CN116755847B - Log pre-analysis and transaction management method for relieving lock conflict - Google Patents

Log pre-analysis and transaction management method for relieving lock conflict Download PDF

Info

Publication number
CN116755847B
CN116755847B CN202311037214.1A CN202311037214A CN116755847B CN 116755847 B CN116755847 B CN 116755847B CN 202311037214 A CN202311037214 A CN 202311037214A CN 116755847 B CN116755847 B CN 116755847B
Authority
CN
China
Prior art keywords
data
transaction
lock
determining
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311037214.1A
Other languages
Chinese (zh)
Other versions
CN116755847A (en
Inventor
刘雨蒙
苏毅
王中行
张睿恒
徐帆江
王碧聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Remote Sensing Equipment
Original Assignee
Beijing Institute of Remote Sensing Equipment
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Remote Sensing Equipment filed Critical Beijing Institute of Remote Sensing Equipment
Priority to CN202311037214.1A priority Critical patent/CN116755847B/en
Publication of CN116755847A publication Critical patent/CN116755847A/en
Application granted granted Critical
Publication of CN116755847B publication Critical patent/CN116755847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A log pre-analysis and transaction management method for relieving lock conflicts comprises the following steps: logically analyzing the transactions based on the offline transaction log, and determining the transaction construction prediction model of all the transactions in the database; determining thermal data information based on thermal statistics information of data to be mobilized when the transaction is executed, wherein the thermal data information is information of data of which the database calling frequency reaches a preset threshold value; and constructing a prediction model based on the hot data information and the transaction, and determining the scheduling strategy of the current transaction data lock. The data tuples that will be accessed by subsequent operations are predicted by building a markov model-based prediction graph. Then, the algorithm of the shortest path, topological ordering, critical path and the like of the graph information combined with the graph theory and the transaction scheduling strategy used are used for judging whether the transaction needs to be delayed to acquire the lock to schedule the transaction, so that the performance loss caused by a large number of lock conflicts is avoided.

Description

Log pre-analysis and transaction management method for relieving lock conflict
Technical Field
The invention relates to the technical field of database transaction scheduling, in particular to a log pre-analysis and transaction management method for relieving lock conflict.
Background
The current mainstream real-time database still adopts classical First Come First Served (FCFS) or custom priority and other strategies to process transaction operation, the generated execution sequence does not consider the current and future conflicts of the transaction, and only a lock mechanism is adopted to ensure safety when shared resources are in conflict, and the method ensures the safety of mutually exclusive use of the resources, but in a multithreading concurrent transaction model, a large number of threads are blocked when the shared resources are in conflict, and the execution efficiency and performance are sacrificed. For example, all lock information is deleted from the centralized lock data structure and co-exists with the original data.
However, even with the improved lock manager implementation, transaction performance may still be impacted by lock contention, especially under high competing workloads. In this case, the lock dependency chain easily becomes very long, resulting in longer and longer transaction latencies. Transactions waiting in the dependency chain can only be executed sequentially one after the other, which results in a reduction of concurrent execution. At the same time, lock dependencies will occur more frequently, and transaction locks are continually transferred, opened and closed in each thread, i.e., frequent lock conflict problems, resulting in increased transaction latency. The problem of lock conflict is currently the main bottleneck of lock-based concurrency control mechanism performance.
Thus, there is a need for a log preanalysis and transaction management method that mitigates lock conflicts.
Disclosure of Invention
A log pre-analysis and transaction management method for relieving lock conflict is used for solving the technical problem that a lock dependence chain is easy to become very long, so that transaction waiting time is longer and longer. The technical scheme of the invention is as follows:
a log pre-analysis and transaction management method for mitigating lock conflicts, the method comprising:
logically analyzing the transactions based on the offline transaction log, and determining the transaction construction prediction model of all the transactions in the database;
determining thermal data information based on thermal statistics information of data to be mobilized when the transaction is executed, wherein the thermal data information is information of data of which the database calling frequency reaches a preset threshold value;
and constructing a prediction model based on the hot data information and the transaction, and determining the scheduling strategy of the current transaction data lock.
Further, the constructing a prediction model based on the hot data information and the transaction, determining a scheduling policy of the current transaction data lock, includes:
determining data r required by the current transaction when the operation is executed based on a Markov prediction model corresponding to the current transaction, wherein the Markov prediction model constructs a model corresponding to a current transaction operation template in the prediction model for the transaction;
judging whether the data r of the current transaction is thermal data or not based on the thermal data information;
if the data r is hot data, the current transaction needs to be delayed to be executed;
if the data r is not hot, there is no need to delay execution of the current transaction.
Further, if the data r is hot data, the current transaction needs to be delayed, including:
if the state of the lock of the data r, which is obtained by the current transaction attempt, is not compatible with the state of the coded registration lock of the data r, the current transaction is delayed to be executed;
if the prediction key set of the Markov prediction model corresponding to the current transaction points to a thermal tuple and the state of the lock of the data r which is attempted to be acquired by the current transaction is not compatible with the state of the lock of the data r which is already acquired, delaying the execution of the current transaction;
and if the prediction key set of the Markov prediction model corresponding to the current transaction points to a thermal tuple and the state of the lock of the data r, which is obtained by the current transaction, is incompatible with the state of the coded registration lock of the data r, delaying the execution of the current transaction, wherein the thermal tuple is a data set containing thermal data.
Further, the logic analysis is performed on the transactions based on the offline transaction log, and determining the transactions of all the transactions in the database to construct a prediction model includes:
logically analyzing all the transactions in the database based on the offline transaction log, and determining the transaction mode of all the transactions in the database;
a corresponding Markov prediction model is established based on each transaction pattern, the Markov prediction model being used to determine data to be accessed by transactions for each transaction pattern.
Further, after the determining that all the transactions in the database construct the prediction model, the method further includes:
determining a current transaction Markov prediction model corresponding to the current transaction based on the operation template of the current transaction;
and acquiring an operation vertex of the current transaction Markov prediction model corresponding to the operation of the current transaction, filling the operation vertex with an existing parameter value, and determining an accurate prediction key set corresponding to the operation of the current transaction.
Further, the determining the thermal data information based on the thermal statistics of the mobilized data required during the execution of the transaction includes:
acquiring heat statistical information of data r required by current transaction execution through a prediction temperature method and a metal annealing method, wherein the data r is data required to be mobilized when the current transaction execution operation is performed;
and if the recorded temperature in the heat statistic information exceeds a hot spot threshold value, the data r corresponding to the heat statistic information is heat data, and the information of the heat data is heat data information.
Further, the predicted temperature method includes:
determining an initial temperature value of the data r based on the use frequency of the data r in the transaction construction prediction model and the use interval of the mobilized data in the offline transaction log;
and when the transaction is executed, acquiring the rising temperature value of the data r in real time.
Further, the metal annealing method includes:
determining the energy and temperature variation of the data r based on the calling information of the data r;
based on the energy difference of the data r and the current temperature, it is determined whether to accept the state change.
Further, the method for determining the energy of the data r includes:
where a is a parameter for controlling the growth rate, the energy increases with increasing temperature;is the temperature of data r.
Further, the method for determining the temperature variation of the data r includes:
in combination with the newton cooling theorem, the temperature decays exponentially, determined by the cooling factor and the time interval between the current time and the last update time, and the current energy, the following formula is used to calculate the temperature value:
for the sampling update operation of the data r, the formula firstly checks whether the current time is in the same period as the last cooling period; if the time interval is greater than one period, temp (r) decays according to a decay factor determined by the period difference and the cooling factor, i.e., epochcap period difference and the cooling factorCooling factor (+)><0) The method comprises the steps of carrying out a first treatment on the surface of the Adding an increment to temp (r); if the time interval between two adjacent updates is less than one period, this means that the two heating operationsVery close, no temperature decay is required; in this case, temp (r) only needs to increase the heating factor.
The invention has the following effects:
the invention provides a log pre-analysis and transaction management method for relieving lock conflicts. The ability to predict the transaction execution state transition pattern is provided by the transaction building prediction model. The innovation point lays a foundation for the subsequent scheduling policy optimization. Further, the lock conflict is relieved through hot data calculation, the performance loss of lock transfer is reduced, and the execution efficiency is improved. According to the scheme, the dependency relationship and the conflict relationship between the transactions are analyzed, and the preemptive hot data resources are actively allocated, so that frequent lock acquisition and release are reduced, and the performance loss caused by lock transfer is reduced. By optimizing the scheduling strategy, the waiting time can be reduced, and the execution efficiency of the transaction and the overall performance of the system can be improved.
Drawings
FIG. 1 is an overall architecture diagram of a log pre-analysis and transaction management method to mitigate lock conflicts;
FIG. 2 is an overall flow chart of a log pre-analysis and transaction management method for mitigating lock conflicts.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Specific example 1:
lock conflicts resulting from shared resource preemption: in the execution of a database transaction, lock conflict is one of common performance bottlenecks, when a certain resource is shared by a plurality of transactions, the resource lock is frequently transferred and called, and when the number of the transactions is too large, the resource load is too large, so that deep lock conflict is caused, and the execution performance of the database transaction is seriously affected. The invention aims to optimize a scheduling strategy by establishing an accurate dependency graph and a Markov model, thereby reducing lock conflict and waiting time, preventing frequent state transformation of a transaction lock, breaking through performance bottleneck caused by lock conflict and improving overall performance.
Problems of adaptivity and flexibility: conventional transaction lock scheduling policies often have difficulty accommodating for variations in different workloads and environmental conditions. The invention optimizes the transaction analysis strategy by utilizing the advantages of graph theory and feature vector algorithm, so that the transaction scheduling strategy has self-adaptability and flexibility, and can be adjusted and adapted according to the dynamic change and real-time feedback of the system.
The first step: reverse engineering analysis transaction logic is performed based on offline transaction logs, and a prediction model is built for each type of transaction
The scheduling algorithm needs to accurately predict which data will be accessed by the following transaction for use in subsequent transaction scheduling, and this prediction process will be inferred by the transaction access pattern and parameters of previously executed statements. In this part of the work, a Markov prediction model M of a transaction is defined as a directed acyclic graph of the transaction execution path. (Prior Art, clear) vertex vi ε V (M) represents a unique statement template that contains (1) the identifier of the template, (2) the parameter set involved (param-set), (3) the set of predictive keys for operations after vi (pred-set) (the set of data blocks that may be used by the current transaction). The edges from vertex vi represent the probability distribution of a transaction transitioning from vi's state to its subsequent state. The markov prediction graph is generated in two stages. In the first stage, a transaction access graph is constructed from the trace log. In the second stage, the parameter dependencies between vertices are mined and pred-set is filled for each vertex. The specific prediction model construction process is as follows:
1. transaction patterns are extracted from trace logs generated by a database. Each transaction log contains a series of statements, which may end with a COMMIT or an ABORT. We first convert all the original sentence sequences that appear in the transaction log into a set of template sequences by replacing each parameter in the query with a wild card character ($). The collected template sequences are then partitioned into different clusters. There are many ways to do this, and the clustering method of the present invention (which uses the calculation based on the feature vector algorithm to generate the edit distance) does so.
Firstly, selecting the characteristics: when the application scene is computationally intensive, the keyword frequency, the parameter number and the operator frequency of the SQL sentence are taken as vector features, and the feature vector of the distance calculation can be expressed as [ keyword frequency 1, keyword frequency 2, …, frequency of parameter number 0, frequency of parameter number 1, …, operator frequency 1, operator frequency 2, … ].
Similarly, when the application scene is query intensive, the number of parameters, sub-query logic, a connection table and the like are used as vector features, so that the subsequent transaction cluster analysis and thermal data calculation are more facilitated;
when the application scenario is a table operation intensive or library operation intensive transaction, the table frequency or library frequency may be operated as a feature vector.
Each log record in the transaction log is converted into a representation form of a feature vector, and then the K-means clustering algorithm is used for clustering the feature vector:
1. an initial value of the cluster center is selected. Randomly selected vectorAs an initial cluster center.
2. For each sample vector, its distance from each cluster center is calculated using Manhattan distance. For example: for feature vector a= [2, 4, 1, 3, 5]And a cluster center = [1, 3, 2, 5, 4]The Manhattan distance is |2-1|+|4-3|+|1-2|+|3-5|+|5-4|=6, and the A vector sequence +| is calculated>And find the minimum value therein.
3. Each sample vector is assigned to the cluster corresponding to its nearest cluster center.
4. Updating a clustering center: and updating the position of the clustering center to be the average value of samples in the cluster according to the samples in the current cluster.
5. And repeating the steps 1-3 until the termination condition is met (the maximum iteration times or the change of the clustering center is small).
6. Obtaining a final clustering result: each sample is assigned to a cluster to form the final cluster result.
If the count of similar sequences in the cluster is greater than a threshold, the sequence is considered to be a frequent transaction pattern in the current workload.
Compared with the simple calculation of the distance between the character strings by using the Levenstein distance, the method adopts the mode of calculating based on the feature vector to perform offline log analysis, can more accurately find out the character string sequence with high similarity, and can dynamically change the analysis preference according to the user-defined characteristics, thereby greatly improving the flexibility of the method.
2. The entire transaction log is converted into an access map of the predictive model consisting of the individual templates. The access graph is initialized without vertices and edges. When traversing the sequence of templates, a vertex is constructed and suspended from the vertex of its previous template. The pred-set of vertices is initialized to null. And counting the operation of the current node in the offline transaction log to obtain the access times of the current edge, and calculating the value of the edge by dividing the access times of the edge by the total access times of the precursor vertexes. After constructing a vertex and an edge, filling the vertex and the edge according to the same steps, and finally converting the whole transaction log into an access graph of a prediction model formed by all templates.
3. After constructing the access graph of the Markov prediction model, the prediction graph is refined by mining the parameter dependencies and filling pred-set for each vertex. After this prediction graph is implemented, the predictions can be made quickly in an online environment while the transaction is executing, without expensive computations. Parameter dependencies can be further divided into variable dependencies and result dependencies. The value of the variable dependent parameter is determined by the previously requested parameter. A mapping function may be used to find a variable dependenceDepending on the relationship. The value of p1 is calculated by using a mapping function between p1 and p2 to obtain the value of p 2. Basically, the mapping function may be a linear function for numeric values or a non-linear function for non-numeric values. Variable dependencies between parameters can be mined by using linear regression analysis. Instead, the mapping function of the non-numeric parameter may be trained as a classification task by using the parameter values from previous operations as inputs and the parameter values of subsequent operations as outputs. The result dependent parameters depend on the search result of the previous operation, for example, in a simple database transaction scenario, the user information is obtained through surname query, and then the user id parameters in the subsequent operation are the search result from the previous operation. The method of exploring the dependency function between the result set and the request parameters is similar to variable dependent parameters. In particular, for each vertexFinding out the parameter dependence +.>Input vertex set +.>For each vertex within the set, check +.>Whether it is the last dependent node, if so, its param-set will be appended to +.>In pred-set of>Will inherit its pred-set and remove the executed key.
The Markov prediction graph is analyzed based on a log and is built offline, and can be stored in a persistent mode. When the database is running, it first loads the prediction graph into its own workspace. When a transaction operation arrives, if it is a newly initiated transaction, the scheduler first needs to find the corresponding Markov prediction graph. The specific procedure is to extract the template of the transaction operation and calculate a hash value to match all graphics. If only one model matches, then a Markov prediction graph is found. If multiple matches are found, more operational information is needed to match the exact Markov diagram. If there is already one Markov prediction graph in the transaction context, then the vertices in the graph that match the operation are found. The pred-set for that vertex will then populate the existing parameter values to obtain an accurate set of predicted keys.
And a second step of: maintaining heat statistics (frequency and interval of use of data over a period of time) to identify hot records and schedule lock requests for those records
The hot spot record tracking mechanism in the present method is independent of the lock manager. The header of the record may use a 64-bit field temp-val to maintain temperature statistics. The application method is divided into the use of the predicted temperature to fix the data heat and the real-time calculation of the temperature to dynamically change the heat.
The method for predicting the temperature and thus fixing the data heat utilizes the frequency of data use in a prediction graph and the time interval of data use in a transaction log to pre-calculate the data heat, fixes the calculation result in a heat field of the data and is used in the subsequent transaction management, and the process of calculating the data heat by the method is performed offline and does not lose the performance of the system in operation.
In the real-time calculation of temperature for thermal dynamic changes, a hot spot record tracking mechanism tracks hot data at runtime and begins to heat the temperature of data that is frequently accessed during a period of time. If the access frequency is reduced, the temperature of the data is reduced, and the method ensures that the data heat is changed in real time according to the execution condition of the transaction in the system, thereby greatly improving the reliability of the data temperature.
In practical use, the two methods are combined, an initial temperature value is set for the data by using the predicted temperature, then the temperature of the used data is calculated in real time when the transaction is executed, and the advantages of the two modes are compatible, so that the transformation of the data heat is more real, and meanwhile, the temperature statistics are updated by low-probability online sampling (for example, 1%) due to the fact that the performance can be damaged by frequent updating temp-val.
The temperature gradually decreases with the passage of time. The method uses the idea of metal annealing to control the temperature information of data, when the temperature is heated to a certain degree, the temperature is kept for enough time, and then the temperature is cooled at a proper speed, and the temperature reduction is calculated by adopting a simulated annealing algorithm.
1. Defining an energy variation function:where a is a parameter for controlling the rate of increase, the energy increases with increasing temperature.
2. Defining a temperature change rule: in combination with the newton cooling theorem, the temperature decays exponentially, determined by the cooling factor and the time interval between the current time and the last update time, and the current energy. The following formula is used to calculate the temperature value:
for the sample update operation of record r, the formula first checks if the current time is within the same period as the last cooling period. If the time interval is greater than one cycle, temp (r) (data temperature) should be attenuated according to an attenuation coefficient determined by the cycle difference and the cooling factor, i.e., epochcap cycle difference and the cooling factorCooling factor (+)><0). In addition, an increment (++)>Temperature increment) to temp (r). If the time interval between two adjacent updates is less than one period, this means that the two additionsThe thermal operation is very close and no temperature decay is required. In this case, temp (r) only needs to increase the heating factor.
Time of current update data temperature, +.>The time at which the data temperature was last updated,one cycle, +_>And judging whether the current data is a threshold value of the hot spot data or not.
3. And (3) performing an annealing operation: the sampled data is subjected to calculation of energy and temperature changes, and then a probability acceptance criterion (Metropolis criterion) is used for deciding whether to accept state changes or not according to the energy difference and the current temperature. At high temperatures, the probability of data cooling is higher, and as the temperature decreases, the probability of a slow decrease in temperature is more favored, followed by an update of the changing state to the data head. The Metropolis probability formula is as follows:
e energy, T time, k is a probability coefficient, T is the number of times E is marked
In summary, the transaction management module of the database will maintain a threshold H for temperature, and if the temperature of a record exceeds the hot spot threshold H, the record is considered a hot spot record. The data corresponding to the hot spot record is hot data. This recorded hotspot information will be used in a later transaction scheduling process.
And a third step of: designing proper transaction scheduling algorithm, and managing transaction scheduling by combining thermal data information obtained by the previous two steps of execution and Markov prediction graph
The lightweight scheduling algorithm is designed in the part, has little additional overhead in the aspects of identifying and avoiding lock contention, and is mainly used for adapting the predictive diagram and the hot data processing method in the first two steps.
First, a transaction maintains the following fields in the local runtime environment:
the pred-set is used to maintain a predicted hot lock that the transaction is about to acquire. (the same as the data lock is referred to herein)
hold-hot-locks are used to keep track of the hot locks held by the transaction.
In the process of transactional prediction, pred-set performs prediction padding according to the method described in the first step. The hold-hot-locks store the acquired hot locks, and in the implementation of conventional database transaction concurrency control mechanisms, the hold locks in the execution context are maintained for release after execution is completed.
For each tracked record tuple (data for which the data lock corresponds), three additional fields will be added at runtime for use by the transaction schedule:
lock-state, a 64-bit variable, represents the state of the lock (both read and write states, belong to locks that have been taken by a transaction).
reg-lock, a 64-bit variable, encodes the state of the registered lock (not yet taken by the transaction, but hopefully taken by the transaction).
temp-val, a 64-bit variable, the recorded temperature was recorded.
the value of temp-val is updated according to the formula in the second step. The first bit of the lock-state indicates the mode of the lock, i.e., 1 indicates the write mode, and 0 indicates the read mode. If the lock is in write mode, the last 63 bits represent the identifier of the transaction holding its write lock. Otherwise, the remaining bits represent the number of transactions holding the read lock. The information in the lock-state is used to lightweight pre-determine whether a request can acquire a lock on the tuple. reg-lock is used to encode a lock on the tuple. The code is the same as the lock-state, and also has a read lock and a write lock. When a transaction acquires a hot lock, it registers the lock in its pred-set by updating reg-lock to inform other started transactions that already hold other hot locks that the lock has been pre-locked.
Since the present method works before the acquire lock of the lock manager is invoked, the transaction scheduling policy will be performed before the lock is acquired. When a transactionIn an attempt to acquire the lock lt for data r, it will be determined whether the operation is to be delayed for execution, based on the following:
case 1: if the requested lock is not a hot lock, orHaving acquired one or more hot locks, it is apparent that completing a transaction holding a hot lock as early as possible may reduce the probability of contention. Thus, this request need not be delayed.
Case 2: if the requested lock is not compatible with r.reg-lock, this means that continuing to hold a lock on r will block execution of other transactions that have acquired other hot locks, thus delaying the transaction may reduce the risk of long blocking chains.
Compatible functionsThe first bit of the lock-state is compatible if it is 0, i.e. currently in read mode, and not of write type, otherwise it is incompatible. If the first bit of the lock-state is 1, it is in write mode, which is incompatible if it is of read type. If lt is a write type, unless the rest of the bits of the lock-state are equal to +.>Or otherwise incompatible.
Case 3: if there is a predicted key in the Markov prediction graph of the transactionPoint to a thermal tuple and lt and +.>Or->Incompatibility, which means that the transaction must be +.>Blocking, thus prioritizing the request to avoid blocking the execution of r by the initiated transaction.
After the lock on the hot tuple r is acquired, the r.lock-state is updated and the lock on pred-set is registered to r.reg-lock. If lt is a write lock, set r.lock-state toId exclusive or 0x80000000. Otherwise, if the lock type is a read lock, the r.lock-state is incremented by 1. The lock registration operation on reg-lock is similar to the operation on lock-state.
The scheduling method can be compatible with other transaction scheduling methods for comprehensive scheduling. If no additional scheduling method is combined, the transaction performs lock allocation on the request for first applying hot data in the FCFS mode. When the method of scheduling the transaction priority is used, after the transaction priority is set in advance according to other transaction scheduling methods, the method is used for allocating preemptive data resources, when hot data is allocated for the first time, the hot data is allocated preferentially to the transaction with high priority, and then the hot data processing judges whether the hot data is delayed to be acquired according to the different conditions according to the scheduling method, and when the transaction scheduling mode is preemptive scheduling, in case 3, the current judgment is carried out after compatibility judgmentAnd->Or (b)The priority order of the transaction corresponding to the transaction identifier determines whether to preempt the hot tuple.
Innovation point
1. Performing reverse transaction analysis based on the offline log and the Markov model and generating a predictive graph of subsequent transactions: the reverse transaction analysis and prediction graph generation phase provides the present method with the ability to predict the transaction execution state transition pattern. The innovation point lays a foundation for the subsequent scheduling policy optimization. The scheme combines offline log analysis and a Markov model, analyzes a history log of transaction execution in a reverse direction, and constructs a state transition probability map of the transaction by using the Markov model. The reverse analysis method provides a basis for the prediction of subsequent transactions and the calculation of thermal data, and the state transition mode of the execution of the transactions can be better understood and predicted through a prediction graph.
2. Based on the prediction graph and the thermal data, the lock conflict is relieved, the performance loss of lock transfer is reduced, and the execution efficiency is improved: by means of the prediction graph and the thermal data algorithm, the scheme actively distributes preemptive thermal data resources by analyzing the dependency relationship and the conflict relationship among the transactions, so that frequent lock acquisition and release are reduced, and performance loss caused by lock transfer is reduced. By optimizing the scheduling strategy, the waiting time can be reduced, and the execution efficiency of the transaction and the overall performance of the system can be improved.
3. The log records are analyzed based on a feature vector algorithm. According to the method, the distances among sentences are calculated by using the method for calculating the editing distance based on the feature vector, so that the transaction templates are classified, and flexible template classification can be performed according to different selected features, so that the transaction log of the method can be subjected to multidimensional analysis, transaction management is more flexible, and the method can adapt to dynamic changes of a system and different data conflict scenes.
4. Thermal data calculations are performed based on a simulated annealing algorithm. The simulated annealing algorithm is used in the calculation of the data temperature, the temperature change during metal annealing is simulated, the simulated annealing algorithm for calculating the optimal solution in the solution space is expanded to the calculation of the data heat of the database, and the Newton cooling theorem is combined, so that the change of the data temperature is more in accordance with the natural law.

Claims (9)

1. A method for log pre-analysis and transaction management to mitigate lock conflicts, the method comprising:
logically analyzing the transactions based on the offline transaction log, and determining the transaction construction prediction model of all the transactions in the database;
determining thermal data information based on thermal statistics information of data to be mobilized when the transaction is executed, wherein the thermal data information is information of data of which the database calling frequency reaches a preset threshold value;
constructing a prediction model based on the hot data information and the transaction, and determining a scheduling strategy of a current transaction data lock;
the step of constructing a prediction model based on the hot data information and the transaction, and determining the scheduling policy of the current transaction data lock comprises the following steps:
determining data r required by the current transaction when the operation is executed based on a Markov prediction model corresponding to the current transaction, wherein the Markov prediction model constructs a model corresponding to a current transaction operation template in the prediction model for the transaction;
judging whether the data r of the current transaction is thermal data or not based on the thermal data information; if the data r is hot data, the current transaction needs to be delayed to be executed;
if the data r is not hot, there is no need to delay execution of the current transaction.
2. The method for log preanalysis and transaction management for lock conflict mitigation according to claim 1, wherein if the data r is hot data, then the current transaction needs to be delayed to be executed, comprising:
if the state of the lock of the data r, which is obtained by the current transaction attempt, is not compatible with the state of the coded registration lock of the data r, the current transaction is delayed to be executed;
if the prediction key set of the Markov prediction model corresponding to the current transaction points to a thermal tuple and the state of the lock of the data r which is attempted to be acquired by the current transaction is not compatible with the state of the lock of the data r which is already acquired, delaying the execution of the current transaction;
and if the prediction key set of the Markov prediction model corresponding to the current transaction points to a thermal tuple and the state of the lock of the data r, which is obtained by the current transaction, is incompatible with the state of the coded registration lock of the data r, delaying the execution of the current transaction, wherein the thermal tuple is a data set containing thermal data.
3. The method for log pre-analysis and transaction management for lock conflict mitigation according to claim 1, wherein the logically analyzing the transactions based on the offline transaction log to determine a transaction construction prediction model for all transactions in the database comprises:
logically analyzing all the transactions in the database based on the offline transaction log, and determining the transaction mode of all the transactions in the database;
a corresponding Markov prediction model is established based on each transaction pattern, the Markov prediction model being used to determine data to be accessed by transactions for each transaction pattern.
4. A method of log preanalysis and transaction management for lock conflict mitigation in accordance with claim 3, further comprising, after said determining that all transactions in the database construct a predictive model:
determining a current transaction Markov prediction model corresponding to the current transaction based on the operation template of the current transaction;
and acquiring an operation vertex of the current transaction Markov prediction model corresponding to the operation of the current transaction, filling the operation vertex with an existing parameter value, and determining an accurate prediction key set corresponding to the operation of the current transaction.
5. The method for log preanalysis and transaction management for lock conflict mitigation of claim 1, wherein the determining thermal data information based on thermal statistics of the mobilized data required for execution of the transaction comprises:
determining an ascending temperature value of data r required by the execution of the current transaction by a predictive temperature method;
the data r is the data which needs to be mobilized when the current transaction executes the operation;
determining a lower cooling degree value of the data r after the current transaction is executed by a metal annealing method, wherein the heat statistical information comprises an ascending temperature value and a lower cooling degree value;
and if the recorded temperature in the heat statistic information exceeds a hot spot threshold value, the data r corresponding to the heat statistic information is heat data, and the information of the heat data is heat data information.
6. The method for log preanalysis and transaction management for lock conflict mitigation in accordance with claim 5, wherein said determining an elevated temperature value of the data r required for the execution of the current transaction by a predictive temperature method comprises:
determining an initial temperature value of the data r based on the use frequency of the data r in the transaction construction prediction model and the use interval of the mobilized data in the offline transaction log;
and when the transaction is executed, determining an ascending temperature value based on the initial temperature value and the temperature variation quantity of the data r obtained in real time.
7. The method for log preanalysis and transaction management for lock conflict mitigation in accordance with claim 5, wherein said determining a desuperheat value of said data r after execution of a current transaction by a metal annealing method comprises:
determining the energy and temperature variation of the data r based on the calling information of the data r;
and determining a cooling down temperature value of the data r based on the energy difference and the current temperature of the data r, wherein the energy difference is a difference value of the energy of the data r in unit time.
8. The method for log preanalysis and transaction management for lock conflict mitigation of claim 7, wherein the method for determining the energy of the data r comprises:
E t =a*temp(r);
where a is a parameter for controlling the growth rate, the energy increases with increasing temperature; temp (r) is the temperature of data r.
9. The method for log preanalysis and transaction management for lock conflict mitigation in accordance with claim 7, wherein the method for determining the temperature change of the data r comprises:
in combination with the newton cooling theorem, the temperature decays exponentially, determined by the cooling factor and the time interval between the current time and the last update time, and the current energy, the following formula is used to calculate the temperature value:
the time of the current update data temperature of curTime and the time of the last update data temperature of lastTime; epoch is one cycle;judging whether the current data is a threshold value of hot spot data or not; epochcap is a period difference; h is a Δ Is the temperature increment;
for the sampling update operation of the data r, the formula firstly checks whether the current time is in the same period as the last cooling period; if the time interval is greater than one cycle, temp (r) decays according to a decay factor determined by the cycle difference and the cooling factor, i.e., epochcap cycle difference and C θ A cooling factor; increase a temperature increment h Δ Onto temp (r); if the time interval between two adjacent updates is less than one cycle, this means that the two heating operations are very close, no temperature decay is required; in this case, temp (r) only needs to increase the heating factor.
CN202311037214.1A 2023-08-17 2023-08-17 Log pre-analysis and transaction management method for relieving lock conflict Active CN116755847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311037214.1A CN116755847B (en) 2023-08-17 2023-08-17 Log pre-analysis and transaction management method for relieving lock conflict

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311037214.1A CN116755847B (en) 2023-08-17 2023-08-17 Log pre-analysis and transaction management method for relieving lock conflict

Publications (2)

Publication Number Publication Date
CN116755847A CN116755847A (en) 2023-09-15
CN116755847B true CN116755847B (en) 2023-11-14

Family

ID=87951862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311037214.1A Active CN116755847B (en) 2023-08-17 2023-08-17 Log pre-analysis and transaction management method for relieving lock conflict

Country Status (1)

Country Link
CN (1) CN116755847B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596239A (en) * 2018-04-20 2018-09-28 南京航空航天大学 A kind of theme temperature trend forecasting method based on Markov Chain and dynamic backtracking
CN110019468A (en) * 2017-12-05 2019-07-16 华为技术有限公司 A kind of Database Systems and data bank access method
CN111580933A (en) * 2020-05-12 2020-08-25 西安交通大学 Hardware acceleration-based virtual machine online migration method
CN115269314A (en) * 2022-07-13 2022-11-01 北京云集智造科技有限公司 Transaction abnormity detection method based on log
CN116383227A (en) * 2023-06-05 2023-07-04 北京成章数据科技发展有限公司 Distributed cache and data storage consistency processing system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320228A1 (en) * 2010-06-24 2011-12-29 Bmc Software, Inc. Automated Generation of Markov Chains for Use in Information Technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019468A (en) * 2017-12-05 2019-07-16 华为技术有限公司 A kind of Database Systems and data bank access method
CN108596239A (en) * 2018-04-20 2018-09-28 南京航空航天大学 A kind of theme temperature trend forecasting method based on Markov Chain and dynamic backtracking
CN111580933A (en) * 2020-05-12 2020-08-25 西安交通大学 Hardware acceleration-based virtual machine online migration method
CN115269314A (en) * 2022-07-13 2022-11-01 北京云集智造科技有限公司 Transaction abnormity detection method based on log
CN116383227A (en) * 2023-06-05 2023-07-04 北京成章数据科技发展有限公司 Distributed cache and data storage consistency processing system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于预取的集群服务器调度算法;燕彩蓉;沈钧毅;彭勤科;;控制与决策(第03期);全文 *

Also Published As

Publication number Publication date
CN116755847A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
US6567806B1 (en) System and method for implementing hash-based load-balancing query processing in a multiprocessor database system
CN111294234B (en) Parallel block chain fragmentation method based on intelligent contract optimization model
US20090287703A1 (en) Transaction processing system of database using multi-operation processing providing concurrency control of transactions
JPH02230373A (en) Data base processing system
Senthilkumar et al. A survey on job scheduling in big data
Wang et al. Elastic pipelining in an in-memory database cluster
Wang et al. Efficient concurrent execution of smart contracts in blockchain sharding
CN115904638A (en) Intelligent management method and system for database affairs
CN116755847B (en) Log pre-analysis and transaction management method for relieving lock conflict
CN117194502B (en) Database content cache replacement method based on long-term and short-term memory network
Abdul et al. Database workload management through CBR and fuzzy based characterization
CN111930484B (en) Power grid information communication server thread pool performance optimization method and system
US7908268B2 (en) Predictive database pool preparation
Park et al. BlinkML: Approximate machine learning with probabilistic guarantees
CN111309982A (en) Self-adaptive structure adjustment method and system of machine learning data index structure
KR101133516B1 (en) Method for optimizing multiple join queries over data streams
Anneser et al. Adaptive hybrid indexes
Wang et al. Discriminative admission control for shared-everything database under mixed OLTP workloads
Benkrid et al. A genetic optimization physical planner for big data warehouses
Sasak-Okoń Speculative query execution in Relational databases with Graph Modelling
Hu et al. Reloca: Optimize resource allocation for data-parallel jobs using deep learning
Sasak-Okoń et al. Speculative query execution in RDBMS based on analysis of query stream multigraphs
Dan et al. Database access characterization for buffer hit prediction
Seo et al. DARK: Deep automatic Redis knobs tuning system depending on the persistence method
CN116755848B (en) Transaction scheduling method and system based on prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant