CN116755848A

CN116755848A - Transaction scheduling method and system based on prediction

Info

Publication number: CN116755848A
Application number: CN202311037781.7A
Authority: CN
Inventors: 刘雨蒙; 赵怡婧; 王碧聪; 王潮; 徐帆江; 苏毅
Original assignee: Beijing Institute of Remote Sensing Equipment
Current assignee: Beijing Institute of Remote Sensing Equipment
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-09-15
Anticipated expiration: 2043-08-17
Also published as: CN116755848B

Abstract

The invention discloses a transaction scheduling method and a system based on prediction, wherein the method comprises the following steps: carrying out transaction analysis on the new incoming transaction to obtain the characteristic expression of the new incoming transaction; extracting related database transaction logs from a database, and carrying out log analysis to obtain log analysis results; according to the log analysis result, using an unsupervised model DBSCAN cluster to obtain center characteristics of each class; predicting similarity relation between the new incoming transaction and each class of central characteristics by cosine similarity according to the characteristic expression of the new incoming transaction and each class of central characteristics, and carrying out transaction scheduling according to the similarity relation to adjust a transaction queue. And carrying out transaction scheduling according to the similarity relation, adjusting a transaction queue, and reducing the probability of transaction suspension. The cosine similarity is used for calculation, has a good effect, and is low in calculation amount. And resources are allocated uniformly, and the transaction throughput and the transaction processing stability of the system are improved.

Description

Transaction scheduling method and system based on prediction

Technical Field

The invention belongs to the technical field of transaction scheduling, and particularly relates to a prediction-based transaction scheduling method and system.

Background

Transaction scheduling (Transaction Scheduling) is a mechanism in a database management system, DBMS, that ensures that multiple transactions executed concurrently can maintain data consistency and data integrity, thereby effectively avoiding situations where data has serious conflicts or resource contention. In the absence of an effective transaction scheduling mechanism, databases are prone to premature aborts of executing transactions, thereby causing significant performance loss. Particularly, in the online transaction processing OLTP (Online Transaction Processing) database system which is widely applied at present, the characteristics of high concurrency, high real-time performance, high stability and high availability enable an efficient transaction scheduling mechanism to be established, so that the key problem of further improving the performance of the database is solved.

Currently, OLTP database management architectures mostly use a random scheduling mechanism to allocate each transaction thread. The random scheduling can enable each CPU core to distribute tasks, reduce the phenomenon that a certain core of the CPU is unloaded, and enable loads in each core to be kept relatively balanced. However, this mechanism does not address the problem of transaction aborts, and when transactions are parallel and a large number of transaction conflicts occur, most of the work that the CPU has completed will be wasted due to the transaction aborts. In addition, if all transactions are scheduled into one thread considering that transaction conflicts are completely avoided, the advantages of hardware multithreading are not fully exploited.

In addition to random scheduling, there are also some OLTP database systems that employ supervised learning to accomplish transaction scheduling. For example, a logistic regression model is built based on the transaction log to predict the probability of a new incoming transaction abort. However, this method still requires manual data labeling, which is difficult to implement in OLTP database systems with huge data volumes. In addition, a learner proposes a transaction scheduling mechanism for prediction based on a KMeans unsupervised learning method, but the algorithm is easily affected by abnormal values in data, is mainly applicable to data in spherical distribution, and has poor clustering effect on data in other distribution forms.

Disclosure of Invention

The invention provides a transaction scheduling method and system based on prediction aiming at the defects in the prior art.

In a first aspect, the present invention provides a prediction-based transaction scheduling method, including:

carrying out transaction analysis on the new incoming transaction to obtain the characteristic expression of the new incoming transaction;

extracting related database transaction logs from a database, and carrying out log analysis to obtain log analysis results;

according to the log analysis result, using an unsupervised model DBSCAN cluster to obtain center characteristics of each class;

predicting similarity relation between the new incoming transaction and each class of central characteristics by cosine similarity according to the characteristic expression of the new incoming transaction and each class of central characteristics, and carrying out transaction scheduling according to the similarity relation to adjust a transaction queue.

In some embodiments, performing transaction scheduling according to the similarity relationship includes:

the cosine similarity and the super parameter threshold valueIn contrast, the threshold value is not reached>The thread in the corresponding class center of the new incoming transaction is taken as a candidate thread set, and the thread is randomly selected from the candidate thread set to be taken as the thread allocated by the new incoming transaction.

In some embodiments, if the cosine similarity does not reach the hyper-parameter thresholdInserting the new incoming transaction into a next location of the transaction being processed of the transaction queue of the assigned thread.

In some embodiments, performing transaction analysis on the new incoming transaction to obtain a feature expression of the new incoming transaction, or performing log analysis to obtain a log analysis result, including:

the Hash function is adopted to complete the conversion from the original format expression to the coding result;

the transaction resolution model is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,request data item for transaction, +.>For requesting locks->Maximum execution time for a transaction, +.>Occupying memory size for transaction,/->Commit time for transaction, ++>Commit status for transaction, ++>For the thread where the execution of the transaction is located, the coding result is abbreviated as +.>，/>For business->Is transaction number, ++>For the encoding result of each feature in a transaction +.>A unified representation is made of the number of the elements,/>is the number of the transaction feature,/->Is the total amount of transaction features.

In some embodiments, for transactionsAnd transaction->Suppose a transaction +.>Transaction aborts occur +.>Representation and transaction->Conflicting transactions, will->Denoted as->And->Is a union of (2) using coding model +.>Record->And->The part of the non-uniform characteristics is expressed as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,splicing representing the result of the encoding->Representing two transactions->And transaction->In simultaneous operation, a +.>Representing transaction->And->Hash codes corresponding to features that cause transaction queue abort at simultaneous run time.

In some embodiments, according to the log analysis result, using an unsupervised model DBSCAN to cluster to obtain each class center feature, including:

taking all data points in the log analysis result as input of a DBSCAN model, setting all data points as unvisited data points by the DBSCAN model, and randomly selecting one of the data points as an initial data point;

determining the initial data pointWhether the number of data points included in the field reaches the coreData points in minimum neighborhood of point>；

If so, a new category is created and the point is added to the category;

detecting the initial data pointJudging whether the data in the field has a category label or not;

if the label is not available, the label is marked as the same category as the initial data point, and finally a category set is obtained.

In some embodiments, in the log parsing process, whether the commit status of the transaction is aborted is determined, and the transaction with the commit status being the same time period as aborted is used as a data point in the log parsing result.

In some embodiments, using the unsupervised model DBSCAN to cluster the class center features further comprises:

data centers for computing data points in each class for each class in a set of classes, for a class，/>Characteristic expression representing data points in the class, +.>Indicates category number,/->Representing the total amount of data points of the category, center of category->The calculation formula of (2) is as follows:

record each class of center features into the new array.

In some embodiments, predicting the similarity relationship between the new incoming transaction and the class center feature using cosine similarity based on the feature expression of the new incoming transaction, the class center feature, includes:

new incoming transactionsAnd the coding result of the class center is expressed as +.>，/>New incoming transaction->And class center->The cosine similarity between the two is calculated as follows:

。

in a second aspect, the present invention provides a prediction-based transaction scheduling system, comprising: the system comprises a transaction analysis module, a log analysis module, a dispatcher, an unsupervised model and a transaction queue;

the transaction analysis module is used for carrying out transaction analysis on the new incoming transaction to obtain the characteristic expression of the new incoming transaction;

the log analysis module is used for extracting related database transaction logs from the database, and carrying out log analysis to obtain log analysis results;

the scheduler is used for obtaining various center features by utilizing an unsupervised model DBSCAN cluster according to the log analysis result; predicting similarity relation between the new incoming transaction and each class of central characteristics by cosine similarity according to the characteristic expression of the new incoming transaction and each class of central characteristics, and carrying out transaction scheduling according to the similarity relation to adjust a transaction queue.

The invention provides a transaction scheduling method based on prediction, which adopts an unsupervised model DBSCAN to predict on the basis of statistics and analysis of the existing database transaction running log, so as to realize transaction scheduling, compare the similarity of the newly arrived transaction with the transaction running in each thread, schedule the transaction according to the similarity relationship, adjust the transaction queue and reduce the probability of transaction suspension. The cosine similarity is calculated, so that the cosine similarity has a better effect, and the calculated amount is lower than that of the correlation coefficient, the Euclidean distance, the Markov distance and the like. And resources are allocated uniformly, and the transaction throughput and the transaction processing stability of the system are improved.

Drawings

FIG. 1 is a flow chart of a method for scheduling transactions based on prediction according to an embodiment of the present invention;

FIG. 2 is a general framework diagram of a database transaction intelligent management system including a prediction-based transaction scheduling system provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-threaded transaction provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a transaction resolution model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a DBSCAN clustering algorithm model provided by an embodiment of the present invention.

Detailed Description

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed rules.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Embodiments described herein may be described with reference to plan and/or cross-sectional views with the aid of idealized schematic diagrams of the present disclosure. Accordingly, the example illustrations may be modified in accordance with manufacturing techniques and/or tolerances. Thus, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of the configuration formed based on the manufacturing process. Thus, the regions illustrated in the figures have schematic properties and the shapes of the regions illustrated in the figures illustrate the particular shapes of the regions of the elements, but are not intended to be limiting.

The invention provides a transaction scheduling method and system based on prediction. The following detailed description is provided with reference to the accompanying drawings of the embodiments of the invention.

In a first aspect, as shown in fig. 1 and fig. 2, the present invention provides a prediction-based transaction scheduling method, including:

step S101, carrying out transaction analysis on a new incoming transaction to obtain a feature expression of the new incoming transaction;

step S102, extracting related database transaction logs from a database, and carrying out log analysis to obtain log analysis results;

step S103, according to the log analysis result, utilizing an unsupervised model DBSCAN cluster to obtain various center features;

step S104, predicting similarity relation between the new incoming transaction and each class center feature by cosine similarity according to the feature expression of the new incoming transaction and each class center feature, and carrying out transaction scheduling according to the similarity relation to adjust a transaction queue.

Step S101 implements the transformation of the transaction from the original SQL representation to code. The characteristic representation of the new incoming transaction in step S101 is input into the transaction scheduler waiting for the scheduler to allocate its execution thread. Step S102 extracts the execution information of each related transaction in the log, and converts the execution information of the related transaction into a data format which is easy to process by the unsupervised model DBSCAN. And clustering the log analysis results through step S103, and recording the central characteristics of each class.

The invention is mainly directed to an online transaction processing OLTP (Online Transaction Processing) database management system. The system has high transactional property, is characterized by a large number of small transactions and small queries, and often utilizes the transaction throughput or the execution quantity of the transactions in unit time of the system to evaluate the performance of the database. OLTP databases, because of this high concurrency characteristic, present a potential risk of transaction conflicts. In the context of limited computing resources, such transaction conflicts will result in premature suspension of transaction queues in threads, resulting in significant wastage of computing resources. Therefore, the invention provides a database transaction scheduling method based on prediction, an unsupervised model is introduced, a spatial clustering method DBSCAN is applied to noise based on density, and the purposes of balancing configuration resources and improving system transaction throughput and transaction processing stability are achieved.

the cosine similarity and the super parameter threshold valueIn contrast, the threshold value is not reached>The thread with the corresponding class center is used as a candidate thread set, and the thread is randomly selected from the candidate thread set as the thread allocated by the new incoming transaction, which is +.>The upcoming transaction is assigned to a thread with a lower similarity thereto.

It should be noted that if the cosine similarity is greater than the super-parameter threshold valueThe cosine similarity between the new incoming transaction and the executed transaction in the log is higher, and the new incoming transaction can be separated. And separating the transactions with stronger similarity, adjusting the transaction queue, and reducing the probability of transaction suspension.

In some embodiments, as shown in FIGS. 2 and 3, if the cosine similarity does not reach the super-parameter thresholdInserting the new incoming transaction into a next location of the transaction being processed of the transaction queue of the assigned thread.

More specifically, the following describes a transaction recording and analyzing method of the model, and a transaction scheduling algorithm (including a DBSCAN model, a clustering center, a similarity calculating method and the like) based on an unsupervised model DBSCAN.

A transaction recording and analyzing method comprises the following steps:

since the purpose of transaction scheduling is to handle the problem of transaction suspension, several reasons for the occurrence of conflicts in transactions are considered, including data item conflicts (different transactions request the same data item), lock contentions (different transactions request the same lock), deadlocks (transaction a waits for transaction B to release lock and transaction B waits for transaction a to release lock), transaction timeouts (exceeding the maximum time limit of transaction execution causes premature suspension of the system), and memory overhead (single transaction causes excessive memory consumption, premature suspension of the system) among others.

As shown in fig. 4, the relevant information of the executed transaction is recorded in the database log, and the record is also performed for the new incoming transaction, so that all transaction record information is required to be consistent. Based on the reasons of conflict generated by each transaction, a transaction information record is recorded, wherein the transaction information record comprises a data item of a transaction request, a request lock, a maximum execution time of the transaction, a memory occupied by the transaction, a time for the transaction to commit, a commit state (commit/abort) of the transaction and a thread where the transaction is executed.

Establishing a judging mechanism, judging the reasons of conflict among the transactions by utilizing the correlation between the aborted transaction and the related transaction of the same time period in the log, wherein the related transaction of the same time period is shown in figure 4, and comprises the steps of judging the data item conflict by utilizing the correlation between the data item requested by the transaction and the data item requested by other transactions in the same time period, judging the lock competition by utilizing the correlation between the request lock of the transaction and the request lock requested by other transactions in the same time period, judging the deadlock by utilizing the correlation between the data item of the aborted transaction and the request lock of other transactions in the same time period and the request data item, judging the timeout by utilizing the correlation between the time submitted by the transaction and the maximum execution time of the transaction, and judging that the memory occupation is too high by utilizing the correlation between the memory size occupied by the transaction and the maximum memory allowed by the database.

Thus, in some embodiments, performing transaction parsing on the new incoming transaction to obtain a feature expression of the new incoming transaction, or performing log parsing to obtain a log parsing result, including:

the transaction resolution model is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,request data item for transaction, +.>For requesting locks->Maximum execution time for a transaction, +.>Occupying memory size for transaction,/->Commit time for transaction, ++>Commit status (commit/abort) for transaction, for a transaction>For the thread where the execution of the transaction is located, the coding result is abbreviated as +.>，/>For business->Is transaction number, ++>For the encoding result of each feature in a transaction +.>A unified representation is made of the number of the elements,/>is the number of the transaction feature,/->Is the total amount of transaction features.

And in the step of carrying out transaction analysis on the new incoming transaction to obtain the characteristic expression of the new incoming transaction, carrying out transaction analysis on the new incoming transaction by adopting the transaction analysis model. In the step of carrying out log analysis to obtain a log analysis result, the transaction analysis model is adopted to analyze the historical log.

In the embodiment of the invention, the conversion from the original format expression to the coding result is completed by adopting a Hash function because the information has stronger difference. Will beThe result after function parsing is used as a transaction->Is characterized by。/>The functions will act on the transactions separately>Among the various items of information recorded, the information,

in some embodiments, for transactionsAnd transaction->Suppose a transaction +.>Transaction aborts occur +.>Representation and transactionConflicting transactions, will->Denoted as->And->Is a union of (2) using coding model +.>Record->And->The part of the non-uniform characteristics is expressed as follows:

It should be noted that, for a multi-transaction in which no abort occurs, the correlation between transactions is not recorded any more, and only the encoding result of the multi-transaction is recorded.

Second, transaction scheduling algorithm based on unsupervised model DBSCAN:

on the basis of completing the analysis of the transaction log and the analysis of the new incoming transaction, the unsupervised model DBSCAN is utilized to realize the clustering of analysis results, and the similarity between the new incoming transaction and a clustering center is calculated by combining cosine similarity, so that the scheduling process of the new incoming transaction is completed.

taking all data points in the log analysis result as input of a DBSCAN model, namely performing cluster analysis in a feature space of the recorded transaction in the log, setting all data points as unvisited data points by the DBSCAN model, and randomly selecting one of the data points as an initial data point;

determining the initial data pointWhether the number of data points included in the field reaches the data point in the minimum neighborhood required by the core point +.>；

If so, a new category is created and the point is added to the category (the point no longer belongs to the unlabeled data point set after labeling);

It should be noted that, according to the log analysis result, each class of central features is obtained by using an unsupervised model DBSCAN cluster, and the method further includes: initializing algorithm super-parameters including DBSCAN field radiusAnd algorithm, data points in minimum neighborhood required by core point +.>Threshold->. The DBSCAN clustering algorithm model is shown in FIG. 5.

In some embodiments, as shown in fig. 4, in the log parsing process, whether the commit status of the transaction is aborted is determined, and the transaction with the commit status being the same time period of aborted is used as a data point in the log parsing result.

data centers for computing data points in each class for each class in a class set, for a class of class，/>Characteristic expression representing data points in the class, +.>Indicates category number,/->Representing the total amount of data points of the category, center of category->The calculation formula of (2) is as follows:

record each class of center features into the new array.

new incoming transactionsAnd class center->The result of the encoding is expressed as +.>，/>New incoming transaction->And class center->The cosine similarity between the two is calculated as follows:

。

in the embodiment of the invention, for each class center obtained by utilizing the unsupervised model DBSCAN cluster, the cosine similarity between the new incoming transaction and the class center is calculated respectively.

The pseudo code of the transaction scheduling algorithm based on the unsupervised model DBSCAN is as follows:

algorithm: transaction scheduling algorithm based on an unsupervised model DBSCAN;

input: a new transaction log analysis result and a related transaction log analysis result of a new incoming transaction;

super parameters: unsupervised model DBSCAN field radiusData points in the minimum neighborhood required by the core point +.>Threshold->；

The steps are as follows:

creating a set of categoriesClass center set->；

Marking all data points as unvisited data points;

forming a set of non-accessed data points，/>Representing a total amount of unvisited data points;

for the followingIs not accessed data point +>Wherein->And (3) circularly executing:

if pointIs->Within the neighborhood radius, at least +.>Data points, create a new class +.>To category set->In (a) and (b);

data points are processedAdd to the category->；

For the followingIs->Except->Data point set outside->Each item of->：

If this pointIs in an unlabeled state, will->Add to category->；

If it isIs->Within the neighborhood radius, there is at least a minimum number of data points within the neighborhood +.>Data to beIs->Data within the neighborhood radius is added to +.>In (a) and (b);

if it isNot belonging to any one of the existing categories->Will->Add to category->Ending the cycle;

category of categoryAdd to category set->Among them;

otherwise markingIs noise;

for a set of categoriesIn all kinds of->And (3) circularly executing:

calculating the centers of the data points in the feature space；

AddingTo category center set->In (a) and (b);

for each category center, loop execution:

calculating new incoming transactionsAnd class center->Cosine similarity +.>；

Judging the calculated cosine similarity and threshold valueCorrelation between them if less than threshold +.>：

Adding the thread number of the class center to the candidate thread set；

Randomly selecting a candidate thread setOne of the threads->。

And (3) outputting: thread number to which new incoming transaction is assigned。

In the embodiment of the invention, different transaction coding modes are adopted. The optimization of multiple indexes is performed by considering more parameters including transaction request data items, request locks, maximum execution time, occupied memory size, commit time, commit status, threads where execution is performed, and the like. Meanwhile, the unsupervised learning is adopted as a basic model DBSCAN to conduct transaction scheduling for prediction, cosine similarity is used for calculation, the cosine similarity has a good effect, and the calculated amount is lower compared with the calculated amount of correlation coefficient, euclidean distance, markov distance and the like.

In a second aspect, as shown in fig. 2, there is provided a database transaction intelligent management system, including a prediction-based transaction scheduling system provided by the present invention, including: the system comprises a transaction analysis module, a log analysis module, a dispatcher, an unsupervised model and a transaction queue;

In some embodiments, the scheduler is specifically configured to:

It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the invention, and are also considered to be within the scope of the invention.

Claims

1. A prediction-based transaction scheduling method, comprising:

2. The prediction-based transaction scheduling method of claim 1, wherein performing transaction scheduling according to the similarity relationship comprises:

3. The prediction-based transaction scheduling method of claim 2, wherein if soThe cosine similarity does not reach the super-parameter thresholdInserting the new incoming transaction into a next location of the transaction being processed of the transaction queue of the assigned thread.

4. The prediction-based transaction scheduling method of claim 1, wherein performing transaction analysis on a new incoming transaction to obtain a feature expression of the new incoming transaction, or performing log analysis to obtain a log analysis result, includes:

the transaction resolution model is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Request data item for transaction, +.>For requesting locks->Maximum execution time for a transaction, +.>Occupying memory size for transaction,/->Commit time for transaction, ++>Commit status for transaction, ++>On-line for execution of a transactionThe coding result is simply written as，/>For business->Is transaction number, ++>For the encoding result of each feature in a transaction +.>A unified representation is made of the number of the elements,is the number of the transaction feature,/->Is the total amount of transaction features.

5. The prediction-based transaction scheduling method of claim 4, wherein for a transactionAnd transactionsSuppose a transaction +.>Transaction aborts occur +.>Representation and transaction->Conflicting transactions, will->Denoted as->And->Is a union of (2) using coding model +.>Record->And->The part of the non-uniform characteristics is expressed as follows:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Splicing representing the result of the encoding->Representing two transactions->And transaction->In simultaneous operation, a +.>Representing transaction->And->Hash codes corresponding to features that cause transaction queue abort at simultaneous run time.

6. The method for scheduling transactions based on prediction according to claim 1, wherein obtaining each class of center features by using an unsupervised model DBSCAN clustering according to the log parsing result comprises:

If so, a new category is created and the point is added to the category;

7. The prediction-based transaction scheduling method according to claim 6, wherein in the log parsing process, whether the commit status of the transaction is aborted is determined, and the transaction with the commit status being the same time period as aborted is used as a data point in the log parsing result.

8. The method for predicting-based transaction scheduling of claim 6, wherein each class of central features is clustered using an unsupervised model DBSCAN, further comprising:

the method comprises the steps of carrying out a first treatment on the surface of the Record each class of center features into the new array.

9. The prediction-based transaction scheduling method of claim 8, wherein predicting a similarity relationship between the incoming transaction and each category center feature using cosine similarity based on the feature expression of the incoming transaction, the each category center feature, comprises:

。

10. a prediction-based transaction scheduling system, comprising: the system comprises a transaction analysis module, a log analysis module, a dispatcher, an unsupervised model and a transaction queue;