CN116560998A

CN116560998A - I/O (input/output) sequence-oriented database performance problem detection method

Info

Publication number: CN116560998A
Application number: CN202310551096.XA
Authority: CN
Inventors: 李姗姗; 王戟; 陈立前; 马俊; 李小玲; 张元良; 王腾; 刘浩然; 白林枭; 彭博铭
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-08-08
Anticipated expiration: 2043-05-16
Also published as: CN116560998B

Abstract

The invention discloses an I/O (input/output) sequence-oriented database performance problem detection method, which aims to accurately detect the problem of unmatched I/O sequence-related performance of a database management system to be detected. The technical proposal is as follows: constructing a database performance problem detection system consisting of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module; the variable analysis module locates a configuration item variable set Conf corresponding to the configuration item in the detected database management system; the stain analysis module performs stain analysis on the configuration item variables in the Conf to obtain an I/O (input/output) sequence related configuration item set; the test case generating module generates a test case set T; the data collection module adopts the test case in T to test and collects the data during operation; the problem matching module detects performance-unadapted problems according to the runtime data. The invention can realize higher recall ratio and accuracy under the condition of lower test expense.

Description

I/O (input/output) sequence-oriented database performance problem detection method

Technical Field

The invention relates to a software performance problem detection technology caused by a software configuration problem, in particular to a detection method for performance problems caused by I/O sequential mismatch under novel hardware.

Background

With the continuous progress and development of society, various software systems are increasingly widely applied, and play a role in various aspects of society. Meanwhile, various problems are caused by some defects of the software, and a plurality of unnecessary troubles are brought to software use users, developers and the like. One of the problems is software performance, which refers to the problems of speed, response time, throughput and the like of the software in the running process. The problem of software performance may cause problems of slow running, crashing, stopping response and the like of the software, thereby affecting the experience and working efficiency of users, and becoming the focus of attention of participants of the software. Amazon, usa, notes that, every 0.1 second delay added to a web page, will directly result in a 1% sales loss. Therefore, the method effectively prevents performance problems from flowing into a production environment, avoids economic loss and extra labor cost, and is a focus of attention of participants of software parties. Software configuration refers to the various settings and adjustments made to software during the development and deployment of the software to meet specific needs and circumstances. The software configuration includes various configuration files, environment variables, database connection strings, log settings, security settings, and the like. The software configuration item is an important interface for external interaction of the software, and generally controls the behavior of the software and the allocation of system resources in the form of conditional expressions so as to adapt the software to different environments and loads (the process is called software configuration). Thus, software configuration is closely related to software performance. In recent years, with the increasing demands of users and the continuous development of software running environments, software gradually develops to high configurability, so that performance problems related to configuration are more prominent.

However, it is not trivial to address configuration-related performance issues, as it comes from a number of aspects: firstly, the performance problems related to configuration can be hidden in a software system, specific configuration and environment triggering are needed, and explicit prompt information such as logs, errors and the like can not be generated after triggering. Furthermore, the software configuration item is used as a key interface for the software adaptation environment, the related code is usually required to adapt to the environment, and the influence of the environment change on the performance is quite remarkable, so that the configuration related performance fault is related to the software itself and often related to the characteristics of the running environment. Finally, the configuration quantity is huge, the relationship between the configuration and the user intention such as performance is complex, the configuration document information is difficult to understand, the user lacks knowledge and time energy in a specific field, and none of the configuration document information brings challenges for detecting the problem configuration items.

With the continuous development of software running environments, configuration-related code without defects may cause performance problems due to the fact that new environment characteristics are not adapted, and the problem of the adaptation caused by the change of the running environments is a typical configuration problem, wherein the problem of the adaptation caused by the replacement of new hardware environments is included. There is a great deal of evidence that current mainstream software often cannot develop the characteristics of new hardware, sometimes even when hardware upgrades occur, the software performance "does not rise and fall".

Taking a database management system as an example, the database management system is a key component of a data-intensive system and is widely deployed on commercial platforms such as caching, metadata management, message delivery, online shopping and the like. Various attempts have been made by related practitioners to improve database performance. Among these, the most straightforward and typical approach is to replace better performing storage devices. In recent years, storage devices have undergone revolutionary development. Among them, NVMe SSD is one of the most representative new storage media, its throughput can reach 6GB/s at the highest, delay can reach 10 mu s level at the lowest, and performance in all aspects is far beyond that of the previous generation SATA SSD and HDD. However, the direct deployment of NVMe SSD in a database management system often fails to achieve the desired effect. And after a large amount of user feedback is carried out, the storage equipment is upgraded to the NVMe SSD, the improvement on performance indexes such as delay, throughput and the like is very limited, and sometimes even the phenomenon of non-rising and non-falling of performance occurs. Research has shown that many mainstream database management systems have performance mismatch, and I/O sequential mismatch is one of the major aspects.

There have been many efforts to address the detection of a wide range of performance problems. A significant portion of these approaches focus only on application-level performance issues, such as Caramel: detecting and fixing performance problems that have non-intra-fix issued by Adrian Nistor et al in ICSE 2015 (Caramel: detecting and repairing performance issues with non-intrusive repairs) and Monika Dhok et al in FSE2016 Directed test generation to detect loop inefficiencies (directed test generation to detect inefficient cycles) and PengfeiSu et al in ICSE2019 issued Reductant loads: asoftware inefficiency indicator (redundancy load: indicator of software inefficiency), respectively, but do not take into account potential mismatch between the application and underlying devices. In addition, there are some work focused on detecting configuration-related performance problems, but there are more or less drawbacks, such as the Violet proposed by yiong Hu et al in Automated reasoning and detection of specious configuration in large systems with symbolic execution (automatic reasoning and detection of plausible configurations in large systems with symbolic execution) published in OSDI2020, which can systematically reason using the performance impact of selective symbolic execution configuration parameters and can get the combined effect between them and the relationship to the inputs, but relies on a large number of symbolic executions, which would take a large amount of computational resources and time. Meanwhile, since the accuracy of symbol execution is limited by the size and complexity of the program, the accuracy of symbol execution may be affected for large programs and complex programs. In addition, there is a mainstream method for judging the problem of the discomfort based on heuristic rules of the performance change, and a method (hereinafter referred to as a benchmark method) for testing by using the device to be tested, the configuration items, the combination and the configuration values, but the rule used as the criterion for judging the problem of the discomfort is rough, and the effectiveness and the testing efficiency are low.

Disclosure of Invention

Aiming at the problem of performance uncomfortableness which is not perceived by developers and users in the current mainstream database management system, the invention provides a dynamic and static combined database performance uncomfortableness detection method aiming at the problem of I/O sequence correlation, which is used for detecting the problem of potential I/O sequence correlation performance uncomfortableness of the database management system to be detected and helping the developers to continuously optimize the database management system.

In order to solve the above problems, a database performance problem detection system is first constructed. The database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module. Then, the variable analysis module analyzes the configuration items of the database management system to be detected and positions a configuration item variable set Conf corresponding to the configuration items in the database management system to be detected. The stain analysis module performs stain analysis on the configuration item variables in the Conf output by the variable analysis module to obtain an I/O sequential correlation configuration item set Conf'. The test case generation module combines the I/O sequence related configuration items in Conf', different test loads B, hardware equipment D and database management system types S to generate a test case set T. The data collection module tests the database management system to be detected by adopting the test cases in T, and collects the running data of the database management system to be detected for running the test cases. The problem matching module detects the performance unsuitable problem of the database management system to be detected according to the runtime data.

The specific technical scheme of the invention is as follows:

first, a database performance problem detection system is constructed. The database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module. The variable analysis module is connected with the stain analysis module, performs variable analysis on the original codes and the configuration items of the database management system to be detected, obtains a configuration item variable set Conf corresponding to the configuration items, and sends the Conf to the stain analysis module. The stain analysis module is connected with the variable analysis module and the test case generation module, receives Conf from the variable analysis module, receives manually screened I/O sequence related system call from the keyboard, performs stain analysis, screens out a configuration item set Conf 'related to the I/O sequence, and sends Conf' to the test case generation module. The test case generation module is connected with the stain analysis module and the data collection module, receives Conf' from the stain analysis module and generates a configuration item combination set C. The test case generation module generates a test case set T according to the set item combination set C, the test load command set B, the hardware equipment set D and the to-be-detected database management system type set S, and sends the T to the data collection module. The DATA collection module is connected with the test case generation module and the problem matching module, receives the test case set T from the test case generation module, tests the database management system to be detected by adopting the test case in T, records the runtime DATA DATA when the database management system to be detected runs the test case, and sends the runtime DATA to the problem matching module. The problem matching module is connected with the DATA collecting module, receives the DATA DATA in the running process from the DATA collecting module, and judges whether the database management system to be detected has the I/O sequential unadapted problem or not according to the DATA.

Secondly, the variable analysis module locates the configuration item variable corresponding to the configuration item to obtain a configuration item variable set Conf corresponding to the configuration item, and sends the Conf to the stain analysis module, wherein the method comprises the following steps:

the variable analysis module analyzes the source code and configuration items of the database management system to be detected by using a ConfMapper algorithm (see "ConfMapper: automated Variable Finding for Configuration Items in Source Code (a method for automatically finding initial variables of configuration items from software source code)" pages 3-7 published by Shulin Zhou in QRS 2016), and positions the detected databaseConfiguration item variable set Conf, conf= { Conf corresponding to configuration item in management system source code ₁ ，...，conf _n ，...，conf _N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf _n And (5) sending the Conf to a stain analysis module for the nth configuration item variable in the Conf.

Thirdly, the stain analysis module receives Conf from the variable analysis module, adopts a static stain analysis method to identify configuration item variables related to the I/O sequential system in the Conf, confirms whether the configuration item variables in the Conf are related to I/O sequential system call, screens out a configuration item set Conf' related to the I/O sequential system, and comprises the following steps:

3.1 identifying system calls related to I/O sequentiality. By reading the official manual of Linux (version 5.4.0 and above is required), the MM system calls of the Linux kernel are investigated, the system calls which can influence the I/O sequence are filtered out from the MM system calls, and cross-filtering (mutual inspection is performed after independent filtering by a plurality of people) is performed on each system call which can influence the I/O sequence, so that MM' system calls which influence the I/O sequence in the MM system calls of the Linux kernel are obtained. MM is a positive integer, as for version 5.4.0 Linux, mm=335. MM' is more than or equal to 1 and less than or equal to MM.

3.2 classifying the MM' system calls which affect I/O sequency and are filtered by 3.1 to obtain a read series S1 and a write series S2, taking the analysis result of the version of Linux5.4.0 as an example, wherein the results comprise:

the read series S1 includes five types of read, read64, readv, preadv, preadv read system calls.

Write sequence S2 includes five types of write system calls, write, pwrite64, writev, pwritev, pwritev.

MM' =10 at this time.

3.3 the search space of known configuration item combinations is enormous and many sampling techniques cannot be applied directly because they change two or more configuration items in each sampling configuration. Therefore, the stain analysis module traverses each configuration item in the Conf, adopts a static stain analysis method to detect the data flow of each configuration item in the Conf, detects whether the I/O related system call is polluted, and excludes the configuration item from the Conf if the I/O related system call is not polluted, so as to obtain an I/O sequential related configuration item set Conf' to obviously reduce the search space of configuration item combination. The static stain analysis method adopts a stain analysis method based on a UseDefine chain, and comprises the following specific steps:

3.3.1 compiling the database management system source code to be tested into LLVM IR (LLVM Intermediate Representation );

3.3.2 let variable n=1, let I/O sequentially related set of configuration items Conf' = { };

3.3.3 construction configuration item variable conf _n Usedefinition chain of (d), conf _n Is marked as dirty data, conf _n The return function value or pointer when read is marked as the propagation source of dirty data. Along conf _n Usedefinition chain of (c), tracking conf _n Propagation in source code of database management system to be detected, determining conf _n In which database management system code segments to be tested may be modified or used.

3.3.4 operations involving taint data in the database management system to be tested are divided into four categories: the method comprises the steps of using taint data, spreading taint data, polluting taint data and spreading other data except the taint data, and connecting all taint paths (spreading tracks of the taint data in a program) formed by the taint data and the taint spreading for operation in a database management system to be detected to form a complete taint spreading graph. Let all code segments on the taint propagation graph set as F, f= { F ₁ ，f ₂ ，...，f _h ，...f _H H is the number of code segments in F, H is 1.ltoreq.H, F _h Is the h code segment in F;

3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf _n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the configuration item variable conf is specified _n Is managed by the databaseWhen no I/O related system call is polluted in the process of reading and using the processing system, the configuration item is removed from Conf, and the method is as follows:

3.3.5.1 let h=1;

3.3.5.2 inspection f _h If any one of the system call in the read series S1 and the write series S2 in the step 3.2 is called, the configuration item variable conf is described _n I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.4; if not, it is indicated that no I/O related system call is contaminated in the process, and 3.3.5.3.

3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, 3.3.5.5 is reached.

3.3.5.4 adding Conf to the set Conf _n 。

3.3.6 if N < N, n=n+1, 3.3.3; if n=n, we mean that we get the I/O sequentiality-related configuration item set Conf ', conf' = { Conf ₁ ，...，conf _z ，...，conf _N′ Z is more than or equal to 1 and less than or equal to N ', N ' is the number of I/O sequential correlation configuration items in Conf ', conf _z Is the z-th I/O sequentiality-related configuration item in Conf'. And 3.3.7.

3.3.7 the stain analysis module sends the Conf' to the test case generation module.

Fourth, the test case generating module generates test cases, and the method is as follows:

4.1 the test case creation module extracts the grammar types and value ranges of the software configuration items of the database management system to be tested using the Spex algorithm (see page 7-10 of article "Do Not Blame Users for Misconfigurations (no user's configuration errors blamed)" published by Tianyin Xu et al in SOSP 2013), and classifies the extracted grammar types into four categories: numerical type (int), boolean type (bool), enumeration type (enum), string type (string);

4.2 test case generating Module generates for Conf' a set V of sets of values to be tested, V= { vv ₁ ，vv ₂ …，vv _z …，vv _N′ }, where v is _z Is the Z-th set of values to be measured in V, i.e., the Z-th configuration item Conf in Conf _z Is a set of values to be measured, is conf _z Is the kth value, K _z Generating a module of conf for the test case _z The number of the generated values is more than or equal to 1 and less than or equal to K. The method comprises the following steps:

4.2.1 initializing variable z=1;

4.2.2 if conf _z The corresponding expected label is empty, then letAt this time k=0, v will be _z Put in V, turn 4.2.7;

4.2.3 if conf _z Is of Boolean type (bool), let v _z = {0,1}, where k=2, v will be v _z Put in V, turn 4.2.7;

4.2.4 if conf _z For enumeration type (enum), letWherein the method comprises the steps ofConf extracted for Spex algorithm _z All possible values of (a) will be v _z Put in V, turn 4.2.7;

4.2.5 if conf _z For string type (string), let(conclusion of "Tuning backfirednot (always) you fault: understanding and detecting configuration-related performance bugs published in ESEC/FSE 2019 according to He Haochen (configuration adjustment is counterproductive,at this time k=0, v will be _z Put in V to transfer 4.2.7;

4.2.6 if conf _z Is of the numerical type (int), then pair conf _z Is sampled by the following method: recording conf extracted by Spex algorithm _z The minimum value of (2) is Min, the maximum value is Max, and v is set to _z ＝{Min，10·Min，10 ² ·Min，Max，10 ^-1 ·Max，10 ^-2 Max, where k=5, v _z Put in V to transfer 4.2.7;

4.2.7 if z=n', turn 4.3; otherwise, let z=z+1, turn 4.2.2;

4.3 vs v ₁ ，vv ₂ …，vv _z …，vv _N′ Taking the cartesian product, we get the cartesian product vcartesian=vv ₁ ×vv ₂ ×…×vv _z …×vv _N′ ；

4.4 test case Generation Module test commands are generated using a Performance test tool (e.g., sysbench (version 1.0.14) or apache-benchmark (version 2.4)). The method comprises the following steps: the classical Pair-wise method (Pair-wise Testing is a combinatorial method of software testing that, for each Pair of input parameters to a system, tests all possible discrete combinations of those parameters method is a combined method in the field of software testing, which tests all possible discrete combinations of these parameters for each Pair of input parameters of the system "- - - - - - -" Pragmatic Software Testing: becoming an Effective and Efficient Test Professiona1, "Utility software test: becomes an efficient test specialty"), samples parameters of the performance test tool, then inputs parameter values (such as concurrency, load type, data table size, data table number, read operation proportion, write operation proportion) of the performance test tool obtained by sampling into the performance test tool, outputs test commands, and obtains a test command set b= { B ₁ ，b ₂ ，b ₃ ，...，b _y ，...，b _Y Y is more than or equal to 1 and less than or equal to Y, and Y is the number of test commands in B, B _y The y test command in B;

4.5 the test case generating module receives a set of hardware devices D defined by a user, wherein D is composed of NVMe SSD, SATA SSD, where d=d1_d2, for experimental comparison. Wherein D1 represents a conventional hard disk HDD set, d1= { D11, D12} (AMD sharp 95700XT and 64GBDDR4 memory are selected in the method, that is, d11=amd sharp 95700XT, d12=64 GBDDR4 memory), D2 represents a solid state HDD set, d2= { D21, D22, D23, D24} (three star 980pro, western number sn850, three star 860evo, and western number blue are selected in the method, that is, d21=three star 980pro, d22=western number sn850, d23=three star 8620 evo, d24=western number blue).

4.6 the test case generating module generates a configuration item combination set C. C= { C ₁ ，...，c _l ，...,c _L }, wherein c _l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v ₁ Any element in the steel is 1-K ₁ ；/>Is a set v _z Any element in the steel, lz is more than or equal to 1 and less than or equal to K _z ；…；/>Is a set v _N′ L is a positive integer, and represents the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item can freely take a value from a value set to be measured.

4.7 the test case creation module creates a database management system type set s= { MySQL, postgreSQL, mariaDB, SQLite, mongoDB, redis }. These are currently 6 main stream database management software.

4.8 test case Generation Module GenerationTest case set T, t= { T ₁ ，t ₂ ，t ₃ ，...t _a ，…，T _A }，1≤a≤A，t _a Is a four-tuple { b } _a ，c _a ，d _a ，s _a }，c _a Is any one element, s, in the configuration item combination set C _a Is any element in a database management system type collection S, b _a For testing any element in command set B, d _a For any element in the hardware device set D, A is the total number of test cases in T, and for the test cases to contain all four-tuple combinations, it is required that A be equal to or greater than 36 XY L.

Fifthly, the DATA collection module receives the test case set T, runs the test cases in the T, collects the DATA in the running process, and obtains a DATA set DATA in the running process, wherein the method comprises the following steps:

5.1 let variable a=1, let the runtime DATA set data= { };

5.2 running test case t _a ：

5.3 data collection Module obtains the slave type s by any one of blktrace, linux Kernel event tracking and eBPF diagnostics (a virtual machine technology running in the Linux Kernel) _a I/O paths of the database management system to the device driver to be tested.

5.4 data collection Module Using hardware storage device d _a In the case of (1) creating a test suite independent of the database management system under test using the open source tool fio (version 3.2 above), the simulation type is s _a I/O behavior of the database management system to be tested, for load command b _a Lower usage configuration item c _a Is of the type s _a The database management system to be detected is tested.

5.5 from test case t _a When starting to run, the data collection module continuously uses the fio tool to carry out configuration adjustment parameters and system setting modification, adjusts the configuration items, and dynamically monitors and records the type s _a Runtime data of a database management system to be tested _a Including CPU utilization, NVMe queue utilization, and I/O request order, the dat will bea _a Added to the set DATA until the test case t _a And (5) ending the operation.

5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6.

5.7 DATA collection module outputs the runtime DATA set DATA, data= { DATA, to the problem matching module ₁ ，…，data _a ，...，data _A }，data _a Is the a-th runtime DATA in DATA.

And step six, the problem matching module determines whether the database management system to be detected has a performance mismatch problem according to the DATA.

6.1 let variable a=1, let the number of performance mismatch problems m=0.

6.2 problem matching Module pairs runtime DATA in DATA according to the following criteria _a Judging to obtain a conclusion whether the database management system to be detected has a performance uncomfortableness problem, wherein the method comprises the following steps:

6.2.1 judging whether the database management system to be detected after configuration adjustment performs 'random-order' conversion, wherein the judging method is to analyze data _a It is determined whether a change in the order of the I/O requests has occurred. If the I/O request sequence degree of the database management system to be detected is inconsistent before and after the parameters and the system setting are modified by the fio tool, the random-sequence is considered to be converted into 6.2.2, otherwise, the conversion is converted into 6.4.

6.2.2 judging whether the performance of the database management system is reduced or not, wherein the judging method is to compare data _a And (3) whether the CPU utilization rate and the NVMe queue utilization rate are reduced before and after the modification. If the CPU utilization rate and the NVMe queue are both reduced after configuration adjustment, the final performance change of the database management system is considered to be reduced, and the process is changed to 6.3; otherwise, turning to 6.4.

6.3 the problem matching module judges that the I/O sequential unadapted problem exists in the database management system to be detected, makes M=M+1, records the I/O sequential unadapted problem existing in the database management system to be detected at the moment as a defect, and obtains the defect ID of the defect (the official does not use a standard fault report management system, namely the ID converted by the tinyurl. Mobi, and can be directly accessed through splicing).

6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5.

And 6.5, ending.

Compared with the prior art, the invention has the following beneficial effects:

1. the third step of the invention uses static stain analysis method to identify all configuration item variables related to the I/O sequential system in Conf, confirms whether the configuration item variables in Conf are related to the I/O sequential system call, and tests by screening the I/O related configuration items, thus greatly reducing testing cost. Compared with the reference method in the background art, the time cost for testing a database management system is unequal to 0.6-5.5 hours, and the average time cost for testing a database in the reference method is more than 2000 hours.

2. The invention can effectively detect the problem of database I/O sequential unadapted, 5 kinds of 15I/O sequential unadapted problems are detected in total by the database management system MySQL, postgreSQL and MariaDB of three main streams, the defect IDs are #10355l, #107362, #103272, # yVIz and #26790 respectively, and all I/O sequential unadapted problems are confirmed by a developer after being reported to the developer.

3. The fourth test case generation module is based on the generated test cases, the generated test cases have the characteristic of comprehensive coverage, and a large number of arrangement and combination of different configuration items, load commands and hardware equipment can enable the test case set to cover most of I/O sequential non-adaptation problems, so that the detection accuracy is greatly improved.

Drawings

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a logical block diagram of an I/O sequential mismatch problem detection system constructed in accordance with the present invention;

Detailed Description

The present invention will be described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention includes the steps of:

the variable analysis module uses ConfMapper algorithm (see "ConfMapper: automated Variable Finding for Configuration Items in Source Code (a method of automatically discovering initial variables of configuration items from software source code)" pages 3-7 published by Shulin Zhou in QRS 2016) to treat source code and configuration of the database management system to be testedAnalyzing the items, and positioning a configuration item variable set Conf, conf= { Conf corresponding to the configuration items in the source code of the detected database management system ₁ ，...，conf _n ，...，conf _N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf _n And (5) sending the Conf to a stain analysis module for the nth configuration item variable in the Conf.

MM' =10 at this time.

3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf _n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the allocation is indicatedTerm variable conf _n No I/O related system call is contaminated during reading and use by the database management system, and the configuration item is excluded from Conf by:

3.3.5.1 let h=1;

3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, 3.3.5.5 is reached.

3.3.5.4 adding Conf to the set Conf _n 。

4.2.1 initializing variable z=1;

4.2.5 if conf _z For string type (string), let(according to He Haochen, "Tuning backfirednot (always) you fault: understanding and detecting configuration-related performer", published in ESEC/FSE 2019The conclusion of the management bug (which is the best for the configuration adjustment _z Put in V to transfer 4.2.7;

4.2.7 if z=n', turn 4.3; otherwise, let z=z+1, turn 4.2.2;

4.4 test case Generation Module test commands are generated using a Performance test tool (e.g., sysbench (version 1.0.14) or apache-benchmark (version 2.4)). The method comprises the following steps: the parameters of the performance test tool are sampled by adopting a classical Pair-wise method (the Pair-wise Testing is a combinatorial method of software testing that, for each Pair of input parameters to a system, tests all possible discrete combinations of those parameters method is a combination method in the field of software testing, the method aims at each Pair of input parameters of a system, all possible discrete combinations of the parameters are tested, - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (35 (r) 35 (r) and is a high-efficiency test specialty) and then the parameter values (such as concurrency, load type, data table size, data table number, read operation proportion and write operation proportion) of the performance test tool obtained by sampling are input into the performance test tool, and test commands are output to obtain a test command set B = { B) ₁ ，b ₂ ，b ₃ ，...，b _y ，...，b _Y Y is greater than or equal to 1 and less than or equal to Y, Y is the test in BNumber of commands, b _y The y test command in B;

4.6 the test case generating module generates a configuration item combination set C. C= { C ₁ ，...，c _l ，...，c _L }, wherein c _l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v ₁ Any element in the steel is 1-K ₁ ；/>Is a set v _z Any element in the steel, lz is more than or equal to 1 and less than or equal to K _z ；…；/>Is a set v _N′ L is a positive integer, and represents the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item can freely take a value from a value set to be measured.

4.8 test case generating module generates test case set T, t= { T ₁ ，t ₂ ，t ₃ ，...t _a ，...，t _A }，1≤a≤A，t _a Is a four-tuple { b } _a ，c _a ，d _a ，s _a }，c _a Is any one element, s, in the configuration item combination set C _a Is any element in a database management system type collection S, b _a For testing any element in command set B, d _a For any element in the hardware device set D, A is the total number of test cases in T, and for the test cases to contain all four-tuple combinations, it is required that A be equal to or greater than 36 XY L.

5.1 let variable a=1, let the runtime DATA set data= { };

5.2 running test case t _a ：

5.5 from test case t _a When starting to run, the data collection module uses the fio tool to continuously carry out configuration adjustment parameters and system setting modification, adjusts configuration items, dynamically monitors and controlsRecord type s _a Runtime data of a database management system to be tested _a Including CPU utilization, NVMe queue utilization, and I/O request order, will data _a Added to the set DATA until the test case t _a And (5) ending the operation.

5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6.

6.1 let variable a=1, let the number of performance mismatch problems m=0.

6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5.

And 6.5, ending.

In order to verify the capability of the invention in finding out the problem of performance mismatch related to I/O sequence of a detection database, the experiment of the invention is carried out on a computer carrying 8-Core Intel Core I7-9700K,32GB memory and NVMe SSD 500GB solid state disk. And (3) the system call extracted from the Linux 5.4.0 version kernel is selected in the experiment, and the system call and MySQL, postgreSQL, mariaDB, SQLite, mongoDB and Redis six main stream open source database management systems are used as experimental objects for evaluation. In the experiment, the source codes and configuration items of six types of database management system software and the manually extracted system call of the Linux 5.4.0 version kernel are used as system input, and program variables are extracted from a variable analysis module and output to a stain analysis module. The stain analysis module receives system call and program variable as input, screens out configuration items related to I/O sequence through stain analysis, outputs the configuration items to the test case generation module for test case generation, outputs the test cases to the data collection module after generation, tests and collects data in the six database management systems, and finally judges whether the I/O sequence non-adaptation problem occurs in the problem matching module. The experimental results are shown in Table 1, the invention detects 15I/O sequential discomfort problems of 5 types in the three databases of MySQL, postgreSQL and MariaDB, and the invention is confirmed by a developer, while for SQLite, mongoDB and Redis, the discomfort problems related to I/O sequential are not detected.

TABLE 1 detection of I/O sequential mismatch problem with the present invention

/>

Claims

1. The database performance problem detection method facing I/O sequence is characterized by comprising the following steps:

firstly, constructing a database performance problem detection system; the database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module; the variable analysis module is connected with the stain analysis module, performs variable analysis on the original codes and the configuration items of the database management system to be detected, obtains a configuration item variable set Conf corresponding to the configuration items, and sends the Conf to the stain analysis module; the stain analysis module is connected with the variable analysis module and the test case generation module, receives Conf from the variable analysis module, receives manually screened I/O sequence related system call from a keyboard, performs stain analysis, screens out a configuration item set Conf 'related to the I/O sequence, and sends Conf' to the test case generation module; the test case generation module is connected with the stain analysis module and the data collection module, receives Conf' from the stain analysis module and generates a configuration item combination set C; the test case generation module generates a test case set T according to the configuration item combination set C, the test load command set B, the hardware equipment set D and the to-be-detected database management system type set S, and sends the test case set T to the data collection module; the DATA collection module is connected with the test case generation module and the problem matching module, receives the test case set T from the test case generation module, tests the database management system to be detected by adopting the test case in the T, records the runtime DATA DATA when the database management system to be detected runs the test case, and sends the runtime DATA to the problem matching module; the problem matching module is connected with the DATA collecting module, receives the DATA DATA in the running process from the DATA collecting module, and judges whether the database management system to be detected has the I/O sequential unadapted problem or not according to the DATA;

the variable analysis module analyzes the source code and the configuration items of the database management system to be detected by using a ConfMapper algorithm, and positions a configuration item variable set Conf, conf= { Conf in the source code of the database management system to be detected, which corresponds to the configuration items ₁ ，...，conf _n ，...，conf _N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf _n The method comprises the steps of sending Conf to a stain analysis module for an nth configuration item variable in the Conf;

3.1 identifying system calls related to I/O sequentiality; the method comprises the steps of searching MM system calls of a Linux kernel by reading a Linux official manual, filtering out system calls which possibly influence I/O (input/output) sequency from the MM system calls, and performing cross-filtering on each system call which possibly influence I/O sequency to obtain MM' system calls which influence I/O sequency in the MM system calls of the Linux kernel; MM is a positive integer, and is more than or equal to 1 and less than or equal to MM';

3.2 classifying the MM' system calls which affect the I/O sequence and are filtered by 3.1 to obtain a read series S1 and a write series S2, wherein the read series S1 comprises read system calls, and the write series S2 comprises write system calls;

3.3 traversing each configuration item in the Conf by the stain analysis module, carrying out data flow detection on each configuration item in the Conf by adopting a static stain analysis method, detecting whether the I/O related system call is polluted, and if the I/O related system call is not polluted, removing the configuration item from the Conf to obtain an I/O sequence related configuration item set Conf'; the static stain analysis method adopts a stain analysis method based on a UseDefine chain, and comprises the following specific steps:

3.3.1 compiling the source code of the database management system to be detected into LLVM intermediate representation;

3.3.3 construction configuration item variable conf _n Usedefinition chain of (d), conf _n Is marked as dirty data, conf _n The return function value or pointer when read is marked as the propagation source of dirty data; along conf _n Usedefinition chain of (c), tracking conf _n Propagation in source code of database management system to be detected, determining conf _n In which database management system code segments to be tested may be modified or used;

3.3.4 operations involving taint data in the database management system to be tested are divided into four categories: the method comprises the steps of using taint data, spreading taint data, polluting taint data and spreading other data except the taint data, and connecting taint paths formed by the taint data and the taint spreading for all operations in a database management system to be detected to form a complete taint spreading diagram; let all code segments on the taint propagation graph set as F, f= { F ₁ ，f ₂ ，...，f _h ，...f _H H is the number of code segments in F, H is 1.ltoreq.H, F _h Is the h code segment in F;

3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf _n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the configuration item variable conf is specified _n When no I/O related system call is polluted in the process of being read and used by the database management system, the configuration item is removed from the Conf, and an I/O sequential related configuration item set Conf' is obtained;

3.3.6 if N < N, n=n+1, 3.3.3; if n=n, we mean that we get the I/O sequentiality-related configuration item set Conf ', conf' = { Conf ₁ ，...，conf _z ，...，conf _N′ Z is more than or equal to 1 and less than or equal to N ', N ' is the number of I/O sequential correlation configuration items in Conf ', conf _z Is the z-th I/O sequential related configuration item in Conf', transfer3.3.7；

3.3.7 stain analysis module sends Conf' to test case generation module;

4.1, the test case generating module extracts the grammar types and the value ranges of the software configuration items of the database management system to be detected by using a Spex algorithm, and classifies the extracted grammar types into four types: numerical type, boolean type, enumeration type, string type;

4.2 test case generating Module generates for Conf' a set V of sets of values to be tested, V= { vv ₁ ，vv ₂ ...，vv _z …，vv _N′ }, where v is _z Is the Z-th set of values to be measured in V, i.e., the Z-th configuration item Conf in Conf _z Is a set of values to be measured, is conf _z Is the kth value, K _z Generating a module of conf for the test case _z The number of the generated values is more than or equal to 1 and less than or equal to K;

4.4, the test case generation module adopts a performance test tool to generate a test command; the method comprises the following steps: sampling parameters of the performance test tool by using a pair-wise method, inputting parameter values of the performance test tool obtained by sampling into the performance test tool, and outputting test commands to obtain a test command set B= { B ₁ ，b ₂ ，b ₃ ，...，b _y ，...，b _Y Y is more than or equal to 1 and less than or equal to Y, and Y is the number of test commands in B, B _y The y test command in B;

4.5 the test case generating module receives a hardware device set D defined by a user, wherein D=D1U D2;

wherein D1 represents a traditional hard disk HDD set, and D2 represents a solid state disk SSD set;

4.6, the test case generating module generates a configuration item combination set C; c= { C ₁ ，...，c _l ，...，c _L }, wherein c _l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v ₁ Any element in the steel is 1-K ₁ ；/>Is a set v _z Any element in the steel, lz is more than or equal to 1 and less than or equal to K _z ；…；/>Is a set v _N′ Any one element of (a) and (b); l is a positive integer, representing the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item freely takes value from a value set to be tested;

4.7 the test case generating module generates a database management system type set S= { MySQL, postgreSQL, mariaDB, SQLite, mongoDB, redis };

4.8 test case generating module generates test case set T, t= { T ₁ ，t ₂ ，t ₃ ，...t _a ，...，t _A }，1≤a≤A，t _a Is a four-tuple { b } _a ，c _a ，d _a ，s _a }，c _a Is any one element, s, in the configuration item combination set C _a Is any element in a database management system type collection S, b _a For testing any element in command set B, d _a The method comprises the steps that the method is any element in a hardware device set D, wherein A is the total number of test cases in T;

5.1 let variable a=1, let the runtime DATA set data= { };

5.2 running test case t _a ：

5.3 the data collection Module obtains the slave type s by any one of blktrace, linux Kernel event tracking and eBPF diagnostics _a I/O paths from the database management system to be tested to the device driver;

5.4 data collection Module Using hardware storage device d _a In the case of (a), a test suite independent of the database management system under test is created using the open source tool fio, the simulation type is s _a I/O behavior of the database management system to be tested, for load command b _a Lower usage configuration item c _a Is of the type s _a The database management system to be detected is tested;

5.5 from test case t _a When starting to run, the data collection module continuously uses the fio tool to modify parameters and system settings, dynamically monitors and records the type s _a Runtime data of a database management system to be tested _a Including CPU utilization, NVMe queue utilization, and I/O request order, will data _a Added to the set DATA until the test case t _a Ending the operation;

5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6;

5.7 DATA collection module outputs the runtime DATA set DATA, data= { DATA, to the problem matching module ₁ ，...，data _a ，...，data _A }，data _a Is the a-th runtime DATA in DATA;

step six, the problem matching module determines whether the database management system to be detected has a performance mismatch problem according to the DATA;

6.1 let variable a=1, let the number of performance mismatch problems m=0;

6.2.1 judging whether the database management system to be detected after configuration adjustment performs 'random-order' conversion, wherein the judging method is to analyze data _a Determining whether a change in the order of the I/O requests occurs; if the I/O request sequence degree of the database management system to be detected is inconsistent before and after the parameters and the system setting are modified by the fio tool, the random-sequence is considered to be converted into 6.2.2, otherwise, the conversion is 6.4;

6.2.2 judging whether the performance of the database management system is reduced or not by comparing da _a Whether the CPU utilization rate and the NVMe queue utilization rate are reduced before and after the modification; if the CPU utilization rate and the NVMe queue are both reduced after the configuration item is regulated, the final performance change of the database management system is considered to be reduced, and the process is changed to 6.3; otherwise, turning to 6.4;

6.3, the problem matching module judges that the I/O sequential unadapted problem exists in the database management system to be detected, makes M=M+1, records the I/O sequential unadapted problem existing in the database management system to be detected at the moment as a defect, and obtains the defect ID of the defect;

6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5;

and 6.5, ending.

2. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the Linux in step 3.1 requires version 5.4.0 and above, and the cross filtering means that a plurality of people independently filter and then mutually check.

3. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the reading series S1 in the step 3.2 comprises five types of read, pread64 and readv, preadv, preadv2 read system calls; the write series S2 includes five types of write system calls, write64, writev, pwritev, pwritev.

4. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the method for checking whether the code segment in the F calls any one of the system call in the read series S1 and the write series S2 in the step 3.2 in the step 3.3.5 to obtain the I/O sequence-related configuration item set Conf' is as follows:

3.3.5.1 let h=1;

3.3.5.2 inspection f _h If any one of the system call in the reading series S1 and the writing series S2 is called, the configuration item variable conf is described _n I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.4; if not, the configuration item variable conf is described _n No I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.3;

3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, turning to 3.3.5.5;

3.3.5.4 adding Conf to the set Conf _n 。

5. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the method for generating the set V of the set of the values to be detected for the Conf' by the test case generating module in step 4.2 is as follows:

4.2.1 initializing variable z=1;

4.2.2 if conf _z The corresponding expected label is empty, then letWill v _z Put in V, turn 4.2.7;

4.2.3 if conf _z For Boolean type, let v _z = {0,1}, v will be _z Put in V, turn 4.2.7;

4.2.4 if conf _z To enumerate the types, orderWherein->Conf extracted for Spex algorithm _z All possible values of (a) will be v _z Put in V to transfer 4.2.7;

4.2.5 if conf _z For the character string type, orderWill v _z Put in V to transfer 4.2.7;

4.2.6 if conf _z For the value type, then pair conf _z Is sampled by the following method: recording conf extracted by Spex algorithm _z The minimum value of (2) is Min, the maximum value is Max, and v is set to _z ＝{Min，10·Min，10 ² Min，Max，10 ^-1 ·Max，10 ^-2 Max }, v _z Put in V to transfer 4.2.7;

4.2.7 if z=n', a set V of the set of values to be measured is obtained, v= { vv ₁ ，vv ₂ ...，vv _z …，vv _N′ Ending; otherwise, let z=z+1, turn 4.2.2.

6. The method for detecting I/O sequential database performance problems according to claim 1, wherein the performance test tool comprises a sysbench version 1.0.14 or an apache-benchmark version 2.4 in step 4.4.

7. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the parameter values of the performance test tool in step 4.4 include concurrency, load type, data table size, data table number, read operation ratio, write operation ratio.

8. The method of claim 1, wherein the step of step 4.5 is d1= { D11, D12}, d11=amd Ruilong 95700xt, d12=64 GBDDR4 memory, d2= { D21, D22, D23, D24}, d21=samsung 980pro, d22=west sn850, d23=samsung 8620 ev, d24=west blue.

9. The method for detecting I/O sequential database performance problems according to claim 1, wherein the step A is equal to or greater than 18X YxL in 4.8 steps.

10. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the open source tool fio requires more than 3.2 versions in step 5.4.