CN116560998A - I/O (input/output) sequence-oriented database performance problem detection method - Google Patents

I/O (input/output) sequence-oriented database performance problem detection method Download PDF

Info

Publication number
CN116560998A
CN116560998A CN202310551096.XA CN202310551096A CN116560998A CN 116560998 A CN116560998 A CN 116560998A CN 202310551096 A CN202310551096 A CN 202310551096A CN 116560998 A CN116560998 A CN 116560998A
Authority
CN
China
Prior art keywords
conf
data
configuration item
module
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310551096.XA
Other languages
Chinese (zh)
Other versions
CN116560998B (en
Inventor
李姗姗
王戟
陈立前
马俊
李小玲
张元良
王腾
刘浩然
白林枭
彭博铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310551096.XA priority Critical patent/CN116560998B/en
Publication of CN116560998A publication Critical patent/CN116560998A/en
Application granted granted Critical
Publication of CN116560998B publication Critical patent/CN116560998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an I/O (input/output) sequence-oriented database performance problem detection method, which aims to accurately detect the problem of unmatched I/O sequence-related performance of a database management system to be detected. The technical proposal is as follows: constructing a database performance problem detection system consisting of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module; the variable analysis module locates a configuration item variable set Conf corresponding to the configuration item in the detected database management system; the stain analysis module performs stain analysis on the configuration item variables in the Conf to obtain an I/O (input/output) sequence related configuration item set; the test case generating module generates a test case set T; the data collection module adopts the test case in T to test and collects the data during operation; the problem matching module detects performance-unadapted problems according to the runtime data. The invention can realize higher recall ratio and accuracy under the condition of lower test expense.

Description

I/O (input/output) sequence-oriented database performance problem detection method
Technical Field
The invention relates to a software performance problem detection technology caused by a software configuration problem, in particular to a detection method for performance problems caused by I/O sequential mismatch under novel hardware.
Background
With the continuous progress and development of society, various software systems are increasingly widely applied, and play a role in various aspects of society. Meanwhile, various problems are caused by some defects of the software, and a plurality of unnecessary troubles are brought to software use users, developers and the like. One of the problems is software performance, which refers to the problems of speed, response time, throughput and the like of the software in the running process. The problem of software performance may cause problems of slow running, crashing, stopping response and the like of the software, thereby affecting the experience and working efficiency of users, and becoming the focus of attention of participants of the software. Amazon, usa, notes that, every 0.1 second delay added to a web page, will directly result in a 1% sales loss. Therefore, the method effectively prevents performance problems from flowing into a production environment, avoids economic loss and extra labor cost, and is a focus of attention of participants of software parties. Software configuration refers to the various settings and adjustments made to software during the development and deployment of the software to meet specific needs and circumstances. The software configuration includes various configuration files, environment variables, database connection strings, log settings, security settings, and the like. The software configuration item is an important interface for external interaction of the software, and generally controls the behavior of the software and the allocation of system resources in the form of conditional expressions so as to adapt the software to different environments and loads (the process is called software configuration). Thus, software configuration is closely related to software performance. In recent years, with the increasing demands of users and the continuous development of software running environments, software gradually develops to high configurability, so that performance problems related to configuration are more prominent.
However, it is not trivial to address configuration-related performance issues, as it comes from a number of aspects: firstly, the performance problems related to configuration can be hidden in a software system, specific configuration and environment triggering are needed, and explicit prompt information such as logs, errors and the like can not be generated after triggering. Furthermore, the software configuration item is used as a key interface for the software adaptation environment, the related code is usually required to adapt to the environment, and the influence of the environment change on the performance is quite remarkable, so that the configuration related performance fault is related to the software itself and often related to the characteristics of the running environment. Finally, the configuration quantity is huge, the relationship between the configuration and the user intention such as performance is complex, the configuration document information is difficult to understand, the user lacks knowledge and time energy in a specific field, and none of the configuration document information brings challenges for detecting the problem configuration items.
With the continuous development of software running environments, configuration-related code without defects may cause performance problems due to the fact that new environment characteristics are not adapted, and the problem of the adaptation caused by the change of the running environments is a typical configuration problem, wherein the problem of the adaptation caused by the replacement of new hardware environments is included. There is a great deal of evidence that current mainstream software often cannot develop the characteristics of new hardware, sometimes even when hardware upgrades occur, the software performance "does not rise and fall".
Taking a database management system as an example, the database management system is a key component of a data-intensive system and is widely deployed on commercial platforms such as caching, metadata management, message delivery, online shopping and the like. Various attempts have been made by related practitioners to improve database performance. Among these, the most straightforward and typical approach is to replace better performing storage devices. In recent years, storage devices have undergone revolutionary development. Among them, NVMe SSD is one of the most representative new storage media, its throughput can reach 6GB/s at the highest, delay can reach 10 mu s level at the lowest, and performance in all aspects is far beyond that of the previous generation SATA SSD and HDD. However, the direct deployment of NVMe SSD in a database management system often fails to achieve the desired effect. And after a large amount of user feedback is carried out, the storage equipment is upgraded to the NVMe SSD, the improvement on performance indexes such as delay, throughput and the like is very limited, and sometimes even the phenomenon of non-rising and non-falling of performance occurs. Research has shown that many mainstream database management systems have performance mismatch, and I/O sequential mismatch is one of the major aspects.
There have been many efforts to address the detection of a wide range of performance problems. A significant portion of these approaches focus only on application-level performance issues, such as Caramel: detecting and fixing performance problems that have non-intra-fix issued by Adrian Nistor et al in ICSE 2015 (Caramel: detecting and repairing performance issues with non-intrusive repairs) and Monika Dhok et al in FSE2016 Directed test generation to detect loop inefficiencies (directed test generation to detect inefficient cycles) and PengfeiSu et al in ICSE2019 issued Reductant loads: asoftware inefficiency indicator (redundancy load: indicator of software inefficiency), respectively, but do not take into account potential mismatch between the application and underlying devices. In addition, there are some work focused on detecting configuration-related performance problems, but there are more or less drawbacks, such as the Violet proposed by yiong Hu et al in Automated reasoning and detection of specious configuration in large systems with symbolic execution (automatic reasoning and detection of plausible configurations in large systems with symbolic execution) published in OSDI2020, which can systematically reason using the performance impact of selective symbolic execution configuration parameters and can get the combined effect between them and the relationship to the inputs, but relies on a large number of symbolic executions, which would take a large amount of computational resources and time. Meanwhile, since the accuracy of symbol execution is limited by the size and complexity of the program, the accuracy of symbol execution may be affected for large programs and complex programs. In addition, there is a mainstream method for judging the problem of the discomfort based on heuristic rules of the performance change, and a method (hereinafter referred to as a benchmark method) for testing by using the device to be tested, the configuration items, the combination and the configuration values, but the rule used as the criterion for judging the problem of the discomfort is rough, and the effectiveness and the testing efficiency are low.
Disclosure of Invention
Aiming at the problem of performance uncomfortableness which is not perceived by developers and users in the current mainstream database management system, the invention provides a dynamic and static combined database performance uncomfortableness detection method aiming at the problem of I/O sequence correlation, which is used for detecting the problem of potential I/O sequence correlation performance uncomfortableness of the database management system to be detected and helping the developers to continuously optimize the database management system.
In order to solve the above problems, a database performance problem detection system is first constructed. The database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module. Then, the variable analysis module analyzes the configuration items of the database management system to be detected and positions a configuration item variable set Conf corresponding to the configuration items in the database management system to be detected. The stain analysis module performs stain analysis on the configuration item variables in the Conf output by the variable analysis module to obtain an I/O sequential correlation configuration item set Conf'. The test case generation module combines the I/O sequence related configuration items in Conf', different test loads B, hardware equipment D and database management system types S to generate a test case set T. The data collection module tests the database management system to be detected by adopting the test cases in T, and collects the running data of the database management system to be detected for running the test cases. The problem matching module detects the performance unsuitable problem of the database management system to be detected according to the runtime data.
The specific technical scheme of the invention is as follows:
first, a database performance problem detection system is constructed. The database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module. The variable analysis module is connected with the stain analysis module, performs variable analysis on the original codes and the configuration items of the database management system to be detected, obtains a configuration item variable set Conf corresponding to the configuration items, and sends the Conf to the stain analysis module. The stain analysis module is connected with the variable analysis module and the test case generation module, receives Conf from the variable analysis module, receives manually screened I/O sequence related system call from the keyboard, performs stain analysis, screens out a configuration item set Conf 'related to the I/O sequence, and sends Conf' to the test case generation module. The test case generation module is connected with the stain analysis module and the data collection module, receives Conf' from the stain analysis module and generates a configuration item combination set C. The test case generation module generates a test case set T according to the set item combination set C, the test load command set B, the hardware equipment set D and the to-be-detected database management system type set S, and sends the T to the data collection module. The DATA collection module is connected with the test case generation module and the problem matching module, receives the test case set T from the test case generation module, tests the database management system to be detected by adopting the test case in T, records the runtime DATA DATA when the database management system to be detected runs the test case, and sends the runtime DATA to the problem matching module. The problem matching module is connected with the DATA collecting module, receives the DATA DATA in the running process from the DATA collecting module, and judges whether the database management system to be detected has the I/O sequential unadapted problem or not according to the DATA.
Secondly, the variable analysis module locates the configuration item variable corresponding to the configuration item to obtain a configuration item variable set Conf corresponding to the configuration item, and sends the Conf to the stain analysis module, wherein the method comprises the following steps:
the variable analysis module analyzes the source code and configuration items of the database management system to be detected by using a ConfMapper algorithm (see "ConfMapper: automated Variable Finding for Configuration Items in Source Code (a method for automatically finding initial variables of configuration items from software source code)" pages 3-7 published by Shulin Zhou in QRS 2016), and positions the detected databaseConfiguration item variable set Conf, conf= { Conf corresponding to configuration item in management system source code 1 ,...,conf n ,...,conf N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf n And (5) sending the Conf to a stain analysis module for the nth configuration item variable in the Conf.
Thirdly, the stain analysis module receives Conf from the variable analysis module, adopts a static stain analysis method to identify configuration item variables related to the I/O sequential system in the Conf, confirms whether the configuration item variables in the Conf are related to I/O sequential system call, screens out a configuration item set Conf' related to the I/O sequential system, and comprises the following steps:
3.1 identifying system calls related to I/O sequentiality. By reading the official manual of Linux (version 5.4.0 and above is required), the MM system calls of the Linux kernel are investigated, the system calls which can influence the I/O sequence are filtered out from the MM system calls, and cross-filtering (mutual inspection is performed after independent filtering by a plurality of people) is performed on each system call which can influence the I/O sequence, so that MM' system calls which influence the I/O sequence in the MM system calls of the Linux kernel are obtained. MM is a positive integer, as for version 5.4.0 Linux, mm=335. MM' is more than or equal to 1 and less than or equal to MM.
3.2 classifying the MM' system calls which affect I/O sequency and are filtered by 3.1 to obtain a read series S1 and a write series S2, taking the analysis result of the version of Linux5.4.0 as an example, wherein the results comprise:
the read series S1 includes five types of read, read64, readv, preadv, preadv read system calls.
Write sequence S2 includes five types of write system calls, write, pwrite64, writev, pwritev, pwritev.
MM' =10 at this time.
3.3 the search space of known configuration item combinations is enormous and many sampling techniques cannot be applied directly because they change two or more configuration items in each sampling configuration. Therefore, the stain analysis module traverses each configuration item in the Conf, adopts a static stain analysis method to detect the data flow of each configuration item in the Conf, detects whether the I/O related system call is polluted, and excludes the configuration item from the Conf if the I/O related system call is not polluted, so as to obtain an I/O sequential related configuration item set Conf' to obviously reduce the search space of configuration item combination. The static stain analysis method adopts a stain analysis method based on a UseDefine chain, and comprises the following specific steps:
3.3.1 compiling the database management system source code to be tested into LLVM IR (LLVM Intermediate Representation );
3.3.2 let variable n=1, let I/O sequentially related set of configuration items Conf' = { };
3.3.3 construction configuration item variable conf n Usedefinition chain of (d), conf n Is marked as dirty data, conf n The return function value or pointer when read is marked as the propagation source of dirty data. Along conf n Usedefinition chain of (c), tracking conf n Propagation in source code of database management system to be detected, determining conf n In which database management system code segments to be tested may be modified or used.
3.3.4 operations involving taint data in the database management system to be tested are divided into four categories: the method comprises the steps of using taint data, spreading taint data, polluting taint data and spreading other data except the taint data, and connecting all taint paths (spreading tracks of the taint data in a program) formed by the taint data and the taint spreading for operation in a database management system to be detected to form a complete taint spreading graph. Let all code segments on the taint propagation graph set as F, f= { F 1 ,f 2 ,...,f h ,...f H H is the number of code segments in F, H is 1.ltoreq.H, F h Is the h code segment in F;
3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the configuration item variable conf is specified n Is managed by the databaseWhen no I/O related system call is polluted in the process of reading and using the processing system, the configuration item is removed from Conf, and the method is as follows:
3.3.5.1 let h=1;
3.3.5.2 inspection f h If any one of the system call in the read series S1 and the write series S2 in the step 3.2 is called, the configuration item variable conf is described n I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.4; if not, it is indicated that no I/O related system call is contaminated in the process, and 3.3.5.3.
3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, 3.3.5.5 is reached.
3.3.5.4 adding Conf to the set Conf n
3.3.6 if N < N, n=n+1, 3.3.3; if n=n, we mean that we get the I/O sequentiality-related configuration item set Conf ', conf' = { Conf 1 ,...,conf z ,...,conf N′ Z is more than or equal to 1 and less than or equal to N ', N ' is the number of I/O sequential correlation configuration items in Conf ', conf z Is the z-th I/O sequentiality-related configuration item in Conf'. And 3.3.7.
3.3.7 the stain analysis module sends the Conf' to the test case generation module.
Fourth, the test case generating module generates test cases, and the method is as follows:
4.1 the test case creation module extracts the grammar types and value ranges of the software configuration items of the database management system to be tested using the Spex algorithm (see page 7-10 of article "Do Not Blame Users for Misconfigurations (no user's configuration errors blamed)" published by Tianyin Xu et al in SOSP 2013), and classifies the extracted grammar types into four categories: numerical type (int), boolean type (bool), enumeration type (enum), string type (string);
4.2 test case generating Module generates for Conf' a set V of sets of values to be tested, V= { vv 1 ,vv 2 …,vv z …,vv N′ }, where v is z Is the Z-th set of values to be measured in V, i.e., the Z-th configuration item Conf in Conf z Is a set of values to be measured, is conf z Is the kth value, K z Generating a module of conf for the test case z The number of the generated values is more than or equal to 1 and less than or equal to K. The method comprises the following steps:
4.2.1 initializing variable z=1;
4.2.2 if conf z The corresponding expected label is empty, then letAt this time k=0, v will be z Put in V, turn 4.2.7;
4.2.3 if conf z Is of Boolean type (bool), let v z = {0,1}, where k=2, v will be v z Put in V, turn 4.2.7;
4.2.4 if conf z For enumeration type (enum), letWherein the method comprises the steps ofConf extracted for Spex algorithm z All possible values of (a) will be v z Put in V, turn 4.2.7;
4.2.5 if conf z For string type (string), let(conclusion of "Tuning backfirednot (always) you fault: understanding and detecting configuration-related performance bugs published in ESEC/FSE 2019 according to He Haochen (configuration adjustment is counterproductive,at this time k=0, v will be z Put in V to transfer 4.2.7;
4.2.6 if conf z Is of the numerical type (int), then pair conf z Is sampled by the following method: recording conf extracted by Spex algorithm z The minimum value of (2) is Min, the maximum value is Max, and v is set to z ={Min,10·Min,10 2 ·Min,Max,10 -1 ·Max,10 -2 Max, where k=5, v z Put in V to transfer 4.2.7;
4.2.7 if z=n', turn 4.3; otherwise, let z=z+1, turn 4.2.2;
4.3 vs v 1 ,vv 2 …,vv z …,vv N′ Taking the cartesian product, we get the cartesian product vcartesian=vv 1 ×vv 2 ×…×vv z …×vv N′
4.4 test case Generation Module test commands are generated using a Performance test tool (e.g., sysbench (version 1.0.14) or apache-benchmark (version 2.4)). The method comprises the following steps: the classical Pair-wise method (Pair-wise Testing is a combinatorial method of software testing that, for each Pair of input parameters to a system, tests all possible discrete combinations of those parameters method is a combined method in the field of software testing, which tests all possible discrete combinations of these parameters for each Pair of input parameters of the system "- - - - - - -" Pragmatic Software Testing: becoming an Effective and Efficient Test Professiona1, "Utility software test: becomes an efficient test specialty"), samples parameters of the performance test tool, then inputs parameter values (such as concurrency, load type, data table size, data table number, read operation proportion, write operation proportion) of the performance test tool obtained by sampling into the performance test tool, outputs test commands, and obtains a test command set b= { B 1 ,b 2 ,b 3 ,...,b y ,...,b Y Y is more than or equal to 1 and less than or equal to Y, and Y is the number of test commands in B, B y The y test command in B;
4.5 the test case generating module receives a set of hardware devices D defined by a user, wherein D is composed of NVMe SSD, SATA SSD, where d=d1_d2, for experimental comparison. Wherein D1 represents a conventional hard disk HDD set, d1= { D11, D12} (AMD sharp 95700XT and 64GBDDR4 memory are selected in the method, that is, d11=amd sharp 95700XT, d12=64 GBDDR4 memory), D2 represents a solid state HDD set, d2= { D21, D22, D23, D24} (three star 980pro, western number sn850, three star 860evo, and western number blue are selected in the method, that is, d21=three star 980pro, d22=western number sn850, d23=three star 8620 evo, d24=western number blue).
4.6 the test case generating module generates a configuration item combination set C. C= { C 1 ,...,c l ,...,c L }, wherein c l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v 1 Any element in the steel is 1-K 1 ;/>Is a set v z Any element in the steel, lz is more than or equal to 1 and less than or equal to K z ;…;/>Is a set v N′ L is a positive integer, and represents the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item can freely take a value from a value set to be measured.
4.7 the test case creation module creates a database management system type set s= { MySQL, postgreSQL, mariaDB, SQLite, mongoDB, redis }. These are currently 6 main stream database management software.
4.8 test case Generation Module GenerationTest case set T, t= { T 1 ,t 2 ,t 3 ,...t a ,…,T A },1≤a≤A,t a Is a four-tuple { b } a ,c a ,d a ,s a },c a Is any one element, s, in the configuration item combination set C a Is any element in a database management system type collection S, b a For testing any element in command set B, d a For any element in the hardware device set D, A is the total number of test cases in T, and for the test cases to contain all four-tuple combinations, it is required that A be equal to or greater than 36 XY L.
Fifthly, the DATA collection module receives the test case set T, runs the test cases in the T, collects the DATA in the running process, and obtains a DATA set DATA in the running process, wherein the method comprises the following steps:
5.1 let variable a=1, let the runtime DATA set data= { };
5.2 running test case t a
5.3 data collection Module obtains the slave type s by any one of blktrace, linux Kernel event tracking and eBPF diagnostics (a virtual machine technology running in the Linux Kernel) a I/O paths of the database management system to the device driver to be tested.
5.4 data collection Module Using hardware storage device d a In the case of (1) creating a test suite independent of the database management system under test using the open source tool fio (version 3.2 above), the simulation type is s a I/O behavior of the database management system to be tested, for load command b a Lower usage configuration item c a Is of the type s a The database management system to be detected is tested.
5.5 from test case t a When starting to run, the data collection module continuously uses the fio tool to carry out configuration adjustment parameters and system setting modification, adjusts the configuration items, and dynamically monitors and records the type s a Runtime data of a database management system to be tested a Including CPU utilization, NVMe queue utilization, and I/O request order, the dat will bea a Added to the set DATA until the test case t a And (5) ending the operation.
5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6.
5.7 DATA collection module outputs the runtime DATA set DATA, data= { DATA, to the problem matching module 1 ,…,data a ,...,data A },data a Is the a-th runtime DATA in DATA.
And step six, the problem matching module determines whether the database management system to be detected has a performance mismatch problem according to the DATA.
6.1 let variable a=1, let the number of performance mismatch problems m=0.
6.2 problem matching Module pairs runtime DATA in DATA according to the following criteria a Judging to obtain a conclusion whether the database management system to be detected has a performance uncomfortableness problem, wherein the method comprises the following steps:
6.2.1 judging whether the database management system to be detected after configuration adjustment performs 'random-order' conversion, wherein the judging method is to analyze data a It is determined whether a change in the order of the I/O requests has occurred. If the I/O request sequence degree of the database management system to be detected is inconsistent before and after the parameters and the system setting are modified by the fio tool, the random-sequence is considered to be converted into 6.2.2, otherwise, the conversion is converted into 6.4.
6.2.2 judging whether the performance of the database management system is reduced or not, wherein the judging method is to compare data a And (3) whether the CPU utilization rate and the NVMe queue utilization rate are reduced before and after the modification. If the CPU utilization rate and the NVMe queue are both reduced after configuration adjustment, the final performance change of the database management system is considered to be reduced, and the process is changed to 6.3; otherwise, turning to 6.4.
6.3 the problem matching module judges that the I/O sequential unadapted problem exists in the database management system to be detected, makes M=M+1, records the I/O sequential unadapted problem existing in the database management system to be detected at the moment as a defect, and obtains the defect ID of the defect (the official does not use a standard fault report management system, namely the ID converted by the tinyurl. Mobi, and can be directly accessed through splicing).
6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5.
And 6.5, ending.
Compared with the prior art, the invention has the following beneficial effects:
1. the third step of the invention uses static stain analysis method to identify all configuration item variables related to the I/O sequential system in Conf, confirms whether the configuration item variables in Conf are related to the I/O sequential system call, and tests by screening the I/O related configuration items, thus greatly reducing testing cost. Compared with the reference method in the background art, the time cost for testing a database management system is unequal to 0.6-5.5 hours, and the average time cost for testing a database in the reference method is more than 2000 hours.
2. The invention can effectively detect the problem of database I/O sequential unadapted, 5 kinds of 15I/O sequential unadapted problems are detected in total by the database management system MySQL, postgreSQL and MariaDB of three main streams, the defect IDs are #10355l, #107362, #103272, # yVIz and #26790 respectively, and all I/O sequential unadapted problems are confirmed by a developer after being reported to the developer.
3. The fourth test case generation module is based on the generated test cases, the generated test cases have the characteristic of comprehensive coverage, and a large number of arrangement and combination of different configuration items, load commands and hardware equipment can enable the test case set to cover most of I/O sequential non-adaptation problems, so that the detection accuracy is greatly improved.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a logical block diagram of an I/O sequential mismatch problem detection system constructed in accordance with the present invention;
Detailed Description
The present invention will be described below with reference to the accompanying drawings.
As shown in fig. 1, the present invention includes the steps of:
first, a database performance problem detection system is constructed. The database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module. The variable analysis module is connected with the stain analysis module, performs variable analysis on the original codes and the configuration items of the database management system to be detected, obtains a configuration item variable set Conf corresponding to the configuration items, and sends the Conf to the stain analysis module. The stain analysis module is connected with the variable analysis module and the test case generation module, receives Conf from the variable analysis module, receives manually screened I/O sequence related system call from the keyboard, performs stain analysis, screens out a configuration item set Conf 'related to the I/O sequence, and sends Conf' to the test case generation module. The test case generation module is connected with the stain analysis module and the data collection module, receives Conf' from the stain analysis module and generates a configuration item combination set C. The test case generation module generates a test case set T according to the set item combination set C, the test load command set B, the hardware equipment set D and the to-be-detected database management system type set S, and sends the T to the data collection module. The DATA collection module is connected with the test case generation module and the problem matching module, receives the test case set T from the test case generation module, tests the database management system to be detected by adopting the test case in T, records the runtime DATA DATA when the database management system to be detected runs the test case, and sends the runtime DATA to the problem matching module. The problem matching module is connected with the DATA collecting module, receives the DATA DATA in the running process from the DATA collecting module, and judges whether the database management system to be detected has the I/O sequential unadapted problem or not according to the DATA.
Secondly, the variable analysis module locates the configuration item variable corresponding to the configuration item to obtain a configuration item variable set Conf corresponding to the configuration item, and sends the Conf to the stain analysis module, wherein the method comprises the following steps:
the variable analysis module uses ConfMapper algorithm (see "ConfMapper: automated Variable Finding for Configuration Items in Source Code (a method of automatically discovering initial variables of configuration items from software source code)" pages 3-7 published by Shulin Zhou in QRS 2016) to treat source code and configuration of the database management system to be testedAnalyzing the items, and positioning a configuration item variable set Conf, conf= { Conf corresponding to the configuration items in the source code of the detected database management system 1 ,...,conf n ,...,conf N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf n And (5) sending the Conf to a stain analysis module for the nth configuration item variable in the Conf.
Thirdly, the stain analysis module receives Conf from the variable analysis module, adopts a static stain analysis method to identify configuration item variables related to the I/O sequential system in the Conf, confirms whether the configuration item variables in the Conf are related to I/O sequential system call, screens out a configuration item set Conf' related to the I/O sequential system, and comprises the following steps:
3.1 identifying system calls related to I/O sequentiality. By reading the official manual of Linux (version 5.4.0 and above is required), the MM system calls of the Linux kernel are investigated, the system calls which can influence the I/O sequence are filtered out from the MM system calls, and cross-filtering (mutual inspection is performed after independent filtering by a plurality of people) is performed on each system call which can influence the I/O sequence, so that MM' system calls which influence the I/O sequence in the MM system calls of the Linux kernel are obtained. MM is a positive integer, as for version 5.4.0 Linux, mm=335. MM' is more than or equal to 1 and less than or equal to MM.
3.2 classifying the MM' system calls which affect I/O sequency and are filtered by 3.1 to obtain a read series S1 and a write series S2, taking the analysis result of the version of Linux5.4.0 as an example, wherein the results comprise:
the read series S1 includes five types of read, read64, readv, preadv, preadv read system calls.
Write sequence S2 includes five types of write system calls, write, pwrite64, writev, pwritev, pwritev.
MM' =10 at this time.
3.3 the search space of known configuration item combinations is enormous and many sampling techniques cannot be applied directly because they change two or more configuration items in each sampling configuration. Therefore, the stain analysis module traverses each configuration item in the Conf, adopts a static stain analysis method to detect the data flow of each configuration item in the Conf, detects whether the I/O related system call is polluted, and excludes the configuration item from the Conf if the I/O related system call is not polluted, so as to obtain an I/O sequential related configuration item set Conf' to obviously reduce the search space of configuration item combination. The static stain analysis method adopts a stain analysis method based on a UseDefine chain, and comprises the following specific steps:
3.3.1 compiling the database management system source code to be tested into LLVM IR (LLVM Intermediate Representation );
3.3.2 let variable n=1, let I/O sequentially related set of configuration items Conf' = { };
3.3.3 construction configuration item variable conf n Usedefinition chain of (d), conf n Is marked as dirty data, conf n The return function value or pointer when read is marked as the propagation source of dirty data. Along conf n Usedefinition chain of (c), tracking conf n Propagation in source code of database management system to be detected, determining conf n In which database management system code segments to be tested may be modified or used.
3.3.4 operations involving taint data in the database management system to be tested are divided into four categories: the method comprises the steps of using taint data, spreading taint data, polluting taint data and spreading other data except the taint data, and connecting all taint paths (spreading tracks of the taint data in a program) formed by the taint data and the taint spreading for operation in a database management system to be detected to form a complete taint spreading graph. Let all code segments on the taint propagation graph set as F, f= { F 1 ,f 2 ,...,f h ,...f H H is the number of code segments in F, H is 1.ltoreq.H, F h Is the h code segment in F;
3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the allocation is indicatedTerm variable conf n No I/O related system call is contaminated during reading and use by the database management system, and the configuration item is excluded from Conf by:
3.3.5.1 let h=1;
3.3.5.2 inspection f h If any one of the system call in the read series S1 and the write series S2 in the step 3.2 is called, the configuration item variable conf is described n I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.4; if not, it is indicated that no I/O related system call is contaminated in the process, and 3.3.5.3.
3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, 3.3.5.5 is reached.
3.3.5.4 adding Conf to the set Conf n
3.3.6 if N < N, n=n+1, 3.3.3; if n=n, we mean that we get the I/O sequentiality-related configuration item set Conf ', conf' = { Conf 1 ,...,conf z ,...,conf N′ Z is more than or equal to 1 and less than or equal to N ', N ' is the number of I/O sequential correlation configuration items in Conf ', conf z Is the z-th I/O sequentiality-related configuration item in Conf'. And 3.3.7.
3.3.7 the stain analysis module sends the Conf' to the test case generation module.
Fourth, the test case generating module generates test cases, and the method is as follows:
4.1 the test case creation module extracts the grammar types and value ranges of the software configuration items of the database management system to be tested using the Spex algorithm (see page 7-10 of article "Do Not Blame Users for Misconfigurations (no user's configuration errors blamed)" published by Tianyin Xu et al in SOSP 2013), and classifies the extracted grammar types into four categories: numerical type (int), boolean type (bool), enumeration type (enum), string type (string);
4.2 test case generating Module generates for Conf' a set V of sets of values to be tested, V= { vv 1 ,vv 2 …,vv z …,vv N′ }, where v is z Is the Z-th set of values to be measured in V, i.e., the Z-th configuration item Conf in Conf z Is a set of values to be measured, is conf z Is the kth value, K z Generating a module of conf for the test case z The number of the generated values is more than or equal to 1 and less than or equal to K. The method comprises the following steps:
4.2.1 initializing variable z=1;
4.2.2 if conf z The corresponding expected label is empty, then letAt this time k=0, v will be z Put in V, turn 4.2.7;
4.2.3 if conf z Is of Boolean type (bool), let v z = {0,1}, where k=2, v will be v z Put in V, turn 4.2.7;
4.2.4 if conf z For enumeration type (enum), letWherein the method comprises the steps ofConf extracted for Spex algorithm z All possible values of (a) will be v z Put in V, turn 4.2.7;
4.2.5 if conf z For string type (string), let(according to He Haochen, "Tuning backfirednot (always) you fault: understanding and detecting configuration-related performer", published in ESEC/FSE 2019The conclusion of the management bug (which is the best for the configuration adjustment z Put in V to transfer 4.2.7;
4.2.6 if conf z Is of the numerical type (int), then pair conf z Is sampled by the following method: recording conf extracted by Spex algorithm z The minimum value of (2) is Min, the maximum value is Max, and v is set to z ={Min,10·Min,10 2 ·Min,Max,10 -1 ·Max,10 -2 Max, where k=5, v z Put in V to transfer 4.2.7;
4.2.7 if z=n', turn 4.3; otherwise, let z=z+1, turn 4.2.2;
4.3 vs v 1 ,vv 2 …,vv z …,vv N′ Taking the cartesian product, we get the cartesian product vcartesian=vv 1 ×vv 2 ×…×vv z …×vv N′
4.4 test case Generation Module test commands are generated using a Performance test tool (e.g., sysbench (version 1.0.14) or apache-benchmark (version 2.4)). The method comprises the following steps: the parameters of the performance test tool are sampled by adopting a classical Pair-wise method (the Pair-wise Testing is a combinatorial method of software testing that, for each Pair of input parameters to a system, tests all possible discrete combinations of those parameters method is a combination method in the field of software testing, the method aims at each Pair of input parameters of a system, all possible discrete combinations of the parameters are tested, - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (35 (r) 35 (r) and is a high-efficiency test specialty) and then the parameter values (such as concurrency, load type, data table size, data table number, read operation proportion and write operation proportion) of the performance test tool obtained by sampling are input into the performance test tool, and test commands are output to obtain a test command set B = { B) 1 ,b 2 ,b 3 ,...,b y ,...,b Y Y is greater than or equal to 1 and less than or equal to Y, Y is the test in BNumber of commands, b y The y test command in B;
4.5 the test case generating module receives a set of hardware devices D defined by a user, wherein D is composed of NVMe SSD, SATA SSD, where d=d1_d2, for experimental comparison. Wherein D1 represents a conventional hard disk HDD set, d1= { D11, D12} (AMD sharp 95700XT and 64GBDDR4 memory are selected in the method, that is, d11=amd sharp 95700XT, d12=64 GBDDR4 memory), D2 represents a solid state HDD set, d2= { D21, D22, D23, D24} (three star 980pro, western number sn850, three star 860evo, and western number blue are selected in the method, that is, d21=three star 980pro, d22=western number sn850, d23=three star 8620 evo, d24=western number blue).
4.6 the test case generating module generates a configuration item combination set C. C= { C 1 ,...,c l ,...,c L }, wherein c l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v 1 Any element in the steel is 1-K 1 ;/>Is a set v z Any element in the steel, lz is more than or equal to 1 and less than or equal to K z ;…;/>Is a set v N′ L is a positive integer, and represents the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item can freely take a value from a value set to be measured.
4.7 the test case creation module creates a database management system type set s= { MySQL, postgreSQL, mariaDB, SQLite, mongoDB, redis }. These are currently 6 main stream database management software.
4.8 test case generating module generates test case set T, t= { T 1 ,t 2 ,t 3 ,...t a ,...,t A },1≤a≤A,t a Is a four-tuple { b } a ,c a ,d a ,s a },c a Is any one element, s, in the configuration item combination set C a Is any element in a database management system type collection S, b a For testing any element in command set B, d a For any element in the hardware device set D, A is the total number of test cases in T, and for the test cases to contain all four-tuple combinations, it is required that A be equal to or greater than 36 XY L.
Fifthly, the DATA collection module receives the test case set T, runs the test cases in the T, collects the DATA in the running process, and obtains a DATA set DATA in the running process, wherein the method comprises the following steps:
5.1 let variable a=1, let the runtime DATA set data= { };
5.2 running test case t a
5.3 data collection Module obtains the slave type s by any one of blktrace, linux Kernel event tracking and eBPF diagnostics (a virtual machine technology running in the Linux Kernel) a I/O paths of the database management system to the device driver to be tested.
5.4 data collection Module Using hardware storage device d a In the case of (1) creating a test suite independent of the database management system under test using the open source tool fio (version 3.2 above), the simulation type is s a I/O behavior of the database management system to be tested, for load command b a Lower usage configuration item c a Is of the type s a The database management system to be detected is tested.
5.5 from test case t a When starting to run, the data collection module uses the fio tool to continuously carry out configuration adjustment parameters and system setting modification, adjusts configuration items, dynamically monitors and controlsRecord type s a Runtime data of a database management system to be tested a Including CPU utilization, NVMe queue utilization, and I/O request order, will data a Added to the set DATA until the test case t a And (5) ending the operation.
5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6.
5.7 DATA collection module outputs the runtime DATA set DATA, data= { DATA, to the problem matching module 1 ,…,data a ,...,data A },data a Is the a-th runtime DATA in DATA.
And step six, the problem matching module determines whether the database management system to be detected has a performance mismatch problem according to the DATA.
6.1 let variable a=1, let the number of performance mismatch problems m=0.
6.2 problem matching Module pairs runtime DATA in DATA according to the following criteria a Judging to obtain a conclusion whether the database management system to be detected has a performance uncomfortableness problem, wherein the method comprises the following steps:
6.2.1 judging whether the database management system to be detected after configuration adjustment performs 'random-order' conversion, wherein the judging method is to analyze data a It is determined whether a change in the order of the I/O requests has occurred. If the I/O request sequence degree of the database management system to be detected is inconsistent before and after the parameters and the system setting are modified by the fio tool, the random-sequence is considered to be converted into 6.2.2, otherwise, the conversion is converted into 6.4.
6.2.2 judging whether the performance of the database management system is reduced or not, wherein the judging method is to compare data a And (3) whether the CPU utilization rate and the NVMe queue utilization rate are reduced before and after the modification. If the CPU utilization rate and the NVMe queue are both reduced after configuration adjustment, the final performance change of the database management system is considered to be reduced, and the process is changed to 6.3; otherwise, turning to 6.4.
6.3 the problem matching module judges that the I/O sequential unadapted problem exists in the database management system to be detected, makes M=M+1, records the I/O sequential unadapted problem existing in the database management system to be detected at the moment as a defect, and obtains the defect ID of the defect (the official does not use a standard fault report management system, namely the ID converted by the tinyurl. Mobi, and can be directly accessed through splicing).
6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5.
And 6.5, ending.
In order to verify the capability of the invention in finding out the problem of performance mismatch related to I/O sequence of a detection database, the experiment of the invention is carried out on a computer carrying 8-Core Intel Core I7-9700K,32GB memory and NVMe SSD 500GB solid state disk. And (3) the system call extracted from the Linux 5.4.0 version kernel is selected in the experiment, and the system call and MySQL, postgreSQL, mariaDB, SQLite, mongoDB and Redis six main stream open source database management systems are used as experimental objects for evaluation. In the experiment, the source codes and configuration items of six types of database management system software and the manually extracted system call of the Linux 5.4.0 version kernel are used as system input, and program variables are extracted from a variable analysis module and output to a stain analysis module. The stain analysis module receives system call and program variable as input, screens out configuration items related to I/O sequence through stain analysis, outputs the configuration items to the test case generation module for test case generation, outputs the test cases to the data collection module after generation, tests and collects data in the six database management systems, and finally judges whether the I/O sequence non-adaptation problem occurs in the problem matching module. The experimental results are shown in Table 1, the invention detects 15I/O sequential discomfort problems of 5 types in the three databases of MySQL, postgreSQL and MariaDB, and the invention is confirmed by a developer, while for SQLite, mongoDB and Redis, the discomfort problems related to I/O sequential are not detected.
TABLE 1 detection of I/O sequential mismatch problem with the present invention
/>

Claims (10)

1. The database performance problem detection method facing I/O sequence is characterized by comprising the following steps:
firstly, constructing a database performance problem detection system; the database performance problem detection system consists of a variable analysis module, a stain analysis module, a test case generation module, a data collection module and a problem matching module; the variable analysis module is connected with the stain analysis module, performs variable analysis on the original codes and the configuration items of the database management system to be detected, obtains a configuration item variable set Conf corresponding to the configuration items, and sends the Conf to the stain analysis module; the stain analysis module is connected with the variable analysis module and the test case generation module, receives Conf from the variable analysis module, receives manually screened I/O sequence related system call from a keyboard, performs stain analysis, screens out a configuration item set Conf 'related to the I/O sequence, and sends Conf' to the test case generation module; the test case generation module is connected with the stain analysis module and the data collection module, receives Conf' from the stain analysis module and generates a configuration item combination set C; the test case generation module generates a test case set T according to the configuration item combination set C, the test load command set B, the hardware equipment set D and the to-be-detected database management system type set S, and sends the test case set T to the data collection module; the DATA collection module is connected with the test case generation module and the problem matching module, receives the test case set T from the test case generation module, tests the database management system to be detected by adopting the test case in the T, records the runtime DATA DATA when the database management system to be detected runs the test case, and sends the runtime DATA to the problem matching module; the problem matching module is connected with the DATA collecting module, receives the DATA DATA in the running process from the DATA collecting module, and judges whether the database management system to be detected has the I/O sequential unadapted problem or not according to the DATA;
Secondly, the variable analysis module locates the configuration item variable corresponding to the configuration item to obtain a configuration item variable set Conf corresponding to the configuration item, and sends the Conf to the stain analysis module, wherein the method comprises the following steps:
the variable analysis module analyzes the source code and the configuration items of the database management system to be detected by using a ConfMapper algorithm, and positions a configuration item variable set Conf, conf= { Conf in the source code of the database management system to be detected, which corresponds to the configuration items 1 ,...,conf n ,...,conf N N is the number of configuration item variables in Conf, N is more than or equal to 1 and less than or equal to N, conf n The method comprises the steps of sending Conf to a stain analysis module for an nth configuration item variable in the Conf;
thirdly, the stain analysis module receives Conf from the variable analysis module, adopts a static stain analysis method to identify configuration item variables related to the I/O sequential system in the Conf, confirms whether the configuration item variables in the Conf are related to I/O sequential system call, screens out a configuration item set Conf' related to the I/O sequential system, and comprises the following steps:
3.1 identifying system calls related to I/O sequentiality; the method comprises the steps of searching MM system calls of a Linux kernel by reading a Linux official manual, filtering out system calls which possibly influence I/O (input/output) sequency from the MM system calls, and performing cross-filtering on each system call which possibly influence I/O sequency to obtain MM' system calls which influence I/O sequency in the MM system calls of the Linux kernel; MM is a positive integer, and is more than or equal to 1 and less than or equal to MM';
3.2 classifying the MM' system calls which affect the I/O sequence and are filtered by 3.1 to obtain a read series S1 and a write series S2, wherein the read series S1 comprises read system calls, and the write series S2 comprises write system calls;
3.3 traversing each configuration item in the Conf by the stain analysis module, carrying out data flow detection on each configuration item in the Conf by adopting a static stain analysis method, detecting whether the I/O related system call is polluted, and if the I/O related system call is not polluted, removing the configuration item from the Conf to obtain an I/O sequence related configuration item set Conf'; the static stain analysis method adopts a stain analysis method based on a UseDefine chain, and comprises the following specific steps:
3.3.1 compiling the source code of the database management system to be detected into LLVM intermediate representation;
3.3.2 let variable n=1, let I/O sequentially related set of configuration items Conf' = { };
3.3.3 construction configuration item variable conf n Usedefinition chain of (d), conf n Is marked as dirty data, conf n The return function value or pointer when read is marked as the propagation source of dirty data; along conf n Usedefinition chain of (c), tracking conf n Propagation in source code of database management system to be detected, determining conf n In which database management system code segments to be tested may be modified or used;
3.3.4 operations involving taint data in the database management system to be tested are divided into four categories: the method comprises the steps of using taint data, spreading taint data, polluting taint data and spreading other data except the taint data, and connecting taint paths formed by the taint data and the taint spreading for all operations in a database management system to be detected to form a complete taint spreading diagram; let all code segments on the taint propagation graph set as F, f= { F 1 ,f 2 ,...,f h ,...f H H is the number of code segments in F, H is 1.ltoreq.H, F h Is the h code segment in F;
3.3.5 checking if the code segment in F invokes either System invocation in the read series S1 and write series S2 in step 3.2, if so, specifying the configuration item variable conf n I/O related system calls are contaminated during reading and use by the database management system, and if there are no calls, the configuration item variable conf is specified n When no I/O related system call is polluted in the process of being read and used by the database management system, the configuration item is removed from the Conf, and an I/O sequential related configuration item set Conf' is obtained;
3.3.6 if N < N, n=n+1, 3.3.3; if n=n, we mean that we get the I/O sequentiality-related configuration item set Conf ', conf' = { Conf 1 ,...,conf z ,...,conf N′ Z is more than or equal to 1 and less than or equal to N ', N ' is the number of I/O sequential correlation configuration items in Conf ', conf z Is the z-th I/O sequential related configuration item in Conf', transfer3.3.7;
3.3.7 stain analysis module sends Conf' to test case generation module;
fourth, the test case generating module generates test cases, and the method is as follows:
4.1, the test case generating module extracts the grammar types and the value ranges of the software configuration items of the database management system to be detected by using a Spex algorithm, and classifies the extracted grammar types into four types: numerical type, boolean type, enumeration type, string type;
4.2 test case generating Module generates for Conf' a set V of sets of values to be tested, V= { vv 1 ,vv 2 ...,vv z …,vv N′ }, where v is z Is the Z-th set of values to be measured in V, i.e., the Z-th configuration item Conf in Conf z Is a set of values to be measured, is conf z Is the kth value, K z Generating a module of conf for the test case z The number of the generated values is more than or equal to 1 and less than or equal to K;
4.3 vs v 1 ,vv 2 …,vv z …,vv N′ Taking the cartesian product, we get the cartesian product vcartesian=vv 1 ×vv 2 ×…×vv z …×vv N′
4.4, the test case generation module adopts a performance test tool to generate a test command; the method comprises the following steps: sampling parameters of the performance test tool by using a pair-wise method, inputting parameter values of the performance test tool obtained by sampling into the performance test tool, and outputting test commands to obtain a test command set B= { B 1 ,b 2 ,b 3 ,...,b y ,...,b Y Y is more than or equal to 1 and less than or equal to Y, and Y is the number of test commands in B, B y The y test command in B;
4.5 the test case generating module receives a hardware device set D defined by a user, wherein D=D1U D2;
wherein D1 represents a traditional hard disk HDD set, and D2 represents a solid state disk SSD set;
4.6, the test case generating module generates a configuration item combination set C; c= { C 1 ,...,c l ,...,c L }, wherein c l Is a set of configuration item combinations generated by the test case generation module, wherein->Is a set v 1 Any element in the steel is 1-K 1 ;/>Is a set v z Any element in the steel, lz is more than or equal to 1 and less than or equal to K z ;…;/>Is a set v N′ Any one element of (a) and (b); l is a positive integer, representing the total number of different configuration item combinations obtained by arranging and combining each configuration item under the condition that each configuration item freely takes value from a value set to be tested;
4.7 the test case generating module generates a database management system type set S= { MySQL, postgreSQL, mariaDB, SQLite, mongoDB, redis };
4.8 test case generating module generates test case set T, t= { T 1 ,t 2 ,t 3 ,...t a ,...,t A },1≤a≤A,t a Is a four-tuple { b } a ,c a ,d a ,s a },c a Is any one element, s, in the configuration item combination set C a Is any element in a database management system type collection S, b a For testing any element in command set B, d a The method comprises the steps that the method is any element in a hardware device set D, wherein A is the total number of test cases in T;
fifthly, the DATA collection module receives the test case set T, runs the test cases in the T, collects the DATA in the running process, and obtains a DATA set DATA in the running process, wherein the method comprises the following steps:
5.1 let variable a=1, let the runtime DATA set data= { };
5.2 running test case t a
5.3 the data collection Module obtains the slave type s by any one of blktrace, linux Kernel event tracking and eBPF diagnostics a I/O paths from the database management system to be tested to the device driver;
5.4 data collection Module Using hardware storage device d a In the case of (a), a test suite independent of the database management system under test is created using the open source tool fio, the simulation type is s a I/O behavior of the database management system to be tested, for load command b a Lower usage configuration item c a Is of the type s a The database management system to be detected is tested;
5.5 from test case t a When starting to run, the data collection module continuously uses the fio tool to modify parameters and system settings, dynamically monitors and records the type s a Runtime data of a database management system to be tested a Including CPU utilization, NVMe queue utilization, and I/O request order, will data a Added to the set DATA until the test case t a Ending the operation;
5.6 if a < a, let a = a+1, turn 5.2; otherwise, turning to 5.6;
5.7 DATA collection module outputs the runtime DATA set DATA, data= { DATA, to the problem matching module 1 ,...,data a ,...,data A },data a Is the a-th runtime DATA in DATA;
step six, the problem matching module determines whether the database management system to be detected has a performance mismatch problem according to the DATA;
6.1 let variable a=1, let the number of performance mismatch problems m=0;
6.2 problem matching Module pairs runtime DATA in DATA according to the following criteria a Judging to obtain a conclusion whether the database management system to be detected has a performance uncomfortableness problem, wherein the method comprises the following steps:
6.2.1 judging whether the database management system to be detected after configuration adjustment performs 'random-order' conversion, wherein the judging method is to analyze data a Determining whether a change in the order of the I/O requests occurs; if the I/O request sequence degree of the database management system to be detected is inconsistent before and after the parameters and the system setting are modified by the fio tool, the random-sequence is considered to be converted into 6.2.2, otherwise, the conversion is 6.4;
6.2.2 judging whether the performance of the database management system is reduced or not by comparing da a Whether the CPU utilization rate and the NVMe queue utilization rate are reduced before and after the modification; if the CPU utilization rate and the NVMe queue are both reduced after the configuration item is regulated, the final performance change of the database management system is considered to be reduced, and the process is changed to 6.3; otherwise, turning to 6.4;
6.3, the problem matching module judges that the I/O sequential unadapted problem exists in the database management system to be detected, makes M=M+1, records the I/O sequential unadapted problem existing in the database management system to be detected at the moment as a defect, and obtains the defect ID of the defect;
6.4 if a < a, let a=a+1, turn 6.2; otherwise, turning to 6.5;
and 6.5, ending.
2. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the Linux in step 3.1 requires version 5.4.0 and above, and the cross filtering means that a plurality of people independently filter and then mutually check.
3. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the reading series S1 in the step 3.2 comprises five types of read, pread64 and readv, preadv, preadv2 read system calls; the write series S2 includes five types of write system calls, write64, writev, pwritev, pwritev.
4. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the method for checking whether the code segment in the F calls any one of the system call in the read series S1 and the write series S2 in the step 3.2 in the step 3.3.5 to obtain the I/O sequence-related configuration item set Conf' is as follows:
3.3.5.1 let h=1;
3.3.5.2 inspection f h If any one of the system call in the reading series S1 and the writing series S2 is called, the configuration item variable conf is described n I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.4; if not, the configuration item variable conf is described n No I/O related system calls are contaminated during the process of being read and used by the database management system, per 3.3.5.3;
3.3.5.3 if H < H, let h=h+1, turn 3.3.5.2; otherwise, turning to 3.3.5.5;
3.3.5.4 adding Conf to the set Conf n
5. The method for detecting the performance problem of the database facing the I/O sequence as claimed in claim 1, wherein the method for generating the set V of the set of the values to be detected for the Conf' by the test case generating module in step 4.2 is as follows:
4.2.1 initializing variable z=1;
4.2.2 if conf z The corresponding expected label is empty, then letWill v z Put in V, turn 4.2.7;
4.2.3 if conf z For Boolean type, let v z = {0,1}, v will be z Put in V, turn 4.2.7;
4.2.4 if conf z To enumerate the types, orderWherein->Conf extracted for Spex algorithm z All possible values of (a) will be v z Put in V to transfer 4.2.7;
4.2.5 if conf z For the character string type, orderWill v z Put in V to transfer 4.2.7;
4.2.6 if conf z For the value type, then pair conf z Is sampled by the following method: recording conf extracted by Spex algorithm z The minimum value of (2) is Min, the maximum value is Max, and v is set to z ={Min,10·Min,10 2 Min,Max,10 -1 ·Max,10 -2 Max }, v z Put in V to transfer 4.2.7;
4.2.7 if z=n', a set V of the set of values to be measured is obtained, v= { vv 1 ,vv 2 ...,vv z …,vv N′ Ending; otherwise, let z=z+1, turn 4.2.2.
6. The method for detecting I/O sequential database performance problems according to claim 1, wherein the performance test tool comprises a sysbench version 1.0.14 or an apache-benchmark version 2.4 in step 4.4.
7. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the parameter values of the performance test tool in step 4.4 include concurrency, load type, data table size, data table number, read operation ratio, write operation ratio.
8. The method of claim 1, wherein the step of step 4.5 is d1= { D11, D12}, d11=amd Ruilong 95700xt, d12=64 GBDDR4 memory, d2= { D21, D22, D23, D24}, d21=samsung 980pro, d22=west sn850, d23=samsung 8620 ev, d24=west blue.
9. The method for detecting I/O sequential database performance problems according to claim 1, wherein the step A is equal to or greater than 18X YxL in 4.8 steps.
10. The method for detecting performance problems of an I/O-oriented database according to claim 1, wherein the open source tool fio requires more than 3.2 versions in step 5.4.
CN202310551096.XA 2023-05-16 2023-05-16 I/O (input/output) sequence-oriented database performance problem detection method Active CN116560998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310551096.XA CN116560998B (en) 2023-05-16 2023-05-16 I/O (input/output) sequence-oriented database performance problem detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310551096.XA CN116560998B (en) 2023-05-16 2023-05-16 I/O (input/output) sequence-oriented database performance problem detection method

Publications (2)

Publication Number Publication Date
CN116560998A true CN116560998A (en) 2023-08-08
CN116560998B CN116560998B (en) 2023-12-01

Family

ID=87487633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310551096.XA Active CN116560998B (en) 2023-05-16 2023-05-16 I/O (input/output) sequence-oriented database performance problem detection method

Country Status (1)

Country Link
CN (1) CN116560998B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198799A1 (en) * 2007-06-20 2010-08-05 Sanjeev Krishnan Method and Apparatus for Software Simulation
CN105955877A (en) * 2016-04-19 2016-09-21 西安交通大学 Taint analysis method for dynamic parallel program based on symbolic computation
US20200019494A1 (en) * 2017-02-28 2020-01-16 Sparriw Co., Ltd Method and apparatus for performing test by using test case
CN111611177A (en) * 2020-06-29 2020-09-01 中国人民解放军国防科技大学 Software performance defect detection method based on configuration item performance expectation
KR20200106124A (en) * 2019-02-28 2020-09-11 한국정보통신기술협회 Test automation framework for dbms for analysis of bigdata and method of test automation
CN112632105A (en) * 2020-01-17 2021-04-09 华东师范大学 System and method for verifying correctness of large-scale transaction load generation and database isolation level
KR102299640B1 (en) * 2020-10-21 2021-09-08 한국과학기술원 Method and system for similarity analysis among kernel system calls using fuzz testing
US20210303696A1 (en) * 2020-03-30 2021-09-30 Software Ag Systems and/or methods for static-dynamic security testing using a test configurator to identify vulnerabilities and automatically repair defects

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198799A1 (en) * 2007-06-20 2010-08-05 Sanjeev Krishnan Method and Apparatus for Software Simulation
CN105955877A (en) * 2016-04-19 2016-09-21 西安交通大学 Taint analysis method for dynamic parallel program based on symbolic computation
US20200019494A1 (en) * 2017-02-28 2020-01-16 Sparriw Co., Ltd Method and apparatus for performing test by using test case
KR20200106124A (en) * 2019-02-28 2020-09-11 한국정보통신기술협회 Test automation framework for dbms for analysis of bigdata and method of test automation
CN112632105A (en) * 2020-01-17 2021-04-09 华东师范大学 System and method for verifying correctness of large-scale transaction load generation and database isolation level
US20210303696A1 (en) * 2020-03-30 2021-09-30 Software Ag Systems and/or methods for static-dynamic security testing using a test configurator to identify vulnerabilities and automatically repair defects
CN111611177A (en) * 2020-06-29 2020-09-01 中国人民解放军国防科技大学 Software performance defect detection method based on configuration item performance expectation
KR102299640B1 (en) * 2020-10-21 2021-09-08 한국과학기술원 Method and system for similarity analysis among kernel system calls using fuzz testing

Also Published As

Publication number Publication date
CN116560998B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
Liblit et al. Scalable statistical bug isolation
Ding et al. Log2: A {Cost-Aware} logging mechanism for performance diagnosis
Seo et al. Programmers' build errors: a case study (at google)
Do et al. Prioritizing JUnit test cases: An empirical assessment and cost-benefits analysis
US7406685B2 (en) System and method for whole-system program analysis
JPH0844590A (en) Apparatus and method for selection test of software system
US20070061624A1 (en) Automated atomic system testing
Gulzar et al. Automated debugging in data-intensive scalable computing
CN102402479B (en) For the intermediate representation structure of static analysis
US10241785B2 (en) Determination of production vs. development uses from tracer data
US9256509B1 (en) Computing environment analyzer
CA2811617C (en) Commit sensitive tests
Rajal et al. A review on various techniques for regression testing and test case prioritization
CN115543781A (en) Method and interactive system for automatically verifying automobile software model
US8850407B2 (en) Test script generation
CN116560998B (en) I/O (input/output) sequence-oriented database performance problem detection method
Kim et al. Predicting method crashes with bytecode operations
TW202311947A (en) Device and method for re-executing of test cases in software application
Singhal et al. A critical review of various testing techniques in aspect-oriented software systems
US9965348B2 (en) Optimized generation of data for software problem analysis
CN113849484A (en) Big data component upgrading method and device, electronic equipment and storage medium
CN111309629A (en) Processing and analyzing system for testing multiple items
Lin Regression testing in research and practice
CN116225965B (en) IO size-oriented database performance problem detection method
CN116561002B (en) Database performance problem detection method for I/O concurrency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant