CN114780411A - Software configuration item preselection method oriented to performance tuning - Google Patents
Software configuration item preselection method oriented to performance tuning Download PDFInfo
- Publication number
- CN114780411A CN114780411A CN202210450353.6A CN202210450353A CN114780411A CN 114780411 A CN114780411 A CN 114780411A CN 202210450353 A CN202210450353 A CN 202210450353A CN 114780411 A CN114780411 A CN 114780411A
- Authority
- CN
- China
- Prior art keywords
- configuration item
- label
- configuration
- intention
- software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3628—Software debugging of optimised code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Machine Translation (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a software configuration item preselection method oriented to performance tuning, and aims to solve the problems that the existing performance tuning method for configuration items consumes long time and only considers a single intention. The technical scheme is as follows: constructing a software configuration item preselection system which is formed by a configuration item intention data automatic amplification module and a configuration item preselection module and is oriented to performance tuning; selecting part of configuration items from data set source software for labeling to obtain a labeled configuration item set; the automatic configuration item intention data amplification module iteratively amplifies the labeling configuration item set to obtain an amplified labeling configuration item set D; training a configuration item preselection module by using the D; and the trained configuration item pre-selection module classifies the configuration items of the target software according to the configuration documents of the target software. The invention can optimize the configuration item sets of corresponding categories according to different intents, greatly reduces the expenditure, and simultaneously recommends the configuration items with related performance more comprehensively, thereby improving the efficiency and the accuracy.
Description
Technical Field
The invention relates to the field of performance tuning of large-scale software, in particular to a software configuration item preselecting method.
Background
In order to adapt software to different application scenarios and production environments without modifying the software source code, developers typically set configuration items for the software to provide a user with an interface to adjust software behavior. However, as application scenarios and user requirements become more diverse, the size and complexity of modern software increases, and the number of software configuration items also increases. For example, there are more than 900 configuration items in MySQL, and the number of configuration items in GCC is more than 1000. The great number of configuration items brings great difficulty to a user for configuring the software, and the use threshold of the software is improved. It is difficult for a user to satisfy his or her intention by adjusting software configuration items.
Users usually have various intentions in using software, such as improving software performance (e.g., throughput rate, execution time, read/write speed, etc.) and reliability, preventing information leakage, etc. This is one of the most common and most interesting intentions for users to improve software performance. Since the performance of software is easier to measure quantitatively relative to other intentions, how to adjust the configuration items of software to achieve the best performance of software, i.e. to perform performance tuning by adjusting the configuration of software, is a hot issue of current research.
In the current configuration tuning work, all configuration items are generally used as input, and under a specific load, a large number of performance tests are performed by changing the values of the configuration items to obtain the corresponding relation between the values of the configuration items and the software performance. The current performance tuning work has the problem of large configuration search space, which results in long time consumption for performance tuning and needs a large amount of time for obtaining the configuration corresponding to the optimal performance.
For the problem of too large configuration search space, the prior art mainly reduces the configuration search space by pre-screening configuration items having important influence on performance, and there are two methods. The first is a Carver, published by Zhen Cao et al in FAST' 2020, which selects key configuration items for Storage System performance Tuning, a Carver method (background one for short) samples a configuration space in a Latin Hypercube Sampling (LHS) manner, adopts a variance-based performance matrix to evaluate the importance degree of different configuration items on performance after performance testing, finally selects N (N is specified by a user) configuration items with the largest influence on performance by using a greedy algorithm, and preselects the configuration items to the user as the input of an automatic Tuning tool. The research proves that the influence of different configuration items on the performance is different in importance degree, a small number of configuration items are particularly important for improving the software performance, and the importance of software configuration item preselection oriented to performance tuning is determined. The second is the Too Man Knobs to Tune? According to the method, firstly, a latin hypercube sampling method is used for sampling a configuration space, then the corresponding relation between the performance of two Database system software, namely Cassandra and PostgreSQL under different loads and software configuration is tested, the importance degree ranking of the influence of different configuration items on the software performance is analyzed, and the first 15 configuration items with the largest influence on the performance under different loads are compared to indicate that several configuration items with the largest influence on the performance in the software are usually fixed. They have experimentally demonstrated that: under the condition that only the first 5 configuration items with the largest performance influence in the Cassandra are subjected to performance optimization, the Throughput rate (Throughput), the Read latency (Read latency) and the write latency (write latency) can reach the similar level of the performance optimization of 30 configuration items, and even better performance can be realized in the Read latency and the write latency. Both methods achieve the preselection of software configuration items, but when obtaining data and further obtaining the importance of the configuration items on the software performance, a large number of performance tests are still needed, the preselected configuration items under different workloads have certain difference, and the preselection result strongly depends on the workload selected during the performance tests. In addition, the methods only pay attention to how to improve the software performance, do not consider whether hidden dangers are brought to the reliability and the safety of the software, and lack comprehensive consideration of user intentions.
In summary, how to construct a multi-purpose sensitive and load-independent lightweight configuration item preselection method to assist the existing performance tuning work and warn the user of possible side effects caused by tuning is a problem to be solved urgently by researchers in the field.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a software configuration item preselection method oriented to performance tuning, aiming at the problem that the existing performance tuning method for configuration items has a single intention of consuming long time and only considering performance (namely, when a user has an intention other than performance, the existing technology cannot work), the method has the advantages of short running time and comprehensive consideration of various intentions of the user, and assists the user in configuring and tuning the intention other than the performance.
In order to solve the technical problems, the technical scheme of the invention is as follows: firstly, constructing a software configuration item pre-selection system which is formed by an automatic configuration item intention data amplification module and a configuration item pre-selection module and is oriented to performance tuning; then randomly selecting part of configuration items from data set source software, and manually marking intentions to obtain a marking configuration item set; the automatic configuration item intention data amplification module iteratively amplifies the labeling configuration item set to obtain an amplified labeling configuration item set; then training a configuration item preselection module by using the amplified labeling configuration item set; and finally, classifying the configuration items of the target software by the trained configuration item preselection module according to the configuration documents of the target software, and selecting configuration item sets corresponding to intentions of different categories. The configuration item sets corresponding to the intentions of different categories reflect several main factors considered when the user performs performance tuning, and the user can perform performance tuning by adopting the configuration item sets corresponding to the intentions of the corresponding categories according to the requirement on software tuning so as to achieve the purpose of improving the software performance.
The invention comprises the following steps:
the first step is to construct a software configuration item preselection system oriented to performance tuning, wherein the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module.
The configuration item intention data automatic amplification module is connected with the configuration item preselection module and is also connected with data set source software. The data set source software comprises two parts: a set of annotated configuration items and a set of unlabeled configuration items. The annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document in a manual annotation mode. The configuration item intention data automatic amplification module preprocesses a labeling configuration item set, labels the unlabeled configuration items in the unlabeled configuration item set, adds newly labeled data into the labeling configuration item set from the unlabeled configuration item set until the number of configuration items in the labeling configuration item set is not changed any more, obtains an amplified labeling configuration item set, and sends the amplified labeling configuration item set to the configuration item preselection module.
The configuration item pre-selection module is connected with the configuration item intention data automatic amplification module and receives the amplified labeling configuration item set from the configuration item intention data automatic amplification module. The configuration item preselection module comprises a TF-IDF encoder and a configuration item preselection model RF. The encoder encodes the sentences in the configuration item document to obtain vectors corresponding to the sentences; and the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model. The configuration item pre-selection module classifies the configuration items of the target software according to the configuration data (generally by manual extraction) of the target software, and pre-selects the configuration items corresponding to different intention categories to obtain a pre-selected configuration item set.
Second, a set of configuration items D of the software is sourced from the data set0Randomly selecting part of configuration items to label intentions to obtain a labeled configuration item set D1。
2.1 data set source software including 13 types of software, MySQL, Cassandra, MariaDB, Apache-Httpd, Nginx, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC, and Clang. Selecting part of configuration items from the data set source software according to the following conditions: 1) belonging to server-side software. The software generally has higher requirements on the performance, reliability, safety and the like of the software, and is favorable for researching the influence of configuration items on the software; 2) there are over 2,000 stars (a star reflects the user's attention to the software, a larger number of stars indicates more users use and pay attention to this software) software on the largest code hosting platform GitHub around the world with a large number of users. The software generally has a large number of users, and the configuration items of the software can be marked to have greater influence; 3) software with more than 100 configuration items. The number of software configuration items is large, and performance tuning is more needed. A configuration item set D consisting of more than 7 thousand configuration items of software that satisfy the above 3 conditions simultaneously0The configuration items with the proportion of s (wherein s is more than or equal to 0.2) are randomly selected by people. The total number of the configuration items is recorded as S, the number of the randomly selected configuration items is N, and N is S multiplied by S, and the configuration items are rounded to obtain an integer.
2.2 according to the official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D1The method comprises the following steps: according to the document description of the configuration item, if the adjustment of the configuration item can bring about the improvement of the software performance, but the improvement of the performance can simultaneously lead to the reduction of the software reliability, the intention Label of the configuration item is Label1(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the safety reduction of the software, the intention Label of the configuration item is Label2(ii) a If the configuration item is adjusted canBringing about the software performance improvement, but the performance improvement can cause the software functionality degradation at the same time, and the intention Label of the configuration item is Label3(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the use cost of the software to increase, the intention Label of the configuration item is Label4(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label5(ii) a If the software performance can be improved by adjusting the configuration item, and the performance is improved without causing the first five side effects, the intention Label of the configuration item is Label6(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label7。
2.3 set of annotation configuration items D1={<(cn,dn),labeln>|1≤n≤N,labelnIs epsilon of Labels }, wherein c isnIs D1Name of the nth configuration item, dnIs a configuration item cnDocument of dnCan be expressed as Wherein WnIs dnThe total number of the Chinese words; labelnIs a configuration item cnThe intention category of (1), Label ═ LabeliI1 ≦ i ≦ 7 is a set of intent tag categories.
Note that the set of S-N configuration items that is not selected in step 2.1 is referred to as an unmarked configuration item set D2,D2={<(cct,ddt)>L 1 is less than or equal to T is less than or equal to T, wherein cctIs D2The tth configuration item name, ddtIs a configuration item cctThe document of (1). The ddtCan be expressed asWherein U istIs ddtTotal number of words in.
Thirdly, preprocessing and labeling a configuration item set D of the configuration item intention data automatic amplification module1Iterative pair of unlabeled configuration item sets D2Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items1And obtaining an amplified labeling configuration item set D by the method comprising the following steps:
3.1 configuration item intention data automatic augmentation Module pretreatment D1The method comprises the following steps:
3.1.1 defining dictionary types (dictionary type definition see Link)https://docs.python.org/3/c- api/dict.htmlOne dictionary type variable dit is composed of several key value pairs (key)1,value1),…,(keyk,valuek),…,(keyK,valueK) Form, satisfies dit [ keyk]=valuekWherein K is the number of key value pairs in the fact, K is more than or equal to 0, and when K is 0, the dictionary fact is a null dictionary, key1,…,keyk,…keyKA set of keys that are different and constitute a dit) variable flabelFor encoding an intention tag class, satisfying flabel[Label1]=1,…,flabel[Labeli]=i,…,flabel[Label7]=7(1≤i≤7);
3.1.2 initializing word mapping maximum index ═ 8;
3.1.3 defining dictionary type variables ftokenFor encoding words, initializing ftokenIs an empty dictionary, i.e. ftokenIs an empty set, and will be added to the key set step by step in the subsequent steps<Part of speech, root word>The formed binary group encodes the words according to the parts of speech and the roots of the words;
3.1.4 encoding words and building f step by steptokenThe method comprises the following steps:
3.1.4.1 initializing variable n ═ 1;
3.1.4.2 pairs dnW innCoding the words to obtain dnCoded d'nThe method comprises the following steps:
3.1.4.2.1 initialize the word index variable wn=1;
3.1.4.2.2 willConversion into binaryWhereinIs composed ofParts of speech (such as nouns, verbs, adjectives, adverbs, etc.),is composed ofE.g., both writes and writes have roots.
3.1.4.2.3 judgingWhether or not at ftokenIf not, thenEncode into index while key-value pairsAdding ftokenIn and for Turning to 3.1.3.2.4; if so, the method will be usedCoded as a keyCorresponding value, i.e. toIs coded intoIs a natural number (value range 1 to 7), and is converted to 3.1.4.2.5;
3.1.4.2.4 let index be index + 1;
3.1.4.2.5 if wn=WnThen pair d is completednCoding of each word in the sequence, and obtaining dnCoded d'n, Rotating for 3.1.4.3; if w<WnTurning to 3.1.4.2.6;
3.1.4.2.6 order wn=wn+1, go 3.1.4.2.2;
3.1.4.3 if N is N, if so, then D1D innIs replaced with its coded d'nObtaining a preprocessed labeled configuration item set D'1,D′1={<(cn,d′n),labeln>|1≤n≤N,labelnBelongs to Labels, and changes to 3.2; if n is<N, converting to 3.1.4.4;
3.1.4.4 changing n to n +1, 3.1.4.2;
3.2 automatic amplification Module of Profile intention data from D1The method for mining the sequence pattern to obtain the sequence pattern set SP comprises the following steps:
3.2.1 use of D'1Constructing a sequence set SeqDB ═ { seq ═ seq1,…,seqn,…,seqNIn which seqnIs composed of configuration items cnDocument d ofnCoded d'nAnd cnIntention label ofnCorresponding code flabel(labeln) Sequences formed by splicing, i.e.
3.2.2 sequence Pattern mining on the SeqDB to get the sequence set P, { P ═ using the FEAT algorithm in the Efficient mining of frequency sequence generators published by Chuancong Gao et al in WWW20081,…,pm,…,pMWhere M is the total number of sequence patterns, pmIs a frequently occurring sequence in the sequence set SeqDB, pm=(pp1,…,ppx,…ppX) X is p corresponding to commonly used expressions in the configuration document, such as frequently occurring words and phrasesmIs calculated by the FEAT algorithm, ppxIs pmThe x-th item of (1) is a code corresponding to a word or an intention label, and satisfies that pp is more than or equal to 1x<index, specifically 1. ltoreq. ppxLabel representing intention Label at the time of less than or equal to 7ppxAt flabelMapping of (3), 8 ≦ ppx<index represents a certainThe form transformed by step 3.1.4.2.2At ftokenOf (2), i.e.
3.2.3 processing the P, reserving sequences related to the intention category in the P, and calculating the corresponding support degree and confidence degree of each sequence to obtain a sequence mode set SP, wherein the method comprises the following steps of:
3.2.3.1 initializing sequence pattern set SP as an empty set;
3.2.3.2 initialization sequence traversal variable m ═ 1;
3.2.3.3 initializing sequence pattern count variable m' ═ 0;
3.2.3.4 determination of pmLast pp inXWhether or not it satisfies pp of 1. ltoreq. ppXLess than or equal to 7, if yes, ppXFor coding of intention classes, pmIn connection with determining the unlabeled configuration item intent categories, go to 3.2.3.5; otherwise, pmIndependently of determining the unlabeled configuration item intent category, proceed directly to 3.2.3.6;
3.2.3.5 let m ═ m' +1, and let pm′=pm. Calculating pm′And adding the processed sequence pattern into the sequence pattern set SP. The method comprises the following steps:
3.2.3.5.1 initializing the index configuration item subscript loop variable n ═ 1, and let m ═ m' -1;
3.2.3.5.2 let m '═ m' + 1;
3.2.3.5.3 initializing support variable supportm′=0;
3.2.3.5.4 initialized matching degree variable matchedm′0, counting the number of configuration items matched with the mode;
3.2.3.5.5 order pm′Corresponding intent class lm′=LabelppxLet p bem′Reflected ofm′Related sequence patternm′=(pp1,…,ppx,…,ppX-1);
3.2.3.5.6 judge patternm'is d'nIf so, indicating that a matching sequence is found, and matchingm′=matchedm′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;
3.2.3.5.7 if lm′=labelmIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabledm′=supportm′+1, go 3.2.3.5.2; if lm′≠labelnTo illustrate that although a sequence can be matched, the intent tag corresponding to that sequence does not match, go to 3.2.3.5.8;
3.2.3.5.8 if N is N, go 3.2.3.5.10, if N < N, go 3.2.3.5.9;
3.2.3.5.9 turn 3.2.3.5.2 when n is n + 1;
3.2.3.5.10 calculating pm′Confidence of (1)m′:confidencem′=supportm′/matchedm′(FEAT Algorithm guarantees pm′At least a sub-sequence of a sequence in SeqDB, i.e. always matchedm′Not less than 1), and the sequence mode after processing is marked as Patternm′=(patternm′,lm′,confidencem′) Will Patternm′Adding the sequence pattern set SP;
3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }m′L 1 is less than or equal to M 'is less than or equal to M', wherein M 'is the total number of all modes in SP, M' is less than or equal to M, and the rotation is 3.3; if not, let m be m +1, go to 3.2.3.4;
3.3 configuration item intent data auto-augmentation Module Pair D2The coding is carried out by the method:
3.3.1 initializing variable t ═ 1;
3.3.2 pairs of ddtIn (1) UtThe method for coding each word comprises the following steps:
3.3.2.1 initialize word index variable ut=1;
3.3.2.2 willIs converted intoWhereinIs composed ofParts of speech (such as nouns, verbs, adjectives, adverbs, etc.),is composed ofThe root word of (2).
3.3.2.3 judgmentWhether or not at ftikenIf so, thenIs coded intoTurning to 3.3.2.4; if not, f cannot be usedtokenFor is toCoding is carried out, directlyCoding is 0, and 3.3.2.4 is turned;
3.3.2.4 if ut=UtThen, it completes the pair ddtCoding of (2), ddtIs coded to dd't, Rotating by 3.3.3; if not, let ut=ut+1, go 3.3.2.2;
3.3.3 if T ═ T, let binary (cc)t,dd′t) As D2In<(cct,ddt)>To the encoded set D 'of unlabeled configuration items'2To obtain D'2={(cct,dd′t) Turning to 3.4, |1 is not less than T and not more than T }; if t<T, rotating to 3.3.4;
3.3.4, converting t to t +1 to 3.3.2;
3.4 configuration item intent data auto amplification Module Using SP to D'2And (6) labeling. The method comprises the following steps:
3.4.1 setting a confidence threshold, with 0< threshold ≦ 1, which is preferably set to 0.7< threshold ≦ 1;
3.4.2 initializing variable t ═ 1;
3.4.3 initializing a set R of configuration items with tags1Is an empty set;
3.4.4 initializing a set R of untagged configuration items2Is an empty set;
3.4.5 initializing the dictionary type variable selector for selecting an intent tag for the tth unlabeled configuration item, let selector [ Label1]=0,…,selector[Labeli]=0,…,selector[Label7]=0,selector[Labeli]Indicating that the t-th unmarked configuration item is marked as LabeliThe confidence of (2);
3.4.6 update selector according to pattern set SP obtained by 3.2, the method is:
3.4.6.1 initializing variable m' ═ 1;
3.4.6.2 Pattern from sequence Patternm′Reading confidencem′,lm′,patternm′;
3.4.6.2 if confidencem′Judging whether pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is more than or equal to the threshold; if confidencem′<threshold, then the Patternm′If the confidence level requirement is not met, turning to 3.4.6.5;
3.4.6.3 if patternm′Is dd'tThe subsequence of (3), which indicates pattern matching, is converted to 3.4.6.4; if not, go to 3.4.6.5;
3.4.6.4 if confidencem′>selector[lm′]Then update the selector [ l ]m′]Instant messenger selectorm′]=confidencem′Turning to 3.4.6.5; otherwise, go directly to 3.4.6.5;
3.4.6.5, if M 'is equal to M', traversing all sequence modes, completing updating the selector, and turning to 3.4.7; if M '< M', making M '═ M' +1, go to 3.4.6.2;
3.4.7 according to selector as dd'tSelecting a label, wherein the method comprises the following steps:
3.4.7.1 initializing candidate tags LCt=Label1;
3.4.7.2 initializing tag subscript variable i-2;
3.4.7.3 if selector [ labeli]>selector[LCt]To illustrate, label is selectediConfidence as a label higher than picking LCtAs confidence of the label, let LCt=labeliTurning to 3.4.7.4; if selector [ label ]i]≤selector[LCt]Go directly to 3.4.7.4;
3.4.7.4 if i is 7, go to 3.4.7.5; if i is less than 7, let i equal to i +1, go to 3.4.7.3;
3.4.7.5 if selector LCt]>0, then LCtAs the intention label of the t-th unlabeled configuration item, will<(cct,ddt),LCt>Adding R1Turning to 3.4.8; if selector [ LCt]If 0, it means that no dd is found in SPt' matching patterns, not selecting an intent tag for the tth unlabeled configuration item, will<(cct,ddt)>Adding R2Turning to 3.4.8;
3.4.8 if T is T, the set D of configuration items not marked is completed2Is marked to obtain R1And R2Turning to 3.4.10; if t<T, go to 3.4.9;
3.4.9 converting t to t +1 to 3.4.5;
3.4.10 determination of R1If the result is an empty set, finishing pair D1The iterative amplification is terminated to obtain an amplified labeled configuration item set, and 3.4.12 is transferred; if not, turning to 3.4.11;
3.4.11 order D1=D1+R1Let D2=R2And then, rotating to 3.1;
3.4.12 set D of labeled configuration items at the time of this step1The set of label placement items after amplification is denoted as D ═<(cn′,dn′),labeln′>|1≤n′≤N′,labeln′E.g. Labels, wherein dn′As configuration item cn′Description of (1), labeln′As configuration item cn′N' is the number of configuration items in the amplified labeled configuration item set D. N' is more than or equal to N.
And fourthly, training a configuration item preselection module of the software configuration item preselection system oriented to performance tuning by using the amplified labeling configuration item set D. The method for training the configuration item pre-selection module comprises the following steps:
4.1 use N' Profile D in D1,…,dn′,…,dN′As a training set, Ramos et al, 2003, 1stAn article published by an instruction Conference on Machine Learning (namely, a first Machine Learning instruction Conference), namely 'Using TF-IDF to determine word dependency in document query', a TF-IDF encoder encorder in a training configuration item pre-selection module is used for encoding a configuration item document, and the input of the encorder is a sentence and the output of the encorder is a vector corresponding to the sentence;
4.2, encoding N 'documents in the D by using an encoder to obtain a vector set V' after encoding, wherein the method comprises the following steps:
4.2.1 initializing vector set V' as an empty set;
4.2.2 initialize loop subscript variable n ═ 1;
4.2.3 Using encoder to convert dn′Encoded as the n' th vector vn′;
4.2.4 vn′Adding V';
4.2.5, if N 'is equal to N', completing encoding of N 'configuration item documents in D to obtain an encoded vector set V', and converting to 4.3; if N '< N', making N '═ N' +1, and rotating to 4.2.3;
4.3 Using training set<vn′,labeln′>I1. ltoreq.n '. ltoreq.N', and using a paper published in 2018 by Yoni Gavish et al in the Journal of Photogrammetry and Remote Sensing (stage 136) (ISPRS Journal of Photogrammetry and Remote Sensing)A pre-selection model RF of configuration items is trained by a hierarchical random forest algorithm proposed in the compatibility of flat and hierarchical vegetation/Land-Cover classification models in a NATURA 2000site (namely, the performances of a NATURA 2000site middle plane and a hierarchical Habitat/Land Cover classification model are compared), and configuration item pre-selection model parameters are obtained.
And fifthly, pre-selecting the configuration items by the trained configuration item pre-selecting module according to the target software configuration items to obtain a pre-selected configuration item set. Data set DT of object software configuration item<dtca,dta>1 is more than or equal to a and less than or equal to A, wherein A is the number of configuration items in the target software, dtcaIs the name of the a-th configuration item, dtaIs the document for the a-th configuration item. The method comprises the following steps:
5.1 using the encoder obtained by 4.1 training to encode A configuration item documents of the target software, and recording the vector set of the encoded target software as VdtThe method comprises the following steps:
5.1.1 initializing the set of vectors V of the target softwaredtIs an empty set;
5.1.2 initialize loop subscript variable a ═ 1;
5.1.3 Using the encoder dtaCoded as the a-th vector vv of the target softwarea;
5.1.4 vaAdding Vdt;
5.1.5 if a is equal to a, the a configuration item documents in DT are encoded, and a vector set V of the encoded target software is obtaineddtTurning to 5.2; if a<A, making a equal to a +1, and rotating by 5.1.3;
5.2 configuration item Pre-selection Module after training according to VdtGenerating a corresponding intention label by the vector corresponding to each configuration item to obtain a predicted intention label list O, wherein the method comprises the following steps:
5.2.1 initializing predicted intention tag list O as an empty list;
5.2.2 initialize cycle index variable a ═ 1;
5.2.3 will vvaInputting the predicted configuration item into the model RF of the pre-selection module of the trained configuration item to predict the meaning of the a-th configuration item of the target softwareDrawing tag oaThe method comprises the following steps:
5.2.3.1 initialize the candidate intention label for the a-th configuration item to be oaLet oa=Label7;
5.2.3.2 will vvaModel RF input to the trained configuration item preselection module to obtain first tier output pprob, npprob]And second layer output [ prob1,prob2,prob3,prob4,prob5,prob6]Wherein pprob is the probability that the generation prediction configuration item is irrelevant to the performance, npprob is the probability that the configuration item to be predicted is relevant to the performance, probiThe intention Label for the configuration item to be predicted is LabeliThe probability of (d);
5.2.3.3 f pprob<npprob, then the RF predicts that the probability that the configuration item is not performance related is greater than the probability that the configuration item is performance related, let oa=Label75.2.4; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label1,…,Labeli,…Label6Which of the other;
5.2.3.4, judging the intention label of the configuration item related to the performance, the method is:
5.2.3.4.1 initializes the candidate intention label subscript ci to 1;
5.2.3.4.2 initializing loop index variable i ═ 1;
5.2.3.4.3 if probi>probciIf so, let ci equal i, turn to 5.2.3.4.4; otherwise, go directly to 5.2.3.4.4;
5.2.3.4.4 if i equals 6, then complete the traversal of the RF second layer output, let oa=labelciTurning to 5.2.4; if i<6, changing i to i +1, and turning to 5.2.3.4.3;
5.2.4 mixingaAdding to the predicted intention tag list O;
5.2.5 if a is equal to a, completing the prediction of all configuration items in DT to obtain a predicted intention label list O, and turning to 5.3; if a is less than A, making a equal to a +1, and rotating to 5.2.3;
5.3 classifying the configuration items according to the intention labels to obtain a set consisting of the configuration items with the same intention types, wherein the method comprises the following steps:
5.3.1 initializing the configuration item set corresponding to 7 kinds of intention labels as an empty set, i.e. ordering A configuration item set corresponding to the ith intention label;
5.3.2 initialize cycle index variable a ═ 1;
5.3.3 intention tag o according to the a-th configuration itemaName dtc of a configuration itemaJoining corresponding configuration item setsThe preparation method comprises the following steps of (1) performing;
5.3.4 if a<A, making a equal to a +1, and rotating by 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item setWherein The intention Label representing the pre-selected model RF prediction of the configuration item after training is LabeliJ is the J configuration itemiLabel representing RF prediction intentioniThe total number of configuration items of (c).
So far, the trained configuration item pre-selection module completes the goal of pre-selecting configuration items according to the intention categories.
The user can select the configuration item set corresponding to the proper intention category according to the intention of the performance tuning to perform tuning, for example, when the user needs to ensure the reliability of the software during the performance tuning,the configuration items in (1) can bring about performance improvement, but can cause software reliability to be reduced, and are contrary to the intention of users that the reliability of the software needs to be ensured,the configuration items in (1) are independent of the software performance, and adjusting the configuration items has no influence on the performance of the software, contrary to the intention for performance tuning, so that the user can select the configuration items preselected by the present invention Andthe configuration items in (1) are subjected to performance tuning to meet the performance tuning requirement.
Compared with the prior art, the invention can achieve the following beneficial effects:
1. by adopting the invention, the configuration item preselection can be carried out on the target software according to the configuration document of the target software without carrying out performance test on the configuration item, and the result is irrelevant to the performance test load, so that the configuration item preselection is lighter, the time consumption of performance tuning is reduced, and the problem of incomplete preselected configuration items caused by the load limitation of the prior art is greatly solved.
2. By adopting the method and the device, the configuration items with vital performance can be preselected, the diversity intention of the user during performance tuning can be considered, and the corresponding configuration items can be preselected according to different intention categories. Compared with the method of the background art (Too Many to-do man Knobs to pipe to fast data base Tuning by Pre-selecting impedance databases Knobs published by Konstatinos Kanellis et al in HotStorage 2020. Database system Tuning is carried out Faster by Pre-selecting Important configuration items) only concerning performance without considering the defect of multi-intention characteristics of users, the invention comprehensively considers the potential influence of other intentions brought by users when carrying out performance Tuning, so that the users can meet the intention of the users to performance when carrying out performance Tuning by using the Pre-selected configuration items.
3. The invention can be used for automatically amplifying the data set. The third step of the invention provides a method for mining sequence patterns from labeled data and amplifying unlabeled data, which can effectively reduce the consumption of manpower and time in the data labeling process. Experiments prove that when the labeled data accounts for 20% (s is 0.2) of the total data amount and the confidence threshold is set to be 0.85(threshold is 0.85), 59.4% of the unlabeled data can be amplified by using the method, the accuracy is 86.4%, the labor and time consumption in the data labeling process is greatly reduced, and the efficiency of data labeling is improved. Compared with the prior art, the method can greatly reduce the dependence of model training on labeled data under the condition of keeping the accuracy rate of the same level as that of the prior art.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a logical block diagram of a first step of the present invention to build a multiple intent sensitive software configuration item preselection system;
FIG. 3 is a third step of the present invention, which is a pre-processing label configuration item set D of the automatic amplification module for configuration item intention data1Iterative pair of unlabeled configuration item sets D2The configuration items not marked in the step (1) are marked, and the newly marked configuration items are adopted to amplify D1And obtaining a flow chart of the amplified labeling configuration item set D.
Detailed Description
The present invention will be described with reference to the accompanying drawings.
As shown in fig. 1, the present invention comprises the steps of:
firstly, a software configuration item preselection system oriented to performance tuning is constructed, and the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module as shown in fig. 2.
The automatic configuration item intention data amplification module is connected with the configuration item preselection module and is also connected with data set source software. The data set source software comprises two parts: a set of annotated configuration items and a set of unlabeled configuration items. The annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document in a manual annotation mode. The configuration item intention data automatic amplification module preprocesses a labeling configuration item set, labels the unlabeled configuration items in the unlabeled configuration item set, adds newly labeled data into the labeling configuration item set from the unlabeled configuration item set until the number of configuration items in the labeling configuration item set is not changed any more, obtains an amplified labeling configuration item set, and sends the amplified labeling configuration item set to the configuration item preselection module.
The configuration item preselection module is connected with the configuration item intention data automatic amplification module, and receives the amplified annotation configuration item set from the configuration item intention data automatic amplification module. The configuration item preselection module comprises a TF-IDF coder encoder and a configuration item preselection model RF. The encoder encodes sentences in the configuration item documents to obtain vectors corresponding to the sentences; and the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model. The configuration item pre-selection module classifies the configuration items of the target software according to the configuration data of the target software, pre-selects the configuration items corresponding to different intention categories, and obtains a pre-selected configuration item set.
Second, a set of configuration items D of the software is sourced from the data set0Randomly selecting partial configuration items to label intentions to obtain a labeled configuration item set D1。
2.1 data set Source software including MySQL, Cassandra, MariaDB, Apache-Httpd, Ng13 types of software including ix, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC and Clang. Selecting part of configuration items from data set source software according to the following conditions: 1) belonging to server-side software. The software generally has higher requirements on the performance, reliability, safety and the like of the software, and is beneficial to researching the influence of configuration items on the software; 2) there are a large number of users and over 2,000 stars of software on the largest code hosting platform in the world, the GitHub. The software generally has a large number of users, and the configuration items of the software can be marked to have greater influence; 3) software with more than 100 configuration items. The number of software configuration items is large, and performance tuning is more needed. A configuration item set D consisting of more than 7 thousand configuration items of software that satisfy the above 3 conditions simultaneously0In the method, configuration items with the proportion of s (wherein s is more than or equal to 0.2) are selected at random manually. The total number of the configuration items is recorded as S, the number of the randomly selected configuration items is N, and N is S multiplied by S, and the configuration items are rounded to obtain an integer.
2.2 according to the official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D1The method comprises the following steps: according to the document description of the configuration item, if the adjustment of the configuration item can bring about the improvement of the software performance, but the improvement of the performance can simultaneously lead to the reduction of the software reliability, the intention Label of the configuration item is Label1(ii) a If the software performance can be improved by adjusting the configuration item, but the software security is reduced due to the performance improvement, the intention Label of the configuration item is Label2(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the functional degradation of the software, the intention Label of the configuration item is Label3(ii) a If the software performance improvement can be brought about by adjusting the configuration item, but the software use cost is increased when the performance improvement is brought about, the intention Label of the configuration item is Label4(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label5(ii) a Can bring softness if the configuration item is adjustedThe performance of the element is improved, the performance is improved, and the first five side effects cannot be caused at the same time, so that the intention Label of the configuration item is Label6(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label7。
2.3 set of annotation configuration items D1={<(cn,dn),labeln>|1≤n≤N,labelnIs epsilon of Labels }, wherein c isnIs D1Name of the nth configuration item, dnIs a configuration item cnDocument of dnCan be expressed as Wherein WnIs dnThe total number of the Chinese words; labelnIs a configuration item cnThe intention category of (1), Label ═ LabeliI1 ≦ i ≦ 7 is the set of intent tag categories.
Note that the set of T ═ S-N configuration items that were not selected in step 2.1 is referred to as unmarked configuration item set D2,D2={<(cct,ddt)>L 1 is less than or equal to T is less than or equal to T, wherein cctIs D2The tth configuration item name, ddtIs the configuration item cctThe document of (2). The ddtCan be expressed asWherein U istIs ddtTotal number of words in.
Thirdly, preprocessing a labeling configuration item set D of the automatic configuration item intention data amplification module1Iterative pair of unlabeled configuration item sets D2Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items1And obtaining an amplified labeling configuration item set D, with the method shown in fig. 3:
3.1 configuration item intention data automatic amplification Module pretreatment D1The method comprises the following steps:
3.1.1 defining dictionary type variable flabelFor encoding an intention tag class, satisfying flabel[Label1]=1,…,flabel[Labeli]=i,…,flabel[Label7]=7(1≤i≤7);
3.1.2 initializing word mapping maximum index ═ 8;
3.1.3 defining dictionary type variables ftokenFor encoding words, initializing ftokenIs an empty dictionary, i.e. ftokenIs an empty set, and will be added to the key set step by step in the subsequent steps<Part of speech, root word>The formed binary group encodes the words according to the parts of speech and the roots of the words;
3.1.4 encoding words and building f step by steptokenThe method comprises the following steps:
3.1.4.1 initializing variable n ═ 1;
3.1.4.2 pairs dnW innCoding each word to obtain dnCoded d'nThe method comprises the following steps:
3.1.4.2.1 initialize the word index variable wn=1;
3.1.4.2.2 willConversion into binaryWhereinIs composed ofThe part of speech of (a) is,is composed ofThe root word of (2).
3.1.4.2.3 judgmentWhether or not at ftokenIf not, willEncoding into index while key-value pairs are encodedAdding ftokenIn and for Turning 3.1.3.2.4; if so, the method will be usedCoded as a keyCorresponding value, i.e. toIs coded intoIs a natural number (the value range is 1 to 7), and 3.1.4.2.5 is turned;
3.1.4.2.4 let index be index + 1;
3.1.4.2.5 if wn=WnThen pair d is completednCoding of each word in the sequence, and obtaining dnCoded d'n, Rotating for 3.1.4.3; if w<WnTurning to 3.1.4.2.6;
3.1.4.2.6 order wn=wn+1, go 3.1.4.2.2;
3.1.4.3 if N is N, then D will be1D in (1)nReplaced by its code d'nObtaining a preprocessed labeled configuration item set D'1,D′1={<(cn,d′n),labeln>|1≤n≤N,labelnBelongs to Labels, and changes to 3.2; if n is<N, rotating to 3.1.4.4;
3.1.4.4 changing n to n +1, 3.1.4.2;
3.2 configuration item intent data auto-augmentation Module from D'1The method for mining the sequence mode to obtain a sequence mode set SP comprises the following steps:
3.2.1 use of D'1Construct sequence set SeqDB ═ ssq1,…,seqn,…,seqNH, seqnIs composed of configuration items cnDocument d ofnCoded d'nAnd cnIntention label ofnCorresponding code flabel(labeln) Sequences formed by splicing, i.e.
3.2.2 sequence set SeqDB is subjected to sequence pattern mining by using FEAT algorithm in Efficient mining of frequency sequence generators (Efficient mining frequent sequence generator) published by Chuancong Gao et al in WWW2008 to obtain a sequence set P, P ═ { P ═ P { (P {)1,…,pm,…,pMWhere M is the total number of sequence patterns, pmIs a frequently occurring sequence in the sequence set SeqDB, pm=(pp1,…,ppx,…ppX) Corresponding to common expressions in the configuration document, such as frequently occurring words and phrases, X is pmIs calculated by the FEAT algorithm, ppxIs pmThe x-th item of (1) is a code corresponding to a word or an intention label, and satisfies that pp is more than or equal to 1x<index, specifically 1. ltoreq. ppxThe intention label is represented at ≦ 7At flabelMapping of 8. ltoreq. ppx<index represents a certainForm transformed by step 3.1.4.2.2At ftokenMapping of (1), i.e
3.2.3 processing the P, reserving sequences related to the intention category, and calculating the corresponding support degree and confidence degree of each sequence to obtain a sequence mode set SP, wherein the method comprises the following steps:
3.2.3.1 initializing sequence pattern set SP as an empty set;
3.2.3.2 initialization sequence traversal variable m ═ 1;
3.2.3.3 initializing sequence mode count variable m ═ 0;
3.2.3.4 determination of pmThe last pp inXWhether or not it satisfies pp of 1. ltoreq. ppXLess than or equal to 7, if yes, ppXFor coding of intention classes, pmIn connection with determining the unlabeled configuration item intent categories, go to 3.2.3.5; otherwise, pmIndependently of determining the unlabeled configuration item intent category, proceed directly to 3.2.3.6;
3.2.3.5 let m ═ m' +1, and let pm′=pm. Calculating pm′And adding the processed sequence pattern into the sequence pattern set SP. The method comprises the following steps:
3.2.3.5.1 initializing the index configuration item subscript loop variable n ═ 1, and let m ═ m' -1;
3.2.3.5.2 let m '═ m' + 1;
3.2.3.5.3 initialization support variable supportm′=0;
3.2.3.5.4 initialized matching degree variable matchedm′0, counting the number of configuration items matched with the mode;
3.2.3.5.5 order pm′Corresponding intention categoryLet p bem′Reflected ofm′Related sequence patternm′=(pp1,…,ppx,…,ppX-1);
3.2.3.5.6 judge patternm′Is d'nIf so, indicating that a matching sequence is found, and matchingm′=matchedm′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;
3.2.3.5.7 if lm′=labelnIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabledm′=supportm′+1, go 3.2.3.5.2; if lm′≠labelnTo illustrate that although a sequence can be matched, the intent tag corresponding to that sequence does not match, go to 3.2.3.5.8;
3.2.3.5.8 if N is N, go 3.2.3.5.10, if N < N, go 3.2.3.5.9;
3.2.3.5.9 making n equal to n +1, turn 3.2.3.5.2;
3.2.3.5.10 calculating pm′Confidence of (2)m′:confidencem′=supportm′/matchedm′(FEAT Algorithm guarantees pm′At least a sub-sequence of a sequence in SeqDB, i.e. always mattedm′Not less than 1), and the sequence mode after processing is marked as Patternm′=(patternm′,lm′,confidencem′) Will Patternm′Adding the sequence pattern set SP into a sequence pattern set;
3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }m′|1≤m′≤M 'is the total number of all the modes in the SP, M' is less than or equal to M, and the conversion is 3.3; if not, let m be m +1, go to 3.2.3.4;
3.3 automatic amplification of configuration item intention data Module pair D2The coding is carried out by the method:
3.3.1 initializing variable t ═ 1;
3.3.2 pairs of ddtIn (1) UtThe method for coding the words comprises the following steps:
3.3.2.1 initialize word index variable ut=1;
3.3.2.2 will beIs converted intoWhereinIs composed ofParts of speech (such as nouns, verbs, adjectives, adverbs, etc.),is composed ofThe root word of (2).
3.3.2.3 judgmentWhether or not at ftokenIf so, willIs coded intoTurning to 3.3.2.4; if not, f cannot be usedtokenTo pairEncoding is carried out, directlyCode 0, go to 3.3.2.4;
3.3.2.4 if ut=UtThen, it completes the pair ddtCoding of (2), ddtIs coded to dd't, Rotating by 3.3.3; if not, let ut=ut+1, go 3.3.2.2;
3.3.3 if T ═ T, let binary (cc)t,dd′t) As D2In<(cct,ddt)>To the encoded set D 'of unlabeled configuration items'2To obtain D'2={(cct,dd′t) Turning to 3.4, |1 is not less than T and not more than T }; if t is<T, rotating to 3.3.4;
3.3.4, t is t +1, and then the rotation is carried out for 3.3.2;
3.4 configuration item intent data auto amplification Module Using SP to D'2And (6) labeling. The method comprises the following steps:
3.4.1 set confidence threshold, let 0< threshold ≦ 1, which is preferably set to 0.7< threshold ≦ 1;
3.4.2 initializing variable t ═ 1;
3.4.3 initializing a tagged set of configuration items R1Is an empty set;
3.4.4 initializing a set R of untagged configuration items2Is an empty set;
3.4.5 initialize the dictionary type variable selector used to select the intent tag for the tth unlabeled configuration item, let selector [ Label1]=0,…,selector[Labeli]=0,…,selector[Label7]=0,selector[Labeli]Indicating that the t-th unmarked configuration item is markedNote LabeliThe confidence of (2);
3.4.6 update selector according to pattern set SP obtained by 3.2, the method is:
3.4.6.1 initializing variable m' ═ 1;
3.4.6.2 Pattern from sequence Patternm′Reading confidencem′,lm′,patternm′;
3.4.6.2 if confidencem′Judging whether the pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is larger than or equal to the threshold; if confidencem′<threshold, then the Patternm′If the confidence level requirement is not met, 3.4.6.5 is switched;
3.4.6.3 if patternm′Is dd'tThe subsequence of (3) to illustrate pattern matching, turn to 3.4.6.4; if not, go to 3.4.6.5;
3.4.6.4 if confidencem′>selector[lm′]Then update the selector [ l ]m′]Instant messenger selectorm′]=confidencem′Turning to 3.4.6.5; otherwise, go directly to 3.4.6.5;
3.4.6.5, if M 'is equal to M', traversing all sequence modes, completing updating the selector, and turning to 3.4.7; if M '< M', making M '═ M' +1, go to 3.4.6.2;
3.4.7 according to selector dd'tSelecting a label, wherein the method comprises the following steps:
3.4.7.1 initializing candidate tags LCt=Label1;
3.4.7.2 initializing a tag index variable i-2;
3.4.7.3 if selector [ labeli]>selector[LCt]To illustrate, selecting labeliConfidence as label higher than that of selected LCtAs confidence of the label, let LCt=labeliTurning to 3.4.7.4; if selector [ labeli]≤selector[LCt]Go directly to 3.4.7.4;
3.4.7.4 if i is 7, go to 3.4.7.5; if i <7, making i equal to i +1, and switching to 3.4.7.3;
3.4.7.5 if selector LCt]>0, then LCtAs the t-thAnnotating intent tags of configuration items, will<(cct,ddt),LCt>Adding R1Turning to 3.4.8; if selector [ LCt]If 0, it means that no dd is found in SPt' matching patterns, not selecting an intent tag for the tth unlabeled configuration item, will<(cct,ddt)>Adding R2Turning to 3.4.8;
3.4.8 if T is equal to T, completing the process of collecting the configuration items D which are not marked2Is marked to obtain R1And R2Turning to 3.4.10; if t<T, turning to 3.4.9;
3.4.9 making t equal to t +1, rotating 3.4.5;
3.4.10 determination of R1If the result is an empty set, finishing pair D1The iterative amplification is terminated to obtain an amplified labeled configuration item set, and the step 3.4.12 is carried out; if not, turning to 3.4.11;
3.4.11 order D1=D1+R1Let D2=R2And then, rotating to 3.1;
3.4.12 set D of labeled configuration items at the time of this step1The set of label placement items after amplification is denoted as D ═<(cn′,dn′),labeln′>|1≤n′≤N′,labeln′Epsilon Labels, wherein dn′As configuration item cn′Description of (1), labeln′As configuration item cn′N' is the number of configuration items in the amplified labeled configuration item set D. N' is more than or equal to N.
And fourthly, training a configuration item preselection module of the software configuration item preselection system oriented to performance tuning by using the amplified labeling configuration item set D. The method for training the configuration item pre-selection module comprises the following steps:
4.1 use N' Profile D in D1,…,dn′,…,dN′As a training set, a TF-IDF method is utilized, a TF-IDF encoder in a training configuration item preselection module is used for encoding a configuration item document, the encoder inputs sentences, and the encoder outputs vectors corresponding to the sentences;
4.2, encoding N 'documents in the D by using an encoder to obtain a vector set V' after encoding, wherein the method comprises the following steps:
4.2.1 initializing vector set V' as an empty set;
4.2.2 initializing loop index variable n ═ 1;
4.2.3 Using encoder to convert dn′Encoded as the n' th vector vn′;
4.2.4 vn′Adding V';
4.2.5 if N 'is equal to N', completing encoding of N 'configuration item documents in D to obtain a vector set V' after encoding, and turning to 4.3; if N '< N', make N '═ N' +1, rotate by 4.2.3;
4.3 Using training set<vn′,labeln′>And |1 is not less than N 'is not less than N', and a configuration item preselection model RF is trained by using a hierarchical random forest algorithm to obtain configuration item preselection model parameters.
And fifthly, the trained configuration item preselection module preselecting the configuration items according to the target software configuration items to obtain a preselected configuration item set. Data set DT of object software configuration item<dtca,dta>1 is more than or equal to a and less than or equal to A, wherein A is the number of configuration items in the target software, dtcaIs the name of the a-th configuration item, dtaIs the document for the a-th configuration item. The method comprises the following steps:
5.1 using the encoder obtained by 4.1 training to encode the A configuration item documents of the target software, and recording the vector set of the encoded target software as VdtThe method comprises the following steps:
5.1.1 initializing the set of vectors V of the target softwaredtIs an empty set;
5.1.2 initialize loop subscript variable a ═ 1;
5.1.3 use of encoder to convert dtaCoded as the a-th vector vv of the target softwarea;
5.1.4 to convert vvaAdding Vdt;
5.1.5 if a is equal to a, finishing the coding of the a configuration item documents in the DT, and obtaining a vector set V of the coded target softwaredtTurning to 5.2; if a<A, making a equal to a +1, and rotating by 5.1.3;
5.2 configuration item Pre-selection Module after training according to VdtGenerating a corresponding intention label by the vector corresponding to each configuration item to obtain a predicted intention label list O, wherein the method comprises the following steps:
5.2.1 initializing predicted intention tag list O to an empty list;
5.2.2 initialize cycle index variable a ═ 1;
5.2.3 vaInputting the predicted intentions label o of the a-th configuration item of the target software into the model RF of the trained configuration item pre-selection moduleaThe method comprises the following steps:
5.2.3.1 initialize the candidate intention label for the a-th configuration item to be oaLet o stand fora=Label7;
5.2.3.2 will vvaModel RF input to the trained configuration item preselection module to obtain first tier outputs pprob, npprob]And second layer output [ prob1,prob2,prob3,prob4,prob5,prob6]Wherein pprob is the probability that the generation prediction configuration item is not related to the performance, npprob is the probability that the configuration item to be predicted is related to the performance, and probiThe intention Label for the configuration item to be predicted is LabeliThe probability of (d);
5.2.3.3 if pprob<npprob, then the RF predicts that the probability that the configuration item is not associated with performance is greater than the probability that the configuration item is associated with performance, let oa=Label75.2.4; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label1,…,Labeli,…Label6Which of the other;
5.2.3.4, judging the intention label of the configuration item related to the performance, the method is:
5.2.3.4.1 initializes the candidate intention label subscript ci to 1;
5.2.3.4.2 initializing loop index variable i ═ 1;
5.2.3.4.3 if probi>probciIf so, let ci equal i, turn to 5.2.3.4.4; otherwise, go directly to 5.2.3.4.4;
5.2.3.4.4 if i is 6, then complete the traversal of the RF second level output, let oa=labelci5.2.4; if i<6, making i equal to i +1, and turning to 5.2.3.4.3;
5.2.4 mixing ofaAdding the predicted intention label list O;
5.2.5 if a is equal to a, completing the prediction of all configuration items in DT, obtaining a predicted intention label list O, and turning to 5.3; if a < A, making a equal to a +1, and rotating by 5.2.3;
5.3 classify the configuration items according to the intention labels to obtain a set consisting of the configuration items with the same intention categories, the method is as follows:
5.3.1 initializing the configuration item set corresponding to 7 kinds of intention labels as an empty set, i.e. ordering A configuration item set corresponding to the ith intention label;
5.3.2 initialize loop subscript variable a ═ 1;
5.3.3 intention tag o according to the a-th configuration itemaName dtc of a configuration itemaJoining corresponding configuration item setsThe preparation method comprises the following steps of (1) performing;
5.3.4 if a<A, making a equal to a +1, and rotating to 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item setWherein The intention Label representing the pre-selected model RF prediction of the trained configuration item is LabeliJ is the J configuration itemiRepresentative of the RF predictive intention tag as LabeliThe total number of configuration items of (1).
In order to verify the effect of the invention, a comparison experiment of the invention and the background technology is carried out on a computer with a Ubuntu18.04 operating system, a 48-core Intel Xeon2.2GHz CPU, a Tesla V100 GPU and 64GB memory. The primary coding language is python 3.8.6. The training process is carried out according to the steps in the specification, PostgreSQL and Cassandra software are used as target software for testing, and 252 and 117 configuration item documents are respectively generated by the PostgreSQL and the Cassandra. Since the prior art does not disclose the technical source code and the experimental result, the comparison is only made with the prior art II. As shown in table 1, the experiment proves that 59.4% of the unlabeled data can be amplified by using the method of the present invention when the labeled data accounts for 20% (s is 0.2) of the total data and the confidence threshold is set to 0.85(threshold is 0.85), with an accuracy of 86.4%, so that the manpower and time consumption in the data labeling process is greatly reduced, and the efficiency of data labeling is improved. The invention can recommend the configuration items related to performance more comprehensively while greatly reducing the overhead. Meanwhile, configuration items can be recommended for user intentions except performance, so that the user can be assisted in tuning, and various user intentions are met.
TABLE 1 comparison of the software configuration item preselection method of the present invention and background Art II
The method for preselecting software configuration items oriented to performance tuning provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein, with the above description being included to assist in understanding the core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (9)
1. A software configuration item preselection method oriented to performance tuning is characterized by comprising the following steps:
the method comprises the following steps that firstly, a software configuration item preselection system oriented to performance tuning is constructed, and the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module;
the automatic configuration item intention data amplification module is connected with the configuration item preselection module and is also connected with data set source software; the data set source software comprises two parts: marking a configuration item set and an unmarked configuration item set; the annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document; the configuration item intention data automatic amplification module preprocesses a labeled configuration item set, labels un-labeled configuration items in the un-labeled configuration item set, adds newly labeled data from the un-labeled configuration item set to the labeled configuration item set until the number of configuration items in the labeled configuration item set is not changed any more, obtains an amplified labeled configuration item set, and sends the amplified labeled configuration item set to a configuration item pre-selection module;
the configuration item preselection module is connected with the configuration item intention data automatic amplification module, and receives the amplified labeling configuration item set from the configuration item intention data automatic amplification module; the configuration item preselection module comprises a TF-IDF encoder and a configuration item preselection model RF; the encoder encodes sentences in the configuration item documents to obtain vectors corresponding to the sentences; the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model; the configuration item pre-selection module classifies the configuration items of the target software according to the configuration data of the target software, pre-selects the configuration items corresponding to different intention categories and obtains a pre-selected configuration item set;
second, a set of configuration items D of the software is sourced from the data set0Randomly selecting part of configuration items to label intentions to obtain a labeled configuration item set D1(ii) a The method comprises the following steps:
2.1 selecting partial configuration items from the data set source software according to the following conditions: 1) software belonging to a server side; 2) software with a large number of users and over 2,000 stars on the code hosting platform, GitHub; 3) configuring software with more than 100 items; a configuration item set D composed of more than 7 thousand configuration items satisfying the 3 pieces of conditional software simultaneously0Randomly selecting configuration items with the proportion of s; recording the total number of the configuration items as S, randomly selecting the number of the configuration items as N, wherein N is S multiplied by S, and rounding to obtain an integer;
2.2 according to official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D1The intention Label of the configuration item is Label1、Label2、Label3、Label4、Label5;、Label6、Label7Seven kinds in total;
2.3 annotating a set D of configuration items1={<(cn,dn),labeln>|1≤n≤N,labelnEpsilon. Labels }, wherein cnIs D1Name of the nth configuration item, dnIs a configuration item cnDocument of dnIs shown as Wherein WnIs dnThe total number of Chinese words; labelnIs a configuration item cnThe intention category of (1), Label ═ LabeliI1 is not less than i not more than 7 is a set formed by the intention label categories;
note that in step 2.1, T ═ S-N configuration items that were not selectedThe formed set is marked as an unmarked configuration item set D2,D2={<(cct,ddt) 1 ≦ T ≦ T, where cctIs D2Name of the t-th configuration item, ddtIs the configuration item cctThe document of (1); the ddtIs shown asWherein U istIs ddtThe total number of Chinese words;
thirdly, preprocessing and labeling a configuration item set D of the configuration item intention data automatic amplification module1Iterative pair of unlabeled configuration item sets D2Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items1And obtaining an amplified labeling configuration item set D, wherein the method comprises the following steps of:
3.1 configuration item intention data automatic augmentation Module pretreatment D1The method comprises the following steps:
3.1.1 defining dictionary type variable flabelFor encoding intent tag categories, satisfy flabel[Label1]=1,...,flabel[Labeli]=i,...,flabel[Label7]=7,1≤i≤7;
3.1.2 initializing word mapping maximum index ═ 8;
3.1.3 defining dictionary type variables ftokenFor encoding words, initializing ftokenIs an empty dictionary, i.e. ftokenThe key set is an empty set, in the subsequent steps, binary groups which are less than parts of speech and more than roots of words are gradually added into the key set, and the words are coded according to the parts of speech and the roots of the words;
3.1.4 encoding words and building f step by steptokenThe method comprises the following steps:
3.1.4.1 initializing variable n ═ 1;
3.1.4.2 pairs dnW innCoding each word to obtain dnCoded d'n, WhereinIs composed of The part of speech of (a) is,is composed ofThe root word of (2);
3.1.4.3 if N ═ N, then D will be1D in (1)nIs replaced with its coded d'nObtaining a preprocessed labeled configuration item set D'1,D′1={<(cn,d′n),labeln>|1≤n≤N,labelnBelongs to Labels, and changes to 3.2; if N is less than N, rotating to 3.1.4.4;
3.1.4.4 changing n to n +1, 3.1.4.2;
3.2 configuration item intent data auto-augmentation Module from D'1The method for mining the sequence mode to obtain a sequence mode set SP comprises the following steps:
3.2.1 use of D'1Constructing a sequence set SeqDB ═ { seq ═ seq1,...,seqn,...,seqNIn which seqnIs composed of configuration items cnDocument d ofnCoded d'nAnd cnIntention label ofnCorresponding code flabel(labeln) Sequences formed by splicing, i.e.
3.2.2 sequence set SeqDB is subjected to sequence pattern mining by using a FEAT algorithm to obtain a sequence set P, wherein P is { P ═ P1,...,pm,...,pMWhere M is the total number of sequence patterns, pmIs a frequently occurring sequence in the sequence set SeqDB, pm=(pp1,...,ppx,...ppX) X is p corresponding to the expression in the configuration documentmIs calculated by the FEAT algorithm, ppxIs pmThe xth item of (1) is a code corresponding to a word or an intention label, and satisfies pp ≦ 1x<index,1≤ppxThe intention label is represented at ≦ 7At flabelMapping of (3), 8 ≦ ppx< index time representsThe form transformed by step 3.1.4.2At ftokenOf (2), i.e.
3.2.3 processing the P, reserving sequences related to the intention category, and calculating the corresponding support degree and confidence degree of each sequence to obtain a sequence mode set SP, wherein the method comprises the following steps:
3.2.3.1 initializing sequence pattern set SP as an empty set;
3.2.3.2 initialization sequence traversal variable m ═ 1;
3.2.3.3 initializing sequence pattern count variable m' ═ 0;
3.2.3.4 determination of pmThe last pp inXWhether or not it satisfies pp of 1. ltoreq. ppXLess than or equal to 7, if yes, ppXFor coding of intention classes, pmIn connection with determining the unlabeled configuration item intent category, go to 3.2.3.5; otherwise, pmIndependent of determining the unlabeled configuration item intent categories, go to 3.2.3.6;
3.2.3.5 let m' +1 and let pm′=pm(ii) a Calculating pm′Confidence of (1)m′And the processed sequence Pattern is patternedm′Adding into the sequence Pattern set SP, Patternm′=(patternm′,lm′,confidencem′),patternm′Is pm′Reflected sum ofm′Related sequences,/m′Is pm′A corresponding intent category;
3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }m′|1≤m′≤M′},Patternm′=(patternm′,lm′,confidencem′) Wherein M 'is the total number of all the modes in the SP, M' is less than or equal to M, and the rotation is 3.3; if not, let m be m +1, go to 3.2.3.4;
3.3 configuration item intent data auto-augmentation Module Pair D2The coding is carried out by the method:
3.3.1 initializing variable t ═ 1;
3.3.3 if T ═ T, let binary (cc)t,dd′t) As D2Mid < (cc)t,ddt) Encoding of > to encoded set D 'of unlabeled configuration items'2To obtain D'2={(cct,dd′t) Turning to 3.4, |1 is not less than T and not more than T }; if T is less than T, turning to 3.3.4;
3.3.4, t is t +1, and then the rotation is carried out for 3.3.2;
3.4 configuration item intent data auto amplification Module Using SP to D'2Labeling is carried out; the method comprises the following steps:
3.4.1 setting a confidence threshold value threshold, wherein the threshold value is more than 0 and less than or equal to 1;
3.4.2 initialization variable t ═ 1;
3.4.3 initializing a set R of configuration items with tags1Is an empty set;
3.4.4 initializing a set R of untagged configuration items2Is an empty set;
3.4.5 initializing the dictionary type variable selector for selecting an intent tag for the tth unlabeled configuration item, let selector [ Label1]=0,...,selector[Labeli]=0,...,selector[Label7]=0,selector[Labeli]Means to Label the t-th unlabelled configuration item as LabeliThe confidence of (2);
3.4.6 update selector according to pattern set SP obtained by 3.2, the method is:
3.4.6.1 initializing variable m' ═ 1;
3.4.6.2 Pattern from sequence Patternm′Reading confidencem′,lm′,patternm′;
3.4.6.2 if confidencem′Judging whether the pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is larger than or equal to the threshold; if confidencem′< threshold, the Patternm′If the confidence level requirement is not met, 3.4.6.5 is switched;
3.4.6.3 if patternm′Is dd'tThe subsequence of (3), which indicates pattern matching, is converted to 3.4.6.4; if not, go to 3.4.6.5;
3.4.6.4 if confidencem′>selector[lm′]Then update the selector [ l ]m′]Instant messenger selectorm′]=confideucem′Turning to 3.4.6.5; otherwise, go directly to 3.4.6.5;
3.4.6.5, if M 'is equal to M', traversing all sequence modes, completing updating the selector, and turning to 3.4.7; if M 'is less than M', making M '═ M' +1, switching to 3.4.6.2;
3.4.7 according to selector dd'tSelecting a label, wherein the method comprises the following steps:
3.4.7.1 initializing candidate tags LCt=Label1;
3.4.7.2 initializing tag subscript variable i-2;
3.4.7.3 if selector [ labeli]>selector[LCt]To illustrate, label is selectediConfidence as label higher than that of selected LCtAs confidence of the label, let LCt=labeliTurning to 3.4.7.4; if selector [ label ]i]≤selector[LCt]Go directly to 3.4.7.4;
3.4.7.4 if i is 7, go to 3.4.7.5; if i is less than 7, making i equal to i +1, and turning to 3.4.7.3;
3.4.7.5 if selector LCt]If greater than 0, LCtAs the intention label of the tth unlabeled configuration item, will < (cc)t,ddt),LCtAddition of R1Turning to 3.4.8; if selector [ LCt]0 indicates SP is not added with dd'tMatching patterns, not selecting an intent tag for the tth unlabeled configuration item, will be < (cc)t,ddt) Addition of R2Turning to 3.4.8;
3.4.8 if T is equal to T, completing the process of collecting the configuration items D which are not marked2Is marked to obtain R1And R2Turning to 3.4.10; if T is less than T, 3.4.9 is switched;
3.4.9 converting t to t +1 to 3.4.5;
3.4.10 determination of R1If the result is an empty set, finishing pair D1The iterative amplification is terminated to obtain an amplified labeled configuration item set, and 3.4.12 is transferred; if not, turning to 3.4.11;
3.4.11 order D1=D1+R1Let D2=R2And then, rotating to 3.1;
3.4.12 set D of annotation configuration items at the time this step is reached1The amplified labeled arrangement item set is denoted as D ═ and (cn′,dn′),labeln′>|1≤n′≤N′,labeln′E.g. Labels, wherein dn′As configuration item cn′Description of (1), labeln′As configuration item cn′N' is the number of configuration items in the amplified labeling configuration item set D; n' is more than or equal to N;
fourthly, training a configuration item preselection module of the software configuration item preselection system oriented to performance tuning by using the amplified labeling configuration item set D; the method for training the configuration item pre-selection module comprises the following steps:
4.1 use N' Profile D in D1,...,dn′,...,dN′As a training set, a TF-IDF method is utilized, a TF-IDF encoder in a training configuration item preselection module is used for encoding a configuration item document, the encoder inputs sentences, and the encoder outputs vectors corresponding to the sentences;
4.2 encoding N ' documents in D by using an encoder to obtain an encoded vector set V ', wherein N ' encoded vectors exist in the V ', and the nth ' vector Vn′For using encoder pairs dn′A coded vector;
4.3 Using training set { < v {n′,labeln′N 'is more than or equal to |1 and less than or equal to N', and a configuration item preselection model RF is trained by using a layered random forest algorithm to obtain configuration item preselection model parameters;
fifthly, the trained configuration item preselection module preselecting configuration items according to the target software configuration items to obtain a preselected configuration item set; recording target software configuration item data set DT { < dtc {a,dta1 ≦ a ≦ A }, where A is the number of configuration items in the target software, dtcaIs the name of the a-th configuration item, dtaA document that is the a-th configuration item; the method comprises the following steps:
5.1 using the encoder obtained by 4.1 training to encode the A configuration item documents of the target software, and recording the vector set of the encoded target software as VdtThe method comprises the following steps:
5.1.1 initializing the set of vectors V of the target softwaredtIs an empty set;
5.1.2 initialize cycle index variable a ═ 1;
5.1.3 use of encoder to convert dtaCoded as the a-th vector vv of the target softwarea;
5.1.4 vaAdding Vdt;
5.1.5 if a is equal to a, finishing the coding of the a configuration item documents in the DT, and obtaining a vector set V of the coded target softwaredt5.2; if a is less than A, making a equal to a +1, and rotating by 5.1.3;
5.2 configuration item preselection Module after training according to VdtGenerating a corresponding intention label by using the vector corresponding to each configuration item to obtain a predicted intention label list O, wherein the method comprises the following steps of:
5.2.1 initializing predicted intention tag list O as an empty list;
5.2.2 initializing loop subscript variable a ═ 1;
5.2.3 will vvaInputting the predicted intention label o of the a-th configuration item of the target software into a model RF of a trained configuration item pre-selection modulea;
5.2.4 mixingaAdding the predicted intention label list O;
5.2.5 if a is equal to a, completing the prediction of all configuration items in DT to obtain a predicted intention label list O, and turning to 5.3; if a is less than A, making a equal to a +1, and rotating by 5.2.3;
5.3 classify the configuration items according to the intention labels to obtain a set consisting of the configuration items with the same intention categories, the method is as follows:
5.3.1 initializing the configuration item set corresponding to 7 intention labels as an empty set, i.e. ordering A configuration item set corresponding to the ith intention label;
5.3.2 initialize cycle index variable a ═ 1;
5.3.3 intention tag o according to the a-th configuration itemaName dtc of the a-th configuration itemaJoining corresponding configuration item setsPerforming the following steps;
5.3.4 if a < a, make a equal to a +1, turn 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item setWherein The intention Label representing the pre-selected model RF prediction of the configuration item after training is LabeliJ is the J configuration itemiLabel representing RF prediction intentioniThe total number of configuration items of (1).
2. The performance-oriented tuning software configuration item preselection method of claim 1, wherein the second step of the data set source software comprises 13 types of software selected from MySQL, Cassandra, MariaDB, Apache-Httpd, Nginx, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC, and Clang.
3. The performance-oriented tuning software configuration item preselection method as claimed in claim 1, wherein the ratio s in step 2.1 satisfies 1 ≧ s ≧ 0.2, and the confidence threshold in step 3.4.1 satisfies 0.7< threshold ≦ 1.
4. The method of claim 1, wherein the 2.2 steps of the method for intent labeling of N configuration items comprise: according to the document description of the configuration item, if the software performance can be improved by adjusting the configuration item, but the software reliability is reduced due to the improvement of the performance, the intention Label of the configuration item is Label1(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the safety reduction of the software, the intention Label of the configuration item is Label2(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the functional degradation of the software, the intention Label of the configuration item is Label3(ii) a If the software performance improvement can be brought about by adjusting the configuration item, but the software use cost is increased when the performance improvement is brought about, the intention Label of the configuration item is Label4(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label5(ii) a If the software performance can be improved by adjusting the configuration item, and the performance is improved without causing the first five side effects, the intention Label of the configuration item is Label6(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label7。
5. The method of claim 1 wherein said pair d of steps 3.1.4.2 is selectednW innCoding the words to obtain dnCoded d'nThe method comprises the following steps:
3.1.4.2.1 initialize the word index variable wn=1;
3.1.4.2.2 willConversion into binaryWhereinIs composed ofThe part of speech of (a) is,is composed ofThe root word of (2);
3.1.4.2.3 judgmentWhether or not at ftokenIf not, willEncoding into index while key-value pairs are encodedAdding ftokenIn and for Turning 3.1.3.2.4; if so, the method will beCoded as a keyCorresponding value, i.e. toIs coded intoTurning to 3.1.4.2.5;
3.1.4.2.4 let index be index + 1;
3.1.4.2.5 if wn=WnThen pair d is completednCoding each word in the sequence to obtain dnCoded d'n, Finishing; if W is less than WnTurning to 3.1.4.2.6;
3.1.4.2.6 order wn=wn+1, go to 3.1.4.2.2.
6. The method of claim 1, wherein step 3.2.3.5 calculates pm′And adding the processed sequence pattern into the sequence pattern set SP, the method comprises the following steps:
3.2.3.5.1 initializing index configuration item index loop variable n ═ 1, and making m ═ m' -1;
3.2.3.5.2 let m '═ m' + 1;
3.2.3.5.3 initialization support variable supportm′=0;
3.2.3.5.4 initialized matching degree variable matchedm′0, counting the number of configuration items matched with the mode;
3.2.3.5.5 order pm′Corresponding intention categoryLet p bem′Is reflected byAnd lm′Related sequence patternm′=(pp1,...,ppx,...,ppX-1);
3.2.3.5.6 judging patternm′Is d 'or not'nIf so, indicating that a matching sequence is found, and matchingm′=matchedm′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;
3.2.3.5.7 if lm′=labelnIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabledm′=supportm′+1, go 3.2.3.5.2; if lm′≠labelnTo illustrate that although the sequence can be matched, the intent tag corresponding to the sequence does not match, go 3.2.3.5.8;
3.2.3.5.8 if N is equal to N, go to 3.2.3.5.10, if N < N, go to 3.2.3.5.9;
3.2.3.5.9 turn 3.2.3.5.2 when n is n + 1;
3.2.3.5.10 calculating pm′Confidence of (1)m′:confidencem′=supportm′/matchedm′The sequence mode after the processing is Patternm′=(patternm′,lm′,confidencem′) Will Patternm′Added to the sequence pattern set SP.
7. A performance-oriented tuning software configuration item preselection method as claimed in claim 1, characterized in that said step 3.3.2 refers to ddtIn (1) UtThe method for coding each word comprises the following steps:
3.3.2.1 initialize word index variable ut=1;
3.3.2.2 willIs converted intoWhereinIs composed ofThe part of speech of (a) is,is composed of The root word of (2);
3.3.2.3 judgmentWhether or not at ftokenIf so, willIs coded intoTurning 3.3.2.4; if not, f cannot be usedtokenFor is toCoding is carried out, directlyCoding is 0, and 3.3.2.4 is turned;
8. The method of claim 1, wherein in step 4.2, the encoders are used to encode N 'documents in D to obtain a set of encoded vectors V' by:
4.2.1 initializing vector set V' as an empty set;
4.2.2 initializing loop index variable n ═ 1;
4.2.3 Using encoder to convert dn′Encoded as the n' th vector vn′;
4.2.4 vn′Adding V';
4.2.5 if N 'is equal to N', completing the encoding of the N 'configuration item documents in D to obtain an encoded vector set V', and ending; if N ' < N ', let N ' +1, change to 4.2.3.
9. A performance-oriented tuning-based software configuration item preselection method as claimed in claim 1, wherein said step 5.2.3 is to select vvaInputting the predicted intentions label o of the a-th configuration item of the target software into the model RF of the trained configuration item pre-selection moduleaThe method comprises the following steps:
5.2.3.1 initialize the candidate intention label of the a-th configuration item to oaLet o stand fora=Label7;
5.2.3.2 will vvaModel RF input to the trained configuration item preselection module to obtain first tier outputs pprob, npprob]And a second layer output [ prob1,prob2,prob3,prob4,prob5,prob6]Wherein pprob is the probability that the generation prediction configuration item is not related to the performance, npprob is the probability that the configuration item to be predicted is related to the performance, and probiThe intention Label for the configuration item to be predicted is LabeliThe probability of (d);
5.2.3.3 if pprob < nprob, the RF predicts that the configuration item has a probability of being performance independent greater thanProbability that the configuration item is related to performance, let oa=Label7And ending; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label1,...,Labeli,...Label6Which of the other;
5.2.3.4, judging the intention label of the configuration item related to the performance, the method is:
5.2.3.4.1 initializes the candidate intention label subscript ci to 1;
5.2.3.4.2 initializing loop index variable i ═ 1;
5.2.3.4.3 if probi>probciIf so, let ci equal i, turn to 5.2.3.4.4; otherwise, go directly to 5.2.4.4;
5.2.3.4.4 if i is 6, then complete the traversal of the RF second level output, let oa=labelciAnd ending; if i < 6, let i equal i +1, go to 5.2.3.4.3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210450353.6A CN114780411B (en) | 2022-04-26 | 2022-04-26 | Software configuration item preselection method oriented to performance tuning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210450353.6A CN114780411B (en) | 2022-04-26 | 2022-04-26 | Software configuration item preselection method oriented to performance tuning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114780411A true CN114780411A (en) | 2022-07-22 |
CN114780411B CN114780411B (en) | 2023-04-07 |
Family
ID=82432902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210450353.6A Active CN114780411B (en) | 2022-04-26 | 2022-04-26 | Software configuration item preselection method oriented to performance tuning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114780411B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116225965A (en) * | 2023-04-11 | 2023-06-06 | 中国人民解放军国防科技大学 | IO size-oriented database performance problem detection method |
CN116561002A (en) * | 2023-05-16 | 2023-08-08 | 中国人民解放军国防科技大学 | Database performance problem detection method for I/O concurrency |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804136A (en) * | 2018-05-31 | 2018-11-13 | 中国人民解放军国防科技大学 | Configuration item type constraint inference method based on name semantics |
CN111611177A (en) * | 2020-06-29 | 2020-09-01 | 中国人民解放军国防科技大学 | Software performance defect detection method based on configuration item performance expectation |
-
2022
- 2022-04-26 CN CN202210450353.6A patent/CN114780411B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804136A (en) * | 2018-05-31 | 2018-11-13 | 中国人民解放军国防科技大学 | Configuration item type constraint inference method based on name semantics |
CN111611177A (en) * | 2020-06-29 | 2020-09-01 | 中国人民解放军国防科技大学 | Software performance defect detection method based on configuration item performance expectation |
Non-Patent Citations (1)
Title |
---|
SHANSHAN LI ET AL.: "Detecting Performance Bottlenecks Guided by Resource Usage" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116225965A (en) * | 2023-04-11 | 2023-06-06 | 中国人民解放军国防科技大学 | IO size-oriented database performance problem detection method |
CN116225965B (en) * | 2023-04-11 | 2023-10-10 | 中国人民解放军国防科技大学 | IO size-oriented database performance problem detection method |
CN116561002A (en) * | 2023-05-16 | 2023-08-08 | 中国人民解放军国防科技大学 | Database performance problem detection method for I/O concurrency |
CN116561002B (en) * | 2023-05-16 | 2023-10-10 | 中国人民解放军国防科技大学 | Database performance problem detection method for I/O concurrency |
Also Published As
Publication number | Publication date |
---|---|
CN114780411B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310438B (en) | Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model | |
CN110490946B (en) | Text image generation method based on cross-modal similarity and antagonism network generation | |
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
CN110020438A (en) | Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence | |
CN112329465A (en) | Named entity identification method and device and computer readable storage medium | |
US11954435B2 (en) | Text generation apparatus, text generation learning apparatus, text generation method, text generation learning method and program | |
CN114780411B (en) | Software configuration item preselection method oriented to performance tuning | |
CN111581973A (en) | Entity disambiguation method and system | |
CN111782961B (en) | Answer recommendation method oriented to machine reading understanding | |
CN112800776A (en) | Bidirectional GRU relation extraction data processing method, system, terminal and medium | |
CN115422369B (en) | Knowledge graph completion method and device based on improved TextRank | |
CN117648469A (en) | Cross double-tower structure answer selection method based on contrast learning | |
CN113807079A (en) | End-to-end entity and relation combined extraction method based on sequence-to-sequence | |
CN117453861A (en) | Code search recommendation method and system based on comparison learning and pre-training technology | |
CN117610562B (en) | Relation extraction method combining combined category grammar and multi-task learning | |
CN117932066A (en) | Pre-training-based 'extraction-generation' answer generation model and method | |
CN111309849B (en) | Fine-grained value information extraction method based on joint learning model | |
CN115408506B (en) | NL2SQL method combining semantic analysis and semantic component matching | |
CN116029261B (en) | Chinese text grammar error correction method and related equipment | |
CN114969279A (en) | Table text question-answering method based on hierarchical graph neural network | |
CN113204679B (en) | Code query model generation method and computer equipment | |
CN114548090A (en) | Fast relation extraction method based on convolutional neural network and improved cascade labeling | |
CN117371447A (en) | Named entity recognition model training method, device and storage medium | |
Kishimoto et al. | MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network | |
CN113239192B (en) | Text structuring technology based on sliding window and random discrete sampling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |