CN114780411A

CN114780411A - Software configuration item preselection method oriented to performance tuning

Info

Publication number: CN114780411A
Application number: CN202210450353.6A
Authority: CN
Inventors: 李姗姗; 贾周阳; 马俊; 李小玲; 何浩辰; 董威; 陈立前; 陈振邦; 周成龙
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-07-22
Anticipated expiration: 2042-04-26
Also published as: CN114780411B

Abstract

The invention discloses a software configuration item preselection method oriented to performance tuning, and aims to solve the problems that the existing performance tuning method for configuration items consumes long time and only considers a single intention. The technical scheme is as follows: constructing a software configuration item preselection system which is formed by a configuration item intention data automatic amplification module and a configuration item preselection module and is oriented to performance tuning; selecting part of configuration items from data set source software for labeling to obtain a labeled configuration item set; the automatic configuration item intention data amplification module iteratively amplifies the labeling configuration item set to obtain an amplified labeling configuration item set D; training a configuration item preselection module by using the D; and the trained configuration item pre-selection module classifies the configuration items of the target software according to the configuration documents of the target software. The invention can optimize the configuration item sets of corresponding categories according to different intents, greatly reduces the expenditure, and simultaneously recommends the configuration items with related performance more comprehensively, thereby improving the efficiency and the accuracy.

Description

Software configuration item preselection method oriented to performance tuning

Technical Field

The invention relates to the field of performance tuning of large-scale software, in particular to a software configuration item preselecting method.

Background

In order to adapt software to different application scenarios and production environments without modifying the software source code, developers typically set configuration items for the software to provide a user with an interface to adjust software behavior. However, as application scenarios and user requirements become more diverse, the size and complexity of modern software increases, and the number of software configuration items also increases. For example, there are more than 900 configuration items in MySQL, and the number of configuration items in GCC is more than 1000. The great number of configuration items brings great difficulty to a user for configuring the software, and the use threshold of the software is improved. It is difficult for a user to satisfy his or her intention by adjusting software configuration items.

Users usually have various intentions in using software, such as improving software performance (e.g., throughput rate, execution time, read/write speed, etc.) and reliability, preventing information leakage, etc. This is one of the most common and most interesting intentions for users to improve software performance. Since the performance of software is easier to measure quantitatively relative to other intentions, how to adjust the configuration items of software to achieve the best performance of software, i.e. to perform performance tuning by adjusting the configuration of software, is a hot issue of current research.

In the current configuration tuning work, all configuration items are generally used as input, and under a specific load, a large number of performance tests are performed by changing the values of the configuration items to obtain the corresponding relation between the values of the configuration items and the software performance. The current performance tuning work has the problem of large configuration search space, which results in long time consumption for performance tuning and needs a large amount of time for obtaining the configuration corresponding to the optimal performance.

For the problem of too large configuration search space, the prior art mainly reduces the configuration search space by pre-screening configuration items having important influence on performance, and there are two methods. The first is a Carver, published by Zhen Cao et al in FAST' 2020, which selects key configuration items for Storage System performance Tuning, a Carver method (background one for short) samples a configuration space in a Latin Hypercube Sampling (LHS) manner, adopts a variance-based performance matrix to evaluate the importance degree of different configuration items on performance after performance testing, finally selects N (N is specified by a user) configuration items with the largest influence on performance by using a greedy algorithm, and preselects the configuration items to the user as the input of an automatic Tuning tool. The research proves that the influence of different configuration items on the performance is different in importance degree, a small number of configuration items are particularly important for improving the software performance, and the importance of software configuration item preselection oriented to performance tuning is determined. The second is the Too Man Knobs to Tune? According to the method, firstly, a latin hypercube sampling method is used for sampling a configuration space, then the corresponding relation between the performance of two Database system software, namely Cassandra and PostgreSQL under different loads and software configuration is tested, the importance degree ranking of the influence of different configuration items on the software performance is analyzed, and the first 15 configuration items with the largest influence on the performance under different loads are compared to indicate that several configuration items with the largest influence on the performance in the software are usually fixed. They have experimentally demonstrated that: under the condition that only the first 5 configuration items with the largest performance influence in the Cassandra are subjected to performance optimization, the Throughput rate (Throughput), the Read latency (Read latency) and the write latency (write latency) can reach the similar level of the performance optimization of 30 configuration items, and even better performance can be realized in the Read latency and the write latency. Both methods achieve the preselection of software configuration items, but when obtaining data and further obtaining the importance of the configuration items on the software performance, a large number of performance tests are still needed, the preselected configuration items under different workloads have certain difference, and the preselection result strongly depends on the workload selected during the performance tests. In addition, the methods only pay attention to how to improve the software performance, do not consider whether hidden dangers are brought to the reliability and the safety of the software, and lack comprehensive consideration of user intentions.

In summary, how to construct a multi-purpose sensitive and load-independent lightweight configuration item preselection method to assist the existing performance tuning work and warn the user of possible side effects caused by tuning is a problem to be solved urgently by researchers in the field.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a software configuration item preselection method oriented to performance tuning, aiming at the problem that the existing performance tuning method for configuration items has a single intention of consuming long time and only considering performance (namely, when a user has an intention other than performance, the existing technology cannot work), the method has the advantages of short running time and comprehensive consideration of various intentions of the user, and assists the user in configuring and tuning the intention other than the performance.

In order to solve the technical problems, the technical scheme of the invention is as follows: firstly, constructing a software configuration item pre-selection system which is formed by an automatic configuration item intention data amplification module and a configuration item pre-selection module and is oriented to performance tuning; then randomly selecting part of configuration items from data set source software, and manually marking intentions to obtain a marking configuration item set; the automatic configuration item intention data amplification module iteratively amplifies the labeling configuration item set to obtain an amplified labeling configuration item set; then training a configuration item preselection module by using the amplified labeling configuration item set; and finally, classifying the configuration items of the target software by the trained configuration item preselection module according to the configuration documents of the target software, and selecting configuration item sets corresponding to intentions of different categories. The configuration item sets corresponding to the intentions of different categories reflect several main factors considered when the user performs performance tuning, and the user can perform performance tuning by adopting the configuration item sets corresponding to the intentions of the corresponding categories according to the requirement on software tuning so as to achieve the purpose of improving the software performance.

The invention comprises the following steps:

the first step is to construct a software configuration item preselection system oriented to performance tuning, wherein the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module.

The configuration item intention data automatic amplification module is connected with the configuration item preselection module and is also connected with data set source software. The data set source software comprises two parts: a set of annotated configuration items and a set of unlabeled configuration items. The annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document in a manual annotation mode. The configuration item intention data automatic amplification module preprocesses a labeling configuration item set, labels the unlabeled configuration items in the unlabeled configuration item set, adds newly labeled data into the labeling configuration item set from the unlabeled configuration item set until the number of configuration items in the labeling configuration item set is not changed any more, obtains an amplified labeling configuration item set, and sends the amplified labeling configuration item set to the configuration item preselection module.

The configuration item pre-selection module is connected with the configuration item intention data automatic amplification module and receives the amplified labeling configuration item set from the configuration item intention data automatic amplification module. The configuration item preselection module comprises a TF-IDF encoder and a configuration item preselection model RF. The encoder encodes the sentences in the configuration item document to obtain vectors corresponding to the sentences; and the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model. The configuration item pre-selection module classifies the configuration items of the target software according to the configuration data (generally by manual extraction) of the target software, and pre-selects the configuration items corresponding to different intention categories to obtain a pre-selected configuration item set.

Second, a set of configuration items D of the software is sourced from the data set₀Randomly selecting part of configuration items to label intentions to obtain a labeled configuration item set D₁。

2.1 data set source software including 13 types of software, MySQL, Cassandra, MariaDB, Apache-Httpd, Nginx, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC, and Clang. Selecting part of configuration items from the data set source software according to the following conditions: 1) belonging to server-side software. The software generally has higher requirements on the performance, reliability, safety and the like of the software, and is favorable for researching the influence of configuration items on the software; 2) there are over 2,000 stars (a star reflects the user's attention to the software, a larger number of stars indicates more users use and pay attention to this software) software on the largest code hosting platform GitHub around the world with a large number of users. The software generally has a large number of users, and the configuration items of the software can be marked to have greater influence; 3) software with more than 100 configuration items. The number of software configuration items is large, and performance tuning is more needed. A configuration item set D consisting of more than 7 thousand configuration items of software that satisfy the above 3 conditions simultaneously₀The configuration items with the proportion of s (wherein s is more than or equal to 0.2) are randomly selected by people. The total number of the configuration items is recorded as S, the number of the randomly selected configuration items is N, and N is S multiplied by S, and the configuration items are rounded to obtain an integer.

2.2 according to the official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D₁The method comprises the following steps: according to the document description of the configuration item, if the adjustment of the configuration item can bring about the improvement of the software performance, but the improvement of the performance can simultaneously lead to the reduction of the software reliability, the intention Label of the configuration item is Label₁(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the safety reduction of the software, the intention Label of the configuration item is Label₂(ii) a If the configuration item is adjusted canBringing about the software performance improvement, but the performance improvement can cause the software functionality degradation at the same time, and the intention Label of the configuration item is Label₃(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the use cost of the software to increase, the intention Label of the configuration item is Label₄(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label₅(ii) a If the software performance can be improved by adjusting the configuration item, and the performance is improved without causing the first five side effects, the intention Label of the configuration item is Label₆(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label₇。

2.3 set of annotation configuration items D₁＝{<(c_n,d_n),label_n>|1≤n≤N,label_nIs epsilon of Labels }, wherein c is_nIs D₁Name of the nth configuration item, d_nIs a configuration item c_nDocument of d_nCan be expressed as

Wherein W_nIs d_nThe total number of the Chinese words; label_nIs a configuration item c_nThe intention category of (1), Label ═ Label_iI1 ≦ i ≦ 7 is a set of intent tag categories.

Note that the set of S-N configuration items that is not selected in step 2.1 is referred to as an unmarked configuration item set D₂，D₂＝{<(cc_t,dd_t)>L 1 is less than or equal to T is less than or equal to T, wherein cc_tIs D₂The tth configuration item name, dd_tIs a configuration item cc_tThe document of (1). The dd_tCan be expressed as

Wherein U is_tIs dd_tTotal number of words in.

Thirdly, preprocessing and labeling a configuration item set D of the configuration item intention data automatic amplification module₁Iterative pair of unlabeled configuration item sets D₂Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items₁And obtaining an amplified labeling configuration item set D by the method comprising the following steps:

3.1 configuration item intention data automatic augmentation Module pretreatment D₁The method comprises the following steps:

3.1.1 defining dictionary types (dictionary type definition see Link)https://docs.python.org/3/c- api/dict.htmlOne dictionary type variable dit is composed of several key value pairs (key)₁,value₁),…,(key_k,value_k),…,(key_K,value_K) Form, satisfies dit [ key_k]＝value_kWherein K is the number of key value pairs in the fact, K is more than or equal to 0, and when K is 0, the dictionary fact is a null dictionary, key₁,…,key_k,…key_KA set of keys that are different and constitute a dit) variable f_labelFor encoding an intention tag class, satisfying f_label[Label₁]＝1，…，f_label[Label_i]＝i，…，f_label[Label₇]＝7(1≤i≤7)；

3.1.2 initializing word mapping maximum index ═ 8;

3.1.3 defining dictionary type variables f_tokenFor encoding words, initializing f_tokenIs an empty dictionary, i.e. f_tokenIs an empty set, and will be added to the key set step by step in the subsequent steps<Part of speech, root word>The formed binary group encodes the words according to the parts of speech and the roots of the words;

3.1.4 encoding words and building f step by step_tokenThe method comprises the following steps:

3.1.4.1 initializing variable n ═ 1;

3.1.4.2 pairs d_nW in_nCoding the words to obtain d_nCoded d'_nThe method comprises the following steps:

3.1.4.2.1 initialize the word index variable w_n＝1；

3.1.4.2.2 will

Conversion into binary

Wherein

Is composed of

Parts of speech (such as nouns, verbs, adjectives, adverbs, etc.),

is composed of

E.g., both writes and writes have roots.

3.1.4.2.3 judging

Whether or not at f_tokenIf not, then

Encode into index while key-value pairs

Adding f_tokenIn and for

Turning to 3.1.3.2.4; if so, the method will be used

Coded as a key

Corresponding value, i.e. to

Is coded into

Is a natural number (value range 1 to 7), and is converted to 3.1.4.2.5;

3.1.4.2.4 let index be index + 1;

3.1.4.2.5 if w_n＝W_nThen pair d is completed_nCoding of each word in the sequence, and obtaining d_nCoded d'_n，

Rotating for 3.1.4.3; if w<W_nTurning to 3.1.4.2.6;

3.1.4.2.6 order w_n＝w_n+1, go 3.1.4.2.2;

3.1.4.3 if N is N, if so, then D₁D in_nIs replaced with its coded d'_nObtaining a preprocessed labeled configuration item set D'₁，D′₁＝{<(c_n,d′_n),label_n>|1≤n≤N,label_nBelongs to Labels, and changes to 3.2; if n is<N, converting to 3.1.4.4;

3.1.4.4 changing n to n +1, 3.1.4.2;

3.2 automatic amplification Module of Profile intention data from D₁The method for mining the sequence pattern to obtain the sequence pattern set SP comprises the following steps:

3.2.1 use of D'₁Constructing a sequence set SeqDB ═ { seq ═ seq₁,…,seq_n,…,seq_NIn which seq_nIs composed of configuration items c_nDocument d of_nCoded d'_nAnd c_nIntention label of_nCorresponding code f_label(label_n) Sequences formed by splicing, i.e.

3.2.2 sequence Pattern mining on the SeqDB to get the sequence set P, { P ═ using the FEAT algorithm in the Efficient mining of frequency sequence generators published by Chuancong Gao et al in WWW2008₁,…,p_m,…,p_MWhere M is the total number of sequence patterns, p_mIs a frequently occurring sequence in the sequence set SeqDB, p_m＝(pp₁,…,pp_x,…pp_X) X is p corresponding to commonly used expressions in the configuration document, such as frequently occurring words and phrases_mIs calculated by the FEAT algorithm, pp_xIs p_mThe x-th item of (1) is a code corresponding to a word or an intention label, and satisfies that pp is more than or equal to 1_x<index, specifically 1. ltoreq. pp_xLabel representing intention Label at the time of less than or equal to 7_ppxAt f_labelMapping of (3), 8 ≦ pp_x<index represents a certain

The form transformed by step 3.1.4.2.2

At f_tokenOf (2), i.e.

3.2.3 processing the P, reserving sequences related to the intention category in the P, and calculating the corresponding support degree and confidence degree of each sequence to obtain a sequence mode set SP, wherein the method comprises the following steps of:

3.2.3.1 initializing sequence pattern set SP as an empty set;

3.2.3.2 initialization sequence traversal variable m ═ 1;

3.2.3.3 initializing sequence pattern count variable m' ═ 0;

3.2.3.4 determination of p_mLast pp in_XWhether or not it satisfies pp of 1. ltoreq. pp_XLess than or equal to 7, if yes, pp_XFor coding of intention classes, p_mIn connection with determining the unlabeled configuration item intent categories, go to 3.2.3.5; otherwise, p_mIndependently of determining the unlabeled configuration item intent category, proceed directly to 3.2.3.6;

3.2.3.5 let m ═ m' +1, and let p_m′＝p_m. Calculating p_m′And adding the processed sequence pattern into the sequence pattern set SP. The method comprises the following steps:

3.2.3.5.1 initializing the index configuration item subscript loop variable n ═ 1, and let m ═ m' -1;

3.2.3.5.2 let m '═ m' + 1;

3.2.3.5.3 initializing support variable support_m′＝0；

3.2.3.5.4 initialized matching degree variable matched_m′0, counting the number of configuration items matched with the mode;

3.2.3.5.5 order p_m′Corresponding intent class l_m′＝Label_ppxLet p be_m′Reflected of_m′Related sequence pattern_m′＝(pp₁,…,pp_x,…,pp_X-1)；

3.2.3.5.6 judge pattern_m'is d'_nIf so, indicating that a matching sequence is found, and matching_m′＝matched_m′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;

3.2.3.5.7 if l_m′＝label_mIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabled_m′＝support_m′+1, go 3.2.3.5.2; if l_m′≠label_nTo illustrate that although a sequence can be matched, the intent tag corresponding to that sequence does not match, go to 3.2.3.5.8;

3.2.3.5.8 if N is N, go 3.2.3.5.10, if N < N, go 3.2.3.5.9;

3.2.3.5.9 turn 3.2.3.5.2 when n is n + 1;

3.2.3.5.10 calculating p_m′Confidence of (1)_m′：confidence_m′＝support_m′/matched_m′(FEAT Algorithm guarantees p_m′At least a sub-sequence of a sequence in SeqDB, i.e. always matched_m′Not less than 1), and the sequence mode after processing is marked as Pattern_m′＝(pattern_m′,l_m′,confidence_m′) Will Pattern_m′Adding the sequence pattern set SP;

3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }_m′L 1 is less than or equal to M 'is less than or equal to M', wherein M 'is the total number of all modes in SP, M' is less than or equal to M, and the rotation is 3.3; if not, let m be m +1, go to 3.2.3.4;

3.3 configuration item intent data auto-augmentation Module Pair D₂The coding is carried out by the method:

3.3.1 initializing variable t ═ 1;

3.3.2 pairs of dd_tIn (1) U_tThe method for coding each word comprises the following steps:

3.3.2.1 initialize word index variable u_t＝1；

3.3.2.2 will

Is converted into

Wherein

Is composed of

Parts of speech (such as nouns, verbs, adjectives, adverbs, etc.),

is composed of

The root word of (2).

3.3.2.3 judgment

Whether or not at f_tikenIf so, then

Is coded into

Turning to 3.3.2.4; if not, f cannot be used_tokenFor is to

Coding is carried out, directly

Coding is 0, and 3.3.2.4 is turned;

3.3.2.4 if u_t＝U_tThen, it completes the pair dd_tCoding of (2), dd_tIs coded to dd'_t，

Rotating by 3.3.3; if not, let u_t＝u_t+1, go 3.3.2.2;

3.3.3 if T ═ T, let binary (cc)_t,dd′_t) As D₂In<(cc_t,dd_t)>To the encoded set D 'of unlabeled configuration items'₂To obtain D'₂＝{(cc_t,dd′_t) Turning to 3.4, |1 is not less than T and not more than T }; if t<T, rotating to 3.3.4;

3.3.4, converting t to t +1 to 3.3.2;

3.4 configuration item intent data auto amplification Module Using SP to D'₂And (6) labeling. The method comprises the following steps:

3.4.1 setting a confidence threshold, with 0< threshold ≦ 1, which is preferably set to 0.7< threshold ≦ 1;

3.4.2 initializing variable t ═ 1;

3.4.3 initializing a set R of configuration items with tags₁Is an empty set;

3.4.4 initializing a set R of untagged configuration items₂Is an empty set;

3.4.5 initializing the dictionary type variable selector for selecting an intent tag for the tth unlabeled configuration item, let selector [ Label₁]＝0，…，selector[Label_i]＝0，…，selector[Label₇]＝0，selector[Label_i]Indicating that the t-th unmarked configuration item is marked as Label_iThe confidence of (2);

3.4.6 update selector according to pattern set SP obtained by 3.2, the method is:

3.4.6.1 initializing variable m' ═ 1;

3.4.6.2 Pattern from sequence Pattern_m′Reading confidence_m′，l_m′，pattern_m′；

3.4.6.2 if confidence_m′Judging whether pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is more than or equal to the threshold; if confidence_m′<threshold, then the Pattern_m′If the confidence level requirement is not met, turning to 3.4.6.5;

3.4.6.3 if pattern_m′Is dd'_tThe subsequence of (3), which indicates pattern matching, is converted to 3.4.6.4; if not, go to 3.4.6.5;

3.4.6.4 if confidence_m′>selector[l_m′]Then update the selector [ l ]_m′]Instant messenger selector_m′]＝confidence_m′Turning to 3.4.6.5; otherwise, go directly to 3.4.6.5;

3.4.6.5, if M 'is equal to M', traversing all sequence modes, completing updating the selector, and turning to 3.4.7; if M '< M', making M '═ M' +1, go to 3.4.6.2;

3.4.7 according to selector as dd'_tSelecting a label, wherein the method comprises the following steps:

3.4.7.1 initializing candidate tags LC_t＝Label₁；

3.4.7.2 initializing tag subscript variable i-2;

3.4.7.3 if selector [ label_i]>selector[LC_t]To illustrate, label is selected_iConfidence as a label higher than picking LC_tAs confidence of the label, let LC_t＝label_iTurning to 3.4.7.4; if selector [ label ]_i]≤selector[LC_t]Go directly to 3.4.7.4;

3.4.7.4 if i is 7, go to 3.4.7.5; if i is less than 7, let i equal to i +1, go to 3.4.7.3;

3.4.7.5 if selector LC_t]>0, then LC_tAs the intention label of the t-th unlabeled configuration item, will<(cc_t,dd_t),LC_t>Adding R₁Turning to 3.4.8; if selector [ LC_t]If 0, it means that no dd is found in SP_t' matching patterns, not selecting an intent tag for the tth unlabeled configuration item, will<(cc_t,dd_t)>Adding R₂Turning to 3.4.8;

3.4.8 if T is T, the set D of configuration items not marked is completed₂Is marked to obtain R₁And R₂Turning to 3.4.10; if t<T, go to 3.4.9;

3.4.9 converting t to t +1 to 3.4.5;

3.4.10 determination of R₁If the result is an empty set, finishing pair D₁The iterative amplification is terminated to obtain an amplified labeled configuration item set, and 3.4.12 is transferred; if not, turning to 3.4.11;

3.4.11 order D₁＝D₁+R₁Let D₂＝R₂And then, rotating to 3.1;

3.4.12 set D of labeled configuration items at the time of this step₁The set of label placement items after amplification is denoted as D ═<(c_n′,d_n′),label_n′>|1≤n′≤N′,label_n′E.g. Labels, wherein d_n′As configuration item c_n′Description of (1), label_n′As configuration item c_n′N' is the number of configuration items in the amplified labeled configuration item set D. N' is more than or equal to N.

And fourthly, training a configuration item preselection module of the software configuration item preselection system oriented to performance tuning by using the amplified labeling configuration item set D. The method for training the configuration item pre-selection module comprises the following steps:

4.1 use N' Profile D in D₁,…,d_n′,…,d_N′As a training set, Ramos et al, 2003, 1^stAn article published by an instruction Conference on Machine Learning (namely, a first Machine Learning instruction Conference), namely 'Using TF-IDF to determine word dependency in document query', a TF-IDF encoder encorder in a training configuration item pre-selection module is used for encoding a configuration item document, and the input of the encorder is a sentence and the output of the encorder is a vector corresponding to the sentence;

4.2, encoding N 'documents in the D by using an encoder to obtain a vector set V' after encoding, wherein the method comprises the following steps:

4.2.1 initializing vector set V' as an empty set;

4.2.2 initialize loop subscript variable n ═ 1;

4.2.3 Using encoder to convert d_n′Encoded as the n' th vector v_n′；

4.2.4 v_n′Adding V';

4.2.5, if N 'is equal to N', completing encoding of N 'configuration item documents in D to obtain an encoded vector set V', and converting to 4.3; if N '< N', making N '═ N' +1, and rotating to 4.2.3;

4.3 Using training set<v_n′,label_n′>I1. ltoreq.n '. ltoreq.N', and using a paper published in 2018 by Yoni Gavish et al in the Journal of Photogrammetry and Remote Sensing (stage 136) (ISPRS Journal of Photogrammetry and Remote Sensing)A pre-selection model RF of configuration items is trained by a hierarchical random forest algorithm proposed in the compatibility of flat and hierarchical vegetation/Land-Cover classification models in a NATURA 2000site (namely, the performances of a NATURA 2000site middle plane and a hierarchical Habitat/Land Cover classification model are compared), and configuration item pre-selection model parameters are obtained.

And fifthly, pre-selecting the configuration items by the trained configuration item pre-selecting module according to the target software configuration items to obtain a pre-selected configuration item set. Data set DT of object software configuration item<dtc_a,dt_a>1 is more than or equal to a and less than or equal to A, wherein A is the number of configuration items in the target software, dtc_aIs the name of the a-th configuration item, dt_aIs the document for the a-th configuration item. The method comprises the following steps:

5.1 using the encoder obtained by 4.1 training to encode A configuration item documents of the target software, and recording the vector set of the encoded target software as V_dtThe method comprises the following steps:

5.1.1 initializing the set of vectors V of the target software_dtIs an empty set;

5.1.2 initialize loop subscript variable a ═ 1;

5.1.3 Using the encoder dt_aCoded as the a-th vector vv of the target software_a；

5.1.4 v_aAdding V_dt；

5.1.5 if a is equal to a, the a configuration item documents in DT are encoded, and a vector set V of the encoded target software is obtained_dtTurning to 5.2; if a<A, making a equal to a +1, and rotating by 5.1.3;

5.2 configuration item Pre-selection Module after training according to V_dtGenerating a corresponding intention label by the vector corresponding to each configuration item to obtain a predicted intention label list O, wherein the method comprises the following steps:

5.2.1 initializing predicted intention tag list O as an empty list;

5.2.2 initialize cycle index variable a ═ 1;

5.2.3 will vv_aInputting the predicted configuration item into the model RF of the pre-selection module of the trained configuration item to predict the meaning of the a-th configuration item of the target softwareDrawing tag o_aThe method comprises the following steps:

5.2.3.1 initialize the candidate intention label for the a-th configuration item to be o_aLet o_a＝Label₇；

5.2.3.2 will vv_aModel RF input to the trained configuration item preselection module to obtain first tier output pprob, npprob]And second layer output [ prob₁,prob₂,prob₃,prob₄,prob₅,prob₆]Wherein pprob is the probability that the generation prediction configuration item is irrelevant to the performance, npprob is the probability that the configuration item to be predicted is relevant to the performance, prob_iThe intention Label for the configuration item to be predicted is Label_iThe probability of (d);

5.2.3.3 f pprob<npprob, then the RF predicts that the probability that the configuration item is not performance related is greater than the probability that the configuration item is performance related, let o_a＝Label₇5.2.4; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label₁，…，Label_i，…Label₆Which of the other;

5.2.3.4, judging the intention label of the configuration item related to the performance, the method is:

5.2.3.4.1 initializes the candidate intention label subscript ci to 1;

5.2.3.4.2 initializing loop index variable i ═ 1;

5.2.3.4.3 if prob_i>prob_ciIf so, let ci equal i, turn to 5.2.3.4.4; otherwise, go directly to 5.2.3.4.4;

5.2.3.4.4 if i equals 6, then complete the traversal of the RF second layer output, let o_a＝label_ciTurning to 5.2.4; if i<6, changing i to i +1, and turning to 5.2.3.4.3;

5.2.4 mixing_aAdding to the predicted intention tag list O;

5.2.5 if a is equal to a, completing the prediction of all configuration items in DT to obtain a predicted intention label list O, and turning to 5.3; if a is less than A, making a equal to a +1, and rotating to 5.2.3;

5.3 classifying the configuration items according to the intention labels to obtain a set consisting of the configuration items with the same intention types, wherein the method comprises the following steps:

5.3.1 initializing the configuration item set corresponding to 7 kinds of intention labels as an empty set, i.e. ordering

A configuration item set corresponding to the ith intention label;

5.3.2 initialize cycle index variable a ═ 1;

5.3.3 intention tag o according to the a-th configuration item_aName dtc of a configuration item_aJoining corresponding configuration item sets

The preparation method comprises the following steps of (1) performing;

5.3.4 if a<A, making a equal to a +1, and rotating by 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item set

Wherein

The intention Label representing the pre-selected model RF prediction of the configuration item after training is Label_iJ is the J configuration item_iLabel representing RF prediction intention_iThe total number of configuration items of (c).

So far, the trained configuration item pre-selection module completes the goal of pre-selecting configuration items according to the intention categories.

The user can select the configuration item set corresponding to the proper intention category according to the intention of the performance tuning to perform tuning, for example, when the user needs to ensure the reliability of the software during the performance tuning,

the configuration items in (1) can bring about performance improvement, but can cause software reliability to be reduced, and are contrary to the intention of users that the reliability of the software needs to be ensured,

the configuration items in (1) are independent of the software performance, and adjusting the configuration items has no influence on the performance of the software, contrary to the intention for performance tuning, so that the user can select the configuration items preselected by the present invention

And

the configuration items in (1) are subjected to performance tuning to meet the performance tuning requirement.

Compared with the prior art, the invention can achieve the following beneficial effects:

1. by adopting the invention, the configuration item preselection can be carried out on the target software according to the configuration document of the target software without carrying out performance test on the configuration item, and the result is irrelevant to the performance test load, so that the configuration item preselection is lighter, the time consumption of performance tuning is reduced, and the problem of incomplete preselected configuration items caused by the load limitation of the prior art is greatly solved.

2. By adopting the method and the device, the configuration items with vital performance can be preselected, the diversity intention of the user during performance tuning can be considered, and the corresponding configuration items can be preselected according to different intention categories. Compared with the method of the background art (Too Many to-do man Knobs to pipe to fast data base Tuning by Pre-selecting impedance databases Knobs published by Konstatinos Kanellis et al in HotStorage 2020. Database system Tuning is carried out Faster by Pre-selecting Important configuration items) only concerning performance without considering the defect of multi-intention characteristics of users, the invention comprehensively considers the potential influence of other intentions brought by users when carrying out performance Tuning, so that the users can meet the intention of the users to performance when carrying out performance Tuning by using the Pre-selected configuration items.

3. The invention can be used for automatically amplifying the data set. The third step of the invention provides a method for mining sequence patterns from labeled data and amplifying unlabeled data, which can effectively reduce the consumption of manpower and time in the data labeling process. Experiments prove that when the labeled data accounts for 20% (s is 0.2) of the total data amount and the confidence threshold is set to be 0.85(threshold is 0.85), 59.4% of the unlabeled data can be amplified by using the method, the accuracy is 86.4%, the labor and time consumption in the data labeling process is greatly reduced, and the efficiency of data labeling is improved. Compared with the prior art, the method can greatly reduce the dependence of model training on labeled data under the condition of keeping the accuracy rate of the same level as that of the prior art.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a logical block diagram of a first step of the present invention to build a multiple intent sensitive software configuration item preselection system;

FIG. 3 is a third step of the present invention, which is a pre-processing label configuration item set D of the automatic amplification module for configuration item intention data₁Iterative pair of unlabeled configuration item sets D₂The configuration items not marked in the step (1) are marked, and the newly marked configuration items are adopted to amplify D₁And obtaining a flow chart of the amplified labeling configuration item set D.

Detailed Description

The present invention will be described with reference to the accompanying drawings.

As shown in fig. 1, the present invention comprises the steps of:

firstly, a software configuration item preselection system oriented to performance tuning is constructed, and the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module as shown in fig. 2.

The automatic configuration item intention data amplification module is connected with the configuration item preselection module and is also connected with data set source software. The data set source software comprises two parts: a set of annotated configuration items and a set of unlabeled configuration items. The annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document in a manual annotation mode. The configuration item intention data automatic amplification module preprocesses a labeling configuration item set, labels the unlabeled configuration items in the unlabeled configuration item set, adds newly labeled data into the labeling configuration item set from the unlabeled configuration item set until the number of configuration items in the labeling configuration item set is not changed any more, obtains an amplified labeling configuration item set, and sends the amplified labeling configuration item set to the configuration item preselection module.

The configuration item preselection module is connected with the configuration item intention data automatic amplification module, and receives the amplified annotation configuration item set from the configuration item intention data automatic amplification module. The configuration item preselection module comprises a TF-IDF coder encoder and a configuration item preselection model RF. The encoder encodes sentences in the configuration item documents to obtain vectors corresponding to the sentences; and the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model. The configuration item pre-selection module classifies the configuration items of the target software according to the configuration data of the target software, pre-selects the configuration items corresponding to different intention categories, and obtains a pre-selected configuration item set.

Second, a set of configuration items D of the software is sourced from the data set₀Randomly selecting partial configuration items to label intentions to obtain a labeled configuration item set D₁。

2.1 data set Source software including MySQL, Cassandra, MariaDB, Apache-Httpd, Ng13 types of software including ix, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC and Clang. Selecting part of configuration items from data set source software according to the following conditions: 1) belonging to server-side software. The software generally has higher requirements on the performance, reliability, safety and the like of the software, and is beneficial to researching the influence of configuration items on the software; 2) there are a large number of users and over 2,000 stars of software on the largest code hosting platform in the world, the GitHub. The software generally has a large number of users, and the configuration items of the software can be marked to have greater influence; 3) software with more than 100 configuration items. The number of software configuration items is large, and performance tuning is more needed. A configuration item set D consisting of more than 7 thousand configuration items of software that satisfy the above 3 conditions simultaneously₀In the method, configuration items with the proportion of s (wherein s is more than or equal to 0.2) are selected at random manually. The total number of the configuration items is recorded as S, the number of the randomly selected configuration items is N, and N is S multiplied by S, and the configuration items are rounded to obtain an integer.

2.2 according to the official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D₁The method comprises the following steps: according to the document description of the configuration item, if the adjustment of the configuration item can bring about the improvement of the software performance, but the improvement of the performance can simultaneously lead to the reduction of the software reliability, the intention Label of the configuration item is Label₁(ii) a If the software performance can be improved by adjusting the configuration item, but the software security is reduced due to the performance improvement, the intention Label of the configuration item is Label₂(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the functional degradation of the software, the intention Label of the configuration item is Label₃(ii) a If the software performance improvement can be brought about by adjusting the configuration item, but the software use cost is increased when the performance improvement is brought about, the intention Label of the configuration item is Label₄(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label₅(ii) a Can bring softness if the configuration item is adjustedThe performance of the element is improved, the performance is improved, and the first five side effects cannot be caused at the same time, so that the intention Label of the configuration item is Label₆(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label₇。

Wherein W_nIs d_nThe total number of the Chinese words; label_nIs a configuration item c_nThe intention category of (1), Label ═ Label_iI1 ≦ i ≦ 7 is the set of intent tag categories.

Note that the set of T ═ S-N configuration items that were not selected in step 2.1 is referred to as unmarked configuration item set D₂，D₂＝{<(cc_t,dd_t)>L 1 is less than or equal to T is less than or equal to T, wherein cc_tIs D₂The tth configuration item name, dd_tIs the configuration item cc_tThe document of (2). The dd_tCan be expressed as

Wherein U is_tIs dd_tTotal number of words in.

Thirdly, preprocessing a labeling configuration item set D of the automatic configuration item intention data amplification module₁Iterative pair of unlabeled configuration item sets D₂Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items₁And obtaining an amplified labeling configuration item set D, with the method shown in fig. 3:

3.1 configuration item intention data automatic amplification Module pretreatment D₁The method comprises the following steps:

3.1.1 defining dictionary type variable f_labelFor encoding an intention tag class, satisfying f_label[Label₁]＝1，…，f_label[Label_i]＝i，…，f_label[Label₇]＝7(1≤i≤7)；

3.1.2 initializing word mapping maximum index ═ 8;

3.1.4.1 initializing variable n ═ 1;

3.1.4.2 pairs d_nW in_nCoding each word to obtain d_nCoded d'_nThe method comprises the following steps:

3.1.4.2.1 initialize the word index variable w_n＝1；

3.1.4.2.2 will

Conversion into binary

Wherein

Is composed of

The part of speech of (a) is,

is composed of

The root word of (2).

3.1.4.2.3 judgment

Whether or not at f_tokenIf not, will

Encoding into index while key-value pairs are encoded

Adding f_tokenIn and for

Turning 3.1.3.2.4; if so, the method will be used

Coded as a key

Corresponding value, i.e. to

Is coded into

Is a natural number (the value range is 1 to 7), and 3.1.4.2.5 is turned;

3.1.4.2.4 let index be index + 1;

Rotating for 3.1.4.3; if w<W_nTurning to 3.1.4.2.6;

3.1.4.2.6 order w_n＝w_n+1, go 3.1.4.2.2;

3.1.4.3 if N is N, then D will be₁D in (1)_nReplaced by its code d'_nObtaining a preprocessed labeled configuration item set D'₁，D′₁＝{<(c_n,d′_n),label_n>|1≤n≤N,label_nBelongs to Labels, and changes to 3.2; if n is<N, rotating to 3.1.4.4;

3.1.4.4 changing n to n +1, 3.1.4.2;

3.2 configuration item intent data auto-augmentation Module from D'₁The method for mining the sequence mode to obtain a sequence mode set SP comprises the following steps:

3.2.1 use of D'₁Construct sequence set SeqDB ═ ssq₁,…,seq_n,…,seq_NH, seq_nIs composed of configuration items c_nDocument d of_nCoded d'_nAnd c_nIntention label of_nCorresponding code f_label(label_n) Sequences formed by splicing, i.e.

3.2.2 sequence set SeqDB is subjected to sequence pattern mining by using FEAT algorithm in Efficient mining of frequency sequence generators (Efficient mining frequent sequence generator) published by Chuancong Gao et al in WWW2008 to obtain a sequence set P, P ═ { P ═ P { (P {)₁,…,p_m,…,p_MWhere M is the total number of sequence patterns, p_mIs a frequently occurring sequence in the sequence set SeqDB, p_m＝(pp₁,…,pp_x,…pp_X) Corresponding to common expressions in the configuration document, such as frequently occurring words and phrases, X is p_mIs calculated by the FEAT algorithm, pp_xIs p_mThe x-th item of (1) is a code corresponding to a word or an intention label, and satisfies that pp is more than or equal to 1_x<index, specifically 1. ltoreq. pp_xThe intention label is represented at ≦ 7

At f_labelMapping of 8. ltoreq. pp_x<index represents a certain

Form transformed by step 3.1.4.2.2

At f_tokenMapping of (1), i.e

3.2.3 processing the P, reserving sequences related to the intention category, and calculating the corresponding support degree and confidence degree of each sequence to obtain a sequence mode set SP, wherein the method comprises the following steps:

3.2.3.1 initializing sequence pattern set SP as an empty set;

3.2.3.2 initialization sequence traversal variable m ═ 1;

3.2.3.3 initializing sequence mode count variable m ═ 0;

3.2.3.4 determination of p_mThe last pp in_XWhether or not it satisfies pp of 1. ltoreq. pp_XLess than or equal to 7, if yes, pp_XFor coding of intention classes, p_mIn connection with determining the unlabeled configuration item intent categories, go to 3.2.3.5; otherwise, p_mIndependently of determining the unlabeled configuration item intent category, proceed directly to 3.2.3.6;

3.2.3.5.2 let m '═ m' + 1;

3.2.3.5.3 initialization support variable support_m′＝0；

3.2.3.5.5 order p_m′Corresponding intention category

Let p be_m′Reflected of_m′Related sequence pattern_m′＝(pp₁,…,pp_x,…,pp_X-1)；

3.2.3.5.6 judge pattern_m′Is d'_nIf so, indicating that a matching sequence is found, and matching_m′＝matched_m′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;

3.2.3.5.7 if l_m′＝label_nIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabled_m′＝support_m′+1, go 3.2.3.5.2; if l_m′≠label_nTo illustrate that although a sequence can be matched, the intent tag corresponding to that sequence does not match, go to 3.2.3.5.8;

3.2.3.5.8 if N is N, go 3.2.3.5.10, if N < N, go 3.2.3.5.9;

3.2.3.5.9 making n equal to n +1, turn 3.2.3.5.2;

3.2.3.5.10 calculating p_m′Confidence of (2)_m′：confidence_m′＝support_m′/matched_m′(FEAT Algorithm guarantees p_m′At least a sub-sequence of a sequence in SeqDB, i.e. always matted_m′Not less than 1), and the sequence mode after processing is marked as Pattern_m′＝(pattern_m′,l_m′,confidence_m′) Will Pattern_m′Adding the sequence pattern set SP into a sequence pattern set;

3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }_m′|1≤m′≤M 'is the total number of all the modes in the SP, M' is less than or equal to M, and the conversion is 3.3; if not, let m be m +1, go to 3.2.3.4;

3.3 automatic amplification of configuration item intention data Module pair D₂The coding is carried out by the method:

3.3.1 initializing variable t ═ 1;

3.3.2 pairs of dd_tIn (1) U_tThe method for coding the words comprises the following steps:

3.3.2.1 initialize word index variable u_t＝1；

3.3.2.2 will be

Is converted into

Wherein

Is composed of

Parts of speech (such as nouns, verbs, adjectives, adverbs, etc.),

is composed of

The root word of (2).

3.3.2.3 judgment

Whether or not at f_tokenIf so, will

Is coded into

Turning to 3.3.2.4; if not, f cannot be used_tokenTo pair

Encoding is carried out, directly

Code 0, go to 3.3.2.4;

Rotating by 3.3.3; if not, let u_t＝u_t+1, go 3.3.2.2;

3.3.3 if T ═ T, let binary (cc)_t,dd′_t) As D₂In<(cc_t,dd_t)>To the encoded set D 'of unlabeled configuration items'₂To obtain D'₂＝{(cc_t,dd′_t) Turning to 3.4, |1 is not less than T and not more than T }; if t is<T, rotating to 3.3.4;

3.3.4, t is t +1, and then the rotation is carried out for 3.3.2;

3.4.1 set confidence threshold, let 0< threshold ≦ 1, which is preferably set to 0.7< threshold ≦ 1;

3.4.2 initializing variable t ═ 1;

3.4.3 initializing a tagged set of configuration items R₁Is an empty set;

3.4.4 initializing a set R of untagged configuration items₂Is an empty set;

3.4.5 initialize the dictionary type variable selector used to select the intent tag for the tth unlabeled configuration item, let selector [ Label₁]＝0，…，selector[Label_i]＝0，…，selector[Label₇]＝0，selector[Label_i]Indicating that the t-th unmarked configuration item is markedNote Label_iThe confidence of (2);

3.4.6.1 initializing variable m' ═ 1;

3.4.6.2 if confidence_m′Judging whether the pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is larger than or equal to the threshold; if confidence_m′<threshold, then the Pattern_m′If the confidence level requirement is not met, 3.4.6.5 is switched;

3.4.6.3 if pattern_m′Is dd'_tThe subsequence of (3) to illustrate pattern matching, turn to 3.4.6.4; if not, go to 3.4.6.5;

3.4.7 according to selector dd'_tSelecting a label, wherein the method comprises the following steps:

3.4.7.1 initializing candidate tags LC_t＝Label₁；

3.4.7.2 initializing a tag index variable i-2;

3.4.7.3 if selector [ label_i]>selector[LC_t]To illustrate, selecting label_iConfidence as label higher than that of selected LC_tAs confidence of the label, let LC_t＝label_iTurning to 3.4.7.4; if selector [ label_i]≤selector[LC_t]Go directly to 3.4.7.4;

3.4.7.4 if i is 7, go to 3.4.7.5; if i <7, making i equal to i +1, and switching to 3.4.7.3;

3.4.7.5 if selector LC_t]>0, then LC_tAs the t-thAnnotating intent tags of configuration items, will<(cc_t,dd_t),LC_t>Adding R₁Turning to 3.4.8; if selector [ LC_t]If 0, it means that no dd is found in SP_t' matching patterns, not selecting an intent tag for the tth unlabeled configuration item, will<(cc_t,dd_t)>Adding R₂Turning to 3.4.8;

3.4.8 if T is equal to T, completing the process of collecting the configuration items D which are not marked₂Is marked to obtain R₁And R₂Turning to 3.4.10; if t<T, turning to 3.4.9;

3.4.9 making t equal to t +1, rotating 3.4.5;

3.4.10 determination of R₁If the result is an empty set, finishing pair D₁The iterative amplification is terminated to obtain an amplified labeled configuration item set, and the step 3.4.12 is carried out; if not, turning to 3.4.11;

3.4.11 order D₁＝D₁+R₁Let D₂＝R₂And then, rotating to 3.1;

3.4.12 set D of labeled configuration items at the time of this step₁The set of label placement items after amplification is denoted as D ═<(c_n′,d_n′),label_n′>|1≤n′≤N′,label_n′Epsilon Labels, wherein d_n′As configuration item c_n′Description of (1), label_n′As configuration item c_n′N' is the number of configuration items in the amplified labeled configuration item set D. N' is more than or equal to N.

4.1 use N' Profile D in D₁,…,d_n′,…,d_N′As a training set, a TF-IDF method is utilized, a TF-IDF encoder in a training configuration item preselection module is used for encoding a configuration item document, the encoder inputs sentences, and the encoder outputs vectors corresponding to the sentences;

4.2.1 initializing vector set V' as an empty set;

4.2.2 initializing loop index variable n ═ 1;

4.2.3 Using encoder to convert d_n′Encoded as the n' th vector v_n′；

4.2.4 v_n′Adding V';

4.2.5 if N 'is equal to N', completing encoding of N 'configuration item documents in D to obtain a vector set V' after encoding, and turning to 4.3; if N '< N', make N '═ N' +1, rotate by 4.2.3;

4.3 Using training set<v_n′,label_n′>And |1 is not less than N 'is not less than N', and a configuration item preselection model RF is trained by using a hierarchical random forest algorithm to obtain configuration item preselection model parameters.

And fifthly, the trained configuration item preselection module preselecting the configuration items according to the target software configuration items to obtain a preselected configuration item set. Data set DT of object software configuration item<dtc_a,dt_a>1 is more than or equal to a and less than or equal to A, wherein A is the number of configuration items in the target software, dtc_aIs the name of the a-th configuration item, dt_aIs the document for the a-th configuration item. The method comprises the following steps:

5.1 using the encoder obtained by 4.1 training to encode the A configuration item documents of the target software, and recording the vector set of the encoded target software as V_dtThe method comprises the following steps:

5.1.2 initialize loop subscript variable a ═ 1;

5.1.3 use of encoder to convert dt_aCoded as the a-th vector vv of the target software_a；

5.1.4 to convert vv_aAdding V_dt；

5.1.5 if a is equal to a, finishing the coding of the a configuration item documents in the DT, and obtaining a vector set V of the coded target software_dtTurning to 5.2; if a<A, making a equal to a +1, and rotating by 5.1.3;

5.2.1 initializing predicted intention tag list O to an empty list;

5.2.2 initialize cycle index variable a ═ 1;

5.2.3 v_aInputting the predicted intentions label o of the a-th configuration item of the target software into the model RF of the trained configuration item pre-selection module_aThe method comprises the following steps:

5.2.3.1 initialize the candidate intention label for the a-th configuration item to be o_aLet o stand for_a＝Label₇；

5.2.3.2 will vv_aModel RF input to the trained configuration item preselection module to obtain first tier outputs pprob, npprob]And second layer output [ prob₁,prob₂,prob₃,prob₄,prob₅,prob₆]Wherein pprob is the probability that the generation prediction configuration item is not related to the performance, npprob is the probability that the configuration item to be predicted is related to the performance, and prob_iThe intention Label for the configuration item to be predicted is Label_iThe probability of (d);

5.2.3.3 if pprob<npprob, then the RF predicts that the probability that the configuration item is not associated with performance is greater than the probability that the configuration item is associated with performance, let o_a＝Label₇5.2.4; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label₁，…，Label_i，…Label₆Which of the other;

5.2.3.4.1 initializes the candidate intention label subscript ci to 1;

5.2.3.4.2 initializing loop index variable i ═ 1;

5.2.3.4.4 if i is 6, then complete the traversal of the RF second level output, let o_a＝label_ci5.2.4; if i<6, making i equal to i +1, and turning to 5.2.3.4.3;

5.2.4 mixing of_aAdding the predicted intention label list O;

5.2.5 if a is equal to a, completing the prediction of all configuration items in DT, obtaining a predicted intention label list O, and turning to 5.3; if a < A, making a equal to a +1, and rotating by 5.2.3;

5.3 classify the configuration items according to the intention labels to obtain a set consisting of the configuration items with the same intention categories, the method is as follows:

A configuration item set corresponding to the ith intention label;

5.3.2 initialize loop subscript variable a ═ 1;

The preparation method comprises the following steps of (1) performing;

5.3.4 if a<A, making a equal to a +1, and rotating to 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item set

Wherein

The intention Label representing the pre-selected model RF prediction of the trained configuration item is Label_iJ is the J configuration item_iRepresentative of the RF predictive intention tag as Label_iThe total number of configuration items of (1).

In order to verify the effect of the invention, a comparison experiment of the invention and the background technology is carried out on a computer with a Ubuntu18.04 operating system, a 48-core Intel Xeon2.2GHz CPU, a Tesla V100 GPU and 64GB memory. The primary coding language is python 3.8.6. The training process is carried out according to the steps in the specification, PostgreSQL and Cassandra software are used as target software for testing, and 252 and 117 configuration item documents are respectively generated by the PostgreSQL and the Cassandra. Since the prior art does not disclose the technical source code and the experimental result, the comparison is only made with the prior art II. As shown in table 1, the experiment proves that 59.4% of the unlabeled data can be amplified by using the method of the present invention when the labeled data accounts for 20% (s is 0.2) of the total data and the confidence threshold is set to 0.85(threshold is 0.85), with an accuracy of 86.4%, so that the manpower and time consumption in the data labeling process is greatly reduced, and the efficiency of data labeling is improved. The invention can recommend the configuration items related to performance more comprehensively while greatly reducing the overhead. Meanwhile, configuration items can be recommended for user intentions except performance, so that the user can be assisted in tuning, and various user intentions are met.

TABLE 1 comparison of the software configuration item preselection method of the present invention and background Art II

The method for preselecting software configuration items oriented to performance tuning provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein, with the above description being included to assist in understanding the core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A software configuration item preselection method oriented to performance tuning is characterized by comprising the following steps:

the method comprises the following steps that firstly, a software configuration item preselection system oriented to performance tuning is constructed, and the software configuration item preselection system oriented to performance tuning is composed of a configuration item intention data automatic amplification module and a configuration item preselection module;

the automatic configuration item intention data amplification module is connected with the configuration item preselection module and is also connected with data set source software; the data set source software comprises two parts: marking a configuration item set and an unmarked configuration item set; the annotation configuration item set refers to a data set constructed by performing intention type annotation on configuration items according to each configuration item document; the configuration item intention data automatic amplification module preprocesses a labeled configuration item set, labels un-labeled configuration items in the un-labeled configuration item set, adds newly labeled data from the un-labeled configuration item set to the labeled configuration item set until the number of configuration items in the labeled configuration item set is not changed any more, obtains an amplified labeled configuration item set, and sends the amplified labeled configuration item set to a configuration item pre-selection module;

the configuration item preselection module is connected with the configuration item intention data automatic amplification module, and receives the amplified labeling configuration item set from the configuration item intention data automatic amplification module; the configuration item preselection module comprises a TF-IDF encoder and a configuration item preselection model RF; the encoder encodes sentences in the configuration item documents to obtain vectors corresponding to the sentences; the RF is a random forest model with a two-layer structure, and the model is trained by using the amplified label configuration item set to obtain parameters of the random forest model; the configuration item pre-selection module classifies the configuration items of the target software according to the configuration data of the target software, pre-selects the configuration items corresponding to different intention categories and obtains a pre-selected configuration item set;

second, a set of configuration items D of the software is sourced from the data set₀Randomly selecting part of configuration items to label intentions to obtain a labeled configuration item set D₁(ii) a The method comprises the following steps:

2.1 selecting partial configuration items from the data set source software according to the following conditions: 1) software belonging to a server side; 2) software with a large number of users and over 2,000 stars on the code hosting platform, GitHub; 3) configuring software with more than 100 items; a configuration item set D composed of more than 7 thousand configuration items satisfying the 3 pieces of conditional software simultaneously₀Randomly selecting configuration items with the proportion of s; recording the total number of the configuration items as S, randomly selecting the number of the configuration items as N, wherein N is S multiplied by S, and rounding to obtain an integer;

2.2 according to official document description of the selected configuration items, carrying out intention labeling on the N configuration items to obtain a labeled configuration item set D₁The intention Label of the configuration item is Label₁、Label₂、Label₃、Label₄、Label₅；、Label₆、Label₇Seven kinds in total;

2.3 annotating a set D of configuration items₁＝{＜(c_n，d_n)，label_n＞|1≤n≤N，label_nEpsilon. Labels }, wherein c_nIs D₁Name of the nth configuration item, d_nIs a configuration item c_nDocument of d_nIs shown as

Wherein W_nIs d_nThe total number of Chinese words; label_nIs a configuration item c_nThe intention category of (1), Label ═ Label_iI1 is not less than i not more than 7 is a set formed by the intention label categories;

note that in step 2.1, T ═ S-N configuration items that were not selectedThe formed set is marked as an unmarked configuration item set D₂，D₂＝{＜(cc_t，dd_t) 1 ≦ T ≦ T, where cc_tIs D₂Name of the t-th configuration item, dd_tIs the configuration item cc_tThe document of (1); the dd_tIs shown as

Wherein U is_tIs dd_tThe total number of Chinese words;

thirdly, preprocessing and labeling a configuration item set D of the configuration item intention data automatic amplification module₁Iterative pair of unlabeled configuration item sets D₂Labeling unmarked configuration items in the step (A), and amplifying by adopting newly labeled configuration items₁And obtaining an amplified labeling configuration item set D, wherein the method comprises the following steps of:

3.1.1 defining dictionary type variable f_labelFor encoding intent tag categories, satisfy f_label[Label₁]＝1，...，f_label[Label_i]＝i，...，f_label[Label₇]＝7，1≤i≤7；

3.1.2 initializing word mapping maximum index ═ 8;

3.1.3 defining dictionary type variables f_tokenFor encoding words, initializing f_tokenIs an empty dictionary, i.e. f_tokenThe key set is an empty set, in the subsequent steps, binary groups which are less than parts of speech and more than roots of words are gradually added into the key set, and the words are coded according to the parts of speech and the roots of the words;

3.1.4.1 initializing variable n ═ 1;

3.1.4.2 pairs d_nW in_nCoding each word to obtain d_nCoded d'_n，

Wherein

Is composed of

The part of speech of (a) is,

is composed of

The root word of (2);

3.1.4.3 if N ═ N, then D will be₁D in (1)_nIs replaced with its coded d'_nObtaining a preprocessed labeled configuration item set D'₁，D′₁＝{＜(c_n，d′_n)，label_n＞|1≤n≤N，label_nBelongs to Labels, and changes to 3.2; if N is less than N, rotating to 3.1.4.4;

3.1.4.4 changing n to n +1, 3.1.4.2;

3.2.1 use of D'₁Constructing a sequence set SeqDB ═ { seq ═ seq₁，...，seq_n，...，seq_NIn which seq_nIs composed of configuration items c_nDocument d of_nCoded d'_nAnd c_nIntention label of_nCorresponding code f_label(label_n) Sequences formed by splicing, i.e.

3.2.2 sequence set SeqDB is subjected to sequence pattern mining by using a FEAT algorithm to obtain a sequence set P, wherein P is { P ═ P₁，...，p_m，...，p_MWhere M is the total number of sequence patterns, p_mIs a frequently occurring sequence in the sequence set SeqDB, p_m＝(pp₁，...，pp_x，...pp_X) X is p corresponding to the expression in the configuration document_mIs calculated by the FEAT algorithm, pp_xIs p_mThe xth item of (1) is a code corresponding to a word or an intention label, and satisfies pp ≦ 1_x＜index，1≤pp_xThe intention label is represented at ≦ 7

At f_labelMapping of (3), 8 ≦ pp_x< index time represents

The form transformed by step 3.1.4.2

At f_tokenOf (2), i.e.

3.2.3.1 initializing sequence pattern set SP as an empty set;

3.2.3.2 initialization sequence traversal variable m ═ 1;

3.2.3.3 initializing sequence pattern count variable m' ═ 0;

3.2.3.4 determination of p_mThe last pp in_XWhether or not it satisfies pp of 1. ltoreq. pp_XLess than or equal to 7, if yes, pp_XFor coding of intention classes, p_mIn connection with determining the unlabeled configuration item intent category, go to 3.2.3.5; otherwise, p_mIndependent of determining the unlabeled configuration item intent categories, go to 3.2.3.6;

3.2.3.5 let m' +1 and let p_m′＝p_m(ii) a Calculating p_m′Confidence of (1)_m′And the processed sequence Pattern is patterned_m′Adding into the sequence Pattern set SP, Pattern_m′＝(pattern_m′，l_m′，confidence_m′)，pattern_m′Is p_m′Reflected sum of_m′Related sequences,/_m′Is p_m′A corresponding intent category;

3.2.3.6 when M equals M, get sequence Pattern set SP, SP equals { Pattern }_m′|1≤m′≤M′}，Pattern_m′＝(pattern_m′，l_m′，confidence_m′) Wherein M 'is the total number of all the modes in the SP, M' is less than or equal to M, and the rotation is 3.3; if not, let m be m +1, go to 3.2.3.4;

3.3.1 initializing variable t ═ 1;

3.3.2 pairs of dd_tIn (1) U_tCoding the words to obtain dd_tCoded dd'_t，

3.3.3 if T ═ T, let binary (cc)_t，dd′_t) As D₂Mid < (cc)_t，dd_t) Encoding of > to encoded set D 'of unlabeled configuration items'₂To obtain D'₂＝{(cc_t，dd′_t) Turning to 3.4, |1 is not less than T and not more than T }; if T is less than T, turning to 3.3.4;

3.3.4, t is t +1, and then the rotation is carried out for 3.3.2;

3.4 configuration item intent data auto amplification Module Using SP to D'₂Labeling is carried out; the method comprises the following steps:

3.4.1 setting a confidence threshold value threshold, wherein the threshold value is more than 0 and less than or equal to 1;

3.4.2 initialization variable t ═ 1;

3.4.3 initializing a set R of configuration items with tags₁Is an empty set;

3.4.4 initializing a set R of untagged configuration items₂Is an empty set;

3.4.5 initializing the dictionary type variable selector for selecting an intent tag for the tth unlabeled configuration item, let selector [ Label₁]＝0，...，selector[Label_i]＝0，...，selector[Label₇]＝0，selector[Label_i]Means to Label the t-th unlabelled configuration item as Label_iThe confidence of (2);

3.4.6.1 initializing variable m' ═ 1;

3.4.6.2 if confidence_m′Judging whether the pattern matching can be carried out or not by turning to 3.4.6.3 if the threshold is larger than or equal to the threshold; if confidence_m′< threshold, the Pattern_m′If the confidence level requirement is not met, 3.4.6.5 is switched;

3.4.6.4 if confidence_m′＞selector[l_m′]Then update the selector [ l ]_m′]Instant messenger selector_m′]＝confideuce_m′Turning to 3.4.6.5; otherwise, go directly to 3.4.6.5;

3.4.6.5, if M 'is equal to M', traversing all sequence modes, completing updating the selector, and turning to 3.4.7; if M 'is less than M', making M '═ M' +1, switching to 3.4.6.2;

3.4.7.1 initializing candidate tags LC_t＝Label₁；

3.4.7.2 initializing tag subscript variable i-2;

3.4.7.3 if selector [ label_i]＞selector[LC_t]To illustrate, label is selected_iConfidence as label higher than that of selected LC_tAs confidence of the label, let LC_t＝label_iTurning to 3.4.7.4; if selector [ label ]_i]≤selector[LC_t]Go directly to 3.4.7.4;

3.4.7.4 if i is 7, go to 3.4.7.5; if i is less than 7, making i equal to i +1, and turning to 3.4.7.3;

3.4.7.5 if selector LC_t]If greater than 0, LC_tAs the intention label of the tth unlabeled configuration item, will < (cc)_t，dd_t)，LC_tAddition of R₁Turning to 3.4.8; if selector [ LC_t]0 indicates SP is not added with dd'_tMatching patterns, not selecting an intent tag for the tth unlabeled configuration item, will be < (cc)_t，dd_t) Addition of R₂Turning to 3.4.8;

3.4.8 if T is equal to T, completing the process of collecting the configuration items D which are not marked₂Is marked to obtain R₁And R₂Turning to 3.4.10; if T is less than T, 3.4.9 is switched;

3.4.9 converting t to t +1 to 3.4.5;

3.4.11 order D₁＝D₁+R₁Let D₂＝R₂And then, rotating to 3.1;

3.4.12 set D of annotation configuration items at the time this step is reached₁The amplified labeled arrangement item set is denoted as D ═ and (c_n′，d_n′)，label_n′＞|1≤n′≤N′，label_n′E.g. Labels, wherein d_n′As configuration item c_n′Description of (1), label_n′As configuration item c_n′N' is the number of configuration items in the amplified labeling configuration item set D; n' is more than or equal to N;

fourthly, training a configuration item preselection module of the software configuration item preselection system oriented to performance tuning by using the amplified labeling configuration item set D; the method for training the configuration item pre-selection module comprises the following steps:

4.1 use N' Profile D in D₁，...，d_n′，...，d_N′As a training set, a TF-IDF method is utilized, a TF-IDF encoder in a training configuration item preselection module is used for encoding a configuration item document, the encoder inputs sentences, and the encoder outputs vectors corresponding to the sentences;

4.2 encoding N ' documents in D by using an encoder to obtain an encoded vector set V ', wherein N ' encoded vectors exist in the V ', and the nth ' vector V_n′For using encoder pairs d_n′A coded vector;

4.3 Using training set { < v {_n′，label_n′N 'is more than or equal to |1 and less than or equal to N', and a configuration item preselection model RF is trained by using a layered random forest algorithm to obtain configuration item preselection model parameters;

fifthly, the trained configuration item preselection module preselecting configuration items according to the target software configuration items to obtain a preselected configuration item set; recording target software configuration item data set DT { < dtc {_a，dt_a1 ≦ a ≦ A }, where A is the number of configuration items in the target software, dtc_aIs the name of the a-th configuration item, dt_aA document that is the a-th configuration item; the method comprises the following steps:

5.1.2 initialize cycle index variable a ═ 1;

5.1.4 v_aAdding V_dt；

5.1.5 if a is equal to a, finishing the coding of the a configuration item documents in the DT, and obtaining a vector set V of the coded target software_dt5.2; if a is less than A, making a equal to a +1, and rotating by 5.1.3;

5.2 configuration item preselection Module after training according to V_dtGenerating a corresponding intention label by using the vector corresponding to each configuration item to obtain a predicted intention label list O, wherein the method comprises the following steps of:

5.2.1 initializing predicted intention tag list O as an empty list;

5.2.2 initializing loop subscript variable a ═ 1;

5.2.3 will vv_aInputting the predicted intention label o of the a-th configuration item of the target software into a model RF of a trained configuration item pre-selection module_a；

5.2.4 mixing_aAdding the predicted intention label list O;

5.2.5 if a is equal to a, completing the prediction of all configuration items in DT to obtain a predicted intention label list O, and turning to 5.3; if a is less than A, making a equal to a +1, and rotating by 5.2.3;

5.3.1 initializing the configuration item set corresponding to 7 intention labels as an empty set, i.e. ordering

A configuration item set corresponding to the ith intention label;

5.3.2 initialize cycle index variable a ═ 1;

5.3.3 intention tag o according to the a-th configuration item_aName dtc of the a-th configuration item_aJoining corresponding configuration item sets

Performing the following steps;

5.3.4 if a < a, make a equal to a +1, turn 5.3.3; if a is equal to A, finishing the classification of all configuration items in DT to obtain a preselected configuration item set

Wherein

The intention Label representing the pre-selected model RF prediction of the configuration item after training is Label_iJ is the J configuration item_iLabel representing RF prediction intention_iThe total number of configuration items of (1).

2. The performance-oriented tuning software configuration item preselection method of claim 1, wherein the second step of the data set source software comprises 13 types of software selected from MySQL, Cassandra, MariaDB, Apache-Httpd, Nginx, Hadoop-Common, MapReduce, Apache-Flink, HDFS, Keystone, Nova, GCC, and Clang.

3. The performance-oriented tuning software configuration item preselection method as claimed in claim 1, wherein the ratio s in step 2.1 satisfies 1 ≧ s ≧ 0.2, and the confidence threshold in step 3.4.1 satisfies 0.7< threshold ≦ 1.

4. The method of claim 1, wherein the 2.2 steps of the method for intent labeling of N configuration items comprise: according to the document description of the configuration item, if the software performance can be improved by adjusting the configuration item, but the software reliability is reduced due to the improvement of the performance, the intention Label of the configuration item is Label₁(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the safety reduction of the software, the intention Label of the configuration item is Label₂(ii) a If the adjustment of the configuration item can bring about the performance improvement of the software, but the performance improvement can simultaneously cause the functional degradation of the software, the intention Label of the configuration item is Label₃(ii) a If the software performance improvement can be brought about by adjusting the configuration item, but the software use cost is increased when the performance improvement is brought about, the intention Label of the configuration item is Label₄(ii) a If the software performance can be improved by adjusting the configuration item, but the performance is improved and the performance is reduced when other users use the software, the intention Label of the configuration item is Label₅(ii) a If the software performance can be improved by adjusting the configuration item, and the performance is improved without causing the first five side effects, the intention Label of the configuration item is Label₆(ii) a If adjusting the configuration item does not affect the software performance, the intent tag of the configuration item is Label₇。

5. The method of claim 1 wherein said pair d of steps 3.1.4.2 is selected_nW in_nCoding the words to obtain d_nCoded d'_nThe method comprises the following steps:

3.1.4.2.1 initialize the word index variable w_n＝1；

3.1.4.2.2 will

Conversion into binary

Wherein

Is composed of

The part of speech of (a) is,

is composed of

The root word of (2);

3.1.4.2.3 judgment

Whether or not at f_tokenIf not, will

Encoding into index while key-value pairs are encoded

Adding f_tokenIn and for

Turning 3.1.3.2.4; if so, the method will be

Coded as a key

Corresponding value, i.e. to

Is coded into

Turning to 3.1.4.2.5;

3.1.4.2.4 let index be index + 1;

3.1.4.2.5 if w_n＝W_nThen pair d is completed_nCoding each word in the sequence to obtain d_nCoded d'_n，

Finishing; if W is less than W_nTurning to 3.1.4.2.6;

3.1.4.2.6 order w_n＝w_n+1, go to 3.1.4.2.2.

6. The method of claim 1, wherein step 3.2.3.5 calculates p_m′And adding the processed sequence pattern into the sequence pattern set SP, the method comprises the following steps:

3.2.3.5.1 initializing index configuration item index loop variable n ═ 1, and making m ═ m' -1;

3.2.3.5.2 let m '═ m' + 1;

3.2.3.5.3 initialization support variable support_m′＝0；

3.2.3.5.5 order p_m′Corresponding intention category

Let p be_m′Is reflected byAnd l_m′Related sequence pattern_m′＝(pp₁，...，pp_x，...，pp_X-1)；

3.2.3.5.6 judging pattern_m′Is d 'or not'_nIf so, indicating that a matching sequence is found, and matching_m′＝matched_m′+1, go 3.2.3.5.7; if not, go to 3.2.3.5.8;

3.2.3.5.7 if l_m′＝label_nIt is shown that the intention tag can be correctly matched at the same time of sequence matching, so that support is enabled_m′＝support_m′+1, go 3.2.3.5.2; if l_m′≠label_nTo illustrate that although the sequence can be matched, the intent tag corresponding to the sequence does not match, go 3.2.3.5.8;

3.2.3.5.8 if N is equal to N, go to 3.2.3.5.10, if N < N, go to 3.2.3.5.9;

3.2.3.5.9 turn 3.2.3.5.2 when n is n + 1;

3.2.3.5.10 calculating p_m′Confidence of (1)_m′：confidence_m′＝support_m′/matched_m′The sequence mode after the processing is Pattern_m′＝(pattern_m′，l_m′，confidence_m′) Will Pattern_m′Added to the sequence pattern set SP.

7. A performance-oriented tuning software configuration item preselection method as claimed in claim 1, characterized in that said step 3.3.2 refers to dd_tIn (1) U_tThe method for coding each word comprises the following steps:

3.3.2.1 initialize word index variable u_t＝1；

3.3.2.2 will

Is converted into

Wherein

Is composed of

The part of speech of (a) is,

is composed of

The root word of (2);

3.3.2.3 judgment

Whether or not at f_tokenIf so, will

Is coded into

Turning 3.3.2.4; if not, f cannot be used_tokenFor is to

Coding is carried out, directly

Coding is 0, and 3.3.2.4 is turned;

3.3.2.4 if u_t＝U_tTo dd is completed_tCoding of (2), dd_tIs coded to dd'_t，

Finishing; if not, let ut equal ut +1, go to 3.3.2.2.

8. The method of claim 1, wherein in step 4.2, the encoders are used to encode N 'documents in D to obtain a set of encoded vectors V' by:

4.2.1 initializing vector set V' as an empty set;

4.2.2 initializing loop index variable n ═ 1;

4.2.3 Using encoder to convert d_n′Encoded as the n' th vector v_n′；

4.2.4 v_n′Adding V';

4.2.5 if N 'is equal to N', completing the encoding of the N 'configuration item documents in D to obtain an encoded vector set V', and ending; if N ' < N ', let N ' +1, change to 4.2.3.

9. A performance-oriented tuning-based software configuration item preselection method as claimed in claim 1, wherein said step 5.2.3 is to select vv_aInputting the predicted intentions label o of the a-th configuration item of the target software into the model RF of the trained configuration item pre-selection module_aThe method comprises the following steps:

5.2.3.1 initialize the candidate intention label of the a-th configuration item to o_aLet o stand for_a＝Label₇；

5.2.3.2 will vv_aModel RF input to the trained configuration item preselection module to obtain first tier outputs pprob, npprob]And a second layer output [ prob₁，prob₂，prob₃，prob₄，prob₅，prob₆]Wherein pprob is the probability that the generation prediction configuration item is not related to the performance, npprob is the probability that the configuration item to be predicted is related to the performance, and prob_iThe intention Label for the configuration item to be predicted is Label_iThe probability of (d);

5.2.3.3 if pprob < nprob, the RF predicts that the configuration item has a probability of being performance independent greater thanProbability that the configuration item is related to performance, let o_a＝Label₇And ending; if pprob is more than npprob, the RF predicts that the probability that the configuration item is related to the performance is greater than the probability that the configuration item is not related to the performance, that is, the configuration item is a performance-related configuration item, 5.2.3.4 further determines whether the configuration item affects the performance and affects other intentions of the user, that is, determines that the intention Label of the configuration item is Label₁，...，Label_i，...Label₆Which of the other;

5.2.3.4.1 initializes the candidate intention label subscript ci to 1;

5.2.3.4.2 initializing loop index variable i ═ 1;

5.2.3.4.3 if prob_i＞prob_ciIf so, let ci equal i, turn to 5.2.3.4.4; otherwise, go directly to 5.2.4.4;

5.2.3.4.4 if i is 6, then complete the traversal of the RF second level output, let o_a＝label_ciAnd ending; if i < 6, let i equal i +1, go to 5.2.3.4.3.