CN116383277A - Root cause mining method and device, electronic equipment and storage medium - Google Patents

Root cause mining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116383277A
CN116383277A CN202310370164.2A CN202310370164A CN116383277A CN 116383277 A CN116383277 A CN 116383277A CN 202310370164 A CN202310370164 A CN 202310370164A CN 116383277 A CN116383277 A CN 116383277A
Authority
CN
China
Prior art keywords
value
dimension
combination
values
potential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310370164.2A
Other languages
Chinese (zh)
Inventor
黄彦博
刘刚
袁典飘
张炜林
杨帆
于连照
张钋
王轶凡
潘峰
郑星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu com Times Technology Beijing Co Ltd
Original Assignee
Baidu com Times Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu com Times Technology Beijing Co Ltd filed Critical Baidu com Times Technology Beijing Co Ltd
Priority to CN202310370164.2A priority Critical patent/CN116383277A/en
Publication of CN116383277A publication Critical patent/CN116383277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a root cause mining method, a root cause mining device, electronic equipment and a storage medium, relates to the technical field of data processing, and particularly relates to the technical field of intelligent searching and big data. The specific implementation scheme is as follows: obtaining an experimental group value and a control group value corresponding to each dimension value combination under a preset dimension combination, and taking each dimension value combination and the experimental group value and the control group value corresponding to the dimension value combination as an initial leaf node; constructing a data set from a first level to a preset number level based on the initial leaf nodes of each category; and for each category, according to the data sets from the first level to the preset number level of the category, sequentially traversing and calculating generalized potential scores of all dimension value combinations from the first level to the preset number level, and mining the root cause of abnormal change of the experimental group value from the dimension value combinations with the generalized potential scores meeting the root cause condition. Therefore, the root cause can be rapidly and accurately positioned.

Description

Root cause mining method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of intelligent searching and big data technologies.
Background
In the small flow experiment in the product iteration or strategy iteration process, when the experiment index is remarkably abnormal, the root cause dimension causing the abnormal change of the experiment index needs to be positioned, so that the root cause causing the abnormal change of the experiment index under the root cause dimension is checked, and the product or strategy is adjusted based on the root cause. Therefore, the iteration scheme of the product or strategy can be quickly formulated, and the product improvement is driven by a small-flow experiment.
Disclosure of Invention
The disclosure provides a root cause mining method, a root cause mining device, electronic equipment and a storage medium, which specifically comprise the following steps:
in a first aspect, an embodiment of the present disclosure provides a root cause mining method, including:
obtaining an experimental group value and a control group value corresponding to each dimension value combination under a preset dimension combination, and taking each dimension value combination and the experimental group value and the control group value corresponding to the dimension value combination as an initial leaf node;
for each category of initial leaf nodes, constructing a data set from a first level to a preset number of levels based on the category of initial leaf nodes, wherein the data set of the nth level comprises potential leaf nodes corresponding to each dimension value combination under a combination of N dimensions, and the potential leaf nodes corresponding to one dimension value combination comprise: the dimension value combination is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value;
And for each category, according to the data sets from the first level to the preset number level of the category, sequentially traversing and calculating generalized potential scores of all dimension value combinations from the first level to the preset number level, and mining the root cause of abnormal change of the experimental group value from the dimension value combinations with the generalized potential scores meeting the root cause condition.
In a second aspect, embodiments of the present disclosure provide a root cause excavation apparatus, the apparatus comprising:
the acquisition module is used for acquiring experimental group values and comparison group values corresponding to all the dimension value combinations under the preset dimension combinations, and taking each dimension value combination and the experimental group values and the comparison group values corresponding to the dimension value combinations as an initial leaf node;
the constructing module is configured to construct, for each category of initial leaf nodes, a data set from a first level to a preset number of levels based on the category of initial leaf nodes, where the data set of the nth level includes potential leaf nodes corresponding to each dimension value combination under a combination of N dimensions, and a potential leaf node corresponding to one dimension value combination includes: the dimension value combination is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value;
The calculation module is used for sequentially traversing and calculating generalized potential scores of all dimension value combinations of the first level to the preset number level according to the data sets of the first level to the preset number level of the category, and mining root causes which cause abnormal changes of experimental group values from the dimension value combinations of which the generalized potential scores meet root cause conditions.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect described above.
In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect described above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a root cause mining method provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of another root cause mining method provided by embodiments of the present disclosure;
FIG. 3 is a flow chart of yet another root cause mining method provided by an embodiment of the present disclosure;
FIG. 4 is an exemplary flow chart of a root cause mining method provided by embodiments of the present disclosure;
FIG. 5 is an exemplary schematic diagram of a comparison of root cause mining methods of embodiments of the present disclosure with conventional root cause mining algorithms provided by embodiments of the present disclosure;
FIG. 6 is an exemplary schematic diagram of a comparison of root cause mining methods of another embodiment of the present disclosure with conventional root cause mining algorithms provided by embodiments of the present disclosure;
FIG. 7 is a schematic diagram of a root cause excavation device according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing the root cause mining method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the disclosure can be applied to a small flow experiment in a product iteration or strategy iteration process, and the scene that the experiment index is remarkably and abnormally changed. Taking product iteration as an example, an experiment can be carried out on the iterated product, an experiment group value and a control group value generated in the experiment process are obtained, then, the significance test is carried out on each experiment group value, and if the condition that the significance abnormal change exists in the experiment group value exists, root cause excavation is carried out on the experiment group value so as to locate the root cause causing the abnormal change of the experiment group value.
For example, for a small flow experiment of a search engine, assuming that a significant anomaly change exists in an experimental group value of a Page View (PV) of repeated search through a significant test, a root cause of the significant anomaly change of the PV of repeated search needs to be mined.
In order to determine the root cause of abnormal change of the location, the embodiment of the disclosure provides a root cause mining method, which can be applied to an electronic device, as shown in fig. 1, and includes:
s101, obtaining an experimental group value and a control group value corresponding to each dimension value combination under a preset dimension combination, and taking each dimension value combination and the experimental group value and the control group value corresponding to the dimension value combination as an initial leaf node.
The preset dimension combination comprises dimensions suspected to cause remarkable abnormal change of the experimental group value.
After determining the experimental group value with the saliency abnormal change, the dimension which possibly causes the saliency abnormal change of the experimental group value can be manually sorted out, and then the electronic equipment forms the sorted enumerated values of all the dimensions into a cross table to obtain the dimension value combination with each finest granularity, wherein each dimension value combination comprises one dimension value of each sorted dimension.
Wherein, for a dimension, the enumerated value of the dimension refers to all the enumerated dimension values of the dimension.
For example, the dimension values of the operating system dimension may include android systems and iOS systems, and the dimension values of the network type dimension may include WiFi networks, 4G networks, 5G networks, and the like.
And then the electronic equipment can acquire an experimental group value and a comparison group value corresponding to each dimension value combination to obtain a two-dimensional table, and the two-dimensional table is used as basic data.
Each row in the two-dimensional table comprises a combination of dimension values and an experimental group value and a control group value corresponding to the combination of dimension values, and accordingly, each row except for the table head in the two-dimensional table can be used as an initial leaf node. As an example, the two-dimensional table is shown in table 1:
TABLE 1
Figure BDA0004168885830000041
An exemplary set of preset dimensions is shown in table 1, which includes 6 dimensions, each in the header: search type, network type, browser, search classification, whether to log in and operating system. Starting from the second row of table 1, the first 6 columns of each row are a combination of dimension values, and the second two columns are the experimental set value and the control set value corresponding to the combination of dimension values, and the experimental set value and the control set value of the repeated search PV corresponding to each combination of dimension values are exemplarily shown in table 1.
It should be noted that table 1 is only an example for easy understanding, and not all combinations of dimension values are not shown, and the data amount of the basic data in the actual implementation is not limited thereto.
S102, constructing a data set from a first level to a preset number of levels based on the initial leaf nodes of each category aiming at the initial leaf nodes of each category.
The data set of the nth level includes potential leaf nodes corresponding to each dimension value combination under the combination of N dimensions, and the potential leaf node corresponding to one dimension value combination includes: the combination of the dimension values is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value.
The preset number is the number of dimensions included in the preset dimension combination minus 1, and the value range of N is 1 to the preset number.
As an example, if the preset dimension combination includes 3 dimensions, a data set from level 1 to level 2 is constructed.
Wherein, for the first hierarchy, the data set of the first hierarchy includes potential leaf nodes corresponding to each dimension value in 1 dimension. I.e., potential leaf nodes corresponding to the respective dimension values in each dimension need to be constructed separately. When constructing a potential leaf node corresponding to one dimension value, the dimension value can be respectively combined with each dimension value of other single dimensions, and an experimental group value and a control group value of each newly obtained dimension value combination are calculated.
For example, if the 3 dimensions are a search type, whether to log in and operate the system, the dimension values of the search type dimension include sug, se and inp, the dimension values of the log in dimension include 1 and 0, and the dimension values of the operating system dimension include android and iOS.
For example, for a dimension value sug in a search type dimension, respectively combining the sug with each dimension value in a login dimension, where the obtained dimension value combination includes: sug+1 and sug+0; combining sug with each dimension value in the operating system dimension, the resulting dimension value combination comprising: sug+android and sug+ios. That is, the potential leaf nodes corresponding to the dimension value sug in the search type dimension include: sug+1 and corresponding experimental and control values, sug+0 and corresponding experimental and control values, sug+android and corresponding experimental and control values, sug+ios and corresponding experimental and control values.
For a dimension value se in a search type dimension, respectively combining the se with each dimension value in a login dimension or not, wherein the obtained dimension value combination comprises the following steps: se+0 and se+1; combining se with each dimension value in the dimension of the operating system respectively, wherein the obtained dimension value combination comprises the following steps: se+android and se+ios. That is, the potential leaf node corresponding to the dimension value se in the search type dimension includes: se+0 and corresponding experimental and control values, se+1 and corresponding experimental and control values, se+android and corresponding experimental and control values, se+ios and corresponding experimental and control values.
Similarly, for inp in the dimension of the search type and for each dimension value in other dimensions, a plurality of potential leaf nodes corresponding to one dimension value can be obtained in the above manner, which is not described herein.
For the second level, the data set for the second level includes potential leaf nodes corresponding to combinations of dimension values in 2 dimensions. That is, it is necessary to construct potential leaf nodes corresponding to each combination of dimension values in each two dimensions, respectively. When constructing a potential leaf node corresponding to one dimension value combination, each dimension value of the dimension value combination and other single dimensions can be respectively combined, and an experimental group value and a control group value of each newly obtained dimension value combination are calculated.
For example, for a combination of a dimension value sug in a search type dimension and a dimension value 1 in a login dimension, a resulting dimension value combination sug+1 may be combined with each dimension value of an operating system dimension, where the resulting dimension value combination includes: sug+1+android and sug+1+ios. That is, the potential leaf nodes corresponding to the dimension value combination sug+1 include: sug+1+android and corresponding experimental and control values, sug+1+ios and corresponding experimental and control values.
The method of constructing potential leaf nodes for other combinations of dimension values is the same and is not exemplified here.
S103, for each category, according to the data sets from the first level to the preset number level of the category, sequentially traversing and calculating generalized potential scores of all dimension value combinations from the first level to the preset number level, and mining the root cause of abnormal change of the experimental group value from the dimension value combinations of which the generalized potential scores meet the root cause condition.
Wherein, for each dimension value combination, the generalized potential score of the dimension value combination can be calculated by all potential leaf nodes corresponding to the dimension value combination.
In connection with the example in S102 above, the generalized potential scores for the dimension values sug, se, inp, 1, 0, android, and iOS, respectively, are calculated for the first hierarchy.
Taking the dimension value sug as an example, a generalized potential score for the dimension value sug may be calculated based on all potential leaf nodes for which the dimension value sug listed above corresponds.
By adopting the method, the experimental group value and the control group value corresponding to each dimension value combination under the preset dimension combination are obtained, so that a plurality of initial leaf nodes are obtained. And the electronic equipment constructs data sets from a first level to a preset number of levels based on the initial leaf nodes of each category, wherein the data set of the nth level comprises potential leaf nodes corresponding to all dimension value combinations under N dimension combinations, and the potential leaf nodes corresponding to one dimension value combination comprise: the combination of the dimension values and the corresponding experimental group value and the corresponding comparison group value after being respectively combined with each dimension value of other single dimensions are equivalent to the construction of the potential leaf node by using a mode of adding 1 dimension to the combination of 1 dimension value, so that the problem that the experimental group value and the comparison group value corresponding to the potential leaf node are too sparse is avoided. And then traversing the calculated generalized potential scores of the dimensional value combinations by utilizing the data sets from the first level to the preset number level of the category, wherein the generalized potential scores are more stable and accurate. Therefore, in the dimension value combination which meets the root cause condition from the generalized potential score, the root cause which causes the abnormal change of the experimental group value can be positioned more accurately.
In some embodiments of the present disclosure, before the step S102, the initial leaf nodes are further filtered and clustered, as shown in fig. 2, and the method includes steps S201-S206.
Wherein S201 is the same as S101, S205-S206 are the same as S102-S103, and reference is made to the related description in the above embodiment, which is not repeated here.
S202, calculating a difference value of each initial leaf node, wherein the difference value is used for representing the difference between the experimental group value and the control group value of the initial leaf node.
Wherein, based on the experimental group value and the control group value included in each initial leaf node, a difference value of each initial leaf node is calculated, the difference value can be represented by a displacement score (ds 1), and a ds1 calculation formula of the initial leaf node is as follows:
Figure BDA0004168885830000071
where v (e) is the experimental set value for the initial leaf node and f (e) is the control set value for the initial leaf node.
For example, if the experimental group value in the initial leaf node 1 is 1111 and the control group value is 2323, ds1=2, (1111-2323)/(1111+2323) of the initial leaf node 1 is approximately equal to-0.7 according to the above formula.
S203, if the experimental group value has abnormal rising trend, deleting the initial leaf node with the difference value smaller than the average value of the difference values; or if the experimental group value is in an abnormal descending trend, deleting the initial leaf node with the difference value larger than the average value of the difference values.
The difference value average value is obtained by summing the difference values of all the initial leaf nodes and then obtaining the average value.
It can be understood that if the experimental group value abnormally increases, the initial leaf nodes with the difference value smaller than the average value of the difference values do not cause the abnormal increase of the experimental group value, and the probability of digging root causes from the initial leaf nodes is smaller, so that the initial leaf nodes with the difference value smaller than the average value of the difference values are deleted; similarly, if the experimental group value abnormally decreases, the initial leaf nodes with the difference value larger than the average value of the difference values cannot cause the abnormal decrease of the experimental group value, and the probability of digging root causes from the initial leaf nodes is small, so that the initial leaf nodes with the difference value larger than the average value of the difference values are deleted.
For example, if the experimental group value is a failure rate, the experimental group value is abnormal and rises, and the initial leaf node with the difference value larger than the average value of the difference values is the initial leaf node with higher failure rate, so that the root cause is easier to locate in the initial leaf node with higher failure rate, and therefore, the initial leaf node with the difference value smaller than the average value is eliminated.
For another example, if the experimental group value is a qualification rate, the experimental group value is abnormal and is in a descending trend, and the initial leaf node with the difference value smaller than the average value of the difference values is the initial leaf node with lower qualification rate, so that the root cause is easier to locate in the initial leaf node with lower qualification rate, and therefore, the initial leaf node with the difference value larger than the average value of the difference values is eliminated.
S204, clustering the initial leaf nodes based on the difference values of the remaining initial leaf nodes.
Specifically, a one-dimensional k-means (k-means) clustering manner may be adopted, and clustering calculation is performed based on the difference values of the remaining initial leaf nodes, so that the remaining initial leaf nodes are divided into a plurality of categories, and each category comprises a plurality of initial leaf nodes.
By adopting the method, the initial leaf nodes possibly containing root causes are screened out by utilizing the difference values of the initial leaf nodes, so that the screening range is reduced, the number of the initial leaf nodes required to be processed subsequently is reduced, and the calculated amount is reduced. The remaining initial leaf nodes are clustered based on their variance values, thereby locating the root cause in each category. Thus, the root cause is convenient to be positioned rapidly and accurately.
In some embodiments of the present disclosure, as shown in fig. 3, S103, according to the data sets from the first level to the preset number level of the category, the generalized potential score of each of the dimensional value combinations from the first level to the preset number level is sequentially calculated by traversing, and from the dimensional value combinations where the generalized potential score meets the root cause condition, the root cause that causes the abnormal change of the experimental group value is mined, which specifically may include the following steps:
S1031, starting from the first hierarchy, for each dimension value combination included in each hierarchy of the category, calculating a generalized potential score of the dimension value combination according to potential leaf nodes corresponding to the dimension value combination in a data set of the hierarchy.
The generalized potential score is used to characterize the possibility that the combination of dimension values corresponding to the generalized potential score is root cause, and a specific method for calculating the generalized potential score of the combination of dimension values corresponding to the data set by using the data set will be described in detail in the following embodiments.
S1032, adding the dimension value combination into the alternative root cause set when the generalized potential score of the calculated dimension value combination is larger than a first preset threshold value, and pruning the potential leaf nodes subordinate to the dimension value combination.
It will be appreciated that if the generalized potential score of a combination of dimension values is greater than a first preset threshold, it is indicated that the combination of dimension values may be root cause, and thus the combination of dimension values is added to the set of candidate root causes. To avoid that potential leaf nodes under the dimension value combination are traversed again when other dimension value combinations are calculated later, pruning processing can be performed on the potential leaf nodes under the dimension value combination. This allows optimizing the performance. As an example, the first preset threshold may be 0.85.
Where a potential leaf node subordinate to a combination of dimension values refers to a potential leaf node that includes the combination of dimension values and is finer grained, for example, if a combination of dimension values is: search type sug+network type WiFi, one potential leaf node is: search type sug+network type wifi+browser oppo, and corresponding experimental and control group values.
S1033, after traversing the category, mining root causes which cause abnormal change of the experimental group value from the candidate root cause set of the category.
By adopting the method, the generalized potential score is used for measuring the possibility that the dimension value combination corresponding to the generalized potential score is the root cause, so that when the generalized potential score of the dimension value combination is larger than the first preset threshold, the possibility that the dimension value combination is the root cause is higher, and further, the potential leaf nodes subordinate to the dimension value combination are subjected to pruning leaf processing, so that unnecessary calculation can be avoided, and the root cause digging efficiency is improved. In addition, other dimension value combinations which are irrelevant to the dimension value combination can be continuously subjected to deeper traversal calculation, other dimension value combinations which are possibly root causes can be accurately mined, the problem that the dimension value combinations with finer granularity in the subsequent layers cannot be traversed is avoided, and therefore the root causes can be efficiently and accurately mined.
In the embodiment of the present disclosure, S1033 may be specifically implemented as steps a to C.
And step A, determining the similar leaf node duty ratio of the dimension value combination according to each dimension value combination included in the alternative root cause set of the category, wherein the similar leaf node duty ratio is the ratio between the number of potential leaf nodes corresponding to the dimension value combination in the category and the total number of potential leaf nodes corresponding to the dimension value combination in all the categories.
Specifically, for each dimension value combination, the calculation formula of the similar leaf node ratio (nodes score) is as follows:
Figure BDA0004168885830000101
wherein, leaf ele For the number of potential leaf nodes corresponding to the combination of dimension values in the category, leaf total For the potential leaf nodes corresponding to the combination of the dimension values in all the categoriesTotal amount.
For example, the dimension value combination is search type 1 and network type WiFi, the dimension value combination belongs to class a, the number of potential leaf nodes corresponding to the dimension value combination in class a is 70, and the number of potential leaf nodes corresponding to the dimension value combination in all classes is 100, and then the depth score of the dimension value combination is 0.7.
And B, if the occupancy rate of the similar leaf nodes is smaller than a second preset threshold, deleting the dimension value combination from the alternative root cause set.
It can be appreciated that if the ratio of the similar leaf nodes is smaller than the second preset threshold, it indicates that only a small part of the potential leaf nodes of the dimension value combination belong to the category, and possibly the sample size is smaller, so that the generalized potential score result of the dimension combination is inaccurate, and the dimension value combination is mistakenly added into the alternative root cause set. The likelihood that the combination of dimension values is the root cause of the category is therefore small and the combination of dimension values can be deleted from the set of candidate root causes. As an example, the second preset threshold may be 0.5.
And C, selecting the dimension value combination with the greatest influence on the whole experiment group value from the remaining dimension value combinations in the alternative root cause set of the category as the root cause of the category.
The influence of the dimension value combination on the whole experiment group value can be represented by an influence parameter, and then the root cause of the category is determined according to the influence parameter, which comprises the following steps: and calculating the influence parameter of each dimension value combination from the remaining dimension value combinations in the candidate root cause set of the category, and taking the dimension value combination with the largest influence parameter as the root cause of the category.
The influence parameter is a difference value between a first variation amplitude and a second variation amplitude, the first variation amplitude is a difference value between an integral experiment group value and an integral comparison group value, and the second variation amplitude is a difference value between the integral experiment group value and the integral comparison group value of the remaining dimension value combination after the dimension value combination is removed. The influence parameter may be specifically d (S 1 ),d(S 1 ) The way in which (a) is calculated will be described in the following embodiments.
By adopting the method, aiming at each dimension value combination included in each hierarchy of the category, if the ratio of the similar leaf nodes is smaller than a preset threshold value, the dimension value combination is a false positive dimension value combination, and the dimension value combination is deleted, so that the problem that the generalized potential score calculation result of the dimension value combination is inaccurate due to the fact that the sample size is small can be avoided, and then the false positive dimension value combination is added into the alternative root cause set by mistake. Further reduces the screening range of root causes and is convenient for rapidly positioning the root causes. Meanwhile, the false positive dimension values in the alternative root cause set are deleted in a combined mode, and the root cause mining accuracy is improved. Furthermore, the influence parameter can be used to characterize the influence of the dimension value combination corresponding to the influence parameter on the whole experiment group value in the remaining dimension value combinations in the candidate root cause set of the category, so that the possibility that the dimension value combination with the largest influence parameter is the root cause of the category is the highest in the remaining dimension value combinations, and therefore the dimension value combination with the largest influence parameter is taken as the root cause of the category. Therefore, the root cause can be rapidly and accurately positioned.
In some embodiments of the present disclosure, computing a generalized potential score for the combination of dimension values from the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy includes:
in the case where the experimental set of values is a proportional index, a generalized potential score (generalized potential score, gps) for the combination of dimensional values is calculated according to the following formula:
Figure BDA0004168885830000111
as an example, as shown in table 2, the experimental group value and the control group value corresponding to each line of the dimensional value combinations in table 2 are both proportional indexes, so that the gps of each dimensional value combination included in table 2 can be calculated based on the above-described gps calculation formula and table 2.
TABLE 2
Figure BDA0004168885830000112
Tables 2 and 1 show the same preset dimensional combinations, except that the experimental group value and the control group value corresponding to each dimensional value combination in table 1 are proportional indexes.
It should be noted that table 2 is only an example for easy understanding, and not all combinations of dimension values are not shown, and the data amount of the basic data in the actual implementation is not limited thereto.
In the case where the experimental set of values is a non-proportional index, the generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure BDA0004168885830000121
wherein v (S) 1leaf ) For the experimental set of values for the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy,
Figure BDA0004168885830000122
For the comparison group value of the potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, v (S 1total ) For the sum of experimental group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, f (S 1total ) The sum of the control group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy. wavg () refers to taking a weighted average of the values calculated in brackets for each potential leaf node.
In the case where the experimental group value is a proportional index, d (S 1 )=(v(S)-f(S))-(v(S 2 )-f(S 2 ) V (S) is the overall experimental group value of all potential leaf nodes in the data set of the hierarchy, and the overall experimental group value is the sum of the experimental group values of all potential leaf nodes in the data set of the hierarchy; f (S) is the overall comparison set value of all the potential leaf nodes in the data set of the hierarchy, and the overall comparison set value is the sum of the comparison set values of all the potential leaf nodes in the data set of the hierarchy; v (S) 2 ) For potential leaf segments in the hierarchy's dataset other than the combination of dimension valuesA global experimental group value of points, the global experimental group value being a sum of experimental group values of each potential leaf node in the hierarchical dataset other than the combination of dimension values; f (S) 2 ) The method comprises the step of setting a total comparison set value of potential leaf nodes except for the dimension value combination in the data set of the hierarchy, wherein the total comparison set value is the sum of the comparison set values of all potential leaf nodes except for the dimension value combination in the data set of the hierarchy.
In the case where the experimental set of values is a non-proportional index,
Figure BDA0004168885830000123
by adopting the method, the embodiment of the disclosure improves the gps calculation formula, on one hand, the arithmetic average of wavg from each node is changed into the weighted average of each potential leaf node, and the problem that the accuracy of the gps calculation result is lower due to too sparse experimental group values and comparison group values of the potential leaf nodes is avoided. On the other hand, in the case that the experimental group value is a non-proportional index, the absolute difference value in the calculation formula of the proportional index gps is changed into the relative difference value, so that the dimensions of the experimental group value of the non-proportional index can be unified, and the stability of the gps calculation result is improved. On the other hand, for two gps calculation formulas, the sum of the differences of the experimental group value and the control group value corresponding to the other dimension value combinations except the specified dimension value combination in the molecule is deleted, so that the numerical value of the molecule can be correspondingly reduced, and gps is more sensitive under the condition of multiple root causes. Furthermore, the root cause can be positioned more accurately.
The following describes a complete flow of the root cause mining method provided by the embodiment of the present disclosure with reference to fig. 4, and as shown in fig. 4, the method includes:
S401, acquiring data based on a preset dimension combination, and generating a two-dimensional table.
Each dimension in the preset dimension combination is a dimension corresponding to an experimental index with significant abnormal change obtained by manual arrangement, the preset dimension combination comprises a plurality of dimension values, the electronic equipment forms a cross table from all the dimension values in the preset dimension combination, and each behavior of the cross table is a dimension value combination with the finest granularity.
And obtaining an experimental group value and a comparison group value corresponding to each dimension value combination, and generating a two-dimensional table.
S402, taking each row in the two-dimensional table as an initial leaf node, and calculating a difference value for each initial leaf node.
Specifically, the ds1 calculation formula refers to the related description in the above embodiment.
Taking the initial leaf node corresponding to the last row in table 1 as an example, assuming that the experimental group value at which the significant anomaly change occurs is the repeated search PV, ds1=2 (2-23)/(2+23) = -1.68 for the initial leaf node, and ds1 for the initial leaf node is-1.68.
S403, if the experimental group value is abnormal and rises, deleting the initial leaf node with the difference value smaller than the average value of the difference values.
S404, if the experimental group value is abnormal and the descending trend is generated, deleting the initial leaf node with the difference value larger than the average value of the difference values.
S405, the remaining initial leaf nodes fall into the alternative dataset.
S406, clustering the initial leaf nodes in the alternative data set.
S407, constructing a data set from a first level to a preset number of levels for the initial leaf nodes in each category.
The method of constructing the data set has been described in the above embodiments, and will not be described here again.
S408, based on the data sets from the first level to the preset number level, the gps of each dimension value combination from the first level to the preset number level is sequentially traversed and calculated.
Every time one gps is calculated, S410 is performed.
S409, judging whether the gps is larger than 0.85.
If yes, execution is S410, and if no, execution is S408.
S410, pruning potential leaf nodes corresponding to the dimension value combinations with the gps larger than 0.85, and continuing to traverse other dimension value combinations.
And S411, storing the dimension value combinations with the gps more than 0.85 in the alternative root cause set after traversing, and screening the dimension value combinations in the alternative root cause set.
Taking table 1 as an example, the dimension value combination search type sug+browser oppo is one dimension value combination in the candidate root cause set, the number of potential leaf nodes in the category B corresponding to the candidate root cause set of the dimension value combination search type sug+browser oppo is 70, the total number of potential leaf nodes in the data set corresponding to the dimension value combination search type sug+browser oppo is 100, and then the ratio of similar leaf nodes corresponding to the dimension value combination search type sug+browser oppo is 0.7, and the ratio of similar leaf nodes characterizes that the dimension value combination search type sug+browser oppo is the root cause of the category B, so that the dimension value combination search type sug+browser oppo can be reserved in the candidate root cause set.
On the contrary, the dimension value combination search type sug+network type 4 and the dimension value combination search type sug+network type 4 belong to the same candidate root cause set, the number of potential leaf nodes of the dimension value combination search type sug+network type 4 in the category B corresponding to the candidate root cause set is 20, the total number of potential leaf nodes corresponding to the dimension value combination search type sug+network type 4 is 100, the ratio of similar leaf nodes corresponding to the dimension value combination search type sug+network type 4 is 0.2, and the ratio of similar leaf nodes represents that the probability that the dimension value combination search type sug+network type 4 is the root cause of the category B is low, so that the dimension value combination search type sug+network type 4 can be deleted from the candidate root cause set.
S412, according to the influence parameters corresponding to the dimensional value combinations in the alternative root cause set, the dimensional value combinations in the alternative root cause set are arranged in a reverse order, and the first bit is taken as the root cause.
Specifically, a dimension value combination with the largest influence parameter is determined from gps corresponding to each dimension value combination, and the dimension value combination corresponding to the largest influence parameter is taken as root cause output.
Wherein the influence parameter is d (S) 1 )。
S413, whether all classes have been traversed.
If yes, S414 is executed, and if no, S407 is executed.
S414, outputting root causes of all classes.
By adopting the method, the initial leaf nodes are screened out by using the difference values, so that the number of the initial leaf nodes to be processed is reduced, and the calculated amount can be reduced. Constructing a data set from a first level to a preset number of levels for the initial leaf nodes of each category, and traversing and calculating the gps of each dimension value combination by utilizing the potential leaf nodes corresponding to each dimension value combination. This can improve the stability of gps. Furthermore, the gps calculation method described above can be applied to finer-grained combinations of dimension values, and thus the method is also applicable to the case of dimensional explosion. After traversing, storing the dimension value combination with the gps more than 0.85 in the candidate root cause set, screening the false positive dimension value combination in the candidate root cause set by using the similar leaf node duty ratio, further narrowing the screening range of the root cause, and finally determining the maximum d (S 1 ) And the corresponding dimension value combination is taken as root cause output. Thus, the root cause can be accurately positioned.
Based on the embodiment, in a small flow experimental scene, comparing the gps result output by the gps calculation formula provided by the embodiment of the disclosure with the gps result output by the traditional squeeze method, wherein the squeeze method is a root cause mining algorithm. As shown in fig. 5, the abscissa of fig. 5 is the relative difference, and the ordinate is gps, wherein the relative difference is used to characterize the influence of a specified root factor on the overall change in experimental set values, the relative difference = (overall experimental set value under specified dataset-overall control set value under specified dataset)/overall control set value under specified dataset. The designated dataset is the dataset used during the low-flow experiments.
In fig. 5, the experimental group value is a non-proportional index, 100 gps results within 0% -5% of the relative difference are calculated by using the gps calculation formula provided by the embodiment of the present disclosure, and the 100 gps results are connected to form a curve, where the curve is a gps result curve obtained by using the gps calculation formula provided by the embodiment of the present disclosure; similarly, a gps result curve corresponding to the traditional squeeze method can be obtained.
The upper graph of fig. 5 is a gps result graph obtained using the gps calculation formula provided by the embodiments of the present disclosure, and the lower graph of fig. 5 is a gps result graph obtained using the conventional squeeze method. Therefore, in the case that the experimental group value is a non-proportional index, after the relative difference value is greater than 1.14%, the gps result obtained by using the gps calculation formula provided by the embodiment of the disclosure can stably reach more than 0.85, and the gps result curve corresponding to the traditional squeeze method in fig. 5 is always not 0.85.
Similarly, as shown in fig. 6, the abscissa of fig. 6 is the relative difference, and the ordinate is gps. In fig. 6, the experimental group values are proportional indicators. The gps result curve obtained by the gps calculation formula provided by the embodiment of the present disclosure is the upper curve in fig. 6, the gps result curve obtained by the conventional squeeze method is the lower curve in fig. 6, after the relative difference is greater than 0.96%, the gps result obtained by the gps calculation formula provided by the embodiment of the present disclosure can stably reach more than 0.85, and the gps result curve corresponding to the conventional squeeze method always fluctuates up and down at 0.
Therefore, the gps calculation formula provided by the embodiment of the disclosure is higher in sensitivity, and root causes can be effectively excavated by adopting the root cause excavation method provided by the embodiment of the disclosure.
It should be noted that fig. 5 and fig. 6 are only examples of verification, and the gps calculated by the root cause mining method provided by the embodiments of the present disclosure is not limited thereto in practical application.
Based on the same conception, the disclosed embodiment provides a root cause excavating device, as shown in fig. 7, comprising:
the obtaining module 701 is configured to obtain an experimental group value and a control group value corresponding to each dimension value combination under a preset dimension combination, and take each dimension value combination and the experimental group value and the control group value corresponding to the dimension value combination as an initial leaf node.
A construction module 702, configured to construct, for each category of initial leaf nodes, a data set from a first level to a preset number of levels based on the category of initial leaf nodes, where the data set of the nth level includes potential leaf nodes corresponding to each dimension value combination under a combination of N dimensions, and a potential leaf node corresponding to one dimension value combination includes: the combination of the dimension values is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value.
The calculating module 703 is configured to, for each category, sequentially traverse and calculate the generalized potential score of each dimension value combination of the first level to the preset number level according to the data set of the first level to the preset number level of the category, and mine the root cause that causes the abnormal change of the experimental group value from the dimension value combinations of which the generalized potential score meets the root cause condition.
Optionally, the device further includes a deletion module and a clustering module:
the calculating module 703 is further configured to calculate a difference value of each initial leaf node, where the difference value is used to represent a difference between the experimental group value and the control group value of the initial leaf node.
The deleting module is used for deleting the initial leaf node with the difference value smaller than the average value of the difference values if the experimental group value has abnormal ascending trend; or if the experimental group value is in an abnormal descending trend, deleting the initial leaf node with the difference value larger than the average value of the difference values.
And the clustering module is used for clustering the initial leaf nodes based on the difference values of the rest initial leaf nodes.
Optionally, the computing module 703 is specifically configured to:
starting from the first hierarchy, for each dimension value combination included in each hierarchy of the category, calculating a generalized potential score of the dimension value combination according to potential leaf nodes corresponding to the dimension value combination in a dataset of the hierarchy.
And adding the dimension value combination into the alternative root cause set whenever the generalized potential score of the calculated dimension value combination is larger than a first preset threshold value, and pruning the potential leaf nodes subordinate to the dimension value combination.
After the traversal of the class is completed, the root causes that cause the abnormal change of the experimental group value are mined from the candidate root cause set of the class.
Optionally, the computing module 703 is specifically configured to:
and determining the similar leaf node duty ratio of each dimension value combination included in the alternative root cause set of the category, wherein the similar leaf node duty ratio is the ratio between the number of potential leaf nodes corresponding to the dimension value combination in the category and the total number of potential leaf nodes corresponding to the dimension value combination in all the categories.
And if the occupancy ratio of the similar leaf nodes is smaller than a second preset threshold value, deleting the dimension value combination from the alternative root cause set.
And selecting the dimension value combination with the greatest influence on the whole experiment group value from the remaining dimension value combinations in the candidate root cause set of the category as the root cause of the category.
Optionally, the computing module 703 is specifically configured to:
and calculating an influence parameter of each dimension value combination from the remaining dimension value combinations in the alternative root cause sets of the category, wherein the influence parameter is a difference value between a first variation amplitude and a second variation amplitude, the first variation amplitude is a difference value between an integral experiment group value and an integral comparison group value, and the second variation amplitude is a difference value between the integral experiment group value and the integral comparison group value of the remaining dimension value combination after the dimension value combination is removed.
And combining the dimension value with the largest influence parameter as the root cause of the category.
Optionally, the computing module 703 is specifically configured to:
in the case where the experimental set of values is a proportional index, a generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure BDA0004168885830000181
alternatively, in the case where the experimental set of values is a non-proportional index, the generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure BDA0004168885830000182
Wherein v (S) 1leaf ) For the experimental set of values for the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy,
Figure BDA0004168885830000183
for the comparison group value of the potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, v (S 1total ) For the sum of experimental group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, f (S 1total ) The sum of the control group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy. wavg () refers to taking a weighted average of the values calculated in brackets for each potential leaf node.
In the case where the experimental group value is a proportional index, d (S 1 )=(v(S)-f(S))-(v(S 2 )-f(S 2 ) V (S) is the overall experimental set value of all potential leaf nodes in the data set of the hierarchy, f (S) is the overall control set value of all potential leaf nodes in the data set of the hierarchy, v (S) 2 ) For the global experimental set of values for potential leaf nodes in the hierarchical dataset other than the dimensional value combination, f (S 2 ) Is the global control group value for the potential leaf nodes in the hierarchical dataset except for the combination of dimension values.
In the case where the experimental set of values is a non-proportional index,
Figure BDA0004168885830000184
according to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as root cause mining methods. For example, in some embodiments, the root cause mining method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the root cause mining method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the root cause mining method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (15)

1. A root cause mining method, comprising:
obtaining an experimental group value and a control group value corresponding to each dimension value combination under a preset dimension combination, and taking each dimension value combination and the experimental group value and the control group value corresponding to the dimension value combination as an initial leaf node;
for each category of initial leaf nodes, constructing a data set from a first level to a preset number of levels based on the category of initial leaf nodes, wherein the data set of the nth level comprises potential leaf nodes corresponding to each dimension value combination under a combination of N dimensions, and the potential leaf nodes corresponding to one dimension value combination comprise: the dimension value combination is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value;
And for each category, according to the data sets from the first level to the preset number level of the category, sequentially traversing and calculating generalized potential scores of all dimension value combinations from the first level to the preset number level, and mining the root cause of abnormal change of the experimental group value from the dimension value combinations with the generalized potential scores meeting the root cause condition.
2. The method of claim 1, further comprising, prior to constructing the first through a preset number of levels of data sets based on the initial leaf nodes for each category, for the initial leaf nodes for that category:
calculating a difference value of each initial leaf node, wherein the difference value is used for representing the difference between an experimental group value and a control group value of the initial leaf node;
if the experimental group value is abnormal and the upward trend is generated, deleting the initial leaf node with the difference value smaller than the average value of the difference values; or if the experimental group value is in an abnormal descending trend, deleting the initial leaf node with the difference value larger than the average value of the difference values;
and clustering the initial leaf nodes based on the difference values of the remaining initial leaf nodes.
3. The method according to claim 1, wherein the sequentially traversing the generalized potential score of each combination of the dimension values of the first level to the preset number level according to the data set of the first level to the preset number level of the category, mining the root cause causing the abnormal change of the experiment group value from the combination of the dimension values of which the generalized potential score satisfies the root cause condition, includes:
Starting from the first hierarchy, for each dimension value combination included in each hierarchy of the category, calculating a generalized potential score of the dimension value combination according to potential leaf nodes corresponding to the dimension value combination in a dataset of the hierarchy;
adding the dimension value combination into an alternative root cause set when the generalized potential score of the calculated dimension value combination is larger than a first preset threshold value, and pruning the potential leaf nodes subordinate to the dimension value combination;
after the traversal of the class is completed, the root causes that cause the abnormal change of the experimental group value are mined from the candidate root cause set of the class.
4. A method according to claim 3, wherein the mining of the root cause that caused the abnormal change in experimental group value from the set of candidate root causes of that class comprises:
for each dimension value combination included in the alternative root cause set of the category, determining the same-class leaf node ratio of the dimension value combination, wherein the same-class leaf node ratio is the ratio between the number of potential leaf nodes corresponding to the dimension value combination in the category and the total number of potential leaf nodes corresponding to the dimension value combination in all the categories;
if the similar leaf node duty ratio is smaller than a second preset threshold value, deleting the dimension value combination from the alternative root cause set;
And selecting the dimension value combination with the greatest influence on the whole experiment group value from the remaining dimension value combinations in the candidate root cause set of the category as the root cause of the category.
5. The method of claim 4, wherein selecting, from among the remaining combinations of dimension values in the candidate root cause set of the category, the combination of dimension values that has the greatest influence on the overall experiment set value as the root cause of the category comprises:
calculating influence parameters of each dimension value combination from the remaining dimension value combinations in the alternative root cause sets of the category, wherein the influence parameters are differences between a first variation amplitude and a second variation amplitude, the first variation amplitude is a difference between an integral experiment group value and an integral comparison group value, and the second variation amplitude is a difference between the integral experiment group value and the integral comparison group value of the remaining dimension value combination after the dimension value combination is removed;
and combining the dimension value with the largest influence parameter as the root cause of the category.
6. The method of any of claims 3-5, wherein said calculating a generalized potential score for the combination of dimension values from the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy comprises:
In the case where the experimental set of values is a proportional index, a generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure FDA0004168885820000031
alternatively, in the case where the experimental set of values is a non-proportional index, the generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure FDA0004168885820000032
wherein v (S) 1leaf ) For the experimental set of values for the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy,
Figure FDA0004168885820000033
f(S 1leaf ) For the comparison group value of the potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, v (S 1total ) For the sum of experimental group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, f (S 1total ) The sum of the control group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy is used; wavg () refers to taking a weighted average of the values calculated in brackets for each potential leaf node;
in the case where the experimental group value is a proportional index, d (S 1 )=(v(S)-f(S))-(v(S 2 )-f(S 2 ) V (S) is all potential leaves in the hierarchical datasetThe overall experimental group value of the node, f (S) is the overall control group value of all potential leaf nodes in the data set of the hierarchy, v (S) 2 ) For the global experimental set of values for potential leaf nodes in the hierarchical dataset other than the dimensional value combination, f (S 2 ) Global control group values for potential leaf nodes in the hierarchical dataset other than the combination of dimension values;
in the case where the experimental set of values is a non-proportional index,
Figure FDA0004168885820000034
7. a root cause excavation device, the device comprising:
the acquisition module is used for acquiring experimental group values and comparison group values corresponding to all the dimension value combinations under the preset dimension combinations, and taking each dimension value combination and the experimental group values and the comparison group values corresponding to the dimension value combinations as an initial leaf node;
the constructing module is configured to construct, for each category of initial leaf nodes, a data set from a first level to a preset number of levels based on the category of initial leaf nodes, where the data set of the nth level includes potential leaf nodes corresponding to each dimension value combination under a combination of N dimensions, and a potential leaf node corresponding to one dimension value combination includes: the dimension value combination is respectively combined with each dimension value of other single dimensions to form a corresponding experimental group value and a corresponding control group value;
the calculation module is used for sequentially traversing and calculating generalized potential scores of all dimension value combinations of the first level to the preset number level according to the data sets of the first level to the preset number level of the category, and mining root causes which cause abnormal changes of experimental group values from the dimension value combinations of which the generalized potential scores meet root cause conditions.
8. The apparatus of claim 7, further comprising a deletion module and a clustering module:
the calculation module is further used for calculating a difference value of each initial leaf node, wherein the difference value is used for representing the difference between an experimental group value and a control group value of the initial leaf node;
the deleting module is used for deleting the initial leaf node with the difference value smaller than the average value of the difference values if the experimental group value has abnormal ascending trend; or if the experimental group value is in an abnormal descending trend, deleting the initial leaf node with the difference value larger than the average value of the difference values;
and the clustering module is used for clustering the initial leaf nodes based on the difference values of the rest initial leaf nodes.
9. The apparatus of claim 7, wherein the computing module is specifically configured to:
starting from the first hierarchy, for each dimension value combination included in each hierarchy of the category, calculating a generalized potential score of the dimension value combination according to potential leaf nodes corresponding to the dimension value combination in a dataset of the hierarchy;
adding the dimension value combination into an alternative root cause set when the generalized potential score of the calculated dimension value combination is larger than a first preset threshold value, and pruning the potential leaf nodes subordinate to the dimension value combination;
After the traversal of the class is completed, the root causes that cause the abnormal change of the experimental group value are mined from the candidate root cause set of the class.
10. The apparatus of claim 9, wherein the computing module is specifically configured to:
for each dimension value combination included in the alternative root cause set of the category, determining the same-class leaf node ratio of the dimension value combination, wherein the same-class leaf node ratio is the ratio between the number of potential leaf nodes corresponding to the dimension value combination in the category and the total number of potential leaf nodes corresponding to the dimension value combination in all the categories;
if the similar leaf node duty ratio is smaller than a second preset threshold value, deleting the dimension value combination from the alternative root cause set;
and selecting the dimension value combination with the greatest influence on the whole experiment group value from the remaining dimension value combinations in the candidate root cause set of the category as the root cause of the category.
11. The apparatus of claim 10, wherein the computing module is specifically configured to:
calculating influence parameters of each dimension value combination from the remaining dimension value combinations in the alternative root cause sets of the category, wherein the influence parameters are differences between a first variation amplitude and a second variation amplitude, the first variation amplitude is a difference between an integral experiment group value and an integral comparison group value, and the second variation amplitude is a difference between the integral experiment group value and the integral comparison group value of the remaining dimension value combination after the dimension value combination is removed;
And combining the dimension value with the largest influence parameter as the root cause of the category.
12. The apparatus according to any of claims 9-11, wherein the computing module is specifically configured to:
in the case where the experimental set of values is a proportional index, a generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure FDA0004168885820000051
alternatively, in the case where the experimental set of values is a non-proportional index, the generalized potential score for the combination of dimensional values is calculated according to the following formula:
Figure FDA0004168885820000052
wherein v (S) 1leaf ) For the experimental set of values for the potential leaf nodes corresponding to the combination of dimension values in the data set of the hierarchy,
Figure FDA0004168885820000053
f(S 1leaf ) For the comparison group value of the potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, v (S 1total ) For the sum of experimental group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy, f (S 1total ) The sum of the control group values of each potential leaf node corresponding to the dimension value combination in the data set of the hierarchy is used; wavg () refers to taking a weighted average of the values calculated in brackets for each potential leaf node;
in the case where the experimental group value is a proportional index, d (S 1 )=(v(S)-f(S))-(v(S 2 )-f(S 2 ) V (S) is the overall experimental set value of all potential leaf nodes in the data set of the hierarchy, f (S) is the overall control set value of all potential leaf nodes in the data set of the hierarchy, v (S) 2 ) For the global experimental set of values for potential leaf nodes in the hierarchical dataset other than the dimensional value combination, f (S 2 ) Global control group values for potential leaf nodes in the hierarchical dataset other than the combination of dimension values;
in the case where the experimental set of values is a non-proportional index,
Figure FDA0004168885820000054
13. an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
CN202310370164.2A 2023-04-07 2023-04-07 Root cause mining method and device, electronic equipment and storage medium Pending CN116383277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310370164.2A CN116383277A (en) 2023-04-07 2023-04-07 Root cause mining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310370164.2A CN116383277A (en) 2023-04-07 2023-04-07 Root cause mining method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116383277A true CN116383277A (en) 2023-07-04

Family

ID=86976547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310370164.2A Pending CN116383277A (en) 2023-04-07 2023-04-07 Root cause mining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116383277A (en)

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
TWI718643B (en) Method and device for identifying abnormal groups
WO2018195105A1 (en) Document similarity analysis
CN112559271B (en) Interface performance monitoring method, device and equipment for distributed application and storage medium
CN114882321A (en) Deep learning model training method, target object detection method and device
CN114428902B (en) Information searching method, device, electronic equipment and storage medium
CN113763502A (en) Chart generation method, device, equipment and storage medium
CN112989235B (en) Knowledge base-based inner link construction method, device, equipment and storage medium
CN114490160A (en) Method, device, equipment and medium for automatically adjusting data tilt optimization factor
CN110489652B (en) News recommendation method and system based on user behavior detection and computer equipment
CN116383277A (en) Root cause mining method and device, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN115329150A (en) Method and device for generating search condition tree, electronic equipment and storage medium
CN114491253B (en) Method and device for processing observation information, electronic equipment and storage medium
CN112966199B (en) Page adjustment income determining method and device, electronic equipment and medium
CN115292303A (en) Data processing method and device
JPWO2017046906A1 (en) Data analysis apparatus and analysis method
CN114519153A (en) Webpage history record display method, device, equipment and storage medium
CN114462625A (en) Decision tree generation method and device, electronic equipment and program product
CN111488430B (en) Method, device, equipment and storage medium for processing data of event
CN103699574A (en) Retrieval optimization method and system for complex retrieval formula
CN114741072B (en) Page generation method, device, equipment and storage medium
CN111797994B (en) Risk assessment method, apparatus, device and storage medium
CN116127948B (en) Recommendation method and device for text data to be annotated and electronic equipment
CN115511014B (en) Information matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination