EP3612980A1 - Automatic feature selection in machine learning - Google Patents

Automatic feature selection in machine learning

Info

Publication number
EP3612980A1
EP3612980A1 EP17731111.5A EP17731111A EP3612980A1 EP 3612980 A1 EP3612980 A1 EP 3612980A1 EP 17731111 A EP17731111 A EP 17731111A EP 3612980 A1 EP3612980 A1 EP 3612980A1
Authority
EP
European Patent Office
Prior art keywords
analysis
factor
training data
features
rules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP17731111.5A
Other languages
German (de)
French (fr)
Inventor
Janakiraman THIYAGARAJAH
Peter Valeryevich Bazanov
Peng Lv
Luca De Matteis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP3612980A1 publication Critical patent/EP3612980A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present disclosure relates to machine learning.
  • the present disclosure relates to automatic feature selection in machine learning.
  • Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
  • a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
  • module refers to software, hardware, or a combination of software and hardware.
  • the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
  • the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed.
  • the term "service” as used throughout the description and claims in particular refers to the provision of data in response to a request.
  • the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
  • the context data is to further specify a technique to be applied by the learning module.
  • the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection.
  • the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
  • the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
  • the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
  • a factor vector may indicate a relevance of factors to an area of analysis.
  • the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
  • a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
  • the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
  • an attribute vector may indicate which factors are particularly relevant to an area of analysis.
  • the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
  • the features having scores above a threshold may be selected and used for training of the learning module.
  • a method of training data feature selection for extracting rules from the training data comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
  • the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
  • the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
  • the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
  • the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
  • a factor vector may indicate a relevance of factors to an area of analysis.
  • the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
  • a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
  • Fig. 1 shows a block diagram of an exemplary system
  • Fig. 2 shows a flow-chart of a machine learning process
  • FIG. 3 shows another flow-chart of the machine learning process of Fig. 2;
  • Fig. 4 shows examples of domain information
  • Fig. 5 shows examples of context data
  • Fig. 6 shows an exemplary process of assigning factors to a context
  • Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis
  • Fig. 8 shows a process for fusing context factors and attribute factors
  • Fig. 9 shows a process of selecting features
  • Fig. 10 shows an overview of the steps of the feature selection process.
  • the following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems.
  • the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
  • Fig. 1 shows a block diagram of an exemplary system 10.
  • the system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14.
  • the feature selection module 12 may be provided with training data 16a, 16b.
  • the training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18.
  • Each data field 18 may include one or more features 20.
  • a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc.
  • the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense.
  • multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
  • the feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules.
  • the factor analysis may produce a decorrelation of features and an independent group extraction.
  • a relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge.
  • the features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc. This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis.
  • the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
  • selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14.
  • Raw context may be transformed to "feature space”.
  • Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
  • Pi ⁇ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec).
  • N N- cardinality, power of alphabet that represent the maximum entropy log(N).
  • N is the total number of unique cells (alphabet like 334-23799 ⁇ ", 334-1 1277 ' ⁇ ')
  • a factor analysis may be applied to construct factor groups in unsupervised mode.
  • the initial groups may be decomposed to 4 categories of the cloud state:
  • Fig. 6 shows an exemplary process of assigning factors to a context.
  • a context may be represented by a function of defined input factors that impact the system under consideration.
  • contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors.
  • the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc.
  • the input may be normalized, de-noised and follow the normal distribution.
  • features may be considered in aggregation and additional fuzzification may be employed using expert rules.
  • Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors.
  • a common group factor snapshot picture may, for example, be:
  • Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis.
  • the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
  • AF attribute factor
  • a typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning.
  • factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness.
  • An analysis like a RCA root-cause analysis
  • the unsupervised factor analysis groups may be associated with standard common factors.
  • the factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups.
  • As output there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
  • Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
  • Fig. 8 shows a process for fusing context factors and attribute factors.
  • Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
  • FeatureFactor FFi f 3 (3 ⁇ 4, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
  • fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features.
  • These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
  • Fig. 9 shows a process of selecting features.
  • Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/%) and the context.
  • the features 20 may be selected under a certain limitation regarding a confidence threshold.
  • Fig. 10 shows an overview of the steps of the feature selection process.
  • si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a learning module to extract rules from training data and a feature selection module to determine features of the training data to be used for extracting the rules. The feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.

Description

AUTOMATIC FEATURE SELECTION IN MACHINE LEARNING
FIELD
The present disclosure relates to machine learning. In particular, the present disclosure relates to automatic feature selection in machine learning. BACKGROUND
In machine learning, real-world problems often involve data with a large number of features. However, not all features may be essential as features may be redundant or even irrelevant. Taking into account redundant or irrelevant feature may reduce the performance of an algorithm. Feature selection aims to solve this problem by selecting only a subset of relevant features from a large set of available features. By removing redundant or irrelevant features, feature selection may help reducing the dimensionality of the data, speed up the learning process, simplify the learnt model, and/or increase the performance.
SUMMARY
According to a first aspect of the present invention, there is provided a system comprising a learning module to extract rules from training data, and a feature selection module to determine features of the training data to be used for extracting the rules, wherein the feature selection module is to receive context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
In this regard, it is noted that the term "module" as used throughout the description and claims in particular refers to software, hardware, or a combination of software and hardware.
Hence, the feature selection module may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the feature selection module may be enabled to map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced. In a first possible implementation form of the first aspect, the system comprises an analytics module, the analytics module to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data is to specify one area of analysis of the different areas of analysis at which the services are directed. In this regard, it is noted that the term "service" as used throughout the description and claims in particular refers to the provision of data in response to a request.
For example, the analytics module may be directed at data mining and provide data in response to a request to identify a pattern in live data.
In a second possible implementation form of the first aspect, the context data is to further specify a technique to be applied by the learning module.
Hence, the selection of relevant features may be particularly focused on features which lend themselves to application of a particular machine learning algorithm.
In a third possible implementation form of the first aspect, the technique comprises one or more of classification, regression, clustering, prediction, and anomaly detection. In a fourth possible implementation form of the first aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a fifth possible implementation form of the first aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a sixth possible implementation form of the first aspect, the feature selection module is to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, a factor vector may indicate a relevance of factors to an area of analysis. In a seventh possible implementation form of the first aspect, the feature selection module is to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
For instance, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
In an eighth possible implementation form of the first aspect, the feature selection module is to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis. Hence, an attribute vector may indicate which factors are particularly relevant to an area of analysis.
In a ninth possible implementation form of the first aspect, the feature selection module is to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis. Thus, the features having scores above a threshold may be selected and used for training of the learning module.
According to a second aspect of the present invention, there is provided a method of training data feature selection for extracting rules from the training data, comprising receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains, selecting features of the training data based on the context data and the domain information, and feeding a machine learning module with the training data and information on the selected features.
Hence, the method may automatically select the features based on the context data specifying the area of analysis in which the extracted rules are to be used, and the domain information indicating the one or more technical environments to which the training data pertains. Accordingly, the method may map the area of analysis to the features which are relevant, while taking into account the technical environment in which the features were produced.
In a first possible implementation form of the second aspect, the different areas of analysis comprise one or more of a root cause analysis, a service impact analysis, a fault prediction analysis, a traffic prediction analysis, a security/threat analysis, a service/resource optimization analysis, and a service/application performance analysis.
In a second possible implementation form of the second aspect, the one or more technical environments include one or more of application management, server management, telecommunications networks, wide area networks, data center network operations, cloud operations, and security operations.
In a third possible implementation form of the second aspect, the method comprises assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
Accordingly, as indicated above, a factor vector may indicate a relevance of factors to an area of analysis.
In a fourth possible implementation form of the second aspect, the method comprises determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
Hence, as indicated above, a feature may be processed based on a fuzzy logic normalization and a factor analysis may be performed to determine factors which identify the context.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a block diagram of an exemplary system;
Fig. 2 shows a flow-chart of a machine learning process;
Fig. 3 shows another flow-chart of the machine learning process of Fig. 2;
Fig. 4 shows examples of domain information;
Fig. 5 shows examples of context data;
Fig. 6 shows an exemplary process of assigning factors to a context;
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis;
Fig. 8 shows a process for fusing context factors and attribute factors; Fig. 9 shows a process of selecting features; and
Fig. 10 shows an overview of the steps of the feature selection process. DETAILED DESCRIPTION
The following exemplary system and method relate to unsupervised machine learning for addressing challenges faced in the operation complex systems such as of cloud computing systems involving a plurality of interoperating computing devices, although the system and method are not limited to cloud computing systems. In particular, the exemplary system and method are directed at optimizing the feature selection prior to machine learning, e.g., in the area of operational analytics, and may improve the usability and accuracy as compared to feature selection by experts .
Fig. 1 shows a block diagram of an exemplary system 10. The system 10 which may be a computing system comprising one or more interoperating computing devices may comprise a feature selection module 12 and a machine learning module 14. The feature selection module 12 may be provided with training data 16a, 16b. The training data 16a, 16b may be collected from a single source or multiple sources and may comprise a plurality of data fields 18. Each data field 18 may include one or more features 20.
For instance, a feature 20 may refer to an alarm serial number, an alarm type, a (first) occurrence time, a clearance time, a location, etc. For example, the training data 16a, 16b may be obtained from multiple sources wherein some training data 16a is sparse and some training data 16b is dense. Hence, in a pre-processing step multiple indicators may be extracted from the data distribution and a normalization and feature scaling may be performed, e.g., using softmax, sigmoid functions, etc.
The feature selection module 12 may carry-out a factor analysis to extract initial group components and then improve the grouping according to relevance based on context semantic reward functions and rules. The factor analysis may produce a decorrelation of features and an independent group extraction. A relevance mechanism may evaluate the factor groups and associate the factor groups with domain knowledge. The features 20 remaining after feature selection within the filtered training data 22 may then be used to train the machine learning module 14 for purposes such as pattern classification, regression, clustering, prediction, anomality detection, etc. This may reduce the computational cost and infrastructure to select the features while allowing to consider the whole set of features relevant to the context including the scope of a learnt model/rules and provides a semantic approach to complex problems, which may be fused with the factor analysis. As shown in Fig. 2, the trained machine learning module 14 may be validated using test data. Once validated, rules may be extracted from the machine learning module 14 and used to analyze a system.
As indicated in Fig. 3, selecting the features 20 may be based on domain information (such as the domain information shown in Fig. 4) indicating the source of the training data 16a, 16b. Furthermore, as also indicated in Fig. 3, selecting the features 20 may also be based on context data indicating the scope of application of the learnt model/rules as well as a type of machine learning algorithm/strategy employed by the machine learning module 14, as exemplarily illustrated in Fig. 5, where the broken line indicates an example of a chosen combination of a scope of application and a machine learning algorithm/strategy employed by the machine learning module 14. A context may be defined as a function of input factors that are assumed to impact the system under consideration. For example, a context may be expressed by C = fl(si ), which may represent adaptive context modes of an environment factor that influence the feature selection such as:
Cloud Operations Multiple context indicators may be extracted. Raw context may be transformed to "feature space".
Cloud operation fuzzy functions may encode factors of reliability, availability of services and data, shared resources data, number of active client, security, complexity, energy consumption and costs, regulations and legal issues, performance, migration, reversion, the lack of standards, limited customization, issues of privacy, etc.:
1. Control Reliability Factor si - stddev ()
S2 - mean ()
S3 - snapshot entropy rate = entropy (current state snapshot)/max entropy (all states) entropy lto2 sentropy = Pi log Pi , where Pi = ~ - the count statistic (prior probability) that i-th cell-id occurs in serving cell-id#l or next neighbor cell-id#2 occurred within timeslot (2 minutes/1 minutes/30 sec). In this encoding, there is also N- cardinality, power of alphabet that represent the maximum entropy log(N). N is the total number of unique cells (alphabet like 334-23799 Ά", 334-1 1277 'Β')
fentropy
Srate entropy ~ maxentropy
54 - presence of 'error type -1 ' 2. Availability of services and data
55 - presence of service Application Management
1. Number of clients and server
2. Number of VMType of interaction async/sync Server Management
1. Resource infrastructure factor
2. Runtime operation factor
3. Memory fragmentation factor
4. Memory free factor
5. Resource synchronization factor
6. Number of processes and their complexity
Network Operations
1. Migration stability factor
2. Network infrastructure stability factor Security Operations
1. Access rights snapshots
2. Process and resource dependencies snapshot A factor analysis may be applied to construct factor groups in unsupervised mode. The initial groups may be decomposed to 4 categories of the cloud state:
Common factors components detected
Unexplained factors Groups may be compared to previous state
New common factors established
Tracked factors and factor age
Fig. 6 shows an exemplary process of assigning factors to a context. A context may be represented by a function of defined input factors that impact the system under consideration. Hence, contexts may be represented by numerical vectors. This may involve fuzzification of the input and basic features to numerical values and initial groups that represent stronger factors. As indicated above, the fuzzy functions could be sigmoid functions, softmax transform, tanh, logsig, etc. For better accuracy, the input may be normalized, de-noised and follow the normal distribution. In case of partial sparse data, features may be considered in aggregation and additional fuzzification may be employed using expert rules.
Hence, a context may be given by C = fi(si ), where Si £ Rm and Si represents the factors considered for identifying a context and may be represented by a context factor vector comprising common factors, new common factors, and unique factors.
Factor analysis may be regarded as a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The observed variables may be modeled as linear combinations of the factors plus an error value: χ = · Ρ + μ + ζ where x is the vector of observed variables, μ is the constant vector of means, A is the matrix of ΝχΜ factor loadings, F is the matrix of common factors and z is the vector of independently distributed error, which leads to: xi = λα · Fi + - + λΐΜ · FM + i + Zi
In factor analysis, two main types of rotation may be used: orthogonal when the new axes are also orthogonal to each other and oblique when the new axes are not required to be orthogonal to each other. Because the rotations are always performed in a subspace (the so-called factor space), the new axes will always exhibit less variance than the original factors.
As output, a common group factor snapshot picture may, for example, be:
• Cloud operation
· Application management
• Server management
• Security operations
Fig. 7 shows an exemplary process of assigning an attribute vector to an area of analysis. In particular, the attribute vector may be created from domain information. I.e., for each domain, an attribute factor (AF) may be generated as a function of the context and attributes of the domain context driven factor/weights for the attributes/properties of the relevant Managed ObjectX Types based on the relevance to the context.
In this regard, each object and shared resources in the system/cloud may provide an additional specification of the factors: AttrF actor AFi = f2(ci , ¾ ), where Ci £ C, ¾ E A wherein A represents the attributes/properties of the relevant Managed Object Types of the domain model.
A typical implementation of attribute vector generation may use a factor analysis that allows to select independent feature components, and coefficients in a new reduced feature space that creates an attribute vector by unsupervised learning. Using factor analysis, common factor groups and very unique features (or sparse) features without common factor may be defined. Factor analysis may operate with variation and covariance matrix and hence be sensitive to fuzzification and normalization. Weighting of group factors may additionally be used to increase the confidence and robustness. An analysis like a RCA (root-cause analysis) may operate with standard common factors that are defined from the domain context. The unsupervised factor analysis groups may be associated with standard common factors. The factor analysis groups may be checked and improved using expert rules from domain context. Also, there may be a tradeoff between unsupervised factor analysis groups and context domain driven factor groups. As output, there may be a group factors snapshot picture in dynamic for each major object resource in a cloud:
• Factors selected from principal Objects Cloud operation indicators according the timeframe dynamic features and background context.
• Factors Selected from principal Objects in Application Management & Context.
• Factors from principal Objects in Server Management & Context.
• Factors from represent principal Objects Security Operations.
Factors can usually be the common group of features and type of external or internal force. Some factors may be basic and evaluated as simple unique features.
Fig. 8 shows a process for fusing context factors and attribute factors. Feature factors may be generated for the set of all features given as input, as a function of the attribute factor and the set of all features given as input.
FeatureFactor FFi = f3(¾, AF), where xi £ X, and X represents the set of all features given as input for learning with AF as the attribute factor which may be obtained from domain and context.
Hence, fusion concatenation of features of context factor generation and features of attribute feature generation may be used and factor analysis may be applied in order to find common group correlation between some of the features. These features may be fused from the 'static context' indicator snapshot, 'dynamic' attributes of each entity, object, and shared resources in cloud. The features from selected groups may be controlled and evaluated.
Fig. 9 shows a process of selecting features. Each feature 20 of the training data 16a, 16b data may be assessed using the feature factor and select the features as a function of the input features and the feature factor, which is generated using the properties of MO (App/Service/Resource/...) and the context.
The selected feature set may thus be represented as X"= f4(X, FF), where X represents the set of all features 20 given as input and FF represents the feature factor obtained from the domain and context. The features 20 may be selected under a certain limitation regarding a confidence threshold. Fig. 10 shows an overview of the steps of the feature selection process. In particular, si could be basic fuzzified features (based on using softmax, hyperbolic tang, sigmoid function, etc.) and fi- could be the mapping of the normalized features to factor components:

Claims

1. A system comprising: a learning module configured to extract rules from training data; and a feature selection module configured to determine features of the training data to be used for extracting the rules; wherein the feature selection module is configured to receive context data of the rules to be extracted and domain information on the training data, wherein the context data specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains.
2. The system of claim 1, comprising: an analytics module configured to provide a plurality of services, wherein the services are directed at different areas of analysis, wherein the context data specify one area of analysis of the different areas of analysis at which the services are directed.
3. The system of claim 1 or 2, wherein the context data further specify a technique to be applied by the learning module.
4. The system of claim 3, wherein the technique comprises one or more of: classification; and clustering.
5. The system of any one of claims 1 to 4, wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis; a service/resource optimization analysis; and a service/application performance analysis.
6. The system of any one of claims 1 to 5, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and security operations.
7. The system of any one of claims 1 to 6, wherein the feature selection module is configured to assign different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
8. The system of claim 7, wherein the feature selection module is configured to determine a relationship between features and factors, wherein a congruence between a feature and a factor is to be determined based on fuzzification.
9. The system of any one of claims 7 or 8, wherein the feature selection module is configured to assign different attribute vectors to the different areas of analysis, wherein an attribute vector comprises a subset of the factors, wherein the subset is selected based on a relevance score of the factors in view of the area of analysis.
10. The system of claim 9, wherein the feature selection module is configured to assign scores to the features of the training data based on the attribute vector corresponding to the area of analysis.
11. A method of training data feature selection for extracting rules from the training data, comprising: receiving context data of the rules to be extracted and domain information on the training data, the context data to specify an area of analysis in which the extracted rules are to be used, and the domain information indicating one or more technical environments to which the training data pertains; selecting features of the training data based on the context data and the domain information; and feeding a machine learning module with the training data and information on the selected features.
12. The method of claim 11 , wherein the different areas of analysis comprise one or more of: a root cause analysis; a service impact analysis; a fault prediction analysis; a traffic prediction analysis; a security/threat analysis; a service/resource optimization analysis; and a service/application performance analysis.
13. The method of claim 11 or 12, wherein the one or more technical environments include one or more of: application management; server management; telecommunications networks; wide area networks; data center network operations; cloud operations; and security operations.
14. The method of any one of claims 11 to 13, comprising: assigning different factor vectors to different areas of analysis, wherein a factor vector comprises a plurality of entries, a value of an entry being a metric for the congruence between a factor and an area of analysis.
15. The method of claim 14, comprising: determining a relationship between features and factors, wherein determining a congruence between a feature and a factor is based on fuzzification.
EP17731111.5A 2017-06-12 2017-06-12 Automatic feature selection in machine learning Pending EP3612980A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/064317 WO2018228667A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning

Publications (1)

Publication Number Publication Date
EP3612980A1 true EP3612980A1 (en) 2020-02-26

Family

ID=59078048

Family Applications (1)

Application Number Title Priority Date Filing Date
EP17731111.5A Pending EP3612980A1 (en) 2017-06-12 2017-06-12 Automatic feature selection in machine learning

Country Status (2)

Country Link
EP (1) EP3612980A1 (en)
WO (1) WO2018228667A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598760B (en) * 2019-08-26 2023-10-24 华北电力大学(保定) Unsupervised feature selection method for vibration data of transformer

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7505948B2 (en) * 2003-11-18 2009-03-17 Aureon Laboratories, Inc. Support vector regression for censored data
GB0809443D0 (en) * 2008-05-23 2008-07-02 Wivenhoe Technology Ltd A Type-2 fuzzy based system for handling group decisions
AU2013100982A4 (en) * 2013-07-19 2013-08-15 Huaiyin Institute Of Technology, China Feature Selection Method in a Learning Machine
US9276951B2 (en) * 2013-08-23 2016-03-01 The Boeing Company System and method for discovering optimal network attack paths

Also Published As

Publication number Publication date
WO2018228667A1 (en) 2018-12-20

Similar Documents

Publication Publication Date Title
CN111885040A (en) Distributed network situation perception method, system, server and node equipment
CN110768971B (en) Confrontation sample rapid early warning method and system suitable for artificial intelligence system
CN114915478B (en) Network attack scene identification method, system and storage medium of intelligent park industrial control system based on multi-agent distributed correlation analysis
CN107111609B (en) Lexical analyzer for neural language behavior recognition system
CN113239065A (en) Big data based security interception rule updating method and artificial intelligence security system
WO2022053163A1 (en) Distributed trace anomaly detection with self-attention based deep learning
CN116502162A (en) Abnormal computing power federal detection method, system and medium in edge computing power network
US11087096B2 (en) Method and system for reducing incident alerts
Grover Anomaly detection for application log data
Abaimov et al. A survey on the application of deep learning for code injection detection
CN116681350A (en) Intelligent factory fault detection method and system
Sisiaridis et al. Feature extraction and feature selection: Reducing data complexity with apache spark
Gao et al. A data mining method using deep learning for anomaly detection in cloud computing environment
WO2018228667A1 (en) Automatic feature selection in machine learning
Zhang et al. A feature extraction method for predictive maintenance with time‐lagged correlation–based curve‐registration model
Qin et al. Multi-view graph contrastive learning for multivariate time series anomaly detection in IoT
Gao et al. A novel distributed fault diagnosis scheme toward open-set scenarios based on extreme value theory
Hou et al. A Federated Learning‐Based Fault Detection Algorithm for Power Terminals
KR20230024747A (en) Apparatus and method for classifying failure alarm for heterogeneous network apparatuses
CN113821418A (en) Fault tracking analysis method and device, storage medium and electronic equipment
Kim et al. Revitalizing self-organizing map: Anomaly detection using forecasting error patterns
Chelak et al. Method of Computer System State Identification based on Boosting Ensemble with Special Preprocessing Procedure
Ramoliya et al. Advanced techniques to predict and detect cloud system failure: A survey
CN111475380A (en) Log analysis method and device
LYU et al. Alarm-Based Root Cause Analysis Based on Weighted Fault Propagation Topology for Distributed Information Network

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20191119

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210525

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS