CN117056834A - Big data analysis method based on decision tree - Google Patents
Big data analysis method based on decision tree Download PDFInfo
- Publication number
- CN117056834A CN117056834A CN202311050733.1A CN202311050733A CN117056834A CN 117056834 A CN117056834 A CN 117056834A CN 202311050733 A CN202311050733 A CN 202311050733A CN 117056834 A CN117056834 A CN 117056834A
- Authority
- CN
- China
- Prior art keywords
- data
- decision tree
- model
- algorithm
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 140
- 238000000034 method Methods 0.000 title claims abstract description 89
- 238000007405 data analysis Methods 0.000 title claims abstract description 46
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 90
- 238000001514 detection method Methods 0.000 claims abstract description 31
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000005856 abnormality Effects 0.000 claims abstract description 14
- 238000013136 deep learning model Methods 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 238000013135 deep learning Methods 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims description 73
- 230000004927 fusion Effects 0.000 claims description 60
- 230000002159 abnormal effect Effects 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 34
- 238000003058 natural language processing Methods 0.000 claims description 26
- 238000007637 random forest analysis Methods 0.000 claims description 21
- 238000004140 cleaning Methods 0.000 claims description 19
- 238000012360 testing method Methods 0.000 claims description 19
- 238000005516 engineering process Methods 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 230000010354 integration Effects 0.000 claims description 11
- 238000003062 neural network model Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000008451 emotion Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000010223 real-time analysis Methods 0.000 claims description 5
- 238000013079 data visualisation Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013450 outlier detection Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 10
- 238000010276 construction Methods 0.000 description 4
- 239000002131 composite material Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of big data analysis, in particular to a big data analysis method based on a decision tree. According to the invention, through integrating multi-modal data and unstructured data and performing deep learning feature extraction, various types of data can be utilized more comprehensively, the SMOTE sampling method is used for processing unbalanced data, the classification precision of few samples is improved, a decision tree algorithm is adopted for feature selection, the precision of a model is improved, a decision tree and a deep learning model are fused, the advantages of the decision tree and the deep learning model can be combined, a stronger, efficient and robust model is built, an abnormality detection algorithm based on the decision tree has an important role in preventing and finding problems, and an easy-to-understand model interpretation tool is used for providing.
Description
Technical Field
The invention relates to the technical field of big data analysis methods, in particular to a big data analysis method based on a decision tree.
Background
Big data analysis methods refer to processes for extracting, processing, analyzing, and understanding large-scale data sets by using various techniques and tools, including data preprocessing, data visualization, statistical analysis, machine learning, natural language processing, data mining, and time-series analysis. Big data analysis methods play an important role in the rapid development of the information age today. With the increasing proliferation of the internet, sensor technology, social media, and other large data sources, the generation of large-scale data sets has exploded. The large-scale data contains massive information and knowledge, and can bring great commercial value to organizations and enterprises under the correct analysis and utilization. By comprehensively applying the big data analysis methods, hidden information in the data can be revealed, business hole finding is provided, decision and strategy are optimized, and innovation and development are promoted.
In the actual use process of the big data analysis method, the traditional data analysis method only processes single type or structured data, and has weaker processing capability on multi-mode and unstructured data. The conventional method often ignores few class samples when processing unbalanced data, so that the classification result is biased towards most class samples. Feature selection relies primarily on human experience and may ignore some important features. Traditional decision tree models may not be able to process complex and high dimensional data and are easily overfitted. Conventional approaches typically only provide model results, and no explanation is given, which may lead to a reduced user's understanding and confidence in the results.
Disclosure of Invention
The invention aims to solve the defects existing in the prior art, and provides a big data analysis method based on a decision tree.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the big data analysis method based on the decision tree comprises the following steps:
integrating multi-modal data, cleaning and normalizing, and extracting features of the multi-modal data by adopting a deep learning convolutional neural network and a natural language processing technology to obtain a preprocessing data set;
integrating unstructured data, and analyzing the unstructured data by adopting an NLP algorithm and a clustering algorithm to obtain an unstructured analysis result;
using an SMOTE sampling method, identifying and processing unbalanced data based on the preprocessed data set and an unstructured analysis result, and obtaining a balanced data set;
selecting features from the balanced dataset by using a decision tree algorithm comprising information gain and Gini coefficients, and obtaining a selected feature set;
constructing a basic decision tree model based on the selection feature set by using a CART algorithm;
the method for fusion learning of the integrated random forest and the deep neural network comprises the steps of fusing the basic decision tree model with the deep learning model to obtain a fused decision tree model;
in the big data analysis process, an online decision tree algorithm is adopted to analyze newly generated data in real time, and an online analysis result is generated;
based on an abnormality detection algorithm of the fusion decision tree model, performing abnormality detection on the online analysis result to generate an abnormality report;
and using an interpretability tool, specifically SHAP, to visually display the exception report and the fusion decision tree model, and simultaneously providing interpretation of the fusion decision tree model, and integrating to generate a final report.
As a further aspect of the present invention, the multimodal data includes image data, audio data, text data;
the integrated multi-modal data is subjected to cleaning and normalization, and the multi-modal data is subjected to feature extraction by adopting a deep learning convolutional neural network and a natural language processing technology, so that a preprocessing data set is obtained specifically by the following steps:
collecting the multi-modal data, and aligning each modal data in the multi-modal data with other modal data in time and space in the data integration process;
performing data cleaning on the integrated multi-mode data, including outlier detection, data filling and data denoising;
normalizing each mode data in the multi-mode data to a unified interval;
performing feature extraction on the image data by adopting the convolutional neural network, performing feature extraction on the text data by adopting the natural language processing technology, performing feature extraction on the audio data by adopting an MFCC, and acquiring a feature vector based on the feature extraction;
and merging the feature vectors of different modes by using a multi-mode fusion technology to acquire the preprocessing data set.
As a further scheme of the invention, the integrated unstructured data is analyzed by adopting an NLP algorithm and a clustering algorithm, and the step of obtaining the unstructured analysis result comprises the following steps:
collecting the unstructured data, and aligning the unstructured data in time and space in the data integration process;
in the analysis process of the NLP algorithm, based on text word segmentation, named entity recognition, emotion analysis, topic modeling and text classification operation, classifying texts in unstructured data into predefined categories, and acquiring word segmentation results, emotion tendencies, topic recognition and classification;
adopting a clustering algorithm, specifically k-means, to obtain clustering results including text clustering, image clustering and audio clustering;
and integrating the results of the NLP algorithm and the clustering algorithm to obtain a clustering analysis result of the unstructured data, and taking the clustering analysis result as the unstructured analysis result.
As a further scheme of the present invention, the step of identifying and processing unbalanced data based on the preprocessed data set and the unstructured analysis result by using the SMOTE sampling method, and obtaining a balanced data set specifically includes:
counting the pretreatment data set and the unstructured analysis result, and counting the number of samples of each category to obtain a data category counting result;
setting a category and an enhancement strategy to be enhanced based on the data category statistical result, and generating enhancement strategy details;
based on the enhancement policy details, generating a synthetic sample for the category needing enhancement by applying the SMOTE algorithm, and finding K nearest neighbors of the category needing enhancement as a synthetic sample set;
combining the preprocessed data set and the unstructured analysis result with the synthesized sample set to form a preliminary balance data set;
based on the preliminary balance data set, the steps are circulated until the number of samples in each category reaches the balance target, and a final balance data set is obtained.
As a further aspect of the present invention, the step of selecting features from the balanced dataset by using a decision tree algorithm including information gain and Gini coefficients, and obtaining a selected feature set specifically includes:
invoking the balance data set, calculating the statistical abstracts of all the features, including average values and standard deviations, and generating feature statistical abstracts;
calculating the information gain of each feature based on the feature statistical abstract, and obtaining an information gain result;
based on the feature statistical abstract, calculating the Gini coefficient of each feature, and obtaining a Gini coefficient result;
and synthesizing the information gain result and the Gini coefficient result to generate a selection feature set.
As a further scheme of the present invention, the step of constructing a basic decision tree model based on the selection feature set using CART algorithm specifically includes:
splitting the balance data set corresponding to the selected feature set into a training set and a test set as training test data;
training a training set in training test data by using a CART algorithm to obtain a CART model;
and verifying on a test set in training test data by using the CART model to obtain a CART verification result.
As a further scheme of the invention, the integrated random forest and deep neural network fusion learning method fuses the basic decision tree model and the deep learning model, and the step of obtaining the fused decision tree model comprises the following steps:
training a model by using a random forest algorithm based on the selected feature set to obtain a random forest model;
constructing and training a deep neural network model based on the selected feature set;
and fusing the CART model, the random forest model and the deep neural network model by adopting a fusion algorithm to obtain a fused decision tree model.
As a further scheme of the invention, in the big data analysis process, the online decision tree algorithm is adopted to analyze the newly generated data in real time, and the step of generating the online analysis result comprises the following steps:
in the big data analysis process, receiving real-time newly generated data as a real-time data stream;
cleaning, normalizing and extracting features of the real-time data stream to obtain a preprocessed data stream;
real-time analysis is carried out on the preprocessed data stream by using an online decision tree algorithm, and an online decision tree analysis result is obtained;
and comparing the analysis result of the online decision tree with a real data label, and evaluating the real-time performance of the model to obtain an online performance evaluation result.
As a further scheme of the present invention, the abnormality detection algorithm based on the fused decision tree model performs abnormality detection on the online analysis result, and the step of generating an abnormality report specifically includes:
loading the pre-trained fusion decision tree model to be used as a pre-loading fusion model;
performing anomaly detection on the analysis result of the online decision tree by using the preloaded fusion model to obtain a preliminary anomaly detection result;
and marking and classifying abnormal data points in the preliminary abnormal detection result, obtaining marked abnormal data, and generating an abnormal report.
As a further scheme of the present invention, the step of using an interpretable tool, specifically SHAP, to visually display the exception report and the fused decision tree model, and simultaneously providing an interpretation of the fused decision tree model, and integrating and generating a final report specifically includes:
loading a SHAP library and dependent resources thereof as a SHAP resource set;
generating an explanation for the fusion decision tree model by using the SHAP resource set, and acquiring a fusion model explanation;
using the SHAP resource set to carry out visual display on the abnormal report as an abnormal data visual result;
and integrating the fusion model interpretation and the abnormal data visualization result to obtain a comprehensive analysis report.
Compared with the prior art, the invention has the advantages and positive effects that:
in the present invention,
by integrating multi-modal data and unstructured data and performing deep learning feature extraction, various types of data can be more comprehensively utilized, and richer information can be extracted. The unbalance data is processed by using the SMOTE sampling method, so that the prejudice of a model can be reduced, and the classification precision of few samples can be improved. By adopting the decision tree algorithm to select the features, the key features affecting the result can be found out more accurately, and the precision of the model is improved. The decision tree and the deep learning model are fused, and the advantages of the decision tree and the deep learning model can be combined to construct a more powerful, efficient and robust model. The online decision tree algorithm is adopted for real-time analysis, so that newly generated data can be responded quickly, and the timeliness of analysis is improved. The abnormal data can be accurately identified based on the abnormal detection algorithm of the decision tree, and the method has an important effect on preventing and finding problems. Using the SHAP interpretability tool, a user may be provided with easy-to-understand model interpretation, enhancing the user's understanding and trust of the results.
Drawings
FIG. 1 is a schematic diagram showing the main steps of a big data analysis method based on decision tree according to the present invention;
FIG. 2 is a detailed schematic diagram of step 1 of the big data analysis method based on decision tree;
FIG. 3 is a detailed schematic diagram of step 2 of the big data analysis method based on decision tree according to the present invention;
FIG. 4 is a detailed schematic diagram of step 3 of the big data analysis method based on decision tree according to the present invention;
FIG. 5 is a detailed schematic diagram of step 4 of the big data analysis method based on decision tree according to the present invention;
FIG. 6 is a detailed schematic diagram of step 5 of the big data analysis method based on decision tree according to the present invention;
FIG. 7 is a detailed schematic diagram of step 6 of the big data analysis method based on decision tree according to the present invention;
FIG. 8 is a detailed schematic diagram of step 7 of the big data analysis method based on decision tree according to the present invention;
FIG. 9 is a detailed schematic diagram of step 8 of the big data analysis method based on decision tree according to the present invention;
fig. 10 is a detailed schematic diagram of step 9 of the big data analysis method based on decision tree according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Furthermore, in the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Example 1
Referring to fig. 1, the present invention provides a technical solution: the big data analysis method based on the decision tree comprises the following steps:
integrating multi-modal data, cleaning and normalizing, and extracting features of the multi-modal data by adopting a deep learning convolutional neural network and a natural language processing technology to obtain a preprocessing data set;
integrating unstructured data, and analyzing the unstructured data by adopting an NLP algorithm and a clustering algorithm to obtain an unstructured analysis result;
using an SMOTE sampling method, identifying and processing unbalanced data based on a preprocessed data set and an unstructured analysis result, and obtaining a balanced data set;
selecting features from the balanced dataset by using a decision tree algorithm comprising information gain and Gini coefficients, and acquiring a selected feature set;
constructing a basic decision tree model based on the selection feature set by using a CART algorithm;
the method comprises the steps of integrating a random forest and a deep neural network, fusing a basic decision tree model and a deep learning model, and obtaining a fused decision tree model;
in the big data analysis process, an online decision tree algorithm is adopted to analyze newly generated data in real time, and an online analysis result is generated;
based on an abnormality detection algorithm fused with the decision tree model, performing abnormality detection on an online analysis result to generate an abnormality report;
and using an interpretability tool, specifically SHAP, performing visual display on the exception report and the fusion decision tree model, simultaneously providing interpretation of the fusion decision tree model, and integrating to generate a final report.
Through the steps of cleaning and normalization, noise and inconsistency in the multi-mode data can be reduced, and the quality and consistency of the data are improved. The characteristic extraction is carried out through the convolutional neural network and the natural language processing technology, so that rich and representative characteristic information can be extracted from the multi-modal data, and various information of the multi-modal data can be fully utilized. The unstructured data is analyzed by using an NLP algorithm and a clustering algorithm, useful information and modes can be extracted from the data such as text, images and the like, and supplementary and rich features are provided for the subsequent decision tree model. The unbalance data is processed by using the SMOTE sampling method, so that sample class distribution can be balanced, the classification performance of the model in a few classes is improved, and the robustness and accuracy of the model are ensured. The decision tree algorithm of information gain and Gini coefficient is used to select the most representative features from the balance data set, reduce feature dimension, improve model efficiency and help explain the decision process of the model. The basic decision tree model and the deep learning model are subjected to fusion learning, so that the advantages of the basic decision tree model and the deep learning model can be fully utilized, and the generalization capability and accuracy of the model are improved. In addition, the prediction result of the fusion decision tree model can be interpreted by using an interpretability tool such as SHAP, so that the interpretability and the credibility of the model are improved. The online decision tree algorithm is adopted to analyze the newly generated data in real time, so that the analysis process has instantaneity and real-time performance, and meanwhile, the abnormal situation can be quickly found and reported by using the abnormal detection algorithm based on the fusion decision tree model. The visual tool displays the abnormal report and fuses the decision tree model, so that a user is helped to intuitively understand the analysis result and the model decision process, a comprehensive and accurate final report is provided, and powerful support is provided for decision making.
Referring to fig. 2, the multimodal data includes image data, audio data, text data;
integrating multi-modal data, cleaning and normalizing, and performing feature extraction on the multi-modal data by adopting a deep learning convolutional neural network and a natural language processing technology, wherein the step of acquiring a preprocessing data set comprises the following steps:
collecting multi-modal data, and aligning each modal data in the multi-modal data with other modal data in time and space in the data integration process;
the integrated multi-mode data is subjected to data cleaning, including abnormal value detection, data filling and data denoising;
normalizing each mode data in the multi-mode data to a unified interval;
performing feature extraction on the image data by adopting a convolutional neural network, performing feature extraction on the text data by adopting a natural language processing technology, performing feature extraction on the audio data by adopting an MFCC, and acquiring feature vectors based on feature extraction;
and merging the feature vectors of different modes by using a multi-mode fusion technology to acquire a preprocessing data set.
Referring to fig. 3, the steps of integrating unstructured data and analyzing the unstructured data by using an NLP algorithm and a clustering algorithm to obtain unstructured analysis results are specifically as follows:
collecting unstructured data, and aligning the unstructured data in time and space in the data integration process;
in the analysis process of the NLP algorithm, based on text word segmentation, named entity recognition, emotion analysis, topic modeling and text classification operation, dividing texts in unstructured data into predefined categories, and acquiring word segmentation results, emotion tendencies, topic recognition and classification;
adopting a clustering algorithm, specifically k-means, to obtain clustering results including text clustering, image clustering and audio clustering;
and integrating the results of the NLP algorithm and the clustering algorithm to obtain a clustering analysis result of the unstructured data, and taking the clustering analysis result as the unstructured analysis result.
First, in the data integration process, the multi-modal data are aligned to ensure that they are consistent in time and space. Secondly, data cleaning is performed, including outlier detection, data filling and denoising, to improve data quality and reduce outlier interference. In addition, the data of different modes are normalized to a unified interval, the deviation of the scale and the range is eliminated, and the data comparability is ensured. Next, representative feature vectors are extracted from images, texts, and audio through convolutional neural networks, natural language processing, and audio feature extraction, etc. And finally, merging the modal feature vectors by utilizing a multi-modal fusion technology to obtain a comprehensive preprocessing data set. The flow can furthest utilize the information of the multi-mode data, improve the data quality and consistency, and provide more accurate and comprehensive pretreatment data sets for subsequent tasks. By integrating the steps, the richness of the multi-mode data can be fully utilized, and the effects of data analysis and model construction are improved.
Referring to fig. 4, using the SMOTE sampling method, based on the preprocessed data set and the unstructured analysis result, the steps of identifying and processing unbalanced data, and obtaining the balanced data set are specifically:
counting the pretreatment data set and the unstructured analysis result, and counting the number of samples of each category to obtain a data category counting result;
setting categories and enhancement strategies to be enhanced based on the data category statistical result, and generating enhancement strategy details;
based on the enhancement policy details, generating a synthetic sample for the category needing enhancement by applying an SMOTE algorithm, and finding K nearest neighbors of the category needing enhancement to be used as a synthetic sample set;
combining the preprocessed data set and the unstructured analysis result with the synthesized sample set to form a preliminary balance data set;
based on the preliminary balance data set, the steps are circulated until the number of samples in each category reaches the balance target, and a final balance data set is obtained.
Firstly, counting a pretreatment data set and an unstructured analysis result to obtain the statistics of the number of samples of each category. And setting the category needing enhancement and a corresponding enhancement strategy according to the statistical result. Then, the SMOTE algorithm is applied to generate a composite sample for the class that needs enhancement, and a composite sample set is generated by selecting K nearest neighbors of the class. Next, the preprocessed data set, the unstructured analysis results, and the composite sample set are combined to form a preliminary balanced data set. And (3) continuously and iteratively generating a balance data set by cycling the steps until the number of samples of each category reaches a balance target, and acquiring a final balance data set. Through the integration step, unbalanced data can be effectively processed, sample distribution among various categories is balanced, learning ability of a model for few categories is improved, and influence of sample category deviation on model training and performance evaluation is reduced. The finally obtained balance data set can improve the robustness, accuracy and overall prediction capability of the model.
Referring to fig. 5, using a decision tree algorithm including information gain and Gini coefficients, selecting features from a balanced dataset, the step of obtaining a selected feature set is specifically:
invoking a balance data set, calculating statistical summaries of all the features, including average values and standard deviations, and generating feature statistical summaries;
calculating the information gain of each feature based on the feature statistics abstract, and obtaining an information gain result;
based on the feature statistics abstract, calculating the Gini coefficient of each feature, and obtaining a Gini coefficient result;
and integrating the information gain result and the Gini coefficient result to generate a selection feature set.
By calculating statistical summaries of all features in the balanced dataset, including mean and standard deviation information, descriptive statistical information about the features can be obtained, providing a basis for subsequent feature selection. Based on the feature statistical summary, the degree of contribution of each feature to the target variable can be estimated by calculating the information gain of each feature. The information gain may help identify features with higher predictive capabilities for constructing decision tree models with better classification capabilities. Based on the feature statistical summary, gini coefficients for each feature are calculated for measuring the purity or the non-purity of the feature. The Gini coefficient can measure the degree of confusion after feature splitting, and the classification accuracy of the decision tree can be improved by selecting the feature with lower Gini coefficient. And comprehensively analyzing the information gain result and the Gini coefficient result to generate a selection feature set. From the selection of the feature set, it can be determined which features should be used as decision nodes in constructing the decision tree model.
Referring to fig. 6, using CART algorithm, the steps for constructing the basic decision tree model based on the selection feature set are specifically:
splitting a balance data set corresponding to the selected feature set into a training set and a test set to serve as training test data;
training a training set in training test data by using a CART algorithm to obtain a CART model;
and verifying on a test set in the training test data by using the CART model to obtain a CART verification result.
Firstly, splitting a balance data set corresponding to the selected feature set into a training set and a test set for building and verifying a model. The training set is then model trained using the CART algorithm, by recursively partitioning the features, generating a decision tree model with nodes and leaf nodes. On each node, the CART algorithm selects an optimal partitioning strategy according to the data characteristics, and establishes a decision rule so that a decision tree can classify the data. And then, verifying on a test set by using the CART model obtained through training, predicting by using the model, and comparing the predicted result with a real label to obtain a CART verification result. Through the verification result, the performance of the model on unseen data can be evaluated, and the generalization capability and classification accuracy of the model can be verified. Such implementation steps enable the construction of basic decision tree models and the improvement of the classification ability and accuracy of the models by verification and evaluation. The CART algorithm utilizes the selection feature set to carry out optimal division, becomes a simple and effective classification and regression method, and has wide implementation effect and application value.
Referring to fig. 7, in the fusion learning method of integrating a random forest and a deep neural network, a basic decision tree model and a deep learning model are fused, and the steps of obtaining the fusion decision tree model are specifically as follows:
training a model by using a random forest algorithm based on the selected feature set to obtain a random forest model;
constructing and training a deep neural network model based on the selection feature set;
and fusing the CART model, the random forest model and the deep neural network model by adopting a fusion algorithm to obtain a fusion decision tree model.
Based on the selection feature set, training data is first model trained using a random forest algorithm. Random forest is an integrated learning method, by randomly selecting a subset of features and data, constructing a plurality of decision trees, and integrating through strategies such as voting or averaging. The trained random forest model can synthesize the prediction results of a plurality of decision trees, and the classification accuracy and the robustness of the model are improved. Next, a deep neural network model is constructed and trained based on the selection feature set. Deep neural networks are powerful learning models that can learn higher-level abstract features from data and perform complex classification or regression tasks. By constructing a proper network structure and an optimization algorithm, the deep neural network model obtained by training has strong pattern recognition capability and generalization capability. And finally, fusing the basic decision tree model, the random forest model and the deep neural network model by adopting a fusion algorithm. The fusion algorithm can obtain a fusion decision tree model by combining prediction results of a plurality of models by utilizing the idea of integrated learning. Common fusion approaches include voting, weighted averaging, stacking, etc.
Referring to fig. 8, in the big data analysis process, the online decision tree algorithm is adopted to analyze the newly generated data in real time, and the steps of generating the online analysis result are specifically as follows:
in the big data analysis process, receiving real-time newly generated data as a real-time data stream;
cleaning, normalizing and extracting features of the real-time data stream to obtain a preprocessed data stream;
real-time analysis is carried out on the preprocessed data stream by using an online decision tree algorithm, and an online decision tree analysis result is obtained;
and comparing the analysis result of the online decision tree with the real data label, and evaluating the real-time performance of the model to obtain an online performance evaluation result.
In large data analysis, it is first necessary to receive data that is newly generated in real time, which may be implemented by a data stream processing framework or a streaming data processing system. The received data contains the latest information, and can be analyzed and decided in real time. And carrying out pretreatment steps such as cleaning, standardization, feature extraction and the like on the real-time data stream. The cleaning data can remove noise and abnormal values, the standardization can convert the data according to a certain specification, the feature extraction can extract meaningful features from the original data, and the input is provided for a subsequent online decision tree algorithm. And analyzing the preprocessed data stream in real time by using an online decision tree algorithm. The online decision tree algorithm has the characteristic of adapting to the data flow, and can dynamically update and adjust the decision tree according to new data. It has high efficiency and real-time performance in processing large-scale data and real-time data streams. And comparing the real data label with the real data label according to the analysis result of the online decision tree, and evaluating the real-time performance of the model. This helps to verify the accuracy and reliability of the model in a real-time environment, as well as the ability to adapt quickly to new data. Through the real-time performance evaluation result, the problem of the model can be found in time and adjusted and improved.
Referring to fig. 9, based on an anomaly detection algorithm of a fused decision tree model, anomaly detection is performed on an online analysis result, and the step of generating an anomaly report specifically includes:
loading a pre-trained fusion decision tree model as a pre-loaded fusion model;
performing anomaly detection on the analysis result of the online decision tree by using a preloaded fusion model to obtain a preliminary anomaly detection result;
labeling and classifying abnormal data points in the preliminary abnormal detection result, acquiring labeled abnormal data, and generating an abnormal report.
First, a pre-trained fused decision tree model needs to be loaded as a pre-loaded model. The fusion model can be loaded to facilitate subsequent abnormality detection operation. And carrying out anomaly detection on the online analysis result by utilizing the preloaded fusion decision tree model. And inputting the online analysis result into the fusion model, and judging whether the data points are abnormal or not according to the prediction result of the model. This process can detect potential outlier data points by comparing the on-line analysis results with the predicted results. And marking and classifying abnormal data points in the preliminary abnormal detection result. This step may label, sort and group outlier data points for subsequent generation of exception reports and further processing, according to particular needs. The labeling and categorizing process may be based on the characteristics of the outliers and the context information, such as anomaly type, severity, etc. And generating an exception report according to the marked exception data points. The anomaly reports may include detailed information of anomaly data points, such as data values, time stamps, anomaly types, and the like, as well as associated statistical and analytical results. The generated exception report can help the user to quickly know the exception condition and take corresponding countermeasures.
Referring to fig. 10, using an interpretability tool, specifically SHAP, to visually display the exception report and the fused decision tree model, and to provide an interpretation of the fused decision tree model, the steps of integrating and generating the final report are specifically:
loading a SHAP library and dependent resources thereof as a SHAP resource set;
generating an explanation for the fusion decision tree model by using the SHAP resource set, and acquiring a fusion model explanation;
performing visual display on the abnormal report by using the SHAP resource set to serve as an abnormal data visual result;
and integrating the fusion model interpretation and the abnormal data visualization result to obtain a comprehensive analysis report.
First, the SHAP library and its associated dependent resources need to be loaded in order to use the SHAP's functionality and tools. This includes installing the SHAP library, loading the TreeExplaner interpreter, and other dependent resources required for processing. And generating an explanation for the fusion decision tree model by using the prepared SHAP resource set. SHAP provides an explanation and understanding of the model by calculating the importance of features and the degree of contribution to model predictions. This can help us understand the importance and impact of each feature in the fusion model, knowing the reasons why the model makes predictions. And visually displaying the abnormal report by using the SHAP resource set. Through visualization tools and techniques, outlier data points, eigenvalues, and other related information are presented in a visual form that enables a user to intuitively understand and analyze the outlier. And integrating the fusion model interpretation generated before and the visualization result of the abnormal data. The interpretation results are combined with the visual results, so that a comprehensive analysis report with more comprehensive and accuracy can be provided, and the user is helped to understand and interpret abnormal conditions deeply.
Working principle: data integration and preprocessing are one of the key stages in data analysis. The goal of this stage is to collect multi-modality data, such as image data, audio data, and text data, and to ensure that the various modality data are aligned in time and space during the integration process. And then, cleaning and normalizing the integrated multi-mode data to improve the data quality and consistency. Feature extraction is another important step in the preprocessing process, where convolutional neural networks and natural language processing techniques (e.g., word embedding and text feature extraction) are employed to extract useful features from multimodal data, thereby obtaining a preprocessed dataset. At the same time, unstructured data are collected and integrated, and processed in the data integration and cleaning stage. In order to process unbalanced data, an SMOTE sampling method is adopted for identification and processing on the basis of a preprocessed data set and an unstructured analysis result, and a balanced data set is generated. Based on the balanced data set, the information gain, gini coefficient and other methods in the decision tree algorithm are utilized to perform feature selection, and a selected feature set is obtained. The basic decision tree model construction adopts a CART algorithm, and a model is constructed by using a selection feature set. The model performs splitting of a training set and a testing set, performs training on the training set by using a CART algorithm, and verifies performance of the model on the testing set. And in the construction stage of the fusion decision tree model, training is carried out by using a random forest algorithm and a deep neural network model. Further, the basic decision tree model, the random forest model and the deep neural network model are subjected to fusion learning, and a fusion decision tree model is obtained. In the process of online analysis and anomaly detection, data generated in real time is processed as a data stream. The data stream is subjected to preprocessing steps such as cleaning, standardization, feature extraction and the like to obtain a preprocessed data stream. And carrying out real-time analysis on the preprocessed data stream by using an online decision tree algorithm to generate an online analysis result. And carrying out anomaly detection on the online analysis result by utilizing an anomaly detection algorithm based on the fusion decision tree model, and generating an anomaly report. Finally, visual display and explanation are carried out. The SHAP library and its associated resources are loaded using a specialized interpretive tool (e.g., SHAP). And generating an explanation for the fusion decision tree model through the SHAP resource set, and acquiring an explanation result of the fusion model. And carrying out visual display on the abnormal report by utilizing the SHAP resource set to obtain a visual result of the abnormal data. And integrating the interpretation result of the fusion model with the visualization result of the abnormal data to generate a comprehensive analysis report.
The present invention is not limited to the above embodiments, and any equivalent embodiments which can be changed or modified by the technical disclosure described above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above embodiments according to the technical matter of the present invention will still fall within the scope of the technical disclosure.
Claims (10)
1. The big data analysis method based on the decision tree is characterized by comprising the following steps:
integrating multi-modal data, cleaning and normalizing, and extracting features of the multi-modal data by adopting a deep learning convolutional neural network and a natural language processing technology to obtain a preprocessing data set;
integrating unstructured data, and analyzing the unstructured data by adopting an NLP algorithm and a clustering algorithm to obtain an unstructured analysis result;
using an SMOTE sampling method, identifying and processing unbalanced data based on the preprocessed data set and an unstructured analysis result, and obtaining a balanced data set;
selecting features from the balanced dataset by using a decision tree algorithm comprising information gain and Gini coefficients, and obtaining a selected feature set;
constructing a basic decision tree model based on the selection feature set by using a CART algorithm;
the method for fusion learning of the integrated random forest and the deep neural network comprises the steps of fusing the basic decision tree model with the deep learning model to obtain a fused decision tree model;
in the big data analysis process, an online decision tree algorithm is adopted to analyze newly generated data in real time, and an online analysis result is generated;
based on an abnormality detection algorithm of the fusion decision tree model, performing abnormality detection on the online analysis result to generate an abnormality report;
and using an interpretability tool, specifically SHAP, to visually display the exception report and the fusion decision tree model, and simultaneously providing interpretation of the fusion decision tree model, and integrating to generate a final report.
2. The decision tree based big data analysis method of claim 1, wherein the multi-modal data comprises image data, audio data, text data;
the integrated multi-modal data is subjected to cleaning and normalization, and the multi-modal data is subjected to feature extraction by adopting a deep learning convolutional neural network and a natural language processing technology, so that a preprocessing data set is obtained specifically by the following steps:
collecting the multi-modal data, and aligning each modal data in the multi-modal data with other modal data in time and space in the data integration process;
performing data cleaning on the integrated multi-mode data, including outlier detection, data filling and data denoising;
normalizing each mode data in the multi-mode data to a unified interval;
performing feature extraction on the image data by adopting the convolutional neural network, performing feature extraction on the text data by adopting the natural language processing technology, performing feature extraction on the audio data by adopting an MFCC, and acquiring a feature vector based on the feature extraction;
and merging the feature vectors of different modes by using a multi-mode fusion technology to acquire the preprocessing data set.
3. The big data analysis method based on decision tree according to claim 1, wherein the step of integrating unstructured data and analyzing the unstructured data by using NLP algorithm and clustering algorithm to obtain unstructured analysis results specifically comprises the following steps:
collecting the unstructured data, and aligning the unstructured data in time and space in the data integration process;
in the analysis process of the NLP algorithm, based on text word segmentation, named entity recognition, emotion analysis, topic modeling and text classification operation, classifying texts in unstructured data into predefined categories, and acquiring word segmentation results, emotion tendencies, topic recognition and classification;
adopting a clustering algorithm, specifically k-means, to obtain clustering results including text clustering, image clustering and audio clustering;
and integrating the results of the NLP algorithm and the clustering algorithm to obtain a clustering analysis result of the unstructured data, and taking the clustering analysis result as the unstructured analysis result.
4. The big data analysis method based on decision tree according to claim 1, wherein the step of using SMOTE sampling method to identify and process unbalanced data based on the preprocessed data set and the unstructured analysis result, and to obtain balanced data set is specifically:
counting the pretreatment data set and the unstructured analysis result, and counting the number of samples of each category to obtain a data category counting result;
setting a category and an enhancement strategy to be enhanced based on the data category statistical result, and generating enhancement strategy details;
based on the enhancement policy details, generating a synthetic sample for the category needing enhancement by applying the SMOTE algorithm, and finding K nearest neighbors of the category needing enhancement as a synthetic sample set;
combining the preprocessed data set and the unstructured analysis result with the synthesized sample set to form a preliminary balance data set;
based on the preliminary balance data set, the steps are circulated until the number of samples in each category reaches the balance target, and a final balance data set is obtained.
5. The big data analysis method based on decision tree according to claim 1, wherein the step of selecting features from the balanced dataset using a decision tree algorithm including information gain, gini coefficients, and obtaining a selected feature set is specifically:
invoking the balance data set, calculating the statistical abstracts of all the features, including average values and standard deviations, and generating feature statistical abstracts;
calculating the information gain of each feature based on the feature statistical abstract, and obtaining an information gain result;
based on the feature statistical abstract, calculating the Gini coefficient of each feature, and obtaining a Gini coefficient result;
and synthesizing the information gain result and the Gini coefficient result to generate a selection feature set.
6. The big data analysis method based on decision tree according to claim 1, wherein the step of constructing a basic decision tree model based on the selected feature set using CART algorithm specifically comprises:
splitting the balance data set corresponding to the selected feature set into a training set and a test set as training test data;
training a training set in training test data by using a CART algorithm to obtain a CART model;
and verifying on a test set in training test data by using the CART model to obtain a CART verification result.
7. The big data analysis method based on decision tree according to claim 1, wherein the method for fusion learning of the integrated random forest and the deep neural network fuses the basic decision tree model and the deep learning model, and the step of obtaining the fused decision tree model specifically comprises:
training a model by using a random forest algorithm based on the selected feature set to obtain a random forest model;
constructing and training a deep neural network model based on the selected feature set;
and fusing the CART model, the random forest model and the deep neural network model by adopting a fusion algorithm to obtain a fused decision tree model.
8. The big data analysis method based on decision tree according to claim 1, wherein in the big data analysis process, the step of adopting an online decision tree algorithm to analyze the newly generated data in real time and generating an online analysis result specifically comprises the following steps:
in the big data analysis process, receiving real-time newly generated data as a real-time data stream;
cleaning, normalizing and extracting features of the real-time data stream to obtain a preprocessed data stream;
real-time analysis is carried out on the preprocessed data stream by using an online decision tree algorithm, and an online decision tree analysis result is obtained;
and comparing the analysis result of the online decision tree with a real data label, and evaluating the real-time performance of the model to obtain an online performance evaluation result.
9. The big data analysis method based on decision tree according to claim 1, wherein the step of generating an anomaly report by performing anomaly detection on the online analysis result by the anomaly detection algorithm based on the fused decision tree model specifically comprises:
loading the pre-trained fusion decision tree model to be used as a pre-loading fusion model;
performing anomaly detection on the analysis result of the online decision tree by using the preloaded fusion model to obtain a preliminary anomaly detection result;
and marking and classifying abnormal data points in the preliminary abnormal detection result, obtaining marked abnormal data, and generating an abnormal report.
10. The big data analysis method based on decision tree according to claim 1, wherein the step of using an interpretive tool, specifically SHAP, to visually display the exception report and the fused decision tree model while providing an interpretation of the fused decision tree model, and integrating and generating a final report specifically includes:
loading a SHAP library and dependent resources thereof as a SHAP resource set;
generating an explanation for the fusion decision tree model by using the SHAP resource set, and acquiring a fusion model explanation;
using the SHAP resource set to carry out visual display on the abnormal report as an abnormal data visual result;
and integrating the fusion model interpretation and the abnormal data visualization result to obtain a comprehensive analysis report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311050733.1A CN117056834A (en) | 2023-08-18 | 2023-08-18 | Big data analysis method based on decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311050733.1A CN117056834A (en) | 2023-08-18 | 2023-08-18 | Big data analysis method based on decision tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117056834A true CN117056834A (en) | 2023-11-14 |
Family
ID=88662283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311050733.1A Withdrawn CN117056834A (en) | 2023-08-18 | 2023-08-18 | Big data analysis method based on decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117056834A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273670A (en) * | 2023-11-23 | 2023-12-22 | 深圳市云图华祥科技有限公司 | Engineering data management system with learning function |
CN117349782A (en) * | 2023-12-06 | 2024-01-05 | 湖南嘉创信息科技发展有限公司 | Intelligent data early warning decision tree analysis method and system |
CN117873837A (en) * | 2024-03-11 | 2024-04-12 | 国网四川省电力公司信息通信公司 | Analysis method for capacity depletion trend of storage device |
CN118314379A (en) * | 2024-03-29 | 2024-07-09 | 深圳市心研医疗科技有限公司 | Scatter diagram classification device |
-
2023
- 2023-08-18 CN CN202311050733.1A patent/CN117056834A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273670A (en) * | 2023-11-23 | 2023-12-22 | 深圳市云图华祥科技有限公司 | Engineering data management system with learning function |
CN117273670B (en) * | 2023-11-23 | 2024-03-12 | 深圳市云图华祥科技有限公司 | Engineering data management system with learning function |
CN117349782A (en) * | 2023-12-06 | 2024-01-05 | 湖南嘉创信息科技发展有限公司 | Intelligent data early warning decision tree analysis method and system |
CN117349782B (en) * | 2023-12-06 | 2024-02-20 | 湖南嘉创信息科技发展有限公司 | Intelligent data early warning decision tree analysis method and system |
CN117873837A (en) * | 2024-03-11 | 2024-04-12 | 国网四川省电力公司信息通信公司 | Analysis method for capacity depletion trend of storage device |
CN118314379A (en) * | 2024-03-29 | 2024-07-09 | 深圳市心研医疗科技有限公司 | Scatter diagram classification device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11816078B2 (en) | Automatic entity resolution with rules detection and generation system | |
CN117056834A (en) | Big data analysis method based on decision tree | |
CN112756759B (en) | Spot welding robot workstation fault judgment method | |
CN110019074A (en) | Analysis method, device, equipment and the medium of access path | |
CN110780965B (en) | Vision-based process automation method, equipment and readable storage medium | |
CN110442523B (en) | Cross-project software defect prediction method | |
CN112069069A (en) | Defect automatic positioning analysis method, device and readable storage medium | |
CN116662817B (en) | Asset identification method and system of Internet of things equipment | |
CN114218998A (en) | Power system abnormal behavior analysis method based on hidden Markov model | |
CN110717090A (en) | Network public praise evaluation method and system for scenic spots and electronic equipment | |
CN109002810A (en) | Model evaluation method, Radar Signal Recognition method and corresponding intrument | |
CN107016416A (en) | The data classification Forecasting Methodology merged based on neighborhood rough set and PCA | |
CN107908807B (en) | Small subsample reliability evaluation method based on Bayesian theory | |
CN113722719A (en) | Information generation method and artificial intelligence system for security interception big data analysis | |
Soukup et al. | Towards evaluating quality of datasets for network traffic domain | |
CN115577357A (en) | Android malicious software detection method based on stacking integration technology | |
CN118296164A (en) | Automatic agricultural product information acquisition and updating method and system based on knowledge graph | |
CN110956543A (en) | Method for detecting abnormal transaction | |
CN111967501B (en) | Method and system for judging load state driven by telemetering original data | |
CN102103502A (en) | Method and system for analyzing a legacy system based on trails through the legacy system | |
CN110879821A (en) | Method, device, equipment and storage medium for generating rating card model derivative label | |
CN111896609A (en) | Method for analyzing mass spectrum data based on artificial intelligence | |
CN115455407A (en) | Machine learning-based GitHub sensitive information leakage monitoring method | |
CN113722230B (en) | Integrated evaluation method and device for vulnerability mining capability of fuzzy test tool | |
CN111382191A (en) | Machine learning identification method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20231114 |