CN111612624A - Method and system for analyzing importance of data features - Google Patents
Method and system for analyzing importance of data features Download PDFInfo
- Publication number
- CN111612624A CN111612624A CN202010464925.7A CN202010464925A CN111612624A CN 111612624 A CN111612624 A CN 111612624A CN 202010464925 A CN202010464925 A CN 202010464925A CN 111612624 A CN111612624 A CN 111612624A
- Authority
- CN
- China
- Prior art keywords
- data
- bond
- sample
- importance
- random forest
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000007637 random forest analysis Methods 0.000 claims abstract description 42
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000012502 risk assessment Methods 0.000 claims abstract description 11
- 238000011156 evaluation Methods 0.000 claims abstract description 4
- 238000003066 decision tree Methods 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 18
- 238000005070 sampling Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Finance (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a method and a system for analyzing data feature importance, wherein the method for analyzing the data feature importance comprises the following steps: obtaining a bond sample required by bond risk evaluation, marking the time granularity of sample points in the bond sample, and taking the time granularity as an original data set; preprocessing the original data set to obtain a balanced data set; constructing a random forest model for analyzing the importance of the data characteristics; and inputting the balanced data set into the random forest for calculation, and analyzing the importance ranking of each data characteristic. After an original data set related to bond risk assessment is collected, data equalization is carried out on the original data set through preprocessing, so that the problem that effective analysis on data characteristics cannot be carried out due to the fact that the proportion of positive samples and negative samples in the data is not uniform is solved, and corresponding data characteristic analysis is carried out on the equalized data through a random forest algorithm, so that more scientific and accurate related data characteristics needed for assessing bond risk compared with those summarized through a traditional assessment method are found out.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a system for analyzing the importance of data characteristics.
Background
In the last two years, bond breach events have frequently occurred with policy changes, and it is anticipated that bond breach will become a common risk event.
However, in the conventional manner of risk assessment of bonds, related data such as the operating conditions, financial conditions and the related industry quotations of the assessment objects are manually collected, and data features in the data are analyzed through summarized experiences, so that the risk of the bonds is assessed.
However, due to the frequent policy changes in recent years, many cases of bond risk which have not or only rarely appeared in the past are presented, so that the data characteristics for analyzing the bond risk according to manual summary become unreliable.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide an analysis method of importance of data features and a corresponding analysis system of importance of data features that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present invention discloses a method for analyzing importance of data features, including:
obtaining a bond sample required by bond risk evaluation, marking the time granularity of sample points in the bond sample, and taking the time granularity as an original data set;
preprocessing the original data set to obtain a balanced data set;
constructing a random forest model for analyzing the importance of the data characteristics;
and inputting the balanced data set into the random forest for calculation, and analyzing the importance ranking of each data characteristic.
Further, the bond samples required for risk assessment of the bonds are specifically as follows:
sample points of the bond sample are time granularity of a quarter;
the sample points of the bond sample are specifically as follows:
and marking positive and negative samples according to the standard that whether the bond corresponding to the sample point has a default situation or a major risk event in the quarter.
Further, the positive and negative samples specifically include:
when the bond is subjected to default or major risk event in the quarter, marking the bond as a negative sample;
when the bond has not experienced a default or major risk event in the quarter, it is marked as a positive sample.
Further, the preprocessing the original data set includes two methods of undersampling and oversampling.
Further, the random forest model specifically includes:
repeatedly and randomly extracting K samples from the preprocessed equalized data set N in a replacing manner by using a bootstrap sampling method to generate a new data sample collection;
generating T classification trees according to the new data sample set to form a random forest;
and performing decision tree modeling on each sample obtained by the bootstrap sampling method to form a plurality of decision trees for prediction, and voting to obtain a final prediction result.
Further, the decision tree specifically includes:
each decision tree is composed of training samples X with the sample size of K and a random vector thetakGenerating;
random vector sequence [ theta ]kK is 1,2, …, k is independently distributed;
random forest, i.e., the set of all decision trees { h (X, θ) }k),k=1,2,…,K};
Each decision tree model h (X, theta)k) There is a vote weight to select the classification result for input variable x:
wherein H (x) represents the result of random forest classification, hi(x) Is a single decision tree classification result, Y represents the classification target, and I (●) is an indicative function.
Further, the data features are input into the random forest for calculation, specifically:
for each decision tree in the random forest, calculating a prediction error of out-of-band data using the corresponding out-of-bag data (OOB), denoted as errOOB 1;
randomly adding noise interference to the characteristic X of all samples of the out-of-bag data OOB, and calculating out-of-bag data error of the out-of-bag data OOB again and recording as errOOB 2;
assuming that there are N trees in the random forest, then the feature X importance ═ Σ (eerOOB2-eerOOB 1)/N.
The embodiment of the invention discloses a system for analyzing the importance of data features, which comprises:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a bond sample required by bond risk assessment, marking the time granularity of sample points in the bond sample and taking the time granularity as an original data set;
the preprocessing module is used for preprocessing the original data set to obtain a balanced data set;
the modeling module is used for constructing a random forest model for analyzing the importance of the data characteristics;
and the calculation module is used for inputting the balanced data set into the random forest for calculation and analyzing the importance ranking of each data characteristic.
The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, the steps of the analysis method for the importance of the data characteristics are realized.
The embodiment of the invention discloses a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data feature importance analysis method are realized.
The embodiment of the invention has the following advantages: after an original data set related to bond risk assessment is collected, data equalization is carried out on the original data set through preprocessing, so that the problem that effective analysis on data characteristics cannot be carried out due to the fact that the proportion of positive samples and negative samples in the data is not uniform is solved, and corresponding data characteristic analysis is carried out on the equalized data through a random forest algorithm, so that more scientific and accurate related data characteristics needed for assessing bond risk compared with those summarized through a traditional assessment method are found out.
Drawings
FIG. 1 is a flow chart of the steps of an embodiment of a method for analyzing the importance of data features of the present invention;
FIG. 2 is a block diagram of an embodiment of the system for analyzing the importance of data features according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core concepts of the embodiment of the invention is to provide an analysis method of data feature importance, which comprises the following steps: obtaining a bond sample required by bond risk evaluation, marking the time granularity of sample points in the bond sample, and taking the time granularity as an original data set; preprocessing the original data set to obtain a balanced data set; constructing a random forest model for analyzing the importance of the data characteristics; and inputting the balanced data set into the random forest for calculation, and analyzing the importance ranking of each data characteristic. After an original data set related to bond risk assessment is collected, data equalization is carried out on the original data set through preprocessing, so that the problem that effective analysis on data characteristics cannot be carried out due to the fact that the proportion of positive samples and negative samples in the data is not uniform is solved, and corresponding data characteristic analysis is carried out on the equalized data through a random forest algorithm, so that more scientific and accurate relevant data characteristics needed for assessing bond risk are found out compared with those summarized through a traditional assessment method.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for analyzing importance of data features of the present invention is shown, which may specifically include the following steps:
s100, obtaining a bond sample required by bond risk assessment, marking the time granularity of sample points in the bond sample, and taking the time granularity as an original data set;
s200, preprocessing the original data set to obtain a balanced data set;
s300, constructing a random forest model for analyzing the importance of the data characteristics;
and S400, inputting the balanced data set into the random forest for calculation, and analyzing the importance ranking of each data feature.
As described in step S100, a bond sample required for risk assessment of bonds is obtained, and the time granularity of sample points in the bond sample is marked as an original data set. In this embodiment, the raw data set includes bonds that have a default event, bonds that are ultimately normally delivered after a risk event, and bonds that are normally delivered. After the collected bond sample is determined, the bond sample is marked correspondingly. First, to determine the time granularity of the sample points in the bond sample, the time granularity is quarterly in this embodiment, that is, a "fiscal quarter" during the issuance of a bond is used as a sample point. The financial statement presence or absence is an important factor in establishing a sample point because of the large number of financial-related data features in the data set. And secondly, carrying out corresponding sample rating according to the standard whether the bond corresponding to the sample point has default conditions or major risk events in the quarter. When the bond has a default or major risk event in the quarter, marking as a negative sample; when the bond has not experienced a default or major risk event in the quarter, the bond is marked as a positive sample.
Referring to the step S200, the raw data set is preprocessed to obtain a balanced data set. The positive and negative samples obtained according to the above step S100 generally have an unbalanced condition, so that preprocessing is inevitably required to be performed on the unbalanced original data set, otherwise, the importance ranking of each data feature in the data set cannot be effectively analyzed. The embodiment mainly adopts two methods of undersampling and oversampling for preprocessing. Under-sampling is the process of deleting some of the samples of the majority class to achieve the purpose of equalizing the samples, and over-sampling is the process of increasing the number of samples of the minority class to achieve the purpose of equalizing the samples. In the oversampling aspect, the present embodiment adopts two methods. The first method is the simplest random copy, i.e. randomly copying a few types of sample data to achieve the balance, but this method may cause an over-fitting problem in the case of a large difference between the positive and negative sample ratios, so the number of copied samples should be controlled to a certain level to ensure that the over-fitting does not occur. The second method is the SMOTE algorithm, which achieves the goal of balancing samples by inserting new samples in a few classes of samples that are close in some locations.
And (5) constructing a random forest model for analyzing the importance of the data features by referring to the step S300. Repeatedly and randomly extracting K samples from the preprocessed equalized data set N in a replacing manner by using a bootstrap sampling method to generate a new data sample set, generating T classification trees according to the new data sample set to form a random forest, modeling each sample obtained by using the bootstrap sampling method by using a decision tree to form a plurality of decision trees for prediction, and finally obtaining a final prediction result by voting. Each decision tree consists of training sample X with sample size K and random vector thetakGenerating, a random sequence of vectors { theta }kK is 1,2, …, k is independently and identically distributed, random forest is the set of all decision trees h (X, θ)k) K is 1,2, …, K, and each decision tree model h (X, θ)k) All having a vote weight to select the classification result of the input variable x
Wherein H (x) represents the result of random forest classification, hi(x) Is a single decision tree classification result, Y represents a classification target, I (●) is an indicative function, and the random forest classification model uses a simple voting strategy to complete the final classification.
Referring to the step S400, the equalized data set is input into the random forest for calculation, and the importance ranking of each data feature is analyzed. Calculating prediction error of out-of-band data by using corresponding out-of-bag data (OOB) for each decision tree in the random forest, and marking the prediction error as errOOB1, then randomly adding noise interference to the characteristic X of all samples of the out-of-bag data OOB, such as replacing the characteristic X of a sample with random number (noise), calculating the out-of-bag data error again, and marking the error as errOOB2, and finally assuming that there are N trees in the random forest, then the importance of the characteristic X is marked as errOOB2
∑(eerOOB2-eerOOB1)/N。
This expression is used as a measure of the importance of the corresponding feature, with the central idea being: if the accuracy outside the bag is greatly reduced after noise is randomly added to a certain feature, the influence of the feature on the classification result of the sample is great. I.e. it is of a relatively high importance.
In the embodiment, the random forest algorithm is applied to data feature analysis, and each feature is replaced by a random number, so that the more obvious the reduction degree of the model effect is, the more important the feature is.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 2, a block diagram of an embodiment of the data feature importance analysis system of the present invention is shown, and specifically, the structure diagram may include the following modules:
the collecting module 100 is used for obtaining a bond sample required by bond risk assessment, marking time granularity of sample points in the bond sample, and taking the time granularity as an original data set;
a preprocessing module 200, configured to preprocess the original data set to obtain a balanced data set;
the modeling module 300 is used for constructing a random forest model for analyzing the importance of the data characteristics;
and the calculating module 400 is configured to input the balanced data set into the random forest for calculation, and analyze importance ranks of each data feature.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, the steps of the analysis method for the importance of the data characteristics are realized.
The embodiment of the invention discloses a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data feature importance analysis method are realized.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method for analyzing the importance of the data features and the system for analyzing the importance of the data features provided by the invention are described in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A method for analyzing the importance of data features, comprising:
obtaining a bond sample required by bond risk evaluation, marking the time granularity of sample points in the bond sample, and taking the time granularity as an original data set;
preprocessing the original data set to obtain a balanced data set;
constructing a random forest model for analyzing the importance of the data characteristics;
and inputting the balanced data set into the random forest for calculation, and analyzing the importance ranking of each data characteristic.
2. The method according to claim 1, wherein the bond samples required for risk assessment of bonds are specifically: sample points of the bond sample are time granularity of a quarter;
the sample points of the bond sample are specifically as follows: and marking positive and negative samples according to the standard that whether the bond corresponding to the sample point has a default situation or a major risk event in the quarter.
3. The method according to claim 2, characterized in that the positive and negative samples are in particular:
when the bond is subjected to default or major risk event in the quarter, marking the bond as a negative sample;
when the bond has not experienced a default or major risk event in the quarter, it is marked as a positive sample.
4. The method of claim 1, wherein the preprocessing the raw data set comprises both an under-sampling and an over-sampling method.
5. The method according to claim 1, wherein the random forest model is specifically:
repeatedly and randomly extracting K samples from the preprocessed equalized data set N in a replacing manner by using a bootstrap sampling method to generate a new data sample collection;
generating T classification trees according to the new data sample set to form a random forest;
and performing decision tree modeling on each sample obtained by the bootstrap sampling method to form a plurality of decision trees for prediction, and voting to obtain a final prediction result.
6. The method according to claim 5, wherein the decision tree is specifically:
each decision tree is composed of training samples X with the sample size of K and a random vector thetakGenerating;
random vector sequence [ theta ]kK is 1,2, …, k is independently distributed;
random forest, i.e., the set of all decision trees { h (X, θ) }k),k=1,2,…,K};
Each decision tree model h (X, theta)k) All have one ticketTicket right to select the classification result of the input variable x:
wherein H (x) represents the result of random forest classification, hi(x) Is a single decision tree classification result, Y represents the classification target, and I (●) is an indicative function.
7. The method of claim 1, wherein the data features are input into the random forest for computation, specifically:
for each decision tree in the random forest, calculating a prediction error of out-of-band data using the corresponding out-of-bag data (OOB), denoted as errOOB 1;
randomly adding noise interference to the characteristic X of all samples of the out-of-bag data OOB, and calculating out-of-bag data error of the out-of-bag data OOB again and recording as errOOB 2;
assuming that there are N trees in the random forest, then the feature X importance ═ Σ (eerOOB2-eerOOB 1)/N.
8. An analysis system for importance of data features, comprising:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a bond sample required by bond risk assessment and taking the bond sample as an original data set;
the preprocessing module is used for preprocessing the original data set to obtain a balanced data set;
the modeling module is used for constructing a random forest model for analyzing the importance of the data characteristics;
and the calculation module is used for inputting the balanced data set into the random forest for calculation and analyzing the importance ranking of each data characteristic.
9. Electronic device, characterized in that it comprises a processor, a memory and a computer program stored on said memory and capable of running on said processor, said computer program, when executed by said processor, implementing the steps of the method of analysis of the importance of data features according to any one of claims 1 to 8.
10. Computer-readable storage medium, characterized in that it stores thereon a computer program which, when being executed by a processor, carries out the steps of the method of analysis of the importance of data features according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010464925.7A CN111612624A (en) | 2020-05-28 | 2020-05-28 | Method and system for analyzing importance of data features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010464925.7A CN111612624A (en) | 2020-05-28 | 2020-05-28 | Method and system for analyzing importance of data features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111612624A true CN111612624A (en) | 2020-09-01 |
Family
ID=72199797
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010464925.7A Pending CN111612624A (en) | 2020-05-28 | 2020-05-28 | Method and system for analyzing importance of data features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111612624A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113419465A (en) * | 2021-07-13 | 2021-09-21 | 浙江菲达环保科技股份有限公司 | Data preprocessing method and system for environmental protection system of thermal power generating unit |
CN115409613A (en) * | 2022-09-13 | 2022-11-29 | 中债金科信息技术有限公司 | Bond risk detection model training method and bond risk detection method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109300545A (en) * | 2018-08-28 | 2019-02-01 | 昆明理工大学 | A kind of method for prewarning risk of the thalassemia based on RF |
CN109934420A (en) * | 2019-04-17 | 2019-06-25 | 重庆大学 | A kind of method and system for predicting labor turnover |
-
2020
- 2020-05-28 CN CN202010464925.7A patent/CN111612624A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109300545A (en) * | 2018-08-28 | 2019-02-01 | 昆明理工大学 | A kind of method for prewarning risk of the thalassemia based on RF |
CN109934420A (en) * | 2019-04-17 | 2019-06-25 | 重庆大学 | A kind of method and system for predicting labor turnover |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113419465A (en) * | 2021-07-13 | 2021-09-21 | 浙江菲达环保科技股份有限公司 | Data preprocessing method and system for environmental protection system of thermal power generating unit |
CN115409613A (en) * | 2022-09-13 | 2022-11-29 | 中债金科信息技术有限公司 | Bond risk detection model training method and bond risk detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107220845B (en) | User re-purchase probability prediction/user quality determination method and device and electronic equipment | |
CN106909981B (en) | Model training method, sample balancing method, model training device, sample balancing device and personal credit scoring system | |
CN111612628A (en) | Method and system for classifying unbalanced data sets | |
CN109635010B (en) | User characteristic and characteristic factor extraction and query method and system | |
CN113177700B (en) | Risk assessment method, system, electronic equipment and storage medium | |
CN113139687B (en) | Method and device for predicting credit card user default | |
CN110490304B (en) | Data processing method and device | |
CN111242358A (en) | Enterprise information loss prediction method with double-layer structure | |
CN111062036A (en) | Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment | |
CN111612624A (en) | Method and system for analyzing importance of data features | |
CN112017040A (en) | Credit scoring model training method, scoring system, equipment and medium | |
CN111199469A (en) | User payment model generation method and device and electronic equipment | |
CN110634060A (en) | User credit risk assessment method, system, device and storage medium | |
CN111160959A (en) | User click conversion estimation method and device | |
CN116468273A (en) | Customer risk identification method and device | |
CN114519508A (en) | Credit risk assessment method based on time sequence deep learning and legal document information | |
CN112508684B (en) | Collecting-accelerating risk rating method and system based on joint convolutional neural network | |
CN112434862B (en) | Method and device for predicting financial dilemma of marketing enterprises | |
CN116166967B (en) | Data processing method, equipment and storage medium based on meta learning and residual error network | |
CN117235633A (en) | Mechanism classification method, mechanism classification device, computer equipment and storage medium | |
CN114139636B (en) | Abnormal operation processing method and device | |
CN115271442A (en) | Modeling method and system for evaluating enterprise growth based on natural language | |
CN113918471A (en) | Test case processing method and device and computer readable storage medium | |
CN109308565B (en) | Crowd performance grade identification method and device, storage medium and computer equipment | |
CN111860642A (en) | Unbalanced sample classification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200901 |
|
RJ01 | Rejection of invention patent application after publication |