CN116485523A - Decision tree-based data evaluation method, device, equipment and storage medium - Google Patents

Decision tree-based data evaluation method, device, equipment and storage medium Download PDF

Info

Publication number
CN116485523A
CN116485523A CN202310450497.6A CN202310450497A CN116485523A CN 116485523 A CN116485523 A CN 116485523A CN 202310450497 A CN202310450497 A CN 202310450497A CN 116485523 A CN116485523 A CN 116485523A
Authority
CN
China
Prior art keywords
decision tree
data
preset
training
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310450497.6A
Other languages
Chinese (zh)
Inventor
潘成挺
钟红义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Breeze Enterprise Technology Co ltd
Original Assignee
Hangzhou Breeze Enterprise Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Breeze Enterprise Technology Co ltd filed Critical Hangzhou Breeze Enterprise Technology Co ltd
Priority to CN202310450497.6A priority Critical patent/CN116485523A/en
Publication of CN116485523A publication Critical patent/CN116485523A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to intelligent decision, and discloses a decision tree-based data evaluation method, a decision tree-based data evaluation device, decision tree-based data evaluation equipment and a storage medium, wherein the decision tree-based data evaluation method comprises the steps of obtaining target data, carrying out standardization processing on the target data through a preset standardization engine, and generating target data characteristics corresponding to the target data; generating target indexes of the target data features through a preset rule engine and the data features; and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model. Through the method, the data are standardized after the target data are acquired, the data indexes are determined from the data characteristics after the data characteristics are extracted, the data indexes are compared with the preset indexes in the decision tree model, the final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved.

Description

Decision tree-based data evaluation method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of intelligent decision making technologies, and in particular, to a decision tree-based data evaluation method, apparatus, device, and storage medium.
Background
At present, machine learning and deep learning with big data combined with artificial intelligence are commonly applied to various industries, a decision tree of the machine learning is a classification algorithm of supervised learning commonly used in the field of artificial intelligence, and a decision tree analysis method, also called a probability analysis decision method, is a system analysis method for representing relevant elements forming a decision scheme as a tree and analyzing and selecting the decision scheme on the basis of the tree. This is one of the most common methods for risk-based decisions, and is particularly useful for analyzing relatively complex problems. She compares the expected benefit values (expected values) of different schemes based on the benefit values, and decides the choice of scheme. The method has the greatest characteristics that the decision process of the whole decision problem in different stages of time can be displayed in an image, the logic thinking is clear, the hierarchy is clear, and the method is very visual. The decision tree has the following advantages: 1. easy to understand and implement; 2. the preparation of the data is simple or unnecessary, the data type and conventional type attributes can be processed simultaneously, a feasible and good result can be made on a large data source in a relatively short time, and the model can be easily evaluated through static test.
Based on the loan scene of the bank, when the bank audits the loan, the invoice is often utilized to analyze the business condition of the enterprise, but some invoices possibly having business authenticity problems are difficult to be removed, the business value of the enterprise cannot be truly and accurately reflected, and the credit giving efficiency of the bank to the small enterprise is low. Therefore, how to improve the trust efficiency of enterprises is a technical problem to be solved.
Disclosure of Invention
The application provides a decision tree-based data evaluation method, device, equipment and storage medium, so as to improve the trust efficiency of enterprises.
In a first aspect, the present application provides a decision tree-based data evaluation method, the decision tree-based data evaluation method comprising:
acquiring target data, and carrying out standardization processing on the target data through a preset standardization engine to generate target data characteristics corresponding to the target data;
generating target indexes of the target data features through a preset rule engine and the data features;
and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
Further, comparing the preset decision tree model with the target index, and before obtaining the evaluation result of the target data based on the preset decision tree model, including:
acquiring historical data as a training set;
training the training set to generate the preset decision tree model.
Further, training the training set to generate the preset decision tree model, including:
extracting training data features of the training set based on the preset rule engine and the training set;
determining at least one training index corresponding to the training data features through the preset rule engine;
and calculating the information gain of each training index through a preset information gain function, determining the node position corresponding to each training index according to the information gain, and generating the preset decision tree model.
Further, calculating the information gain of each training index through a preset function includes:
dividing the training data characteristics into a preset number of value intervals in an impure reduction mode, and calculating a gear entropy value corresponding to each value interval;
and calculating the total information entropy value of each training index based on the total information entropy value function and the value interval, and calculating the information gain according to the total information entropy value and each gear entropy value.
Further, the total information entropy function is:
I(X)=-∑pi*logpi,i=1,2,…,n;
wherein I (X) is the total information entropy value, and Pi is the proportion of the ith sample in the current sample set.
Further, the preset information gain function is:
ΔI(X,f)=I(X)-(P 1 I(X 1 )+…+P N I(X N ));
wherein ΔI (X, f) is the information gain, X is the sample set, P N The proportion of samples in X divided into subsets.
Further, calculating the information gain of each training index through a preset information gain function, and determining the node position corresponding to each training index according to the information gain, so as to generate the preset decision tree model, including:
and performing descending order processing on the information gains, and generating the preset decision tree model by arranging the training indexes corresponding to the information gains according to descending order.
In a second aspect, the present application further provides a decision tree-based data evaluation device, the decision tree-based data evaluation device comprising:
the data normalization module is used for acquiring target data, performing normalization processing on the target data through a preset normalization engine and generating target data characteristics corresponding to the target data;
the index generation module is used for generating target indexes of the target data features through a preset rule engine and the data features;
and the decision tree comparison module is used for comparing a preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
In a third aspect, the present application also provides an apparatus comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the decision tree-based data evaluation method as described above when the computer program is executed.
In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a decision tree based data evaluation method as described above.
The application discloses a decision tree-based data evaluation method, a decision tree-based data evaluation device, decision tree-based data evaluation equipment and a storage medium, wherein the decision tree-based data evaluation method comprises the steps of obtaining target data, and performing standardized processing on the target data through a preset standardized engine to generate target data characteristics corresponding to the target data; generating target indexes of the target data features through a preset rule engine and the data features; and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model. Through the method, the data are standardized after the target data are acquired, the data indexes are determined from the data characteristics after the data characteristics are extracted, the data indexes are compared with the preset indexes in the decision tree model, the final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a decision tree-based data evaluation method according to a first embodiment of the present application;
FIG. 2 is a schematic flow chart of a decision tree-based data evaluation method according to a second embodiment of the present application;
FIG. 3 is a schematic flow chart of a decision tree-based data evaluation method according to a third embodiment of the present application;
FIG. 4 is a schematic flow chart of a decision tree-based data evaluation method according to a fourth embodiment of the present application;
FIG. 5 is a schematic diagram of a decision tree model according to an embodiment of the present application;
FIG. 6 is a schematic block diagram of a decision tree based data evaluation apparatus provided by an embodiment of the present application;
fig. 7 is a schematic block diagram of an apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
The embodiment of the application provides a data evaluation method, device, equipment and storage medium based on a decision tree. The decision tree-based data evaluation method can be applied to a server, standardized processing is carried out on data after target data are acquired, each data index is determined from the data characteristics after the data characteristics are extracted, each data index is compared with a preset index in a decision tree model, a final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved. The server may be an independent server or a server cluster.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic flowchart of a decision tree-based data evaluation method according to a first embodiment of the present application. The decision tree-based data evaluation method can be applied to a server, is used for carrying out standardization processing on data after target data are acquired, determining each data index from the data characteristics after extracting the data characteristics, comparing each data index with a preset index in a decision tree model, determining a final trust evaluation result according to the comparison result, and improving the trust efficiency of enterprises.
As shown in fig. 1, the decision tree-based data evaluation method specifically includes steps S10 to S30.
Step S10, acquiring target data, and carrying out standardization processing on the target data through a preset standardization engine to generate target data characteristics corresponding to the target data;
specifically, the data normalization process includes the steps of:
1) Defining a calculation index of standardized tax stamps of enterprises;
2) And defining characteristic values according to the proportion of the calculated index, and defining the characteristics and the characteristic values.
The standardization of the tax receipt data is accomplished as follows:
1) Acquiring a data model template corresponding to an enterprise user;
2) Inputting financial data, tax data and invoice data;
3) The normalization engine processes, merges, cleans, transforms, etc. the data according to the data model templates to generate normalized data.
Step S20, generating target indexes of the target data features through a preset rule engine and the data features;
specifically, training sample data are obtained to generate a training set, wherein the training sample data comprise general indexes, tax credits, ticket credits, three-party indexes, credit indexes and financial indexes;
according to the characteristics of the decision trees, calculating the characteristics selected by the information gain as root nodes, setting nodes of non-leaf nodes of each decision tree as decision nodes, and setting leaf nodes of each decision tree as output units, wherein each decision node is a sample characteristic and a corresponding judgment value, and each leaf node corresponds to a loan prediction result.
And step S30, comparing a preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
In a specific embodiment, the decision tree generation comprises the steps of:
step 1: all data are regarded as one node, and step 2 is carried out;
step 2: selecting one data characteristic from all the data characteristics to divide the nodes, and entering step 3;
step 3: generating a plurality of child nodes, judging each child node, and entering step 4 if the condition of stopping splitting is met; otherwise, enter step 2;
step 4: the node is set to be a child node, and the output result is the category with the largest number of the nodes.
The embodiment discloses a decision tree-based data evaluation method, a device, equipment and a storage medium, wherein the decision tree-based data evaluation method comprises the steps of obtaining target data, and performing standardized processing on the target data through a preset standardized engine to generate target data characteristics corresponding to the target data; generating target indexes of the target data features through a preset rule engine and the data features; and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model. Through the method, the data are standardized after the target data are acquired, the data indexes are determined from the data characteristics after the data characteristics are extracted, the data indexes are compared with the preset indexes in the decision tree model, the final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart of a decision tree-based data evaluation method according to a second embodiment of the present application. The decision tree-based data evaluation method can be applied to a server, is used for carrying out standardization processing on data after target data are acquired, determining each data index from the data characteristics after extracting the data characteristics, comparing each data index with a preset index in a decision tree model, determining a final trust evaluation result according to the comparison result, and improving the trust efficiency of enterprises.
Based on the embodiment shown in fig. 1, in this embodiment, as shown in fig. 2, step S30 is preceded by steps S21 to S22.
Step S21, acquiring historical data as a training set;
and S22, training the training set to generate the preset decision tree model.
In a specific embodiment, as shown in table 1, table 1 defines the index and the criterion of the index, the feature and the definition criterion of the feature value, and the feature is determined according to the passing proportion or the accuracy proportion of the index, and the index can be customized (increased or decreased) according to the specific service of the user, and the rule can also be set by the user.
TABLE 1
The embodiment discloses a decision tree-based data evaluation method, a device, equipment and a storage medium, wherein the decision tree-based data evaluation method comprises the steps of obtaining target data, and performing standardized processing on the target data through a preset standardized engine to generate target data characteristics corresponding to the target data; generating target indexes of the target data features through a preset rule engine and the data features; and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model. Through the method, the data are standardized after the target data are acquired, the data indexes are determined from the data characteristics after the data characteristics are extracted, the data indexes are compared with the preset indexes in the decision tree model, the final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved.
Referring to fig. 3, fig. 3 is a schematic flowchart of a decision tree-based data evaluation method according to a third embodiment of the present application. The decision tree-based data evaluation method can be applied to a server, is used for carrying out standardization processing on data after target data are acquired, determining each data index from the data characteristics after extracting the data characteristics, comparing each data index with a preset index in a decision tree model, determining a final trust evaluation result according to the comparison result, and improving the trust efficiency of enterprises.
Based on the embodiment shown in fig. 2, as shown in fig. 3 in this embodiment, step S22 includes S221 to step S223.
Step S221, extracting training data features of the training set based on the preset rule engine and the training set;
step S222, determining at least one training index corresponding to the training data features through the preset rule engine;
step S223, calculating the information gain of each training index through a preset information gain function, and determining the node position corresponding to each training index according to the information gain to generate the preset decision tree model.
In a specific embodiment, taking the data in table 1 as an example, table 2 is generated after the pair is according to the rule.
TABLE 2
The total information entropy value is calculated according to table 2 as follows:
in this sample set, taking the feature "universal index" as an example, it has 3 values {1 st, 2 nd, 3 rd }, 13 samples in the corresponding subset (universal index=1 st), wherein there are 4 positive samples, 9 negative samples, 5 samples in (universal index=2 nd), 3 positive samples, 2 negative samples, 2 samples in (universal index=3 rd), 2 positive samples, and 0 negative samples.
The embodiment discloses a decision tree-based data evaluation method, a device, equipment and a storage medium, wherein the decision tree-based data evaluation method comprises the steps of obtaining target data, and performing standardized processing on the target data through a preset standardized engine to generate target data characteristics corresponding to the target data; generating target indexes of the target data features through a preset rule engine and the data features; and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model. Through the method, the data are standardized after the target data are acquired, the data indexes are determined from the data characteristics after the data characteristics are extracted, the data indexes are compared with the preset indexes in the decision tree model, the final trust evaluation result is determined according to the comparison result, and the trust efficiency of enterprises is improved.
Referring to fig. 4, fig. 4 is a schematic flowchart of a decision tree-based data evaluation method according to a fourth embodiment of the present application. The decision tree-based data evaluation method can be applied to a server, is used for carrying out standardization processing on data after target data are acquired, determining each data index from the data characteristics after extracting the data characteristics, comparing each data index with a preset index in a decision tree model, determining a final trust evaluation result according to the comparison result, and improving the trust efficiency of enterprises.
Based on the embodiment shown in fig. 2, as shown in fig. 4 in this embodiment, step S223 includes S2231 to step S2232.
Step S2231, dividing the training data characteristics into a preset number of value intervals in an impure reduction mode, and calculating a gear entropy value corresponding to each value interval;
entropy of general index 1 st gear:
entropy of general index 2 nd gear:
entropy of general index 3 rd gear:
information entropy of general index:
information gain of general index: g 1 =I-I 1 =0.14121。
Step S2232, calculating a total information entropy value of each training index based on the total information entropy function and the value interval, and calculating the information gain according to the total information entropy value and each gear entropy value.
Further, the total information entropy function is:
I(X)=-∑pi*logPi,i=1,2,…,n;
wherein I (X) is the total information entropy value, and Pi is the proportion of the ith sample in the current sample set.
Further, the preset information gain function is:
ΔI(X,f)=I(X)-(P 1 I(X 1 )+…+P N I(X N ));
wherein ΔI (X, f) is the information gain, X is the sample set, P N The proportion of samples in X divided into subsets.
Specifically, as shown in table 3, table 3 is an information gain calculation result table.
TABLE 3 Table 3
Features (e.g. a character) Information gain
General index 0.14121
Tax lending index 0.138462
Ticket lending index 0.506003
Three-party index 0.13457
Credit sign index 0.072314
Financial index 0.066304
As can be seen from table 3, the information gain of the ticket credit index is the largest, that is, the credit index is optimally selected as the root node for classification, taking the above tables 1, 2 and 3 as examples, the decision tree model is determined as shown in fig. 5, and fig. 5 is a schematic diagram of the decision tree model in the embodiment of the present application.
Based on the embodiment shown in fig. 2, in this embodiment, step S22 includes:
and performing descending order processing on the information gains, and generating the preset decision tree model by arranging the training indexes corresponding to the information gains according to descending order.
Referring to fig. 6, fig. 6 is a schematic block diagram of a decision tree-based data evaluation apparatus for performing the decision tree-based data evaluation method according to the embodiment of the present application. The decision tree-based data evaluation device can be configured on a server.
As shown in fig. 6, the decision tree based data evaluation apparatus 400 includes:
the data normalization module 410 is configured to obtain target data, and perform normalization processing on the target data through a preset normalization engine, so as to generate target data features corresponding to the target data;
the index generation module 420 is configured to generate a target index of the target data feature through a preset rule engine and the data feature;
and a decision tree comparison module 430, configured to compare a preset decision tree model with the target index, so as to obtain an evaluation result of the target data based on the preset decision tree model.
Further, the decision tree-based data evaluation device further includes:
the training set module is used for acquiring historical data as a training set;
and the decision tree model generation module is used for training the training set and generating the preset decision tree model.
Further, the decision tree model generation module includes:
the data feature extraction unit is used for extracting training data features of the training set based on the preset rule engine and the training set;
the training index determining unit is used for determining at least one training index corresponding to the training data characteristics through the preset rule engine;
the decision tree model generating unit is used for calculating the information gain of each training index through a preset information gain function, determining the node position corresponding to each training index according to the information gain, and generating the preset decision tree model.
Further, the decision tree model generating unit includes:
the gear entropy value calculating subunit is used for dividing the training data characteristics into a preset number of value intervals in an unrepeated reduction mode and calculating the gear entropy value corresponding to each value interval;
and the information gain calculation subunit is used for calculating the total information entropy value of each training index based on the total information entropy function and the value interval, and calculating the information gain according to the total information entropy value and each gear entropy value.
Further, the decision tree model generating unit further includes:
and the index sorting subunit is used for carrying out descending order processing on the information gains, and generating the preset decision tree model according to descending order arrangement of the training indexes corresponding to the information gains.
It should be noted that, for convenience and brevity of description, the specific working process of the apparatus and each module described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The apparatus described above may be implemented in the form of a computer program which is executable on a device as shown in fig. 7.
Referring to fig. 7, fig. 7 is a schematic block diagram of an apparatus according to an embodiment of the present application. The device may be a server.
Referring to fig. 7, the apparatus includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.
The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of decision tree based data evaluation methods.
The processor is used to provide computing and control capabilities to support the operation of the entire device.
The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of decision tree-based data evaluation methods.
The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of a portion of the structure associated with the present application and does not constitute a limitation of the apparatus to which the present application is applied, and that a particular apparatus may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-Programmable gate arrays (FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:
acquiring target data, and carrying out standardization processing on the target data through a preset standardization engine to generate target data characteristics corresponding to the target data;
generating target indexes of the target data features through a preset rule engine and the data features;
and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
In one embodiment, the comparison between the preset decision tree model and the target index is performed before the evaluation result of the target data based on the preset decision tree model is obtained, and the method is used for realizing:
acquiring historical data as a training set;
training the training set to generate the preset decision tree model.
In one embodiment, the training set is trained to generate the preset decision tree model for implementing:
extracting training data features of the training set based on the preset rule engine and the training set;
determining at least one training index corresponding to the training data features through the preset rule engine;
and calculating the information gain of each training index through a preset information gain function, determining the node position corresponding to each training index according to the information gain, and generating the preset decision tree model.
In one embodiment, the information gain of each training index is calculated by a preset function, so as to realize:
dividing the training data characteristics into a preset number of value intervals in an impure reduction mode, and calculating a gear entropy value corresponding to each value interval;
and calculating the total information entropy value of each training index based on the total information entropy value function and the value interval, and calculating the information gain according to the total information entropy value and each gear entropy value.
In one embodiment, the information gain of each training index is calculated through a preset information gain function, and the node position corresponding to each training index is determined according to the information gain, so as to generate the preset decision tree model, which is used for realizing:
and performing descending order processing on the information gains, and generating the preset decision tree model by arranging the training indexes corresponding to the information gains according to descending order.
Embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement any of the decision tree-based data evaluation methods provided in the embodiments of the present application.
The computer readable storage medium may be an internal storage unit of the device according to the foregoing embodiment, for example, a hard disk or a memory of the device. The computer readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the device.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A decision tree-based data evaluation method, characterized in that the decision tree-based data evaluation method comprises:
acquiring target data, and carrying out standardization processing on the target data through a preset standardization engine to generate target data characteristics corresponding to the target data;
generating target indexes of the target data features through a preset rule engine and the data features;
and comparing the preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
2. The decision tree-based data evaluation method according to claim 1, wherein the comparing the preset decision tree model with the target index, before obtaining the evaluation result of the target data based on the preset decision tree model, comprises:
acquiring historical data as a training set;
training the training set to generate the preset decision tree model.
3. The decision tree-based data evaluation method according to claim 2, wherein the training set to generate the preset decision tree model comprises:
extracting training data features of the training set based on the preset rule engine and the training set;
determining at least one training index corresponding to the training data features through the preset rule engine;
and calculating the information gain of each training index through a preset information gain function, determining the node position corresponding to each training index according to the information gain, and generating the preset decision tree model.
4. A decision tree based data evaluation method according to claim 3, wherein said calculating the information gain of each of said training metrics by a predetermined function comprises:
dividing the training data characteristics into a preset number of value intervals in an impure reduction mode, and calculating a gear entropy value corresponding to each value interval;
and calculating the total information entropy value of each training index based on the total information entropy value function and the value interval, and calculating the information gain according to the total information entropy value and each gear entropy value.
5. The decision tree based data evaluation method of claim 4, wherein the total information entropy function is:
I(X)=-∑pi*logpi,i=1,2,…,n;
wherein I (X) is the total information entropy value, and Pi is the proportion of the ith sample in the current sample set.
6. The decision tree based data evaluation method of claim 4, wherein the predetermined information gain function is:
ΔI(X,f)=I(X)-(P 1 I(X 1 )+…+P N I*X N ));
wherein ΔI (X, f) is the information gain, X is the sample set, P N The proportion of samples in X divided into subsets.
7. The decision tree-based data evaluation method according to claim 3, wherein the calculating the information gain of each training index by a preset information gain function, and determining the node position corresponding to each training index according to the information gain, generating the preset decision tree model comprises:
and performing descending order processing on the information gains, and generating the preset decision tree model by arranging the training indexes corresponding to the information gains according to descending order.
8. A decision tree based data evaluation device, comprising:
the data normalization module is used for acquiring target data, performing normalization processing on the target data through a preset normalization engine and generating target data characteristics corresponding to the target data;
the index generation module is used for generating target indexes of the target data features through a preset rule engine and the data features;
and the decision tree comparison module is used for comparing a preset decision tree model with the target index to obtain an evaluation result of the target data based on the preset decision tree model.
9. An apparatus comprising a memory and a processor;
the memory is used for storing a computer program;
the processor for executing the computer program and for implementing the decision tree based data evaluation method according to any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the decision tree based data evaluation method according to any one of claims 1 to 7.
CN202310450497.6A 2023-04-20 2023-04-20 Decision tree-based data evaluation method, device, equipment and storage medium Pending CN116485523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310450497.6A CN116485523A (en) 2023-04-20 2023-04-20 Decision tree-based data evaluation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310450497.6A CN116485523A (en) 2023-04-20 2023-04-20 Decision tree-based data evaluation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116485523A true CN116485523A (en) 2023-07-25

Family

ID=87220895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310450497.6A Pending CN116485523A (en) 2023-04-20 2023-04-20 Decision tree-based data evaluation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116485523A (en)

Similar Documents

Publication Publication Date Title
US11741361B2 (en) Machine learning-based network model building method and apparatus
US20200065710A1 (en) Normalizing text attributes for machine learning models
CN112069310B (en) Text classification method and system based on active learning strategy
US9104709B2 (en) Cleansing a database system to improve data quality
CN110310114B (en) Object classification method, device, server and storage medium
US20200286095A1 (en) Method, apparatus and computer programs for generating a machine-learning system and for classifying a transaction as either fraudulent or genuine
CN114818643B (en) Log template extraction method and device for reserving specific service information
WO2021257395A1 (en) Systems and methods for machine learning model interpretation
CN115510981A (en) Decision tree model feature importance calculation method and device and storage medium
CN113723542A (en) Log clustering processing method and system
CN111309770B (en) Automatic rule generating system and method based on unsupervised machine learning
CN111190967B (en) User multidimensional data processing method and device and electronic equipment
CN109101487A (en) Conversational character differentiating method, device, terminal device and storage medium
CN116485523A (en) Decision tree-based data evaluation method, device, equipment and storage medium
US11954685B2 (en) Method, apparatus and computer program for selecting a subset of training transactions from a plurality of training transactions
CN114118411A (en) Training method of image recognition network, image recognition method and device
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN113469237A (en) User intention identification method and device, electronic equipment and storage medium
JP5824429B2 (en) Spam account score calculation apparatus, spam account score calculation method, and program
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN111080433A (en) Credit risk assessment method and device
CN112632284A (en) Information extraction method and system for unlabeled text data set
Burgard et al. Mixed-Integer Linear Optimization for Semi-Supervised Optimal Classification Trees
CN113987309B (en) Personal privacy data identification method and device, computer equipment and storage medium
CN116502140B (en) Encryption algorithm identification method and device based on similarity of control flow graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination