CN117725527A - Score model optimization method based on machine learning analysis rules - Google Patents

Score model optimization method based on machine learning analysis rules Download PDF

Info

Publication number
CN117725527A
CN117725527A CN202311822048.6A CN202311822048A CN117725527A CN 117725527 A CN117725527 A CN 117725527A CN 202311822048 A CN202311822048 A CN 202311822048A CN 117725527 A CN117725527 A CN 117725527A
Authority
CN
China
Prior art keywords
rule
core
data set
analyzed
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311822048.6A
Other languages
Chinese (zh)
Inventor
裴卫杰
闫庆
胡佰庆
梁春雨
高建新
丁珂
商延辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingyan Technology Co ltd
Original Assignee
Beijing Lingyan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lingyan Technology Co ltd filed Critical Beijing Lingyan Technology Co ltd
Priority to CN202311822048.6A priority Critical patent/CN117725527A/en
Publication of CN117725527A publication Critical patent/CN117725527A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a score model optimization method based on machine learning analysis rules, which takes continuous optimization application of an artificial intelligent algorithm in a model combination strategy as a design concept, realizes automatic and intelligent optimization of the model combination strategy under the theoretical guidance of supervised learning, unsupervised learning, rough set and grain calculation and the like, can carry out multi-granularity analysis on related data of a trigger model through an AI technology, carries out learning trial calculation through a plurality of optimization strategies aiming at different business scenes, can overcome subjectivity and trial calculation limitation of business experts, fully excavates hidden information to obtain a better combination strategy, assists business personnel to realize automatic optimization of the model, assists business to better optimize the model quickly, and rapidly discovers association relation and participation degree of rules under the model, assists the business to understand feature recognition points of suspicious cases, and further better recognizes suspicious cases.

Description

Score model optimization method based on machine learning analysis rules
Technical Field
The invention belongs to the technical field of artificial intelligent model optimization, and particularly relates to a score model optimization method based on machine learning analysis rules.
Background
In order to accurately grasp suspicious cases in time, the expert system model needs to be continuously updated and iterated, and along with the introduction and long-time operation of a large number of expert models, how to effectively optimize the expert model ensures that the effectiveness of the model becomes a current pain point. In the present stage, after the result analysis is carried out on site by a business expert, an optimization suggestion is provided, personnel are arranged to adjust and implement the optimization, and the optimization has certain characteristics of hysteresis, one-sided performance, low efficiency and the like, so that the workload and time cost of the business are increased, and the additional economic cost is increased, thereby becoming the working difficulty and pain of each mechanism.
Along with the increase of digital transformation and electric fraud means of financial institutions, the model optimization adopts a traditional business expert field analysis case optimization model mode, the risk characteristics of suspicious cases early warning main bodies are difficult to comprehensively identify, the method has no timeliness and objectivity, the quick development requirements and supervision and inspection requirements of the financial institution business cannot be met, and the defects of the conventional model optimization technology mainly are as follows: the intelligent comprehensive suspicious case result analysis is lacking, case evidence information can be collected only by a manual mode or a semi-manual mode, and the processing efficiency and accuracy are not high; the method mainly relies on service experts to manually analyze and judge cases, so that the method has subjectivity, limitation and hysteresis, risk characteristics of an early warning main body are difficult to comprehensively distinguish, model accuracy is insufficient, model combing definition can be carried out only through experience of the service experts, model updating is not timely, accuracy is reduced, and even a missing report condition occurs.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a score model optimization method based on machine learning analysis rules, wherein the application takes the continuous optimization application of an artificial intelligent algorithm in a model combination strategy as a design concept, and under the theoretical guidance of supervised learning, unsupervised learning, rough set and grain calculation and the like, hidden information is mined by analyzing model triggering conditions of a classification algorithm, a dimension reduction algorithm, a rough set and grain calculation method, a correlation rule method, a statistical analysis method and the like, and an expert system model combination strategy is optimized. Through the application of AI technology, financial institutions can optimize expert model more fast, accurately, and then discerns suspicious transaction risk accurately, in time reduces risk and loss.
The application provides a score model optimization method based on machine learning analysis rules, which comprises the following steps:
acquiring recognized case data, excluded case data and rule early warning data in an optimization period of an original model;
analyzing a core rule group and rule participation degree in the original model according to the identified case data and the excluded case data and combining a preset algorithm to obtain a machine learning rule analysis result;
Executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result;
and packaging and deploying the model obtained from the score model optimization result to finish the optimization of the original model.
In some optional implementations of some embodiments, the analyzing the core rule set and the rule engagement in the original model according to the identified case data and the excluded case data and in combination with a preset algorithm to obtain a machine learning rule analysis result includes:
combining the identified case data and the excluded case data into a first data set;
preprocessing the first data set and discretizing based on a rule dummy variable to obtain a first data set to be analyzed;
respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm;
carrying out rule participation analysis on the first data set to be analyzed by adopting a random forest classification algorithm to obtain a rule participation analysis result;
and the core rule analysis result and the rule participation analysis result jointly form the machine learning rule analysis result.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm includes:
the method comprises the steps of adopting a random forest classification algorithm to analyze a core rule group to obtain a first type of core rule group analysis result, wherein the steps are as follows:
step A1: let the first data set to be analyzed be XN, A be the number of samples in the first data set to be analyzed XN, the number of input samples of a single decision tree is: a, randomly extracting A training samples from a first data set XN to be analyzed, wherein the A training samples are put back from the first data set XN to be analyzed;
step A2: let M be the total number of features of the first dataset to be analyzed,when splitting is carried out on each node of each decision tree, randomly selecting M input features from M input features as attribute sets, wherein each attribute is represented by a, calculating the base index of each attribute through a base index formula, and splitting the attribute with the smallest base index, wherein the base index calculation formula is as follows:
wherein,d represents the first data set XN to be analyzed, a represents the attribute in the first data set XN to be analyzed, V represents the value of the attribute a, V represents the value number, and D v Representing that the v-th branch node contains all values a of D on attribute a v Data of p k Representing the proportion of the kth sample in D, wherein y represents the total number of the categories;
step A3: repeating the step A1 and the step A2, enabling each decision tree to be split continuously, stopping splitting until the splitting stopping condition is met, generating a corresponding number of first decision trees, and taking the category with the largest single tree classification result from the first decision trees as a first random forest classification result through a voting method;
step A4: and respectively calculating the expected contribution rate of each decision tree in the first random forest classification result to obtain a first expected contribution rate, carrying out average normalization on the first expected contribution rate to obtain feature importance, arranging the feature importance in a descending order, and sequentially taking features with the accumulated feature importance of more than 99% from front to back according to the arrangement result as a first type core rule group analysis result.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
And carrying out core rule group analysis by adopting an association rule algorithm to obtain a second type of core rule group analysis result, wherein the method comprises the following steps of:
step B1: traversing all the features in the first data set to be analyzed to obtain a set of feature pairwise combinations in the first data set to be analyzed, and recording the set as a frequent A item set LA;
step B2: traversing all the features in the first data set to be analyzed, searching features combined with the set LA in the first data set to be analyzed, combining the features into a new item set, and recording the new item set as a frequent B item set LB;
step B3: repeatedly executing the step B2 until the frequent k item sets cannot be found, and calculating the support, confidence and lifting degree of each frequent item set to obtain a frequent item set calculation result;
step B4: and selecting the frequent item set which is simultaneously larger than the minimum support degree, the minimum confidence degree and the minimum lifting degree from the frequent item set calculation result as a second type core rule group.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
And adopting a minimum attribute reduction algorithm to analyze the core rule group to obtain a third type of core rule group analysis result, wherein the method comprises the following steps of:
step C1: constructing a decision table according to the first data set to be analyzed;
step C2: binary coding is carried out on the initial group, a binary distinguishing matrix is constructed, the core of the decision table is calculated from the binary distinguishing matrix, and the core and necessary attributes are added into the initial group;
step C3: let n be the number of samples of the first data set to be analyzed, h be the number of features of the first data set to be analyzed, one piece of data represents one individual, x represents the individual, the fitness value of each individual in the current generation is calculated through a fitness function formula, and the fitness function formula is expressed as follows:
wherein h is the number of conditional attributes, y (x) is the number of all the conditional attributes equal to 1 in the individual x, the number of the conditional attributes in the individual x is represented, gamma (x, D) is the dependence of the conditional attributes contained in the individual x on the decision attribute D, and alpha is a proportionality coefficient;
step C4: calculating the probability of the selected individual, wherein the calculation formula is as follows:
selecting individuals to produce offspring using a roulette selection algorithm;
step C5: using cross probability p c Single-point crossover is performed while utilizing variation probability p m Selecting individuals x for variation iteration until all the individuals x are varied from 1 to 0 to obtain a final population; the maximum iteration times are preset, and when the iteration times are smaller than the maximum iteration times, the step C3 is returned to continue iteration;
step C6: and obtaining an attribute reduction result according to the individual with the largest fitness in the final population, wherein the attribute reduction result is a third type core rule group.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
and adopting a recursive feature elimination algorithm to analyze the core rule group to obtain a fourth type of analysis result of the core rule group, wherein the method comprises the following steps:
step D1: selecting a random forest classification algorithm as a basic model and setting a preset number of target features as feature sets;
step D2: training the first data set to be analyzed through the basic model, and obtaining a feature importance score after average normalization of the expected contribution rate of all decision trees through calculation of the expected contribution rate of each decision tree;
Step D3: sorting the target features according to the feature importance scores, and selecting the features with the lowest scores as features to be removed;
step D4: removing the features to be removed from the feature set, if the number of the remaining features reaches a threshold value/all the features are removed, obtaining a feature subset after removal and jumping to a step D5; otherwise, jumping to the step D2, and carrying out feature importance assessment and feature elimination on the new feature subset again;
step D5: and taking the characteristic subset after the elimination as a final characteristic subset, wherein the final characteristic subset is a fourth type of core rule set.
In some optional implementations of some embodiments, the performing, by using a random forest classification algorithm, rule engagement analysis on the first data set to be analyzed to obtain a rule engagement analysis result includes:
step E1.1: let the first data set to be analyzed be XN, E be the number of samples in the first data set to be analyzed XE, the number of input samples of a single decision tree is: e randomly extracting E training samples from the first data set XE to be analyzed;
step E1.2: let G be the total number of features of the first dataset to be analyzed, When splitting is carried out on each node of each decision tree, G input features are randomly selected from G input features to serve as an attribute set, the base index of each attribute in the attribute set is calculated through a base index formula, and the attribute with the minimum base index is selected for splitting;
step E1.3: repeating the step E1.1 and the step E1.2, enabling each decision tree to be split continuously until the splitting stopping condition is met, stopping splitting, generating a corresponding number of second decision trees, and taking the category with the largest single tree classification result from the second decision trees as a second random forest classification result through a voting method;
step E1.4: respectively calculating the expected contribution rate of each decision tree in the second random forest classification result to obtain a second expected contribution rate, and carrying out average normalization on the second expected contribution rate to obtain rule importance;
step E2: obtaining a rule contribution rate according to the recognized case data, wherein the rule contribution rate represents the ratio of the rule trigger quantity in the recognized case to the recognized case number, and the formula is as follows:
rule_contribute=r1/ra
wherein rule_control represents rule contribution rate, r1 represents rule trigger quantity in the identified cases, and ra represents the number of the identified cases;
Step E3: the rule similarity represents the similarity and the inclusion condition of triggered clients among rules in the model, the rule similarity is calculated by adopting a pearson similarity function, and the calculation formula is as follows:
wherein rule_similarity represents rule similarity, rc1 represents a discretization array of the client triggered by rule 1, and rc2 represents a discretization array of the client triggered by rule 2;
step E4: obtaining rule separation rate according to the recognized case data and the excluded case data, wherein the rule separation rate represents the ratio between the triggering quantity of the rule in the recognized case and the triggering quantity in the excluded case, and the formula is as follows:
rule_difference=r1/r2
wherein rule_difference represents rule separation rate, r1 represents rule trigger quantity in the determined case, and r2 represents rule trigger quantity in the excluded case;
step E5: and the rule importance, the rule contribution rate, the rule similarity and the rule separation rate jointly form the rule participation degree analysis result.
In some optional implementations of some embodiments, the executing, according to the machine learning rule analysis result and the rule early warning data, a score model optimization policy on the original model to obtain a score model optimization result, where the score model optimization policy includes a score model original model optimization step, a score model addition rule optimization step, and a score model reconstruction optimization step, and the score model original model optimization step is:
Step X1: merging the identified case data and the excluded case data into a second data set;
step X2: preprocessing the second data set and discretizing based on a rule dummy variable to obtain a second data set to be analyzed;
step X3: acquiring a second type core rule group and rule importance in the machine learning rule analysis result;
step X4: calculating the rule contribution rate;
step X5: and weighting the rule importance and the rule contribution rate to obtain a rule classification score, wherein the formula is as follows:
rule_grading_score=0.5*rule_importance+0.5*rule_contribute
wherein rule_grading_score represents rule grading scores, rule_importance represents rule importance;
the method comprises the steps of standardizing a rule grading score to obtain a rule grading standard value, wherein the rule grading standard value is a first grade when the rule grading standard value is more than or equal to 0.8, a second grade when the rule grading standard value is more than or equal to 0.3 and less than 0.8, and a third grade when the rule grading standard value is less than 0.3;
step X6: analyzing rule grading standard values and carrying out rule score grading on rules in the second class core rule group:
when the rule contribution rate of the second class core rule group is 1, the corresponding rule in the second class core rule group is adjusted to be a first-grade rule;
When the rule grading standard value of the rule in the second class core rule group is the third grade, the corresponding rule in the second class core rule group is adjusted to be the second grade rule;
when the rule grading standard value of the rule in the second class core rule group is the first grade/the second grade and is not the core rule, the corresponding rule in the second class core rule group is adjusted to be the third grade rule;
corresponding scores are given to the first gear rule, the second gear rule and the third gear rule;
step X7: performing score accumulation calculation on the second type core rule group on the second data set to be analyzed according to the rule score shifting result in the step X6, and obtaining the reporting rate corresponding to the second type core rule group according to the calculation result;
step X8: and selecting the remaining core rule groups, repeatedly executing the step X7 to obtain the reporting rate corresponding to the remaining core rule groups, and selecting the core rule group with the highest reporting rate to optimize the original model.
In some alternative implementations of some embodiments, the score type increase rule optimizing step is:
step Y1: combining the identified case data, the excluded case data, and the rule pre-warning data into a third data set;
step Y2: preprocessing the third data set and discretizing based on a rule dummy variable to obtain a third data set to be analyzed;
Step Y3: selecting the model external rules triggered by the identified case clients together and calculating rule division rates to obtain the model external rule division rates;
step Y4: selecting rules with the out-of-model rule respectively rate larger than a preset threshold value for combination to obtain a plurality of respective rule groups;
step Y5: selecting rules governed by an original model and characteristic data corresponding to respective rule sets from the third data set to be analyzed to obtain a fourth data set;
step Y6: calculating the rule importance and the rule contribution rate of the fourth data set, and obtaining a rule grading standard value based on the rule importance and the rule contribution rate;
step Y7: analyzing rule grading standard values and carrying out rule score grading on rules in the rule groups respectively;
step Y8: and D, performing score accumulation calculation on the respective rule groups on the fourth data set according to the rule score shifting result in the step Y7, obtaining reporting rates corresponding to the respective rule groups according to the calculation result, and selecting the respective rule group with the highest reporting rate to optimize the original model.
In some optional implementations of some embodiments, the score-type reconstruction model optimization step is:
step Z1: combining the identified case data, the excluded case data, and the rule pre-warning data into a fifth data set;
Step Z2: preprocessing the fifth data set and discretizing the fifth data set based on a rule dummy variable to obtain a fifth data set to be analyzed;
step Z3: the first type of core rule group, the third type of core rule group and the fourth type of core rule group are obtained, de-overlapping is carried out, a final core rule group is obtained, and characteristic data corresponding to the final core rule group are obtained from the fifth data set to be analyzed, so that a sixth data set is obtained;
step Z4: calculating rule importance and rule contribution rate of the sixth data set, and obtaining rule standard values based on the rule importance and rule contribution rate;
step Z5: analyzing rule grading standard values and carrying out rule score grading on rules in the final core rule;
step Z6: and D, performing score accumulation calculation on the final core rules on the sixth data set according to the rule score shifting result in the step Z5, obtaining the reporting rate corresponding to the final core rules according to the calculation result, and selecting the final core rule group with the highest reporting rate to optimize the original model.
The invention has the beneficial effects that:
according to the invention, an artificial intelligent algorithm is used as a design concept in the continuous optimization application of the model combination strategy, the model combination strategy is automatically and intelligently optimized under the theoretical guidance of supervised learning, unsupervised learning, rough set and grain calculation, and the like, the related data of the trigger model can be subjected to multi-granularity analysis through an AI technology, different optimization strategies are generated based on different service requirements, and the auxiliary service can better optimize the model more quickly. Through the application of the AI technology, the financial institution can optimize the expert model more quickly and accurately, discover the association relation of rules under the model and the participation degree thereof quickly, assist the business to understand the feature recognition points of the suspicious cases, further better recognize the suspicious cases, further accurately recognize the suspicious transaction risk, and reduce the risk and loss in time.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is an algorithm flow chart of the score-type raw model optimization step.
FIG. 3 is an algorithm flow chart of the score type increment rule optimization step.
FIG. 4 is an algorithm flow chart of the score-type reconstruction model optimization step.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The application provides a score model optimization method based on machine learning analysis rules, which is shown in fig. 1 and comprises the following steps:
s100: acquiring recognized case data, excluded case data and rule early warning data in an optimization period of an original model;
the recognized case information data mainly comprise client information, account information, case date, rule sets and the like of an optimization model required by recognized case triggering in an optimization period;
The case elimination information data mainly includes client information, account information, case date, rule set and the like of an optimization model required by case triggering in an optimization period;
the rule early warning information data is mainly all clients triggering rules in the optimization period and all rule sets in the daily review period.
S200: analyzing a core rule group and rule participation degree in the original model according to the identified case data and the excluded case data and combining a preset algorithm to obtain a machine learning rule analysis result;
in some optional implementations of some embodiments, the analyzing the core rule set and the rule engagement in the original model according to the identified case data and the excluded case data and in combination with a preset algorithm to obtain a machine learning rule analysis result includes:
combining the identified case data and the excluded case data into a first data set;
preprocessing the first data set and discretizing based on a rule dummy variable to obtain a first data set to be analyzed;
respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm;
Carrying out rule participation analysis on the first data set to be analyzed by adopting a random forest classification algorithm to obtain a rule participation analysis result;
and the core rule analysis result and the rule participation analysis result jointly form the machine learning rule analysis result.
The core rule group is the main body of the model combination strategy and is the separate point of different functional models. Therefore, a core rule set of the analytical model is necessary and important. The method for analyzing the core rule group comprises a random forest classification algorithm, an association rule algorithm, a minimum attribute reduction algorithm and a recursive feature elimination algorithm;
in some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm includes:
and carrying out core rule group analysis by adopting a random forest classification algorithm to obtain a first class of core rule group analysis result, wherein the random forest classification algorithm is to obtain a model result by combining a plurality of decision tree weak classifiers and finally utilizing a voting method, and simultaneously evaluate each feature to obtain feature importance. The data set XN is learned by adopting a random forest classification algorithm, wherein algorithm parameters are selected from the following: decision tree number (n_detectors) =100, maximum decision tree number (max_features) = "sqrt", rating criterion (criterion) = "gini", class weight (class_weight) = "class";
The method comprises the following steps:
step A1: let the first data set to be analyzed be XN, A be the number of samples in the first data set to be analyzed XN, the number of input samples of a single decision tree is: a, randomly extracting A training samples from a first data set XN to be analyzed, wherein the A training samples are put back from the first data set XN to be analyzed;
step A2: let M be the feature total number of the first data set to be analyzed, m= [ v M ], when we split on each node of each decision tree, randomly select M input features from M input features as attribute sets, each attribute is represented by a, calculate the base index of each attribute through the base index formula, select the attribute with the smallest base index for splitting, and the base index calculation formula is:
wherein,d represents the first data set XN to be analyzed, a represents the attribute in the first data set XN to be analyzed, V represents the value of the attribute a, V represents the value number, and D v Representing that the v-th branch node contains all values a of D on attribute a v Data of p k Representing the proportion of the kth sample in D, wherein y represents the total number of the categories;
step A3: repeating the step A1 and the step A2, enabling each decision tree to be split continuously, stopping splitting until the splitting stopping condition is met, generating a corresponding number of first decision trees, and taking the category with the largest single tree classification result from the first decision trees as a first random forest classification result through a voting method;
Further, the stop splitting condition is:
(1) The current node contains data that is all of one category,
(2) The current set of attributes is empty, or all data has the same value on all attributes,
(3) The data set of the current node is empty and cannot be divided;
step A4: and respectively calculating the expected contribution rate of each decision tree in the first random forest classification result to obtain a first expected contribution rate, carrying out average normalization on the first expected contribution rate to obtain feature importance, arranging the feature importance in a descending order, and sequentially taking features with the accumulated feature importance of more than 99% from front to back according to the arrangement result as a first type core rule group analysis result.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
and carrying out core rule group analysis by adopting an association rule algorithm to obtain a second type of core rule group analysis result, wherein the method comprises the following steps of:
step B1: traversing all the features in the first data set to be analyzed to obtain a set of feature pairwise combinations in the first data set to be analyzed, and recording the set as a frequent A item set LA;
Step B2: traversing all the features in the first data set to be analyzed, searching features combined with the set LA in the first data set to be analyzed, combining the features into a new item set, and recording the new item set as a frequent B item set LB;
step B3: repeatedly executing the step B2 until the frequent k item sets cannot be found, and calculating the support, confidence and lifting degree of each frequent item set to obtain a frequent item set calculation result;
step B4: and selecting the frequent item set which is simultaneously larger than the minimum support degree, the minimum confidence degree and the minimum lifting degree from the frequent item set calculation result as a second type core rule group.
The association rule algorithm is used for measuring association relations between items by calculating indexes such as the probability of simultaneous existence between the items in the data set, the probability of existence of the rear item when the front item set exists, and the like, wherein the items are the characteristics in the data set;
support represents the frequency of occurrence of a certain item set, i.e. the ratio of the number of transactions comprising the item set to the total number of transactions, e.g. P (a) represents the ratio of item set a, P (a n B) represents the ratio of simultaneous occurrence of item set a and item set B;
confidence (Confidence): the frequency of simultaneous occurrence of items B when items A occur is denoted as { A→B }. That is, the ratio of the transaction number containing both items A and B to the transaction number containing item A, where P (B A) =P (A n B)/P (A);
Lift (degree of Lift): refers to the frequency of occurrence of the A term and the B term divided by the frequency of occurrence of each of the two terms, and the formula is P (B|A)/P (B) =P (A n B)/(P (A) ×P (B)). The data set XN is learned by adopting an association rule algorithm, the items described above are identical to the characteristics in the data set XN, and algorithm parameters are selected from the following: minimum support (min_support) =0.001, minimum confidence (min_conf) =0.5, minimum lift) =1.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
and adopting a minimum attribute reduction algorithm to analyze the core rule group to obtain a third type of core rule group analysis result, wherein the method comprises the following steps of:
step C1: constructing a decision table according to the first data set to be analyzed;
step C2: binary coding is carried out on the initial group, a binary distinguishing matrix is constructed, the core of the decision table is calculated from the binary distinguishing matrix, and the core and necessary attributes are added into the initial group;
step C3: let n be the number of samples of the first data set to be analyzed, h be the number of features of the first data set to be analyzed, one piece of data represents one individual, x represents the individual, the fitness value of each individual in the current generation is calculated through a fitness function formula, and the fitness function formula is expressed as follows:
Wherein h is the number of conditional attributes, y (x) is the number of all the conditional attributes equal to 1 in the individual x, the number of the conditional attributes in the individual x is represented, gamma (x, D) is the dependence of the conditional attributes contained in the individual x on the decision attribute D, and alpha is a proportionality coefficient;
step C4: calculating the probability of the selected individual, wherein the calculation formula is as follows:
selecting individuals to produce offspring using a roulette selection algorithm;
step C5: using cross probability p c Single-point crossover is performed while utilizing variation probability p m Selecting individuals x for variation iteration until all the individuals x are varied from 1 to 0 to obtain a final population; the maximum iteration times are preset, and when the iteration times are smaller than the maximum iteration times, the step C3 is returned to continue iteration;
step C6: and obtaining an attribute reduction result according to the individual with the largest fitness in the final population, wherein the attribute reduction result is a third type core rule group.
In some optional implementations of some embodiments, the performing, by using a preset algorithm and a recursive feature elimination algorithm, core rule set analysis on the first data set to be analyzed to obtain a core rule set analysis result corresponding to the preset algorithm, further includes:
and adopting a recursive feature elimination algorithm to analyze the core rule group to obtain a fourth type of analysis result of the core rule group, wherein the method comprises the following steps:
Step D1: selecting a random forest classification algorithm as a basic model and setting a preset number of target features as feature sets;
step D2: training the first data set to be analyzed through the basic model, and obtaining a feature importance score after average normalization of the expected contribution rate of all decision trees through calculation of the expected contribution rate of each decision tree;
step D3: sorting the target features according to the feature importance scores, and selecting the features with the lowest scores as features to be removed;
step D4: removing the features to be removed from the feature set, if the number of the remaining features reaches a threshold value/all the features are removed, obtaining a feature subset after removal and jumping to a step D5; otherwise, jumping to the step D2, and carrying out feature importance assessment and feature elimination on the new feature subset again;
step D5: and taking the characteristic subset after the elimination as a final characteristic subset, wherein the final characteristic subset is a fourth type of core rule set.
In some optional implementations of some embodiments, the performing, by using a random forest classification algorithm, rule engagement analysis on the first data set to be analyzed to obtain a rule engagement analysis result includes:
Step E1.1: let the first data set to be analyzed be XN, E be the number of samples in the first data set to be analyzed XE, the number of input samples of a single decision tree is: e randomly extracting E training samples from the first data set XE to be analyzed;
step E1.2: let G be the feature total number of the first data set to be analyzed, g= [ v G ], when we split on each node of each decision tree, randomly selecting G input features from G input features as attribute sets, calculating the base index of each attribute in the attribute sets by a base index formula, and selecting the attribute with the smallest base index for splitting;
step E1.3: repeating the step E1.1 and the step E1.2, enabling each decision tree to be split continuously until the splitting stopping condition is met, stopping splitting, generating a corresponding number of second decision trees, and taking the category with the largest single tree classification result from the second decision trees as a second random forest classification result through a voting method;
step E1.4: respectively calculating the expected contribution rate of each decision tree in the second random forest classification result to obtain a second expected contribution rate, and carrying out average normalization on the second expected contribution rate to obtain rule importance;
Step E2: obtaining a rule contribution rate according to the recognized case data, wherein the rule contribution rate represents the ratio of the rule trigger quantity in the recognized case to the recognized case number, and the formula is as follows:
rule_contribute=r1/ra
wherein rule_control represents rule contribution rate, r1 represents rule trigger quantity in the identified cases, and ra represents the number of the identified cases;
step E3: the rule similarity represents the similarity and the inclusion condition of triggered clients among rules in the model, the rule similarity is calculated by adopting a pearson similarity function, and the calculation formula is as follows:
wherein rule_similarity represents rule similarity, rc1 represents a discretization array of the client triggered by rule 1, and rc2 represents a discretization array of the client triggered by rule 2;
step E4: obtaining rule separation rate according to the recognized case data and the excluded case data, wherein the rule separation rate represents the ratio between the triggering quantity of the rule in the recognized case and the triggering quantity in the excluded case, and the formula is as follows:
rule_difference=r1/r2
wherein rule_difference represents rule separation rate, r1 represents rule trigger quantity in the determined case, and r2 represents rule trigger quantity in the excluded case;
step E5: and the rule importance, the rule contribution rate, the rule similarity and the rule separation rate jointly form the rule participation degree analysis result.
S300: executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result;
in some optional implementations of some embodiments, the executing, according to the machine learning rule analysis result and the rule early warning data, a score model optimization policy on the original model to obtain a score model optimization result, where the score model optimization policy includes a score model original model optimization step, a score model addition rule optimization step, and a score model reconstruction optimization step, and the score model original model optimization step is:
step X1: merging the identified case data and the excluded case data into a second data set;
step X2: preprocessing the second data set and discretizing based on a rule dummy variable to obtain a second data set to be analyzed;
step X3: acquiring a second type core rule group and rule importance in the machine learning rule analysis result;
step X4: calculating the rule contribution rate;
step X5: and weighting the rule importance and the rule contribution rate to obtain a rule classification score, wherein the formula is as follows:
rule_grading_score=0.5*rule_importance+0.5*rule_contribute
Wherein rule_grading_score represents rule grading scores, rule_importance represents rule importance;
the method comprises the steps of standardizing a rule grading score to obtain a rule grading standard value, wherein the rule grading standard value is a first grade when the rule grading standard value is more than or equal to 0.8, a second grade when the rule grading standard value is more than or equal to 0.3 and less than 0.8, and a third grade when the rule grading standard value is less than 0.3;
step X6: analyzing rule grading standard values and carrying out rule score grading on rules in the second class core rule group:
when the rule contribution rate of the second class core rule group is 1, the corresponding rule in the second class core rule group is adjusted to be a first-grade rule;
when the rule grading standard value of the rule in the second class core rule group is the third grade, the corresponding rule in the second class core rule group is adjusted to be the second grade rule;
when the rule grading standard value of the rule in the second class core rule group is the first grade/the second grade and is not the core rule, the corresponding rule in the second class core rule group is adjusted to be the third grade rule;
corresponding scores are given to the first gear rule, the second gear rule and the third gear rule;
step X7: performing score accumulation calculation on the second type core rule group on the second data set to be analyzed according to the rule score shifting result in the step X6, and obtaining the reporting rate corresponding to the second type core rule group according to the calculation result;
The core rule set is subjected to trial calculation on the second data set to be analyzed, the scores of the trigger rules of the clients are accumulated, if the scores are larger than a threshold value, the clients generate cases, the clients are considered to be suspicious clients, the number of the suspicious clients which are considered to be the count_r is recorded, and the total number of the clients in the second data set to be analyzed is recorded as the count_all; the reporting rate is the ratio of the number of the determined suspicious clients to the total number of clients, and the formula is as follows:
report_per=count_r/count_all
step X8: and selecting the remaining core rule groups, repeatedly executing the step X7 to obtain the reporting rate corresponding to the remaining core rule groups, and selecting the core rule group with the highest reporting rate to optimize the original model.
Further, the optimization of the score model first needs to determine whether there is a rule adjustment, different types of rule adjustment adopt different optimization strategies, and the algorithm flow chart of the optimization step of the score model original model is shown in fig. 2.
In some alternative implementations of some embodiments, the score type increase rule optimizing step is:
step Y1: combining the identified case data, the excluded case data, and the rule pre-warning data into a third data set;
step Y2: preprocessing the third data set and discretizing based on a rule dummy variable to obtain a third data set to be analyzed;
Step Y3: selecting the model external rules triggered by the identified case clients together and calculating rule division rates to obtain the model external rule division rates;
step Y4: selecting rules with the out-of-model rule respectively rate larger than a preset threshold value for combination to obtain a plurality of respective rule groups;
step Y5: selecting rules governed by an original model and characteristic data corresponding to respective rule sets from the third data set to be analyzed to obtain a fourth data set; and according to the score original model optimizing step, the fourth data set is subjected to report rate calculation to obtain report rates corresponding to the respective rule groups, and the respective rule group with the highest report rate is selected to optimize the original model.
The specific steps for realizing model optimization by using the original model optimization step comprise the following steps:
step Y6: calculating the rule importance and the rule contribution rate of the fourth data set, and obtaining a rule grading standard value based on the rule importance and the rule contribution rate;
step Y7: analyzing rule grading standard values and carrying out rule score grading on rules in the rule groups respectively;
step Y8: and D, performing score accumulation calculation on the respective rule groups on the fourth data set according to the rule score shifting result in the step Y7, obtaining reporting rates corresponding to the respective rule groups according to the calculation result, and selecting the respective rule group with the highest reporting rate to optimize the original model.
The score type adding rule is that only the rule is added on the basis of the original model rule, the model identification points are expanded, and the algorithm flow chart of the score type adding rule optimizing step is shown in fig. 3.
In some optional implementations of some embodiments, the score-type reconstruction model optimization step is:
step Z1: combining the identified case data, the excluded case data, and the rule pre-warning data into a fifth data set;
step Z2: preprocessing the fifth data set and discretizing the fifth data set based on a rule dummy variable to obtain a fifth data set to be analyzed;
step Z3: the first type of core rule group, the third type of core rule group and the fourth type of core rule group are obtained, de-overlapping is carried out, a final core rule group is obtained, and characteristic data corresponding to the final core rule group are obtained from the fifth data set to be analyzed, so that a sixth data set is obtained; and according to the score original model optimizing step, the sixth data set is subjected to reporting rate calculation, the reporting rate corresponding to the final core rule set is obtained, and the final core rule set with the highest reporting rate is selected to optimize the original model.
The specific steps for realizing model optimization by using the original model optimization step comprise the following steps:
Step Z4: calculating rule importance and rule contribution rate of the sixth data set, and obtaining rule standard values based on the rule importance and rule contribution rate;
step Z5: analyzing rule grading standard values and carrying out rule score grading on rules in the final core rule;
step Z6: and D, performing score accumulation calculation on the final core rules on the sixth data set according to the rule score shifting result in the step Z5, obtaining the reporting rate corresponding to the final core rules according to the calculation result, and selecting the final core rule group with the highest reporting rate to optimize the original model.
The score type reconstruction model refers to that rules under the original model are completely abandoned, the rules are reselected from a rule early-warning pool to be combined, and an algorithm flow chart of the score type reconstruction model optimization step is shown in fig. 4.
S400: and packaging and deploying the model obtained from the score model optimization result to finish the optimization of the original model.
According to the scheme, the core rule group of the model is analyzed by adopting methods such as random forest classification, association rule and the like, so that service personnel can be effectively helped to lock core feature identification points of the model; the rule participation degree is evaluated by adopting indexes such as rule importance degree, rule contribution rate and the like, so that service personnel can be effectively helped to analyze the participation condition of the rule; the service personnel are helped to understand the service condition of the current model in a deeper way from the data angle, and data support is provided for model optimization; aiming at different business scenes, learning trial calculation is carried out through various optimization strategies, subjectivity and trial calculation limitation of business specialists can be overcome, hidden information is more fully mined, a better combination strategy is obtained, and business staff is assisted to realize automatic optimization of a model; on the accuracy of the model, through multiple strategy adjustment optimization calculation and model verification, the reporting rate is effectively improved and the case forming rate is reduced on the basis of ensuring the recognized case capture.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements made by those skilled in the art without departing from the present technical solution shall be considered as falling within the scope of the claims.

Claims (10)

1. A score model optimization method based on machine learning analysis rules is characterized in that: the method comprises the following steps:
acquiring recognized case data, excluded case data and rule early warning data in an optimization period of an original model;
analyzing a core rule group and rule participation degree in the original model according to the identified case data and the excluded case data and combining a preset algorithm to obtain a machine learning rule analysis result;
executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result;
and packaging and deploying the model obtained from the score model optimization result to finish the optimization of the original model.
2. The method according to claim 1, characterized in that: analyzing the core rule group and the rule participation degree in the original model according to the identified case data and the excluded case data and combining a preset algorithm to obtain a machine learning rule analysis result, wherein the machine learning rule analysis result comprises the following steps:
Combining the identified case data and the excluded case data into a first data set;
preprocessing the first data set and discretizing based on a rule dummy variable to obtain a first data set to be analyzed;
respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm;
carrying out rule participation analysis on the first data set to be analyzed by adopting a random forest classification algorithm to obtain a rule participation analysis result;
and the core rule analysis result and the rule participation analysis result jointly form the machine learning rule analysis result.
3. The method according to claim 2, characterized in that: the step of performing core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm, includes:
the method comprises the steps of adopting a random forest classification algorithm to analyze a core rule group to obtain a first type of core rule group analysis result, wherein the steps are as follows:
step A1: let the first data set to be analyzed be XN, A be the number of samples in the first data set to be analyzed XN, the number of input samples of a single decision tree is: a, randomly extracting A training samples from a first data set XN to be analyzed, wherein the A training samples are put back from the first data set XN to be analyzed;
Step A2: let M be the total number of features of the first dataset to be analyzed,when splitting is carried out on each node of each decision tree, randomly selecting M input features from M input features as attribute sets, wherein each attribute is represented by a, calculating the base index of each attribute through a base index formula, and splitting the attribute with the smallest base index, wherein the base index calculation formula is as follows:
wherein,d represents the first data set to be analyzed XN, a represents the attribute in the first data set to be analyzed XN, v represents the value of attribute a,v represents the number of values, D v Representing that the v-th branch node contains all values a of D on attribute a v Data of p k Representing the proportion of the kth sample in D, wherein y represents the total number of the categories;
step A3: repeating the step A1 and the step A2, enabling each decision tree to be split continuously, stopping splitting until the splitting stopping condition is met, generating a corresponding number of first decision trees, and taking the category with the largest single tree classification result from the first decision trees as a first random forest classification result through a voting method;
step A4: and respectively calculating the expected contribution rate of each decision tree in the first random forest classification result to obtain a first expected contribution rate, carrying out average normalization on the first expected contribution rate to obtain feature importance, arranging the feature importance in a descending order, and sequentially taking features with the accumulated feature importance of more than 99% from front to back according to the arrangement result as a first type core rule group analysis result.
4. A method according to claim 3, characterized in that: the method comprises the steps of respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm, and further comprising:
and carrying out core rule group analysis by adopting an association rule algorithm to obtain a second type of core rule group analysis result, wherein the method comprises the following steps of:
step B1: traversing all the features in the first data set to be analyzed to obtain a set of feature pairwise combinations in the first data set to be analyzed, and recording the set as a frequent A item set LA;
step B2: traversing all the features in the first data set to be analyzed, searching features combined with the set LA in the first data set to be analyzed, combining the features into a new item set, and recording the new item set as a frequent B item set LB;
step B3: repeatedly executing the step B2 until the frequent k item sets cannot be found, and calculating the support, confidence and lifting degree of each frequent item set to obtain a frequent item set calculation result;
step B4: and selecting the frequent item set which is simultaneously larger than the minimum support degree, the minimum confidence degree and the minimum lifting degree from the frequent item set calculation result as a second type core rule group.
5. The method according to claim 4, wherein: the method comprises the steps of respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm, and further comprising:
and adopting a minimum attribute reduction algorithm to analyze the core rule group to obtain a third type of core rule group analysis result, wherein the method comprises the following steps of:
step C1: constructing a decision table according to the first data set to be analyzed;
step C2: binary coding is carried out on the initial group, a binary distinguishing matrix is constructed, the core of the decision table is calculated from the binary distinguishing matrix, and the core and necessary attributes are added into the initial group;
step C3: let n be the number of samples of the first data set to be analyzed, h be the number of features of the first data set to be analyzed, one piece of data represents one individual, x represents the individual, the fitness value of each individual in the current generation is calculated through a fitness function formula, and the fitness function formula is expressed as follows:
wherein h is the number of conditional attributes, y (x) is the number of all the conditional attributes equal to 1 in the individual x, the number of the conditional attributes in the individual x is represented, gamma (x, D) is the dependence of the conditional attributes contained in the individual x on the decision attribute D, and alpha is a proportionality coefficient;
Step C4: calculating the probability of the selected individual, wherein the calculation formula is as follows:
selecting individuals to produce offspring using a roulette selection algorithm;
step C5: using cross probability p c Single-point crossover is performed while utilizing variation probability p m Selecting individuals x for variation iteration until all the individuals x are varied from 1 to 0 to obtain a final population; the maximum iteration times are preset, and when the iteration times are smaller than the maximum iteration times, the step C3 is returned to continue iteration;
step C6: and obtaining an attribute reduction result according to the individual with the largest fitness in the final population, wherein the attribute reduction result is a third type core rule group.
6. The method according to claim 5, wherein: the method comprises the steps of respectively carrying out core rule set analysis on the first data set to be analyzed by adopting a preset algorithm and a recursive feature elimination algorithm to obtain a core rule set analysis result of a type corresponding to the preset algorithm, and further comprising:
and adopting a recursive feature elimination algorithm to analyze the core rule group to obtain a fourth type of analysis result of the core rule group, wherein the method comprises the following steps:
step D1: selecting a random forest classification algorithm as a basic model and setting a preset number of target features as feature sets;
Step D2: training the first data set to be analyzed through the basic model, and obtaining a feature importance score after average normalization of the expected contribution rate of all decision trees through calculation of the expected contribution rate of each decision tree;
step D3: sorting the target features according to the feature importance scores, and selecting the features with the lowest scores as features to be removed;
step D4: removing the features to be removed from the feature set, if the number of the remaining features reaches a threshold value/all the features are removed, obtaining a feature subset after removal and jumping to a step D5; otherwise, jumping to the step D2, and carrying out feature importance assessment and feature elimination on the new feature subset again;
step D5: and taking the characteristic subset after the elimination as a final characteristic subset, wherein the final characteristic subset is a fourth type of core rule set.
7. The method according to claim 6, wherein: the method for analyzing the rule participation degree of the first data set to be analyzed by adopting a random forest classification algorithm to obtain a rule participation degree analysis result comprises the following steps:
step E1.1: let the first data set to be analyzed be XN, E be the number of samples in the first data set to be analyzed XE, the number of input samples of a single decision tree is: e randomly extracting E training samples from the first data set XE to be analyzed;
Step E1.2: let G be the total number of features of the first dataset to be analyzed,when splitting is carried out on each node of each decision tree, G input features are randomly selected from G input features to serve as an attribute set, the base index of each attribute in the attribute set is calculated through a base index formula, and the attribute with the minimum base index is selected for splitting;
step E1.3: repeating the step E1.1 and the step E1.2, enabling each decision tree to be split continuously until the splitting stopping condition is met, stopping splitting, generating a corresponding number of second decision trees, and taking the category with the largest single tree classification result from the second decision trees as a second random forest classification result through a voting method;
step E1.4: respectively calculating the expected contribution rate of each decision tree in the second random forest classification result to obtain a second expected contribution rate, and carrying out average normalization on the second expected contribution rate to obtain rule importance;
step E2: obtaining a rule contribution rate according to the recognized case data, wherein the rule contribution rate represents the ratio of the rule trigger quantity in the recognized case to the recognized case number, and the formula is as follows:
rule_contribute=r1/ra
wherein rule_control represents rule contribution rate, r1 represents rule trigger quantity in the identified cases, and ra represents the number of the identified cases;
Step E3: the rule similarity represents the similarity and the inclusion condition of triggered clients among rules in the model, the rule similarity is calculated by adopting a pearson similarity function, and the calculation formula is as follows:
wherein rule_similarity represents rule similarity, rc1 represents a discretization array of the client triggered by rule 1, and rc2 represents a discretization array of the client triggered by rule 2;
step E4: obtaining rule separation rate according to the recognized case data and the excluded case data, wherein the rule separation rate represents the ratio between the triggering quantity of the rule in the recognized case and the triggering quantity in the excluded case, and the formula is as follows:
rule_difference=r1/r2
wherein rule_difference represents rule separation rate, r1 represents rule trigger quantity in the determined case, and r2 represents rule trigger quantity in the excluded case;
step E5: and the rule importance, the rule contribution rate, the rule similarity and the rule separation rate jointly form the rule participation degree analysis result.
8. The method according to claim 7, wherein: executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result, wherein the method comprises the following steps of:
Step X1: merging the identified case data and the excluded case data into a second data set;
step X2: preprocessing the second data set and discretizing based on a rule dummy variable to obtain a second data set to be analyzed;
step X3: acquiring a second type core rule group and rule importance in the machine learning rule analysis result;
step X4: calculating the rule contribution rate;
step X5: and weighting the rule importance and the rule contribution rate to obtain a rule classification score, wherein the formula is as follows:
rule_grading_score=0.5*rule_importance+0.5*rule_contribute
wherein rule_grading_score represents rule grading scores, rule_importance represents rule importance;
the method comprises the steps of standardizing a rule grading score to obtain a rule grading standard value, wherein the rule grading standard value is a first grade when the rule grading standard value is more than or equal to 0.8, a second grade when the rule grading standard value is more than or equal to 0.3 and less than 0.8, and a third grade when the rule grading standard value is less than 0.3;
step X6: analyzing rule grading standard values and carrying out rule score grading on rules in the second class core rule group:
when the rule contribution rate of the second class core rule group is 1, the corresponding rule in the second class core rule group is adjusted to be a first-grade rule;
When the rule grading standard value of the rule in the second class core rule group is the third grade, the corresponding rule in the second class core rule group is adjusted to be the second grade rule;
when the rule grading standard value of the rule in the second class core rule group is the first grade/the second grade and is not the core rule, the corresponding rule in the second class core rule group is adjusted to be the third grade rule;
corresponding scores are given to the first gear rule, the second gear rule and the third gear rule;
step X7: performing score accumulation calculation on the second type core rule group on the second data set to be analyzed according to the rule score shifting result in the step X6, and obtaining the reporting rate corresponding to the second type core rule group according to the calculation result;
step X8: and selecting the remaining core rule groups, repeatedly executing the step X7 to obtain the reporting rate corresponding to the remaining core rule groups, and selecting the core rule group with the highest reporting rate to optimize the original model.
9. The method according to claim 7, wherein: executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result, wherein the method comprises the following steps of:
Step Y1: combining the identified case data, the excluded case data, and the rule pre-warning data into a third data set;
step Y2: preprocessing the third data set and discretizing based on a rule dummy variable to obtain a third data set to be analyzed;
step Y3: selecting the model external rules triggered by the identified case clients together and calculating rule division rates to obtain the model external rule division rates;
step Y4: selecting rules with the out-of-model rule respectively rate larger than a preset threshold value for combination to obtain a plurality of respective rule groups;
step Y5: selecting rules governed by an original model and characteristic data corresponding to respective rule sets from the third data set to be analyzed to obtain a fourth data set;
step Y6: calculating the rule importance and the rule contribution rate of the fourth data set, and obtaining a rule grading standard value based on the rule importance and the rule contribution rate;
step Y7: analyzing rule grading standard values and carrying out rule score grading on rules in the rule groups respectively;
step Y8: and D, performing score accumulation calculation on the respective rule groups on the fourth data set according to the rule score shifting result in the step Y7, obtaining reporting rates corresponding to the respective rule groups according to the calculation result, and selecting the respective rule group with the highest reporting rate to optimize the original model.
10. The method according to claim 9, wherein: executing a score type model optimization strategy on the original model according to the machine learning rule analysis result and the rule early warning data to obtain a score type model optimization result, wherein the method comprises the following steps of:
step Z1: combining the identified case data, the excluded case data, and the rule pre-warning data into a fifth data set;
step Z2: preprocessing the fifth data set and discretizing the fifth data set based on a rule dummy variable to obtain a fifth data set to be analyzed;
step Z3: the first type of core rule group, the third type of core rule group and the fourth type of core rule group are obtained, de-overlapping is carried out, a final core rule group is obtained, and characteristic data corresponding to the final core rule group are obtained from the fifth data set to be analyzed, so that a sixth data set is obtained;
step Z4: calculating rule importance and rule contribution rate of the sixth data set, and obtaining rule standard values based on the rule importance and rule contribution rate;
step Z5: analyzing rule grading standard values and carrying out rule score grading on rules in the final core rule;
step Z6: and D, performing score accumulation calculation on the final core rules on the sixth data set according to the rule score shifting result in the step Z5, obtaining the reporting rate corresponding to the final core rules according to the calculation result, and selecting the final core rule group with the highest reporting rate to optimize the original model.
CN202311822048.6A 2023-12-27 2023-12-27 Score model optimization method based on machine learning analysis rules Pending CN117725527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311822048.6A CN117725527A (en) 2023-12-27 2023-12-27 Score model optimization method based on machine learning analysis rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311822048.6A CN117725527A (en) 2023-12-27 2023-12-27 Score model optimization method based on machine learning analysis rules

Publications (1)

Publication Number Publication Date
CN117725527A true CN117725527A (en) 2024-03-19

Family

ID=90203431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311822048.6A Pending CN117725527A (en) 2023-12-27 2023-12-27 Score model optimization method based on machine learning analysis rules

Country Status (1)

Country Link
CN (1) CN117725527A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247970A (en) * 2017-06-23 2017-10-13 国家质量监督检验检疫总局信息中心 A kind of method for digging and device of commodity qualification rate correlation rule
CN109034201A (en) * 2018-06-26 2018-12-18 阿里巴巴集团控股有限公司 Model training and rule digging method and system
CN111950738A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Machine learning model optimization effect evaluation method and device, terminal and storage medium
CN113469578A (en) * 2021-07-28 2021-10-01 支付宝(杭州)信息技术有限公司 Multi-objective optimization-based business strategy generation method, device and system
CN114493858A (en) * 2021-12-06 2022-05-13 度小满科技(北京)有限公司 Illegal fund transfer suspicious transaction monitoring method and related components
CN114490827A (en) * 2022-01-30 2022-05-13 中图科信数智技术(北京)有限公司 Method and device for analyzing and predicting user behavior based on data mining
WO2022133210A2 (en) * 2020-12-18 2022-06-23 Strong Force TX Portfolio 2018, LLC Market orchestration system for facilitating electronic marketplace transactions
CN115409519A (en) * 2022-10-31 2022-11-29 北京领雁科技股份有限公司 Risk prediction model optimization method and device, electronic equipment and medium
CN115577646A (en) * 2022-12-08 2023-01-06 北京领雁科技股份有限公司 Data modeling method, device, equipment and medium based on multi-source heterogeneous data
CN115795361A (en) * 2022-11-16 2023-03-14 北京工业大学 Classification method based on feature selection and model combination optimization

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247970A (en) * 2017-06-23 2017-10-13 国家质量监督检验检疫总局信息中心 A kind of method for digging and device of commodity qualification rate correlation rule
CN109034201A (en) * 2018-06-26 2018-12-18 阿里巴巴集团控股有限公司 Model training and rule digging method and system
CN111950738A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Machine learning model optimization effect evaluation method and device, terminal and storage medium
WO2022133210A2 (en) * 2020-12-18 2022-06-23 Strong Force TX Portfolio 2018, LLC Market orchestration system for facilitating electronic marketplace transactions
CN113469578A (en) * 2021-07-28 2021-10-01 支付宝(杭州)信息技术有限公司 Multi-objective optimization-based business strategy generation method, device and system
CN114493858A (en) * 2021-12-06 2022-05-13 度小满科技(北京)有限公司 Illegal fund transfer suspicious transaction monitoring method and related components
CN114490827A (en) * 2022-01-30 2022-05-13 中图科信数智技术(北京)有限公司 Method and device for analyzing and predicting user behavior based on data mining
CN115409519A (en) * 2022-10-31 2022-11-29 北京领雁科技股份有限公司 Risk prediction model optimization method and device, electronic equipment and medium
CN115795361A (en) * 2022-11-16 2023-03-14 北京工业大学 Classification method based on feature selection and model combination optimization
CN115577646A (en) * 2022-12-08 2023-01-06 北京领雁科技股份有限公司 Data modeling method, device, equipment and medium based on multi-source heterogeneous data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔皓: "基于电信用户数据的信用风险评价模型的研究与应用", 中国优秀硕士学位论文全文数据库 (经济与管理科学辑), 15 June 2021 (2021-06-15) *

Similar Documents

Publication Publication Date Title
CN109785976B (en) Gout disease stage prediction system based on Soft-Voting
CN108898479B (en) Credit evaluation model construction method and device
CN109034264B (en) CSP-CNN model for predicting severity of traffic accident and modeling method thereof
CN111754345B (en) Bit currency address classification method based on improved random forest
Schäfer et al. Detection of gravitational-wave signals from binary neutron star mergers using machine learning
CN106874963B (en) A kind of Fault Diagnosis Method for Distribution Networks and system based on big data technology
CN108564117B (en) SVM-based poverty and life assisting identification method
CN111898689A (en) Image classification method based on neural network architecture search
CN110837523A (en) High-confidence reconstruction quality and false-transient-reduction quantitative evaluation method based on cascade neural network
CN115688024B (en) Network abnormal user prediction method based on user content characteristics and behavior characteristics
CN113344438A (en) Loan system, loan monitoring method, loan monitoring apparatus, and loan medium for monitoring loan behavior
CN112084877A (en) NSGA-NET-based remote sensing image identification method
CN111708865B (en) Technology forecasting and patent early warning analysis method based on improved XGboost algorithm
CN117350364A (en) Knowledge distillation-based code pre-training model countermeasure sample generation method and system
CN117725527A (en) Score model optimization method based on machine learning analysis rules
CN112819499A (en) Information transmission method, information transmission device, server and storage medium
Ke et al. Loan repayment behavior prediction of provident fund users using a stacking-based model
CN117787976A (en) Expression type model optimization method based on machine learning analysis rules
CN115393098A (en) Financing product information recommendation method and device
CN114372810A (en) Funding account identification and funding transaction relation network analysis method for funding person
CN110297977B (en) Personalized recommendation single-target evolution method for crowd funding platform
CN114021612A (en) Novel personal credit assessment method and system
CN112926640A (en) Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN112348257A (en) Election prediction method driven by multi-source data fusion and time sequence analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination