CN109815985A - Evaluation method, device, storage medium and the electronic equipment of variable discretization - Google Patents

Evaluation method, device, storage medium and the electronic equipment of variable discretization Download PDF

Info

Publication number
CN109815985A
CN109815985A CN201811574683.6A CN201811574683A CN109815985A CN 109815985 A CN109815985 A CN 109815985A CN 201811574683 A CN201811574683 A CN 201811574683A CN 109815985 A CN109815985 A CN 109815985A
Authority
CN
China
Prior art keywords
variable
target
discretization
information content
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811574683.6A
Other languages
Chinese (zh)
Inventor
于福超
王菊
宋仕君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811574683.6A priority Critical patent/CN109815985A/en
Publication of CN109815985A publication Critical patent/CN109815985A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This disclosure relates to a kind of evaluation method, device, storage medium and the electronic equipment of variable discretization.The described method includes: carrying out discretization to target variable, the corresponding discretization variable of target variable is obtained, target variable includes at least three target columns, and it is corresponding with reference to classification that target is classified as target variable;For each discretization variable, the corresponding evidence weight of each discrete range of discretization variable is determined according to the corresponding evidence weight of each target column in each discrete range;Information content corresponding with the discrete range is determined according to the corresponding evidence weight of each discrete range and reference columns, and the corresponding information content of discretization variable is determined according to each information content, with reference to a target column for being classified as target variable;It is evaluated according to the corresponding information content pair of discretization variable sliding-model control corresponding with discretization variable.Thus, it is possible to which the evaluation method of the discretization variable is made to can be adapted for more classified variables, the scope of application of this method is effectively widened.

Description

Evaluation method, device, storage medium and the electronic equipment of variable discretization
Technical field
This disclosure relates to data processing field, and in particular, to a kind of evaluation method of variable discretization, device, storage Medium and electronic equipment.
Background technique
For continuous variable, it usually needs the continuous variable is carried out discretization, thus based on after discretization Data to the variable carry out corresponding operation.In the prior art, it generallys use the methods of equidistant, equiprobability and carries out discrete, example Such as, equidistant discrete, be that continuous variable is divided into the sections of several equal lengths to realize Data Discretization, equiprobability from It is scattered then be guarantee continuous variable it is discrete after each section in data volume it is identical.
For continuous variable, the accuracy of discretization data has very great shadow to variable subsequent operation It rings.Therefore, it is necessary to the sliding-model controls to continuous variable to evaluate, and whether can guarantee this in a manner of the determining discretization The accuracy of variable data.In the prior art, it generallys use WOE (Weight of Evidence, evidence weight) and IV The method that (Information Value, information content) combines evaluates continuous variable's discretization, to determine the discretization Whether processing is feasible, however aforesaid way is the sliding-model control evaluation for being directed to the continuous variable of corresponding two classification, and It is not applied for the sliding-model control evaluation of more polytypic continuous variable.
Summary of the invention
To solve the above-mentioned problems, purpose of this disclosure is to provide a kind of changes that discretization is carried out to more polytypic variable Measure evaluation method, device, storage medium and the electronic equipment of discretization.
To achieve the goals above, according to the disclosure in a first aspect, providing a kind of evaluation method of variable discretization, institute The method of stating includes:
Discretization is carried out to target variable, obtains the corresponding discretization variable of the target variable, wherein the target becomes Amount includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
For each discretization variable, institute is determined according to the corresponding evidence weight of each target column within the scope of various discrete State the corresponding evidence weight of various discrete range of discretization variable;
Information content corresponding with the discrete range is determined according to the corresponding evidence weight of each discrete range and reference columns, and The corresponding information content of the discretization variable is determined according to each information content, wherein the reference is classified as the target and becomes One target column of amount;
According to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable It is evaluated.
Optionally, it by following formula, is determined according to the corresponding evidence weight of each target column within the scope of various discrete The corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding Sample number.
Optionally, by following formula, according to the corresponding evidence weight determination of each discrete range and the discrete range pair The information content answered:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding Sample number.
Optionally, described corresponding with the discretization variable according to the corresponding information content pair of the discretization variable Sliding-model control is evaluated, comprising:
When the corresponding information content of the discretization variable is less than preset first information amount threshold value, it is determining with it is described discrete It is feasible to change the corresponding sliding-model control of variable.
Optionally, the method also includes:
Determine the first related coefficient between the target variable and the reference columns;
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns;
It is described according to the corresponding information content pair of discretization variable discretization corresponding with the discretization variable Processing is evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient The discretization variable is evaluated.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described Second related coefficient evaluates the discretization variable, comprising:
It is less than preset second information content threshold value and second phase relation in the corresponding information content of the discretization variable When several absolute values with the difference of first related coefficient are less than preset difference threshold, the determining and discretization variable pair The sliding-model control answered is feasible.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described Second related coefficient evaluates the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By letter corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable The ratio of the sum of breath amount is determined as the corresponding information content ratio of current discrete variable;
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable;
When institute's evaluation values are less than preset Evaluation threshold, sliding-model control corresponding with the discretization variable is determined It is feasible.
According to the second aspect of the disclosure, a kind of evaluating apparatus of variable discretization is provided, described device includes:
Processing module, for obtaining the corresponding discretization variable of the target variable to target variable progress discretization, In, the target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module, for being directed to each discretization variable, according to each target column pair within the scope of various discrete The evidence weight answered determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module, for according to the corresponding evidence weight of each discrete range and reference columns determination and the discrete model Corresponding information content is enclosed, and the corresponding information content of the discretization variable is determined according to each information content, wherein the ginseng Examine a target column for being classified as the target variable;
Evaluation module, for corresponding with the discretization variable according to the corresponding information content pair of the discretization variable Sliding-model control evaluated.
Optionally, first determining module is used for by following formula, according to each target within the scope of various discrete Arrange the corresponding evidence weight of various discrete range that corresponding evidence weight determines the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding Sample number.
Optionally, second determining module is used for by following formula, according to the corresponding weight evidence of each discrete range Information content corresponding with the discrete range is determined again:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding Sample number.
Optionally, the evaluation module includes:
First evaluation submodule, for being less than preset first information amount threshold in the corresponding information content of the discretization variable When value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, described device further include:
Third determining module, for determining the first related coefficient between the target variable and the reference columns;
4th determining module, for determining between the corresponding discretization variable of the target variable and the reference columns Two related coefficients;
The evaluation module includes:
Second evaluation submodule, for according to the corresponding information content of the discretization variable, first phase relation Several and second related coefficient evaluates the discretization variable.
Optionally, the second evaluation submodule is used to be less than preset the in the corresponding information content of the discretization variable The absolute value of two information content threshold values and the difference of second related coefficient and first related coefficient is less than preset difference When threshold value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, the second evaluation submodule includes:
First determines submodule, for by the absolute value of second related coefficient and the difference of first related coefficient It is determined as relevance difference;
Second determines submodule, for by the corresponding information content of current discrete variable and the target variable it is each from The ratio of the sum of information content corresponding to dispersion variable is determined as the corresponding information content ratio of current discrete variable;
Third determines submodule, described current for the sum of the relevance difference and the information content ratio to be determined as The evaluation of estimate of discretization variable;
4th determines submodule, is used for when institute's evaluation values are less than preset Evaluation threshold, the determining and discretization The corresponding sliding-model control of variable is feasible.
According to the third aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The step of disclosure first aspect the method is realized when the program is executed by processor.
According to the fourth aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in disclosure first aspect The step of method.
In the above-mentioned technical solutions, for the corresponding discretization variable of target variable, by calculating each of the target variable A target column corresponding evidence weight in a discrete range, so that it is determined that go out the corresponding evidence weight of the discrete range, And then the information content of corresponding discretization variable can be determined according to the corresponding evidence weight of various discrete range, thus to this from Dispersion variable is evaluated.Thus, it is possible to make the evaluation method of the discretization variable can be adapted for more classified variables, effectively Widen the scope of application of this method.It is also possible to which the accuracy of the evaluation result of discretization variable is effectively ensured, user is promoted Usage experience.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart of the evaluation method of the variable discretization provided according to an embodiment of the present disclosure;
Fig. 2 is the block diagram of the evaluating apparatus of the variable discretization provided according to an embodiment of the present disclosure;
Fig. 3 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment;
Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
Shown in Fig. 1, for the flow chart of the evaluation method of the variable discretization provided according to an embodiment of the present disclosure. As shown in Figure 1, which comprises
In S11, discretization is carried out to target variable, obtains the corresponding discretization variable of target variable, wherein the mesh Marking variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable.Target variable be to Discrete variable, wherein discretization can be carried out to the target variable with reference to classification by multiple.Illustratively, target variable is " the last time purchase amount of money ", when carrying out discretization to it, can classify the identity of different buyers as discrete reference, Such as, target column can be " middle school student ", " university student ", " graduate " etc..
Wherein it is possible to which carrying out discretization to target variable according to the prior art illustratively can pass through equidistant discrete side Target variable is divided into 3 isometric discrete ranges in section by formula, and the number in section can be set according to actual use scene It sets;It can also be by the way that the corresponding each data of target variable be ranked up according to certain sequence, to be divided into comprising identical The discretization variable of the data of number.Wherein, if there is no other explanation, more classified variables refer to comprising at least three targets in the disclosure The variable of column.
In S12, for each discretization variable, according to the corresponding weight evidence of each target column within the scope of various discrete The corresponding evidence weight of various discrete range of the discretization variable is determined again.
Wherein, the corresponding evidence weight of current discrete range is used to characterize the data and the discretization of the current discrete range Difference between the overall data (that is, data of the various discrete range of the discretization variable) of variable.Therefore, it is determining currently When the corresponding evidence weight of discrete range, the corresponding weight evidence of each target column in the discrete range can be determined first Weight, so as to determine the corresponding evidence weight of current discrete range according to the corresponding evidence weight of each target column, so that When determining the corresponding evidence weight of current discrete range, the data of each target column can be comprehensively considered within the scope of current discrete Data difference influence so that this method can be applied to more classified variables.
It is corresponding with the discrete range according to the corresponding evidence weight of each discrete range and reference columns determination in S13 Information content, and the corresponding information content of the discretization variable is determined according to each information content, wherein the reference is classified as the mesh Mark a target column of variable.
Wherein, to target variable discretization, determine the information content of discretization variable when, it will usually in specified target column One is as reference columns, in order to evaluate the corresponding sliding-model control of discretization variable.Illustratively, which can To be configured according to actual use scene, e.g., target variable is " the last time purchase the amount of money ", target be classified as " middle school student ", " university student ", " graduate " etc. then can specify with reference to " middle school student " are classified as, according to the last purchase amount of money of middle school student The sliding-model control of the target variable " the last time purchase amount of money " is evaluated.
In S14, carried out according to the corresponding information content pair of discretization variable sliding-model control corresponding with discretization variable Evaluation.
Wherein, it can be by carrying out evaluation to discretization variable and determine target variable is discretized into the discretization variable Whether corresponding sliding-model control is feasible.When determining that the corresponding sliding-model control of discretization variable is feasible, indicate that this is discrete The target variable can accurately be indicated by changing variable, can carry out follow-up data processing according to the discretization variable.
In the above-mentioned technical solutions, for the corresponding discretization variable of target variable, by calculating each of the target variable A target column corresponding evidence weight in a discrete range, so that it is determined that go out the corresponding evidence weight of the discrete range, And then the information content of corresponding discretization variable can be determined according to the corresponding evidence weight of various discrete range, thus to this from Dispersion variable is evaluated.Thus, it is possible to make the evaluation method of the discretization variable can be adapted for more classified variables, effectively Widen the scope of application of this method.It is also possible to which the accuracy of the evaluation result of discretization variable is effectively ensured, user is promoted Usage experience.
In order to make those skilled in the art more understand technical solution provided in an embodiment of the present invention, below to above-mentioned steps It is described in detail.
It is illustrated how to according to the corresponding evidence weight determination of each target column within the scope of various discrete first The corresponding evidence weight of various discrete range of discretization variable.It illustratively, can be by following formula, according to various discrete model The corresponding evidence weight of each target column in enclosing determines the corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding Sample number.
It wherein, in this embodiment, can be by it when determining the evidence weight that current goal arranges within the scope of current discrete He is handled target column as a target column, and is determining current discrete range according to the evidence weight of each target column When corresponding evidence weight, according to the ratio of corresponding sample number between current goal column and target complete column as the current mesh The weight coefficient of mark column is weighted summation, can both determine the weight evidence of each target column within the scope of current discrete respectively Weight, and each target column can be integrated, the corresponding evidence weight of current discrete range is efficiently and accurately determined, and then to more The sliding-model control of classified variable is evaluated.
Be as shown in table 1 below illustratively, target variable " the last time purchase the amount of money " and corresponding target column " middle school student ", A kind of tables of data of sliding-model control between " university student ", " graduate ".
Table 1
As shown in table 1, which is that target variable is divided into four discrete ranges, wherein
First discrete range is (0,100), then determines the corresponding evidence weight of the discrete range by above-mentioned formula WOE1 are as follows: WOE1=0.05 × (- 0.7472)+0.41 × (- 0.1228)+0.54 × (0.3207)=- 0.232
Second discrete range be [100,200), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula Weight WOE2 are as follows: WOE2=0.1 × 0+0.57 × (- 0.3236)+0.33 × 0.5328=-0.362
Third discrete range be [200,500), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula Weight WOE3 are as follows: WOE3=0.2 × 0.8109+0.33 × (- 0.4520)+0.47 × 0.0268=0.026
4th discrete range be [500 ,+∞), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula Weight WOE4 are as follows: WOE4=0.3 × 1.3499+0.3 × (- 0.606)+0.4 × (- 0.2451)=0.125
It, can be by following formula, according to each discrete model after determining the corresponding evidence weight of various discrete range It encloses corresponding evidence weight and determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;
LabelrestTIndicate that target column of the target variable within the scope of current discrete in addition to the reference columns T is corresponding Sample number;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding Sample number.
Illustratively, specified target column " middle school student " be reference columns when, then with first discrete range (0,100) for, Wherein, target variable within the scope of current discrete, the corresponding sample number of reference columns " middle school student " be 2500, target variable is in whole Discrete range is interior, the corresponding sample number of reference columns " middle school student " is 10000, and target variable removes the ginseng within the scope of current discrete Examining the corresponding sample number of target column except column " middle school student " is 47500 (that is, target variable is within the scope of current discrete, target Arrange the sum of " university student " and " graduate " corresponding sample number), target variable whole discrete range is interior to remove the reference columns " middle school The corresponding sample number of target column except life " is 90000.Thus, it is possible to determine that first discrete range (0,100) is corresponding Information content is
Similarly, second discrete range [100,200) corresponding information content IV2 is 0;
Third discrete range [200,500) corresponding information content IV3 is 0.004;
4th discrete range [500 ,+∞) corresponding information content IV4 is 0.014;
Therefore, the corresponding information content of discretization variable is 0.082 (that is, 0.064+0+0.004+0.014).
It, can be through the above technical solutions, by other targets except reference columns when target variable is more classified variables Column are comprehensive to be considered as a column, quickly and quickly and easily determines the corresponding evidence weight of various discrete range and information content, To accurately determine out the corresponding information content of discretization variable, accurate data base is provided to carry out evaluation to discretization variable The accuracy of the evaluation result of the discretization variable determined based on the information content is effectively ensured in plinth.
Optionally, described corresponding with the discretization variable according to the corresponding information content pair of the discretization variable Sliding-model control is evaluated, comprising:
When the corresponding information content of the discretization variable is less than preset first information amount threshold value, it is determining with it is described discrete It is feasible to change the corresponding sliding-model control of variable.
Wherein, which can be configured according to actual use situation, and the disclosure is to this without limit It is fixed.In this embodiment, determine that the corresponding sliding-model control of discretization variable is based on the corresponding information content of discretization variable No feasible, this method is simple, accurate and be easy to implement, and user experience is effectively ensured.
Optionally, after multiple discretization variables corresponding to target variable are evaluated, determining that target variable is corresponding Target discreteization processing when, the corresponding sliding-model control of the smallest discretization variable of information content directly can be determined as target The target discreteization of variable is handled, and carries out discretization to the target variable so as to handle according to the target discreteization, can be with Guarantee the accuracy of target variable sliding-model control, or subsequent data processing provides accurate data and supports.
Due to when determining the corresponding information content of discretization variable, be by by except current goal column in addition to other mesh Mark column are comprehensive to be calculated as a column reference, therefore, related to target column after more classified variable discretizations in order to guarantee Property, the disclosure also provides following embodiment.Specifically, the method also includes:
Determine the first related coefficient between the target variable and reference columns.
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns.
Wherein it is possible to the Pearson correlation coefficient between target variable and reference columns is determined as first related coefficient, Pearson correlation coefficient between discretization variable and reference columns is determined as second related coefficient.Pearson correlation coefficient Calculation is the prior art, and details are not described herein.
It is described according to the corresponding information content pair of discretization variable discretization corresponding with the discretization variable It is as follows to handle a kind of example implementations evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient The discretization variable is evaluated.
In this embodiment, when the first related coefficient can be used for characterizing target variable non-discretization between reference columns Correlation, the second related coefficient can be used for characterizing resulting discretization variable and reference columns after the target variable sliding-model control Between correlation, i.e. the first related coefficient and the second related coefficient, which respectively correspond, indicates that the target variable carries out sliding-model control The forward and backward correlation between reference columns, thus, it is possible in conjunction with the corresponding information content of discretization variable, the first related coefficient and Second related coefficient evaluates the discretization variable, so as to which the accuracy of evaluation result is effectively ensured.
Optionally, in one embodiment, described according to the corresponding information content of the discretization variable, first phase A kind of example implementations that relationship number and second related coefficient evaluate the discretization variable are as follows, packet It includes:
It is less than preset second information content threshold value and second phase relation in the corresponding information content of the discretization variable When several absolute values with the difference of first related coefficient are less than preset difference threshold, the determining and discretization variable pair The sliding-model control answered is feasible.
It needs to be illustrated, the second information content threshold value and first information amount threshold value may be the same or different, poor Value threshold value can be configured according to actual use scene, and the disclosure is to this without limiting.In the corresponding letter of discretization variable When breath amount is less than information content threshold value, indicate that the target variable, should be from after the corresponding sliding-model control of discretization variable The information that the data of dispersion variable and the data of target variable indicate is consistent;In the second related coefficient and the first related coefficient The absolute value of difference when being less than preset difference threshold, indicate target variable after sliding-model control, with target column Between correlation variation it is smaller, at this point it is possible to determine that corresponding with discretization variable sliding-model control is feasible.
In the above-mentioned technical solutions, when evaluating discretization variable, not only with reference to the corresponding letter of discretization variable Sliding-model control is commented in breath amount, the change for carrying out the correlation of sliding-model control correspondence in combination with target variable Valence, so as to which the accuracy of variable sliding-model control evaluation result is effectively ensured.
Optionally, after multiple discretization variables corresponding to target variable are evaluated, determining that target variable is corresponding Target discreteization processing when, the second related coefficient can be excluded first and the absolute value of the difference of the first related coefficient is greater than Or the corresponding sliding-model control of discretization variable equal to difference threshold, i.e., determine that the corresponding correlation of target variable becomes first Change lesser sliding-model control.Later, can by remaining sliding-model control, the smallest discretization variable of information content it is corresponding Sliding-model control is determined as the target discreteization processing of target variable, so as to be handled according to the target discreteization to the target Variable carries out discretization, to guarantee the accuracy of sliding-model control.
To target variable carry out sliding-model control after, need to be counted accordingly according to the data after the discretization or Calculate, therefore, through the above technical solutions, can guarantee it is lesser to the corresponding interdependence effects of target variable under the premise of, It determines the corresponding sliding-model control of target variable, thereby may be ensured that the accuracy of target variable sliding-model control, it can also be with Accurate data are provided for subsequent data processing to support.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described Second related coefficient evaluates the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By letter corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable The ratio of the sum of breath amount is determined as the corresponding information content ratio of current discrete variable.
Illustratively, the information content ratio can be determined according to the following formula:
Wherein, IVDIndicate the corresponding information content of current discrete variable;
Rate indicates the information content ratio;
IViIndicate information content corresponding to i-th of discretization variable of the target variable;
N indicates the total number of the corresponding discretization variable of target variable.
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable, The evaluation of estimate can carry out comprehensive characterization to the corresponding correlation of discretization variable and information content;
When institute's evaluation values are less than preset Evaluation threshold, sliding-model control corresponding with the discretization variable is determined It is feasible.
Wherein, relevance difference can be used for characterizing the correlation that target variable is separated into the discretization variable correspondence Variation, it is corresponding complete in the target variable to can be used for characterizing the corresponding information content of discretization variable by information content ratio The accounting of the information content summation of portion's discretization variable.It therefore, in this embodiment, can by relevance difference and information content ratio To carry out overall merit to discretization variable, to effectively improve the accuracy of evaluation of estimate, and then guarantee the discretization variable The accuracy of evaluation result is supported to determine that the sliding-model control of target variable provides data, further promotes user and use body It tests.
It optionally, can be according to discretization variable after multiple discretization variables corresponding to target variable are evaluated Evaluation of estimate determine target variable corresponding target discreteization processing, for example, can be by the smallest discretization variable pair of evaluation of estimate The sliding-model control answered as the target discreteization processing, so as to according to the target discreteization handle to the target variable into Row discretization.
To target variable carry out sliding-model control after, need to be counted accordingly according to the data after the discretization or Calculate, therefore, through the above technical solutions, can guarantee it is lesser to the corresponding interdependence effects of target variable under the premise of, It determines the corresponding sliding-model control of target variable, thereby may be ensured that the accuracy of target variable sliding-model control, it can also be with Accurate data are provided for subsequent data processing to support.
Fig. 2 is the block diagram of the evaluating apparatus of the variable discretization provided according to an embodiment of the present disclosure, the device 20 include:
Processing module 21, for obtaining the corresponding discretization variable of the target variable to target variable progress discretization, Wherein, the target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module 22, for being directed to each discretization variable, according to each target column within the scope of various discrete Corresponding evidence weight determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module 23, for discrete with this according to the corresponding evidence weight of each discrete range and reference columns determination The corresponding information content of range, and the corresponding information content of the discretization variable is determined according to each information content, wherein it is described With reference to a target column for being classified as the target variable;
Evaluation module 24, for according to the corresponding information content pair of the discretization variable and the discretization variable pair The sliding-model control answered is evaluated.
Optionally, first determining module 22 is used for by following formula, according to each mesh within the scope of various discrete Mark arranges the corresponding evidence weight of various discrete range that corresponding evidence weight determines the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding Sample number.
Optionally, second determining module 23 is used for by following formula, according to the corresponding evidence of each discrete range Weight determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding Sample number.
Optionally, the evaluation module 24 includes:
First evaluation submodule, for being less than preset first information amount threshold in the corresponding information content of the discretization variable When value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, described device 20 further include:
Third determining module, for determining the first related coefficient between the target variable and the reference columns;
4th determining module, for determining between the corresponding discretization variable of the target variable and the reference columns Two related coefficients;
The evaluation module 24 includes:
Second evaluation submodule, for according to the corresponding information content of the discretization variable, first phase relation Several and second related coefficient evaluates the discretization variable.
Optionally, the second evaluation submodule is used to be less than preset the in the corresponding information content of the discretization variable The absolute value of two information content threshold values and the difference of second related coefficient and first related coefficient is less than preset difference When threshold value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, the second evaluation submodule includes:
First determines submodule, for by the absolute value of second related coefficient and the difference of first related coefficient It is determined as relevance difference;
Second determines submodule, for by the corresponding information content of current discrete variable and the target variable it is each from The ratio of the sum of information content corresponding to dispersion variable is determined as the corresponding information content ratio of current discrete variable;
Third determines submodule, described current for the sum of the relevance difference and the information content ratio to be determined as The evaluation of estimate of discretization variable;
4th determines submodule, is used for when institute's evaluation values are less than preset Evaluation threshold, the determining and discretization The corresponding sliding-model control of variable is feasible.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Fig. 3 is the block diagram of a kind of electronic equipment 700 shown according to an exemplary embodiment.As shown in figure 3, the electronics is set Standby 700 may include: processor 701, memory 702.The electronic equipment 700 can also include multimedia component 703, input/ Export one or more of (I/O) interface 704 and communication component 705.
Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete above-mentioned variable discretization Evaluation method in all or part of the steps.Memory 702 is for storing various types of data to support to set in the electronics Standby 700 operation, these data for example may include any application or method for operating on the electronic equipment 700 Instruction and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..It should Memory 702 can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static state Random access memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 703 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 702 is sent by communication component 705.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 704 provides interface between processor 701 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 705 is for the electronic equipment 700 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 705 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 700 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing the evaluation method of above-mentioned variable discretization.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of evaluation method of above-mentioned variable discretization is realized when program instruction is executed by processor.For example, this is computer-readable Storage medium can be the above-mentioned memory 702 including program instruction, and above procedure instruction can be by the processor of electronic equipment 700 701 execute to complete the evaluation method of above-mentioned variable discretization.
Fig. 4 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment 1900 It may be provided as a server.Referring to Fig. 4, electronic equipment 1900 includes processor 1922, and quantity can be one or more A and memory 1932, for storing the computer program that can be executed by processor 1922.The meter stored in memory 1932 Calculation machine program may include it is one or more each correspond to one group of instruction module.In addition, processor 1922 can To be configured as executing the computer program, to execute the evaluation method of above-mentioned variable discretization.
In addition, electronic equipment 1900 can also include power supply module 1926 and communication component 1950, the power supply module 1926 It can be configured as the power management for executing electronic equipment 1900, which can be configured as realization electronic equipment 1900 communication, for example, wired or wireless communication.In addition, the electronic equipment 1900 can also include that input/output (I/O) connects Mouth 1958.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM etc..
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of evaluation method of above-mentioned variable discretization is realized when program instruction is executed by processor.For example, this is computer-readable Storage medium can be the above-mentioned memory 1932 including program instruction, and above procedure instruction can be by the processing of electronic equipment 1900 Device 1922 is executed to complete the evaluation method of above-mentioned variable discretization.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of evaluation method of variable discretization, which is characterized in that the described method includes:
Discretization is carried out to target variable, obtains the corresponding discretization variable of the target variable, wherein the target variable packet At least three target columns are included, it is corresponding with reference to classification that the target is classified as the target variable;
For each discretization variable, according to the corresponding evidence weight of each target column within the scope of various discrete determine it is described from The corresponding evidence weight of various discrete range of dispersion variable;
According to the corresponding evidence weight of each discrete range and the determining information content corresponding with the discrete range of reference columns, and according to Each information content determines the corresponding information content of the discretization variable, wherein the reference is classified as the target variable One target column;
It is carried out according to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable Evaluation.
2. the method according to claim 1, wherein by following formula, according to every within the scope of various discrete The corresponding evidence weight of a target column determines the corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate the corresponding sample of target column of the target variable within the scope of current discrete in addition to i-th of target column This number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate the corresponding sample of target column in the target variable whole discrete range in addition to i-th of target column This number.
3. corresponding according to each discrete range according to the method described in claim 2, it is characterized in that, by following formula Evidence weight determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestTIt indicates Target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate the corresponding sample of target column in the target variable whole discrete range in addition to the reference columns T This number.
4. the method according to claim 1, wherein described according to the corresponding information of the discretization variable Amount evaluates sliding-model control corresponding with the discretization variable, comprising:
It is determining to become with the discretization when the corresponding information content of the discretization variable is less than preset first information amount threshold value It is feasible to measure corresponding sliding-model control.
5. the method according to claim 1, wherein the method also includes:
Determine the first related coefficient between the target variable and the reference columns;
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns;
It is described according to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable It is evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient to institute Discretization variable is stated to be evaluated.
6. according to the method described in claim 5, it is characterized in that, described according to the corresponding information of the discretization variable Amount, first related coefficient and second related coefficient evaluate the discretization variable, comprising:
The corresponding information content of the discretization variable be less than preset second information content threshold value and second related coefficient with When the absolute value of the difference of first related coefficient is less than preset difference threshold, determination is corresponding with the discretization variable Sliding-model control is feasible.
7. according to the method described in claim 5, it is characterized in that, described according to the corresponding information of the discretization variable Amount, first related coefficient and second related coefficient evaluate the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By information content corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable The sum of ratio be determined as the corresponding information content ratio of current discrete variable;
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable;
When institute's evaluation values are less than preset Evaluation threshold, determine that sliding-model control corresponding with the discretization variable can Row.
8. a kind of evaluating apparatus of variable discretization, which is characterized in that described device includes:
Processing module obtains the corresponding discretization variable of the target variable for carrying out discretization to target variable, wherein The target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module, it is corresponding according to each target column within the scope of various discrete for being directed to each discretization variable Evidence weight determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module, for according to the corresponding evidence weight of each discrete range and reference columns determination and the discrete range pair The information content answered, and the corresponding information content of the discretization variable is determined according to each information content, wherein the reference columns For a target column of the target variable;
Evaluation module, for according to the corresponding information content pair of the discretization variable it is corresponding with the discretization variable from Dispersion processing is evaluated.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-7 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-7 The step of method.
CN201811574683.6A 2018-12-21 2018-12-21 Evaluation method, device, storage medium and the electronic equipment of variable discretization Pending CN109815985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811574683.6A CN109815985A (en) 2018-12-21 2018-12-21 Evaluation method, device, storage medium and the electronic equipment of variable discretization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811574683.6A CN109815985A (en) 2018-12-21 2018-12-21 Evaluation method, device, storage medium and the electronic equipment of variable discretization

Publications (1)

Publication Number Publication Date
CN109815985A true CN109815985A (en) 2019-05-28

Family

ID=66602195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811574683.6A Pending CN109815985A (en) 2018-12-21 2018-12-21 Evaluation method, device, storage medium and the electronic equipment of variable discretization

Country Status (1)

Country Link
CN (1) CN109815985A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717650A (en) * 2019-09-06 2020-01-21 平安医疗健康管理股份有限公司 Receipt data processing method and device, computer equipment and storage medium
CN110958352A (en) * 2019-11-28 2020-04-03 Tcl移动通信科技(宁波)有限公司 Network signal display method, device, storage medium and mobile terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717650A (en) * 2019-09-06 2020-01-21 平安医疗健康管理股份有限公司 Receipt data processing method and device, computer equipment and storage medium
CN110958352A (en) * 2019-11-28 2020-04-03 Tcl移动通信科技(宁波)有限公司 Network signal display method, device, storage medium and mobile terminal
CN110958352B (en) * 2019-11-28 2021-04-09 Tcl移动通信科技(宁波)有限公司 Network signal display method, device, storage medium and mobile terminal

Similar Documents

Publication Publication Date Title
Tinker Redshift-space distortions with the halo occupation distribution–II. Analytic model
CN104090912B (en) Information-pushing method and device
CN108833458B (en) Application recommendation method, device, medium and equipment
Dainotti et al. Slope evolution of GRB correlations and cosmology
Wesson et al. Understanding and reducing statistical uncertainties in nebular abundance determinations
CN110245080B (en) Method and device for generating scene test case
CN104063286B (en) The fluency method of testing and device of display content change
Ashworth et al. Exploring the IMF of star clusters: a joint SLUG and LEGUS effort
CN106709318A (en) Recognition method, device and calculation equipment for user equipment uniqueness
CN111311104A (en) Configuration file recommendation method, device and system
WO2017000743A1 (en) Method and device for software recommendation
CN109598414A (en) Risk evaluation model training, methods of risk assessment, device and electronic equipment
US10402736B2 (en) Evaluation system, evaluation method, and computer-readable storage medium
CN108932320A (en) Article search method, apparatus and electronic equipment
CN106776925A (en) A kind of Forecasting Methodology of mobile terminal user's sex, server and system
CN109753994A (en) User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN110580217B (en) Software code health degree detection method, processing method, device and electronic equipment
CN109815985A (en) Evaluation method, device, storage medium and the electronic equipment of variable discretization
CN105589853B (en) A kind of classification catalogue determines method and device, automatic classification method and device
CN109657840A (en) Decision tree generation method, device, computer readable storage medium and electronic equipment
CN109753993A (en) User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN110059569A (en) Biopsy method and device, model evaluation method and apparatus
CN109739940A (en) On-line analytical processing method, apparatus, storage medium and electronic equipment
Zali et al. An initial theoretical usability evaluation model for assessing defence mobile e-based application system
KR20140079639A (en) Method for selecting similar users for collaborative filtering based on earth mover´s distance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190528

RJ01 Rejection of invention patent application after publication