CN109815985A - Evaluation method, device, storage medium and the electronic equipment of variable discretization - Google Patents
Evaluation method, device, storage medium and the electronic equipment of variable discretization Download PDFInfo
- Publication number
- CN109815985A CN109815985A CN201811574683.6A CN201811574683A CN109815985A CN 109815985 A CN109815985 A CN 109815985A CN 201811574683 A CN201811574683 A CN 201811574683A CN 109815985 A CN109815985 A CN 109815985A
- Authority
- CN
- China
- Prior art keywords
- variable
- target
- discretization
- information content
- discrete
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This disclosure relates to a kind of evaluation method, device, storage medium and the electronic equipment of variable discretization.The described method includes: carrying out discretization to target variable, the corresponding discretization variable of target variable is obtained, target variable includes at least three target columns, and it is corresponding with reference to classification that target is classified as target variable;For each discretization variable, the corresponding evidence weight of each discrete range of discretization variable is determined according to the corresponding evidence weight of each target column in each discrete range;Information content corresponding with the discrete range is determined according to the corresponding evidence weight of each discrete range and reference columns, and the corresponding information content of discretization variable is determined according to each information content, with reference to a target column for being classified as target variable;It is evaluated according to the corresponding information content pair of discretization variable sliding-model control corresponding with discretization variable.Thus, it is possible to which the evaluation method of the discretization variable is made to can be adapted for more classified variables, the scope of application of this method is effectively widened.
Description
Technical field
This disclosure relates to data processing field, and in particular, to a kind of evaluation method of variable discretization, device, storage
Medium and electronic equipment.
Background technique
For continuous variable, it usually needs the continuous variable is carried out discretization, thus based on after discretization
Data to the variable carry out corresponding operation.In the prior art, it generallys use the methods of equidistant, equiprobability and carries out discrete, example
Such as, equidistant discrete, be that continuous variable is divided into the sections of several equal lengths to realize Data Discretization, equiprobability from
It is scattered then be guarantee continuous variable it is discrete after each section in data volume it is identical.
For continuous variable, the accuracy of discretization data has very great shadow to variable subsequent operation
It rings.Therefore, it is necessary to the sliding-model controls to continuous variable to evaluate, and whether can guarantee this in a manner of the determining discretization
The accuracy of variable data.In the prior art, it generallys use WOE (Weight of Evidence, evidence weight) and IV
The method that (Information Value, information content) combines evaluates continuous variable's discretization, to determine the discretization
Whether processing is feasible, however aforesaid way is the sliding-model control evaluation for being directed to the continuous variable of corresponding two classification, and
It is not applied for the sliding-model control evaluation of more polytypic continuous variable.
Summary of the invention
To solve the above-mentioned problems, purpose of this disclosure is to provide a kind of changes that discretization is carried out to more polytypic variable
Measure evaluation method, device, storage medium and the electronic equipment of discretization.
To achieve the goals above, according to the disclosure in a first aspect, providing a kind of evaluation method of variable discretization, institute
The method of stating includes:
Discretization is carried out to target variable, obtains the corresponding discretization variable of the target variable, wherein the target becomes
Amount includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
For each discretization variable, institute is determined according to the corresponding evidence weight of each target column within the scope of various discrete
State the corresponding evidence weight of various discrete range of discretization variable;
Information content corresponding with the discrete range is determined according to the corresponding evidence weight of each discrete range and reference columns, and
The corresponding information content of the discretization variable is determined according to each information content, wherein the reference is classified as the target and becomes
One target column of amount;
According to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable
It is evaluated.
Optionally, it by following formula, is determined according to the corresponding evidence weight of each target column within the scope of various discrete
The corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding
Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding
Sample number.
Optionally, by following formula, according to the corresponding evidence weight determination of each discrete range and the discrete range pair
The information content answered:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT
Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding
Sample number.
Optionally, described corresponding with the discretization variable according to the corresponding information content pair of the discretization variable
Sliding-model control is evaluated, comprising:
When the corresponding information content of the discretization variable is less than preset first information amount threshold value, it is determining with it is described discrete
It is feasible to change the corresponding sliding-model control of variable.
Optionally, the method also includes:
Determine the first related coefficient between the target variable and the reference columns;
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns;
It is described according to the corresponding information content pair of discretization variable discretization corresponding with the discretization variable
Processing is evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient
The discretization variable is evaluated.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described
Second related coefficient evaluates the discretization variable, comprising:
It is less than preset second information content threshold value and second phase relation in the corresponding information content of the discretization variable
When several absolute values with the difference of first related coefficient are less than preset difference threshold, the determining and discretization variable pair
The sliding-model control answered is feasible.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described
Second related coefficient evaluates the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By letter corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable
The ratio of the sum of breath amount is determined as the corresponding information content ratio of current discrete variable;
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable;
When institute's evaluation values are less than preset Evaluation threshold, sliding-model control corresponding with the discretization variable is determined
It is feasible.
According to the second aspect of the disclosure, a kind of evaluating apparatus of variable discretization is provided, described device includes:
Processing module, for obtaining the corresponding discretization variable of the target variable to target variable progress discretization,
In, the target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module, for being directed to each discretization variable, according to each target column pair within the scope of various discrete
The evidence weight answered determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module, for according to the corresponding evidence weight of each discrete range and reference columns determination and the discrete model
Corresponding information content is enclosed, and the corresponding information content of the discretization variable is determined according to each information content, wherein the ginseng
Examine a target column for being classified as the target variable;
Evaluation module, for corresponding with the discretization variable according to the corresponding information content pair of the discretization variable
Sliding-model control evaluated.
Optionally, first determining module is used for by following formula, according to each target within the scope of various discrete
Arrange the corresponding evidence weight of various discrete range that corresponding evidence weight determines the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding
Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding
Sample number.
Optionally, second determining module is used for by following formula, according to the corresponding weight evidence of each discrete range
Information content corresponding with the discrete range is determined again:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT
Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding
Sample number.
Optionally, the evaluation module includes:
First evaluation submodule, for being less than preset first information amount threshold in the corresponding information content of the discretization variable
When value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, described device further include:
Third determining module, for determining the first related coefficient between the target variable and the reference columns;
4th determining module, for determining between the corresponding discretization variable of the target variable and the reference columns
Two related coefficients;
The evaluation module includes:
Second evaluation submodule, for according to the corresponding information content of the discretization variable, first phase relation
Several and second related coefficient evaluates the discretization variable.
Optionally, the second evaluation submodule is used to be less than preset the in the corresponding information content of the discretization variable
The absolute value of two information content threshold values and the difference of second related coefficient and first related coefficient is less than preset difference
When threshold value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, the second evaluation submodule includes:
First determines submodule, for by the absolute value of second related coefficient and the difference of first related coefficient
It is determined as relevance difference;
Second determines submodule, for by the corresponding information content of current discrete variable and the target variable it is each from
The ratio of the sum of information content corresponding to dispersion variable is determined as the corresponding information content ratio of current discrete variable;
Third determines submodule, described current for the sum of the relevance difference and the information content ratio to be determined as
The evaluation of estimate of discretization variable;
4th determines submodule, is used for when institute's evaluation values are less than preset Evaluation threshold, the determining and discretization
The corresponding sliding-model control of variable is feasible.
According to the third aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with,
The step of disclosure first aspect the method is realized when the program is executed by processor.
According to the fourth aspect of the disclosure, a kind of electronic equipment is provided, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in disclosure first aspect
The step of method.
In the above-mentioned technical solutions, for the corresponding discretization variable of target variable, by calculating each of the target variable
A target column corresponding evidence weight in a discrete range, so that it is determined that go out the corresponding evidence weight of the discrete range,
And then the information content of corresponding discretization variable can be determined according to the corresponding evidence weight of various discrete range, thus to this from
Dispersion variable is evaluated.Thus, it is possible to make the evaluation method of the discretization variable can be adapted for more classified variables, effectively
Widen the scope of application of this method.It is also possible to which the accuracy of the evaluation result of discretization variable is effectively ensured, user is promoted
Usage experience.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool
Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart of the evaluation method of the variable discretization provided according to an embodiment of the present disclosure;
Fig. 2 is the block diagram of the evaluating apparatus of the variable discretization provided according to an embodiment of the present disclosure;
Fig. 3 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment;
Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched
The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
Shown in Fig. 1, for the flow chart of the evaluation method of the variable discretization provided according to an embodiment of the present disclosure.
As shown in Figure 1, which comprises
In S11, discretization is carried out to target variable, obtains the corresponding discretization variable of target variable, wherein the mesh
Marking variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable.Target variable be to
Discrete variable, wherein discretization can be carried out to the target variable with reference to classification by multiple.Illustratively, target variable is
" the last time purchase amount of money ", when carrying out discretization to it, can classify the identity of different buyers as discrete reference,
Such as, target column can be " middle school student ", " university student ", " graduate " etc..
Wherein it is possible to which carrying out discretization to target variable according to the prior art illustratively can pass through equidistant discrete side
Target variable is divided into 3 isometric discrete ranges in section by formula, and the number in section can be set according to actual use scene
It sets;It can also be by the way that the corresponding each data of target variable be ranked up according to certain sequence, to be divided into comprising identical
The discretization variable of the data of number.Wherein, if there is no other explanation, more classified variables refer to comprising at least three targets in the disclosure
The variable of column.
In S12, for each discretization variable, according to the corresponding weight evidence of each target column within the scope of various discrete
The corresponding evidence weight of various discrete range of the discretization variable is determined again.
Wherein, the corresponding evidence weight of current discrete range is used to characterize the data and the discretization of the current discrete range
Difference between the overall data (that is, data of the various discrete range of the discretization variable) of variable.Therefore, it is determining currently
When the corresponding evidence weight of discrete range, the corresponding weight evidence of each target column in the discrete range can be determined first
Weight, so as to determine the corresponding evidence weight of current discrete range according to the corresponding evidence weight of each target column, so that
When determining the corresponding evidence weight of current discrete range, the data of each target column can be comprehensively considered within the scope of current discrete
Data difference influence so that this method can be applied to more classified variables.
It is corresponding with the discrete range according to the corresponding evidence weight of each discrete range and reference columns determination in S13
Information content, and the corresponding information content of the discretization variable is determined according to each information content, wherein the reference is classified as the mesh
Mark a target column of variable.
Wherein, to target variable discretization, determine the information content of discretization variable when, it will usually in specified target column
One is as reference columns, in order to evaluate the corresponding sliding-model control of discretization variable.Illustratively, which can
To be configured according to actual use scene, e.g., target variable is " the last time purchase the amount of money ", target be classified as " middle school student ",
" university student ", " graduate " etc. then can specify with reference to " middle school student " are classified as, according to the last purchase amount of money of middle school student
The sliding-model control of the target variable " the last time purchase amount of money " is evaluated.
In S14, carried out according to the corresponding information content pair of discretization variable sliding-model control corresponding with discretization variable
Evaluation.
Wherein, it can be by carrying out evaluation to discretization variable and determine target variable is discretized into the discretization variable
Whether corresponding sliding-model control is feasible.When determining that the corresponding sliding-model control of discretization variable is feasible, indicate that this is discrete
The target variable can accurately be indicated by changing variable, can carry out follow-up data processing according to the discretization variable.
In the above-mentioned technical solutions, for the corresponding discretization variable of target variable, by calculating each of the target variable
A target column corresponding evidence weight in a discrete range, so that it is determined that go out the corresponding evidence weight of the discrete range,
And then the information content of corresponding discretization variable can be determined according to the corresponding evidence weight of various discrete range, thus to this from
Dispersion variable is evaluated.Thus, it is possible to make the evaluation method of the discretization variable can be adapted for more classified variables, effectively
Widen the scope of application of this method.It is also possible to which the accuracy of the evaluation result of discretization variable is effectively ensured, user is promoted
Usage experience.
In order to make those skilled in the art more understand technical solution provided in an embodiment of the present invention, below to above-mentioned steps
It is described in detail.
It is illustrated how to according to the corresponding evidence weight determination of each target column within the scope of various discrete first
The corresponding evidence weight of various discrete range of discretization variable.It illustratively, can be by following formula, according to various discrete model
The corresponding evidence weight of each target column in enclosing determines the corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding
Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding
Sample number.
It wherein, in this embodiment, can be by it when determining the evidence weight that current goal arranges within the scope of current discrete
He is handled target column as a target column, and is determining current discrete range according to the evidence weight of each target column
When corresponding evidence weight, according to the ratio of corresponding sample number between current goal column and target complete column as the current mesh
The weight coefficient of mark column is weighted summation, can both determine the weight evidence of each target column within the scope of current discrete respectively
Weight, and each target column can be integrated, the corresponding evidence weight of current discrete range is efficiently and accurately determined, and then to more
The sliding-model control of classified variable is evaluated.
Be as shown in table 1 below illustratively, target variable " the last time purchase the amount of money " and corresponding target column " middle school student ",
A kind of tables of data of sliding-model control between " university student ", " graduate ".
Table 1
As shown in table 1, which is that target variable is divided into four discrete ranges, wherein
First discrete range is (0,100), then determines the corresponding evidence weight of the discrete range by above-mentioned formula
WOE1 are as follows: WOE1=0.05 × (- 0.7472)+0.41 × (- 0.1228)+0.54 × (0.3207)=- 0.232
Second discrete range be [100,200), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula
Weight WOE2 are as follows: WOE2=0.1 × 0+0.57 × (- 0.3236)+0.33 × 0.5328=-0.362
Third discrete range be [200,500), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula
Weight WOE3 are as follows: WOE3=0.2 × 0.8109+0.33 × (- 0.4520)+0.47 × 0.0268=0.026
4th discrete range be [500 ,+∞), then the corresponding weight evidence of the discrete range is determined by above-mentioned formula
Weight WOE4 are as follows: WOE4=0.3 × 1.3499+0.3 × (- 0.606)+0.4 × (- 0.2451)=0.125
It, can be by following formula, according to each discrete model after determining the corresponding evidence weight of various discrete range
It encloses corresponding evidence weight and determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;
LabelrestTIndicate that target column of the target variable within the scope of current discrete in addition to the reference columns T is corresponding
Sample number;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding
Sample number.
Illustratively, specified target column " middle school student " be reference columns when, then with first discrete range (0,100) for,
Wherein, target variable within the scope of current discrete, the corresponding sample number of reference columns " middle school student " be 2500, target variable is in whole
Discrete range is interior, the corresponding sample number of reference columns " middle school student " is 10000, and target variable removes the ginseng within the scope of current discrete
Examining the corresponding sample number of target column except column " middle school student " is 47500 (that is, target variable is within the scope of current discrete, target
Arrange the sum of " university student " and " graduate " corresponding sample number), target variable whole discrete range is interior to remove the reference columns " middle school
The corresponding sample number of target column except life " is 90000.Thus, it is possible to determine that first discrete range (0,100) is corresponding
Information content is
Similarly, second discrete range [100,200) corresponding information content IV2 is 0;
Third discrete range [200,500) corresponding information content IV3 is 0.004;
4th discrete range [500 ,+∞) corresponding information content IV4 is 0.014;
Therefore, the corresponding information content of discretization variable is 0.082 (that is, 0.064+0+0.004+0.014).
It, can be through the above technical solutions, by other targets except reference columns when target variable is more classified variables
Column are comprehensive to be considered as a column, quickly and quickly and easily determines the corresponding evidence weight of various discrete range and information content,
To accurately determine out the corresponding information content of discretization variable, accurate data base is provided to carry out evaluation to discretization variable
The accuracy of the evaluation result of the discretization variable determined based on the information content is effectively ensured in plinth.
Optionally, described corresponding with the discretization variable according to the corresponding information content pair of the discretization variable
Sliding-model control is evaluated, comprising:
When the corresponding information content of the discretization variable is less than preset first information amount threshold value, it is determining with it is described discrete
It is feasible to change the corresponding sliding-model control of variable.
Wherein, which can be configured according to actual use situation, and the disclosure is to this without limit
It is fixed.In this embodiment, determine that the corresponding sliding-model control of discretization variable is based on the corresponding information content of discretization variable
No feasible, this method is simple, accurate and be easy to implement, and user experience is effectively ensured.
Optionally, after multiple discretization variables corresponding to target variable are evaluated, determining that target variable is corresponding
Target discreteization processing when, the corresponding sliding-model control of the smallest discretization variable of information content directly can be determined as target
The target discreteization of variable is handled, and carries out discretization to the target variable so as to handle according to the target discreteization, can be with
Guarantee the accuracy of target variable sliding-model control, or subsequent data processing provides accurate data and supports.
Due to when determining the corresponding information content of discretization variable, be by by except current goal column in addition to other mesh
Mark column are comprehensive to be calculated as a column reference, therefore, related to target column after more classified variable discretizations in order to guarantee
Property, the disclosure also provides following embodiment.Specifically, the method also includes:
Determine the first related coefficient between the target variable and reference columns.
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns.
Wherein it is possible to the Pearson correlation coefficient between target variable and reference columns is determined as first related coefficient,
Pearson correlation coefficient between discretization variable and reference columns is determined as second related coefficient.Pearson correlation coefficient
Calculation is the prior art, and details are not described herein.
It is described according to the corresponding information content pair of discretization variable discretization corresponding with the discretization variable
It is as follows to handle a kind of example implementations evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient
The discretization variable is evaluated.
In this embodiment, when the first related coefficient can be used for characterizing target variable non-discretization between reference columns
Correlation, the second related coefficient can be used for characterizing resulting discretization variable and reference columns after the target variable sliding-model control
Between correlation, i.e. the first related coefficient and the second related coefficient, which respectively correspond, indicates that the target variable carries out sliding-model control
The forward and backward correlation between reference columns, thus, it is possible in conjunction with the corresponding information content of discretization variable, the first related coefficient and
Second related coefficient evaluates the discretization variable, so as to which the accuracy of evaluation result is effectively ensured.
Optionally, in one embodiment, described according to the corresponding information content of the discretization variable, first phase
A kind of example implementations that relationship number and second related coefficient evaluate the discretization variable are as follows, packet
It includes:
It is less than preset second information content threshold value and second phase relation in the corresponding information content of the discretization variable
When several absolute values with the difference of first related coefficient are less than preset difference threshold, the determining and discretization variable pair
The sliding-model control answered is feasible.
It needs to be illustrated, the second information content threshold value and first information amount threshold value may be the same or different, poor
Value threshold value can be configured according to actual use scene, and the disclosure is to this without limiting.In the corresponding letter of discretization variable
When breath amount is less than information content threshold value, indicate that the target variable, should be from after the corresponding sliding-model control of discretization variable
The information that the data of dispersion variable and the data of target variable indicate is consistent;In the second related coefficient and the first related coefficient
The absolute value of difference when being less than preset difference threshold, indicate target variable after sliding-model control, with target column
Between correlation variation it is smaller, at this point it is possible to determine that corresponding with discretization variable sliding-model control is feasible.
In the above-mentioned technical solutions, when evaluating discretization variable, not only with reference to the corresponding letter of discretization variable
Sliding-model control is commented in breath amount, the change for carrying out the correlation of sliding-model control correspondence in combination with target variable
Valence, so as to which the accuracy of variable sliding-model control evaluation result is effectively ensured.
Optionally, after multiple discretization variables corresponding to target variable are evaluated, determining that target variable is corresponding
Target discreteization processing when, the second related coefficient can be excluded first and the absolute value of the difference of the first related coefficient is greater than
Or the corresponding sliding-model control of discretization variable equal to difference threshold, i.e., determine that the corresponding correlation of target variable becomes first
Change lesser sliding-model control.Later, can by remaining sliding-model control, the smallest discretization variable of information content it is corresponding
Sliding-model control is determined as the target discreteization processing of target variable, so as to be handled according to the target discreteization to the target
Variable carries out discretization, to guarantee the accuracy of sliding-model control.
To target variable carry out sliding-model control after, need to be counted accordingly according to the data after the discretization or
Calculate, therefore, through the above technical solutions, can guarantee it is lesser to the corresponding interdependence effects of target variable under the premise of,
It determines the corresponding sliding-model control of target variable, thereby may be ensured that the accuracy of target variable sliding-model control, it can also be with
Accurate data are provided for subsequent data processing to support.
Optionally, described according to the corresponding information content of the discretization variable, first related coefficient and described
Second related coefficient evaluates the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By letter corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable
The ratio of the sum of breath amount is determined as the corresponding information content ratio of current discrete variable.
Illustratively, the information content ratio can be determined according to the following formula:
Wherein, IVDIndicate the corresponding information content of current discrete variable;
Rate indicates the information content ratio;
IViIndicate information content corresponding to i-th of discretization variable of the target variable;
N indicates the total number of the corresponding discretization variable of target variable.
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable,
The evaluation of estimate can carry out comprehensive characterization to the corresponding correlation of discretization variable and information content;
When institute's evaluation values are less than preset Evaluation threshold, sliding-model control corresponding with the discretization variable is determined
It is feasible.
Wherein, relevance difference can be used for characterizing the correlation that target variable is separated into the discretization variable correspondence
Variation, it is corresponding complete in the target variable to can be used for characterizing the corresponding information content of discretization variable by information content ratio
The accounting of the information content summation of portion's discretization variable.It therefore, in this embodiment, can by relevance difference and information content ratio
To carry out overall merit to discretization variable, to effectively improve the accuracy of evaluation of estimate, and then guarantee the discretization variable
The accuracy of evaluation result is supported to determine that the sliding-model control of target variable provides data, further promotes user and use body
It tests.
It optionally, can be according to discretization variable after multiple discretization variables corresponding to target variable are evaluated
Evaluation of estimate determine target variable corresponding target discreteization processing, for example, can be by the smallest discretization variable pair of evaluation of estimate
The sliding-model control answered as the target discreteization processing, so as to according to the target discreteization handle to the target variable into
Row discretization.
To target variable carry out sliding-model control after, need to be counted accordingly according to the data after the discretization or
Calculate, therefore, through the above technical solutions, can guarantee it is lesser to the corresponding interdependence effects of target variable under the premise of,
It determines the corresponding sliding-model control of target variable, thereby may be ensured that the accuracy of target variable sliding-model control, it can also be with
Accurate data are provided for subsequent data processing to support.
Fig. 2 is the block diagram of the evaluating apparatus of the variable discretization provided according to an embodiment of the present disclosure, the device
20 include:
Processing module 21, for obtaining the corresponding discretization variable of the target variable to target variable progress discretization,
Wherein, the target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module 22, for being directed to each discretization variable, according to each target column within the scope of various discrete
Corresponding evidence weight determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module 23, for discrete with this according to the corresponding evidence weight of each discrete range and reference columns determination
The corresponding information content of range, and the corresponding information content of the discretization variable is determined according to each information content, wherein it is described
With reference to a target column for being classified as the target variable;
Evaluation module 24, for according to the corresponding information content pair of the discretization variable and the discretization variable pair
The sliding-model control answered is evaluated.
Optionally, first determining module 22 is used for by following formula, according to each mesh within the scope of various discrete
Mark arranges the corresponding evidence weight of various discrete range that corresponding evidence weight determines the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate that target column of the target variable within the scope of current discrete in addition to i-th of target column is corresponding
Sample number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate that the target column in the target variable whole discrete range in addition to i-th of target column is corresponding
Sample number.
Optionally, second determining module 23 is used for by following formula, according to the corresponding evidence of each discrete range
Weight determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestT
Indicate target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate that the target column in the target variable whole discrete range in addition to the reference columns T is corresponding
Sample number.
Optionally, the evaluation module 24 includes:
First evaluation submodule, for being less than preset first information amount threshold in the corresponding information content of the discretization variable
When value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, described device 20 further include:
Third determining module, for determining the first related coefficient between the target variable and the reference columns;
4th determining module, for determining between the corresponding discretization variable of the target variable and the reference columns
Two related coefficients;
The evaluation module 24 includes:
Second evaluation submodule, for according to the corresponding information content of the discretization variable, first phase relation
Several and second related coefficient evaluates the discretization variable.
Optionally, the second evaluation submodule is used to be less than preset the in the corresponding information content of the discretization variable
The absolute value of two information content threshold values and the difference of second related coefficient and first related coefficient is less than preset difference
When threshold value, determine that sliding-model control corresponding with the discretization variable is feasible.
Optionally, the second evaluation submodule includes:
First determines submodule, for by the absolute value of second related coefficient and the difference of first related coefficient
It is determined as relevance difference;
Second determines submodule, for by the corresponding information content of current discrete variable and the target variable it is each from
The ratio of the sum of information content corresponding to dispersion variable is determined as the corresponding information content ratio of current discrete variable;
Third determines submodule, described current for the sum of the relevance difference and the information content ratio to be determined as
The evaluation of estimate of discretization variable;
4th determines submodule, is used for when institute's evaluation values are less than preset Evaluation threshold, the determining and discretization
The corresponding sliding-model control of variable is feasible.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
Fig. 3 is the block diagram of a kind of electronic equipment 700 shown according to an exemplary embodiment.As shown in figure 3, the electronics is set
Standby 700 may include: processor 701, memory 702.The electronic equipment 700 can also include multimedia component 703, input/
Export one or more of (I/O) interface 704 and communication component 705.
Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete above-mentioned variable discretization
Evaluation method in all or part of the steps.Memory 702 is for storing various types of data to support to set in the electronics
Standby 700 operation, these data for example may include any application or method for operating on the electronic equipment 700
Instruction and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..It should
Memory 702 can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static state
Random access memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory
(Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable
Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory
(Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as
ROM), magnetic memory, flash memory, disk or CD.Multimedia component 703 may include screen and audio component.Wherein
Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include
One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage
Device 702 is sent by communication component 705.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O
Interface 704 provides interface between processor 701 and other interface modules, other above-mentioned interface modules can be keyboard, mouse,
Button etc..These buttons can be virtual push button or entity button.Communication component 705 is for the electronic equipment 700 and other
Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field
Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication
Component 705 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 700 can be by one or more application specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part is realized, for executing the evaluation method of above-mentioned variable discretization.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should
The step of evaluation method of above-mentioned variable discretization is realized when program instruction is executed by processor.For example, this is computer-readable
Storage medium can be the above-mentioned memory 702 including program instruction, and above procedure instruction can be by the processor of electronic equipment 700
701 execute to complete the evaluation method of above-mentioned variable discretization.
Fig. 4 is the block diagram of a kind of electronic equipment 1900 shown according to an exemplary embodiment.For example, electronic equipment 1900
It may be provided as a server.Referring to Fig. 4, electronic equipment 1900 includes processor 1922, and quantity can be one or more
A and memory 1932, for storing the computer program that can be executed by processor 1922.The meter stored in memory 1932
Calculation machine program may include it is one or more each correspond to one group of instruction module.In addition, processor 1922 can
To be configured as executing the computer program, to execute the evaluation method of above-mentioned variable discretization.
In addition, electronic equipment 1900 can also include power supply module 1926 and communication component 1950, the power supply module 1926
It can be configured as the power management for executing electronic equipment 1900, which can be configured as realization electronic equipment
1900 communication, for example, wired or wireless communication.In addition, the electronic equipment 1900 can also include that input/output (I/O) connects
Mouth 1958.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows
ServerTM, Mac OS XTM, UnixTM, LinuxTM etc..
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should
The step of evaluation method of above-mentioned variable discretization is realized when program instruction is executed by processor.For example, this is computer-readable
Storage medium can be the above-mentioned memory 1932 including program instruction, and above procedure instruction can be by the processing of electronic equipment 1900
Device 1922 is executed to complete the evaluation method of above-mentioned variable discretization.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can
No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought equally should be considered as disclosure disclosure of that.
Claims (10)
1. a kind of evaluation method of variable discretization, which is characterized in that the described method includes:
Discretization is carried out to target variable, obtains the corresponding discretization variable of the target variable, wherein the target variable packet
At least three target columns are included, it is corresponding with reference to classification that the target is classified as the target variable;
For each discretization variable, according to the corresponding evidence weight of each target column within the scope of various discrete determine it is described from
The corresponding evidence weight of various discrete range of dispersion variable;
According to the corresponding evidence weight of each discrete range and the determining information content corresponding with the discrete range of reference columns, and according to
Each information content determines the corresponding information content of the discretization variable, wherein the reference is classified as the target variable
One target column;
It is carried out according to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable
Evaluation.
2. the method according to claim 1, wherein by following formula, according to every within the scope of various discrete
The corresponding evidence weight of a target column determines the corresponding evidence weight of various discrete range of the discretization variable:
Wherein, WOE indicates the target variable in the corresponding evidence weight of current discrete range;
NumLabel indicates the sum of the target column of the target variable;
LabelsumIndicate that target complete of the target variable within the scope of current discrete arranges corresponding sample number;
LabeliIndicate i-th target column corresponding sample number of the target variable within the scope of current discrete;
LabelrestIndicate the corresponding sample of target column of the target variable within the scope of current discrete in addition to i-th of target column
This number;
Indicate i-th target column corresponding sample number of the target variable in whole discrete ranges;
Indicate the corresponding sample of target column in the target variable whole discrete range in addition to i-th of target column
This number.
3. corresponding according to each discrete range according to the method described in claim 2, it is characterized in that, by following formula
Evidence weight determines information content corresponding with the discrete range:
Wherein, IV indicates the target variable in the corresponding information content of current discrete range;
Target variable described in LabelT is within the scope of current discrete, the corresponding sample number of reference columns T;
SumLabelTIndicate the target variable in whole discrete ranges, the corresponding sample number of reference columns T;LabelrestTIt indicates
Target column corresponding sample number of the target variable within the scope of current discrete in addition to the reference columns T;
Indicate the corresponding sample of target column in the target variable whole discrete range in addition to the reference columns T
This number.
4. the method according to claim 1, wherein described according to the corresponding information of the discretization variable
Amount evaluates sliding-model control corresponding with the discretization variable, comprising:
It is determining to become with the discretization when the corresponding information content of the discretization variable is less than preset first information amount threshold value
It is feasible to measure corresponding sliding-model control.
5. the method according to claim 1, wherein the method also includes:
Determine the first related coefficient between the target variable and the reference columns;
Determine the second related coefficient between the corresponding discretization variable of the target variable and the reference columns;
It is described according to the corresponding information content pair of discretization variable sliding-model control corresponding with the discretization variable
It is evaluated, comprising:
According to the corresponding information content of the discretization variable, first related coefficient and second related coefficient to institute
Discretization variable is stated to be evaluated.
6. according to the method described in claim 5, it is characterized in that, described according to the corresponding information of the discretization variable
Amount, first related coefficient and second related coefficient evaluate the discretization variable, comprising:
The corresponding information content of the discretization variable be less than preset second information content threshold value and second related coefficient with
When the absolute value of the difference of first related coefficient is less than preset difference threshold, determination is corresponding with the discretization variable
Sliding-model control is feasible.
7. according to the method described in claim 5, it is characterized in that, described according to the corresponding information of the discretization variable
Amount, first related coefficient and second related coefficient evaluate the discretization variable, comprising:
The absolute value of second related coefficient and the difference of first related coefficient is determined as relevance difference;
By information content corresponding to the various discrete variable of the corresponding information content of current discrete variable and the target variable
The sum of ratio be determined as the corresponding information content ratio of current discrete variable;
The sum of the relevance difference and the information content ratio are determined as to the evaluation of estimate of the current discrete variable;
When institute's evaluation values are less than preset Evaluation threshold, determine that sliding-model control corresponding with the discretization variable can
Row.
8. a kind of evaluating apparatus of variable discretization, which is characterized in that described device includes:
Processing module obtains the corresponding discretization variable of the target variable for carrying out discretization to target variable, wherein
The target variable includes at least three target columns, and it is corresponding with reference to classification that the target is classified as the target variable;
First determining module, it is corresponding according to each target column within the scope of various discrete for being directed to each discretization variable
Evidence weight determines the corresponding evidence weight of various discrete range of the discretization variable;
Second determining module, for according to the corresponding evidence weight of each discrete range and reference columns determination and the discrete range pair
The information content answered, and the corresponding information content of the discretization variable is determined according to each information content, wherein the reference columns
For a target column of the target variable;
Evaluation module, for according to the corresponding information content pair of the discretization variable it is corresponding with the discretization variable from
Dispersion processing is evaluated.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of any one of claim 1-7 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-7
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811574683.6A CN109815985A (en) | 2018-12-21 | 2018-12-21 | Evaluation method, device, storage medium and the electronic equipment of variable discretization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811574683.6A CN109815985A (en) | 2018-12-21 | 2018-12-21 | Evaluation method, device, storage medium and the electronic equipment of variable discretization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109815985A true CN109815985A (en) | 2019-05-28 |
Family
ID=66602195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811574683.6A Pending CN109815985A (en) | 2018-12-21 | 2018-12-21 | Evaluation method, device, storage medium and the electronic equipment of variable discretization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109815985A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717650A (en) * | 2019-09-06 | 2020-01-21 | 平安医疗健康管理股份有限公司 | Receipt data processing method and device, computer equipment and storage medium |
CN110958352A (en) * | 2019-11-28 | 2020-04-03 | Tcl移动通信科技(宁波)有限公司 | Network signal display method, device, storage medium and mobile terminal |
-
2018
- 2018-12-21 CN CN201811574683.6A patent/CN109815985A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717650A (en) * | 2019-09-06 | 2020-01-21 | 平安医疗健康管理股份有限公司 | Receipt data processing method and device, computer equipment and storage medium |
CN110958352A (en) * | 2019-11-28 | 2020-04-03 | Tcl移动通信科技(宁波)有限公司 | Network signal display method, device, storage medium and mobile terminal |
CN110958352B (en) * | 2019-11-28 | 2021-04-09 | Tcl移动通信科技(宁波)有限公司 | Network signal display method, device, storage medium and mobile terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tinker | Redshift-space distortions with the halo occupation distribution–II. Analytic model | |
CN104090912B (en) | Information-pushing method and device | |
CN108833458B (en) | Application recommendation method, device, medium and equipment | |
Dainotti et al. | Slope evolution of GRB correlations and cosmology | |
Wesson et al. | Understanding and reducing statistical uncertainties in nebular abundance determinations | |
CN110245080B (en) | Method and device for generating scene test case | |
CN104063286B (en) | The fluency method of testing and device of display content change | |
Ashworth et al. | Exploring the IMF of star clusters: a joint SLUG and LEGUS effort | |
CN106709318A (en) | Recognition method, device and calculation equipment for user equipment uniqueness | |
CN111311104A (en) | Configuration file recommendation method, device and system | |
WO2017000743A1 (en) | Method and device for software recommendation | |
CN109598414A (en) | Risk evaluation model training, methods of risk assessment, device and electronic equipment | |
US10402736B2 (en) | Evaluation system, evaluation method, and computer-readable storage medium | |
CN108932320A (en) | Article search method, apparatus and electronic equipment | |
CN106776925A (en) | A kind of Forecasting Methodology of mobile terminal user's sex, server and system | |
CN109753994A (en) | User's portrait method, apparatus, computer readable storage medium and electronic equipment | |
CN110580217B (en) | Software code health degree detection method, processing method, device and electronic equipment | |
CN109815985A (en) | Evaluation method, device, storage medium and the electronic equipment of variable discretization | |
CN105589853B (en) | A kind of classification catalogue determines method and device, automatic classification method and device | |
CN109657840A (en) | Decision tree generation method, device, computer readable storage medium and electronic equipment | |
CN109753993A (en) | User's portrait method, apparatus, computer readable storage medium and electronic equipment | |
CN110059569A (en) | Biopsy method and device, model evaluation method and apparatus | |
CN109739940A (en) | On-line analytical processing method, apparatus, storage medium and electronic equipment | |
Zali et al. | An initial theoretical usability evaluation model for assessing defence mobile e-based application system | |
KR20140079639A (en) | Method for selecting similar users for collaborative filtering based on earth mover´s distance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190528 |
|
RJ01 | Rejection of invention patent application after publication |