CN109346168A - A kind of method and device of determining data dependence - Google Patents
A kind of method and device of determining data dependence Download PDFInfo
- Publication number
- CN109346168A CN109346168A CN201811012940.7A CN201811012940A CN109346168A CN 109346168 A CN109346168 A CN 109346168A CN 201811012940 A CN201811012940 A CN 201811012940A CN 109346168 A CN109346168 A CN 109346168A
- Authority
- CN
- China
- Prior art keywords
- independent variable
- variable
- value
- spearman
- coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Biomedical Technology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Algebra (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
Abstract
The embodiment of the present application discloses a kind of method and apparatus of determining data dependence, wherein, this method comprises: according to the parameter value of independent variable and the parameter value of dependent variable, calculate the Pearson correlation coefficients and Spearman's correlation coefficient between two groups of data of independent variable and dependent variable, then calculated Pearson correlation coefficients and Spearman's correlation coefficient, a new relevant parameter is determined to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter is between Pearson correlation coefficients and Spearman's correlation coefficient, correlation between independent variable and dependent variable is characterized by the relevant parameter, it no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing which kind of incidence relation is analyzed data have, it can also determine the correlation between data.
Description
Technical field
This application involves field of computer technology, and in particular to a kind of method and device of determining data dependence.
Background technique
It, can be by calculating the related coefficient between two groups of data in order to determine the correlation between two groups of data.Existing
Have in technology, Pearson's (Pearson) related coefficient or Spearman (Spearman) between two groups of data can be calculated
Related coefficient, to determine the correlation between two groups of data.Wherein, Pearson correlation coefficients, which are suitable for two groups of data, has linearly
Under the scene of incidence relation, Spearman's correlation coefficient is suitable under the scene that two groups of data have non-linear correlation relationship, leads to
Often need it is artificial by virtue of experience, selection indicated using Pearson correlation coefficients or Spearman's correlation coefficient two groups of data it
Between correlation.But when needing to carry out data dependence analysis, if there is to analyzed data which kind of association there is to close
It is uncomprehending situation, then can not be accurately selected from Pearson correlation coefficients or Spearman's correlation coefficient.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of method and device of determining data dependence, to solve existing skill
In art when carrying out data dependence analysis, can not accurately it be carried out from Pearson correlation coefficients or Spearman's correlation coefficient
The technical issues of selection.
To solve the above problems, technical solution provided by the embodiments of the present application is as follows:
A kind of method of determining data dependence, which comprises
According to the parameter value of independent variable and the parameter value of dependent variable, calculate between the independent variable and the dependent variable
Pearson correlation coefficients and Spearman's correlation coefficient, the independent variable and the dependent variable have corresponding relationship;
According to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine the independent variable and it is described because
Relevant parameter between variable, the relevant parameter between the independent variable and the dependent variable are greater than or equal to the first numerical value, and
Less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, described
One numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is described
The larger value in Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pearson correlation coefficients and it is described this
Joseph Pearman related coefficient is equal, and first numerical value and the second value are the Pearson correlation coefficients or this described skin
Germania related coefficient.
In one possible implementation, described related according to the Pearson correlation coefficients and the Spearman
Coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value;
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value;
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
In one possible implementation, described related according to the Pearson correlation coefficients and the Spearman
Coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold
When, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are less than or equal to institute
When stating first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
In one possible implementation, the method also includes:
Relevant parameter between the independent variable and the dependent variable is greater than to the independent variable of second threshold, determination is characterized
Independent variable.
In one possible implementation, the method also includes:
Linear equation is established, linear equation equation one end is the dependent variable, the linear equation equation other end
For the sum of each characteristic data items, each characteristic data items are that a feature independent variable is corresponding with this feature independent variable
Regression coefficient product, the feature independent variable in each characteristic data items is all different, the number of the characteristic data items
It measures identical as the quantity of the feature independent variable;
Parameter value after the standardization of the feature independent variable is brought into the parameter value after the standardization of the dependent variable
The linear equation, solution obtain the corresponding regression coefficient of each feature independent variable;
According to the sequence of the corresponding regression coefficient of each feature independent variable, the first row of the feature independent variable is obtained
Sequence result;
According to the sequence of the relevant parameter between each feature independent variable and the dependent variable, the feature is obtained certainly
Second ranking results of variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from small
To big sequence, then if the second ranking results of target signature independent variable are greater than the first sequence knot of the target signature independent variable
Fruit deletes the target signature independent variable from the feature independent variable, and the target signature independent variable is any one institute
State feature independent variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from big
To small sequence, then if the second ranking results of target signature independent variable are less than the first sequence knot of the target signature independent variable
Fruit deletes the target signature independent variable from the feature independent variable.
A kind of device of determining data dependence, described device include:
First computing unit, for calculating the independent variable according to the parameter value of independent variable and the parameter value of dependent variable
Pearson correlation coefficients and Spearman's correlation coefficient between the dependent variable, the independent variable have with the dependent variable
Corresponding relationship;
First determination unit, for determining according to the Pearson correlation coefficients and the Spearman's correlation coefficient
Relevant parameter between the independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are greater than
Or it is equal to the first numerical value, and be less than or equal to second value, if the Pearson correlation coefficients are related to the Spearman
Coefficient is unequal, and first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient,
The second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pierre
Gloomy related coefficient and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the Pearson
Related coefficient or the Spearman's correlation coefficient.
In one possible implementation, first determination unit includes:
First computation subunit is obtained for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient
To third value;
Second computation subunit is obtained for being added the Pearson correlation coefficients with the Spearman's correlation coefficient
To the 4th numerical value;
Third computation subunit, for by the third value divided by multiplied by 2, obtaining the 5th number after the 4th numerical value
Value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the described 5th
Numerical value.
In one possible implementation, first determination unit includes:
Second determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient
When absolute value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient
When absolute value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is described
5th numerical value.
In one possible implementation, described device further include:
Second determination unit, for the relevant parameter between the independent variable and the dependent variable to be greater than second threshold
Independent variable, determination are characterized independent variable.
In one possible implementation, described device further include:
Unit is established, for establishing linear equation, linear equation equation one end is the dependent variable, the linear side
The journey equation other end is the sum of each characteristic data items, and each characteristic data items are the feature independent variable and the spy
The product of the corresponding regression coefficient of independent variable is levied, the feature independent variable in each characteristic data items is all different, the spy
The quantity for levying data item is identical as the quantity of the feature independent variable;
Second computing unit, for by after the standardization of the feature independent variable parameter value and the dependent variable standard
Parameter value after change brings the linear equation into, and solution obtains the corresponding regression coefficient of each feature independent variable;
First sequencing unit, for the sequence according to the corresponding regression coefficient of each feature independent variable, described in acquisition
First ranking results of feature independent variable;
Second sequencing unit, for the row according to the relevant parameter between each feature independent variable and the dependent variable
Sequence obtains the second ranking results of the feature independent variable;
First deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable
Ranking results are to sort from small to large, then if the second ranking results of target signature independent variable are greater than the target signature certainly
First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable, and the target signature is certainly
Variable is any one of feature independent variable;
Second deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable
Ranking results are to sort from large to small, then if the second ranking results of target signature independent variable are less than the target signature certainly
First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable.
A kind of computer readable storage medium is stored with instruction in the computer readable storage medium storing program for executing, works as described instruction
When running on the terminal device, so that the method that the terminal device executes above-mentioned determination data dependence.
A kind of computer program product, when the computer program product is run on the terminal device, so that the terminal
The method of the above-mentioned determination data dependence of equipment.
It can be seen that the embodiment of the present application has the following beneficial effects:
The embodiment of the present application calculates the Pearson correlation coefficients and Si Pi between two groups of data of independent variable and dependent variable simultaneously
Germania related coefficient, then calculated Pearson correlation coefficients and Spearman's correlation coefficient, determine one it is new
Relevant parameter characterize the correlation between independent variable and dependent variable, the value of the relevant parameter be in Pearson correlation coefficients and
Between Spearman's correlation coefficient, the correlation between independent variable and dependent variable is characterized by the relevant parameter, is no longer needed to from Pierre
Gloomy related coefficient and Spearman's correlation coefficient are selected, even if not knowing which kind of incidence relation is analyzed data have,
It can determine the correlation between data.
Detailed description of the invention
Fig. 1 is a kind of flow chart of determining data dependence method provided by the embodiments of the present application;
Fig. 2 (a) is independent variable provided by the embodiments of the present application and the linear exemplary diagram of dependent variable;
Fig. 2 (b) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 2 (c) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 2 (d) is independent variable provided by the embodiments of the present application and the linear exemplary diagram of dependent variable;
Fig. 2 (e) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 3 is a kind of flow chart for the method for removing strongly connected feature independent variable provided by the embodiments of the present application;
Fig. 4 is a kind of structure chart of determining data dependence device provided by the embodiments of the present application.
Specific embodiment
In order to make the above objects, features, and advantages of the present application more apparent, with reference to the accompanying drawing and it is specific real
Mode is applied to be described in further detail the embodiment of the present application.
The technical solution of the application for ease of understanding is below first illustrated the background technique of the application.
To finding in relevance technique study between traditional analysis data, traditional analysis method includes inventor
Pearson method and Spearman method.Wherein, Pearson method is for measuring whether on one wire two datasets close
Face, for measuring the correlation degree between two groups of data in a linear relationship, the absolute value of Pearson correlation coefficient is bigger, table
Bright correlation between the two is stronger.However, this method is relatively specific for analysis in a linear relationship between data, in non-
The data analytical effect of linear relationship is poor;Spearman method is mainly used for analysis in the association between non-linear relation data
Degree, but this method can not be well reflected the correlation degree between data in a linear relationship.When needs are to a large amount of of acquisition
When being associated property of data is analyzed, for not passing through professional training or for the user that the data of acquisition are not known,
A kind of method can not be accurately selected to analyze the relevance of data from above two method.
Based on this, the embodiment of the present application provides a kind of method of determining data dependence, at the same calculate independent variable and because
Pearson correlation coefficients and Spearman's correlation coefficient between two groups of data of variable are determined further according to above-mentioned two related coefficient
The value of one new relevant parameter, the relevant parameter is between Pearson correlation coefficients and Spearman's correlation coefficient, with this
Relevant parameter characterizes the correlation between independent variable and dependent variable, no longer needs to related to Spearman from Pearson correlation coefficients
Coefficient is selected, even if not knowing which kind of incidence relation is analyzed data have, can also determine the correlation between data
Property.
The technical solution of the application for ease of understanding, below in conjunction with attached drawing to a kind of determination provided by the embodiments of the present application
The method of data dependence is illustrated.
Referring to Fig. 1, which is a kind of method flow diagram of determining data dependence provided by the embodiments of the present application, such as Fig. 1
Shown, this method may include:
S101: according to the parameter value of independent variable and the parameter value of dependent variable, the skin between independent variable and dependent variable is calculated
Ademilson related coefficient and Spearman's correlation coefficient.
In the present embodiment, to obtain in mass data set collected, correlation between independent variable and dependent variable can be with
According to the parameter value of each independent variable and the parameter value of dependent variable, the pearson correlation between each independent variable and dependent variable is calculated
Coefficient and Spearman's correlation coefficient.
Wherein, independent variable and dependent variable have corresponding relationship, which can correspond to one certainly for a dependent variable
Variable, or a dependent variable corresponds to multiple independents variable, when corresponding relationship be the latter when, need to calculate each independent variable with
Pearson correlation coefficients and Spearman's correlation coefficient between dependent variable.
For example, needing to carry out multiple inspection to patient, finally by multiple inspection number to determine whether patient suffers from disease A
It is made a definite diagnosis according to item.Wherein, if be considered as dependent variable with disease A, each inspection item is considered as an independent variable, meter
Calculate each inspection item it is corresponding check data with whether between the corresponding parameter value of disease A Pearson correlation coefficients and
Spearman's correlation coefficient can set 1 with the corresponding parameter value of disease A for patient in specific implementation;Not by patient
It is set as 0 with the corresponding parameter value of disease A, so as to calculate two related coefficients between independent variable and dependent variable.
In another example bank is whether certain determining trading activity is fraud, when needing the transaction to this trading activity
Between, transaction amount, multiple transaction attributes such as loco judged, so that comprehensive descision goes out whether this trading activity is to take advantage of
Swindleness behavior.Wherein it is determined that it can be dependent variable, exchange hour, transaction amount and transaction that whether trading activity, which is fraud,
Multiple transaction attributes such as place can be independent variable, calculate two related coefficients between each independent variable and dependent variable.Having
When body is realized, for convenience of calculating, trading activity can be determined as to fraud and be set as 1, be not that fraud is set as 0,
Loco can be indicated with administrative code, wherein administrative code is the province for representing China by different level with six Arabic numerals
(autonomous region, municipality directly under the Central Government), regional (city, state, alliance), county (area, city, flag) title so that independent variable and dependent variable are corresponding
Parameter value be numeric type data, to calculate related coefficient.
It should be noted that above-mentioned two related coefficient is used to characterize the correlation between independent variable and dependent variable, it is related
Absolute coefficient is bigger, shows that the relevance between the independent variable and dependent variable is stronger, that is to say, that the independent variable is to dependent variable
Influence it is bigger.For example, in inspection item independent variable erythrocyte distribution width whether suffered to dependent variable it is related between disease A
Coefficient is larger, and the degree of influence for showing whether erythrocyte distribution width suffers from disease A to patient diagnosed is larger;Alternatively, independent variable
Related coefficient between exchange hour and dependent variable fraud is larger, shows that exchange hour is fraud row to determining trading activity
For degree of influence it is larger.
For ease of understanding, according to the parameter value calculation Pearson correlation coefficients and Si Pi of the parameter value of independent variable and dependent variable
Whether Germania related coefficient is fraud by trading activity of dependent variable, and independent variable is exchange hour, transaction amount and transaction
Be illustrated for place, as shown in table 1, available a plurality of transaction data, include in every transaction data dependent variable and
Multiple independents variable.
As shown in table 1, a plurality of transaction data is obtained, each independent variable and dependent variable are corresponding with ginseng in every transaction data
Numerical value utilizes the corresponding column parameter value of independent variable and dependent variable when calculating the related coefficient between independent variable and dependent variable
Corresponding column parameter value carries out the calculating of related coefficient, below in conjunction with 1 pair of calculating Pearson correlation coefficients of table and this Pierre
Graceful related coefficient is illustrated.
(1) Pearson correlation coefficients are calculated
In specific implementation, it can use formula (1) and calculate the pearson correlation system obtained between independent variable and dependent variable
Number:
Wherein, Pearson correlation coefficients of the r between independent variable xi and dependent variable y;N is that independent variable xi corresponds to parameter value
Number, xijFor corresponding j-th of the parameter value of independent variable xi, yjFor corresponding j-th of the parameter value of dependent variable y.
It is exemplified by Table 1, i=1,2 and 3, N=3, it, will when calculating the correlation coefficient r between independent variable x1 and dependent variable y
Corresponding three parameter values of x1 and corresponding three parameter values of y substitute into formula (1), and the Pearson of x1 and y can be calculated
Independent variable x2, x3 are similarly substituted into above-mentioned formula respectively, can calculate its Pearson between dependent variable by correlation coefficient r
Correlation coefficient r.
(2) Spearman's correlation coefficient is calculated
In specific implementation, it is related to can use the Spearman that formula (2) calculate between acquisition independent variable and dependent variable
Coefficient:
Wherein, Spearman's correlation coefficient of the ρ between independent variable xi and dependent variable y, N are that independent variable xi corresponds to parameter value
Number, xijFor corresponding j-th of the parameter value of independent variable xi, yj is corresponding j-th of the parameter value of dependent variable y,It is corresponding for xi
The average value of parameter value,The average value of parameter value is corresponded to for y.
It is exemplified by Table 1, i=1,2 and 3, N=3, when calculating the correlation coefficient r between independent variable x1 and dependent variable y, first
The average value of corresponding three parameter values of x1 and the average value of corresponding three parameter values of y are calculated, then substitutes into formula (2)
In, the Spearman's correlation coefficient ρ of x1 and y can be calculated, similarly, independent variable x2, x3 are substituted into above-mentioned formula respectively,
Its Spearman's correlation coefficient ρ between dependent variable can be calculated.
By above-mentioned two calculation formula, the Pearson correlation coefficients and Si Pi between independent variable and dependent variable can be determined
Then Germania related coefficient executes S102 according to above-mentioned two related coefficient.
S102: it according to Pearson correlation coefficients and Spearman's correlation coefficient, determines between independent variable and dependent variable
Relevant parameter.
In the present embodiment, using Pearson correlation coefficients and Spearman's correlation coefficient, independent variable and dependent variable are calculated
Between relevant parameter, the relevant parameter be greater than or equal to the first numerical value, and be less than or equal to second value, wherein such as pericarp
Ademilson related coefficient and Spearman's correlation coefficient are unequal, and the first numerical value is Pearson correlation coefficients and Spearman phase relation
Smaller value in number, second value is the larger value in Pearson correlation coefficients and Spearman's correlation coefficient, if Pearson
Related coefficient and Spearman's correlation coefficient are equal, and the first numerical value and second value are Pearson correlation coefficients or Spearman
Related coefficient.
Namely in the embodiment of the present application, Pearson correlation coefficients and when unequal Spearman's correlation coefficient, independent variable
Relevant parameter between dependent variable is between Pearson correlation coefficients and Spearman's correlation coefficient, Pearson correlation coefficients
Relevant parameter and Pearson correlation coefficients or this Pierre when equal with Spearman's correlation coefficient, between independent variable and dependent variable
Graceful related coefficient is equal.
About the phase calculated using Pearson correlation coefficients and Spearman's correlation coefficient between independent variable and dependent variable
Parameter is closed, the embodiment of the present application provides a kind of calculation method, specifically includes: Pearson correlation coefficients are related to Spearman
Multiplication obtains third value;Pearson correlation coefficients are added with Spearman's correlation coefficient, obtain the 4th numerical value;It will
Third value is divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;Determine that the relevant parameter between independent variable and dependent variable is
5th numerical value.
Above-mentioned calculation method for ease of understanding may refer to formula (3):
Wherein, r is Pearson correlation coefficients, and ρ is Spearman's correlation coefficient, and ρ * r is third value, and ρ+r is the 4th number
Value, Coff are the 5th numerical value, i.e. relevant parameter between independent variable and dependent variable.
After obtaining the Pearson correlation coefficients and Spearman's correlation coefficient of independent variable and dependent variable by S101, by two
A related coefficient substitutes into formula (3), can calculate the relevant parameter obtained between independent variable and dependent variable.
In the present embodiment, it is located at Pearson correlation coefficients and Spearman's correlation coefficient due to calculating acquisition relevant parameter
Between, the correlation between independent variable and dependent variable can be characterized, even if so that user does not know analyzed data with which kind of pass
Connection relationship can also determine the correlation between data.
The correlation that can be used for characterizing between independent variable and dependent variable for relevant parameter below in conjunction with attached drawing is said
It is bright.
Referring to fig. 2 (a), discrete point indicates a certain independent variable of acquisition and the data of dependent variable in figure, can from figure
In a linear relationship between independent variable and dependent variable out, calculating and obtaining Pearson correlation coefficients is 1, Spearman's correlation coefficient 1,
Above-mentioned two related coefficient is substituted into formula (3), obtaining relevant parameter Coff value is 1, due to Pearson correlation coefficients absolute value
Bigger, the association between two data is stronger, when Pearson correlation coefficients are 1, show independent variable and dependent variable is linear strong
Correlation can indicate there is strong correlation between independent variable and dependent variable since coff value is also 1.
Referring to fig. 2 (b), discrete point indicates that a certain independent variable of acquisition and the data of dependent variable, straight line indicate variation in figure
Trend, the variation tendency can also embody in related coefficient, when related coefficient is positive value, show dependent variable with independent variable
Increase and increase, when related coefficient is negative value, shows that dependent variable reduces with the increase of independent variable, straight line is upper in Fig. 2 (b)
The trend of liter is in non-linear relation between independent variable and dependent variable, and the Pearson correlation coefficients for calculating acquisition are 0.851, this Pierre
Graceful related coefficient is 1, and above-mentioned two related coefficient is substituted into formula (3), and obtaining relevant parameter coff value is 0.92, due to this skin
The absolute value of Germania related coefficient is bigger, shows that the relevance between two data is stronger, independent variable and dependent variable in Fig. 2 (b)
Spearman's correlation coefficient be 1, show that independent variable and dependent variable are non-linear strong association, but due to calculating the coff value obtained
0.92 is also larger, is only second to 1, and can also characterize between independent variable and dependent variable is to be associated with by force.
Referring to fig. 2 (c), as can be seen from the figure between independent variable and dependent variable in non-linear relation, the skin of acquisition is calculated
Ademilson related coefficient is -0.093, and Spearman's correlation coefficient is -0.093, and above-mentioned two related coefficient is substituted into formula (3),
Obtaining relevant parameter coff value is -0.093, wherein negative sign shows that dependent variable reduces with the increase of independent variable.It is obtained due to calculating
The absolute value for obtaining Spearman's correlation coefficient is smaller, shows that the relevance between independent variable and dependent variable is smaller, obtains due to calculating
The related parameter values obtained are also smaller, and can also characterize is weak rigidity between independent variable and dependent variable.
Referring to fig. 2 (d), as can be seen from the figure in a linear relationship between independent variable and dependent variable, it calculates and obtains Pearson
Related coefficient is -1, and Spearman's correlation coefficient is -1, and above-mentioned two related coefficient is substituted into formula (3), obtains relevant parameter
Coff value is -1, and since Pearson correlation coefficients absolute value is bigger, the association between two data is stronger, when pearson correlation system
When number is -1, show that independent variable and dependent variable are linear strong correlation, since coff value is also -1, can indicate independent variable and because becoming
There is strong correlation between amount.
Fig. 2 (e) is in as can be seen from the figure non-linear relation between independent variable and dependent variable, calculates the skin of acquisition by ginseng
Ademilson related coefficient is -0.799, and Spearman's correlation coefficient is -1, and above-mentioned two related coefficient is substituted into formula (3), is obtained
Relevant parameter coff value is -0.888, since the absolute value of Spearman's correlation coefficient is bigger, shows the pass between two data
Connection property is stronger, and the Spearman's correlation coefficient of independent variable and dependent variable is -1 in Fig. 2 (e), shows that independent variable and dependent variable are non-
Linearly strong association, but it is also larger due to calculating the coff value -0.88 obtained, it is only second to -1, independent variable can also be characterized and because becoming
It is to be associated with by force between amount.
By above-mentioned analysis it is found that using Pearson correlation coefficients and Spearman's correlation coefficient calculate independent variable and because
Relevant parameter between variable can not only be taken into account linear but also can take into account non-linear, and be able to reflect independent variable and dependent variable
Between correlation so that user when facing new data, no longer needs to from Pearson correlation coefficients and Spearman phase relation
Number is selected, and can also determine the correlation between data.
As can be seen from the above description, when obtaining one group of new data, formula (3) is can use and calculate independent variable in new data
With the relevant parameter of dependent variable.However, cannot directly utilize formula (3) when the new data distribution situation of acquisition is more discrete
Relevant parameter is obtained, needs further to sentence Pearson correlation coefficients and the difference of the Spearman's correlation coefficient
It is disconnected, to determine the relevant parameter of independent variable and dependent variable in new data according to judging result, specifically, working as Pearson correlation coefficients
When being greater than first threshold with the absolute value of the difference of Spearman's correlation coefficient, the related ginseng between independent variable and dependent variable is determined
Number is second value;When Pearson correlation coefficients and the absolute value of the difference of Spearman's correlation coefficient are less than or equal to described the
When one threshold value, determine that the relevant parameter between independent variable and dependent variable is the 5th numerical value.
In the present embodiment, the difference between Pearson correlation coefficients and Spearman's correlation coefficient is calculated, and judges difference
Whether value is greater than the first preset threshold, when difference is greater than the first preset threshold, the then relevant parameter between independent variable and dependent variable
For the larger value in Pearson correlation coefficients and Spearman's correlation coefficient;When difference is not more than the first preset threshold, then certainly
Relevant parameter between variable and dependent variable is the coff value obtained using formula (3).Wherein, the first preset threshold usual situation
Under can be set to 0.5, in specific implementation, can be set according to practical situations, the present embodiment is pre- for first
If the setting of threshold value is without limiting.
As can be seen from the above description, the embodiment of the present application can be calculated simultaneously between two groups of data of independent variable and dependent variable
Pearson correlation coefficients and Spearman's correlation coefficient, then calculated Pearson correlation coefficients and Spearman phase
Relationship number determines a new relevant parameter to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter
Between Pearson correlation coefficients and Spearman's correlation coefficient, characterized between independent variable and dependent variable by the relevant parameter
Correlation no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing analyzed data
With which kind of incidence relation, the correlation between data can also be determined.
In practical applications, it is obtained in data acquisition system between independent variable and dependent variable when using above method embodiment calculating
Relevant parameter after, the feature independent variable that can characterize dependent variable, this feature independent variable can also be selected from multiple independents variable
For the independent variable for having changed larger impact to dependent variable, it is based on this, the embodiment of the present application provides a kind of selection feature change certainly
The method of amount, specifically, the relevant parameter between independent variable and dependent variable to be greater than to the independent variable of second threshold, determination is characterized
Independent variable.
In the present embodiment, first determine whether to calculate the relevant parameter between the independent variable obtained and dependent variable by the above method
Whether the second preset threshold is greater than, if it is greater, then the determination of corresponding independent variable is characterized independent variable.For example, the phase of x1 and y
The relevant parameter that the relevant parameter that pass parameter is 0.85, x2 and y is 0.78, x3 and y is 0.56, and the second preset threshold is 0.7, then
X1 and x2 are characterized independent variable.
Wherein, the second preset value is referred to Pearson correlation coefficients and is set with strength of association corresponding relationship, works as skin
When Ademilson related coefficient is located at [0.8,1], show between two data to be extremely strong correlation;When being located at [0.6,0.8], show two
It is strong correlation between a data;When being located at [0.4,0.6], show between two data to be moderate correlation;When [0.2,
When 0.4], show between two data to be weak correlation;When [0,0.2], show between two data for extremely weak correlation or without correlation.
Due to needing strong correlation between the feature independent variable and dependent variable of selection, the second preset threshold can be set as 0.6,
When the relevant parameter between a certain independent variable and dependent variable is greater than 0.6, feature independent variable is determined it as.
It should be noted that the second preset threshold can also be set according to other modes, the present embodiment corresponding second
The setting of preset threshold is without limiting.
In addition, when carrying out feature Variable selection, selected feature independent variable not only need to meet with dependent variable it
Between strong association, it is also necessary to meet between each feature independent variable for weak rigidity, i.e., cannot be strong between each feature independent variable
Association.Therefore, after determination is characterized independent variable, it is also necessary to judge between each feature independent variable whether to be to be associated with by force.When
When between the feature independent variable selected to be associated with by force, need to remove the strong association between feature independent variable.
Based on this, whether it is that strong association and removal are special that the embodiment of the present application provides between a kind of judging characteristic independent variable
Strongly connected method between sign independent variable, is illustrated this method below in conjunction with attached drawing.
Referring to Fig. 3, which is strongly connected method between a kind of removal feature independent variable provided by the embodiments of the present application, such as
Shown in Fig. 3, this method may include:
S301: linear equation is established.
In this example, linear equation is established for the feature independent variable and dependent variable of acquisition, the linear equation equation one
End is dependent variable, and the other end is the sum of each characteristic data items, and each characteristic data items are a feature independent variable and this feature
The product of the corresponding regression coefficient of independent variable, the feature independent variable in each characteristic data items are all different, characteristic data items
Quantity is identical as the quantity of feature independent variable.
In practical applications, each feature independent variable is corresponding with respective regression coefficient, each feature independent variable and its
Corresponding regression coefficient is added again after being multiplied, and forms the other end of linear equation equation.For example, share 7 independent variable x1, x2,
X3, x4, x5, x6, x7, having selected feature independent variable by the above method is x1, x3, x4x5, x7, then the linear equation tool established
Body can show as y=a1*x1+a3*x3+a4*x4+a5*x5+a7*x7, wherein a1, a3, a4, a5 and a7 are characterized certainly respectively
The corresponding regression coefficient of variable x1, x3, x4x5, x7.
S302: the parameter value after the standardization of parameter value and dependent variable after the standardization of feature independent variable is brought into linearly
Equation, solution obtain the corresponding regression coefficient of each feature independent variable.
In this example, influence of the different dimensions to subsequent calculated result, Ke Yixian are corresponded to eliminate different characteristic independent variable
The corresponding parameter value of feature independent variable and the corresponding parameter value of dependent variable are standardized, then by standardized ginseng
Numerical value substitutes into above-mentioned linear equation, calculates the corresponding regression coefficient of each feature independent variable.
It in specific implementation, can be using 0-1 standardized method to the parameter value of feature independent variable and the parameter of dependent variable
Value is normalized, wherein 0-1 standardization is also known as deviation standardization, is to carry out linear transformation to parameter value, falls result
In [0,1] section, transfer function are as follows:
Wherein, x* is the parameter value after standardization, and x is certain feature independent variable or dependent variable corresponding parameter value, a max
For the maximum value of certain feature independent variable or the corresponding whole parameter values of dependent variable, min is that certain feature independent variable or dependent variable are corresponding
Whole parameter values minimum value.
For example, transaction amount is characterized independent variable, the corresponding three parameter value x2 of transaction amount in table 10、x21、x22, from
Determine then a maximum value and a minimum value substitute into above-mentioned transfer function in above three parameter value, to each parameter value
It is standardized, the parameter value after being standardized.
It should be noted that can also be normalized using other standards method, such as min-max standard
Change, the embodiment of the present application to the concrete mode of normalized without limitation.
In addition, due in table 1 loco be using six Arabic numerals composition administrative code indicate,
When being standardized, administrative code can be seen as design parameter value, then be marked using above-mentioned transfer function
Quasi-ization processing.
In specific implementation, by multiple ginsengs of the multiple parameter values of the feature independent variable Jing Guo standardization and dependent variable
Numerical value substitutes into thread equation, to form multiple linear equations, then solves to above-mentioned multiple linear equations, obtains each feature
The corresponding regression coefficient of independent variable.
S303: according to the sequence of the corresponding regression coefficient of each feature independent variable, the first sequence of feature independent variable is obtained
As a result.
In this example, the regression coefficient of acquisition is ranked up, to obtain feature according to the ranking results of regression coefficient
First ranking results of independent variable.In specific implementation, can be ranked up according to sequence from big to small, can also according to from
It is small to be ranked up to big sequence.
For example, be ranked up according to sequence from small to large to a1, a3, a4, a5 and a7, ranking results be a1 < a3 < a5 <
A7 < a4, then the first ranking results of feature independent variable be x1, x3, x5, x7, x4, alternatively, according to sequence from big to small to a1,
A3, a4, a5 and a7 are ranked up, ranking results a4 > a7 > a5 > a3 > a1, then the first ranking results of feature independent variable be x4,
x7、x5、x3、x1。
S304: according to the sequence of the relevant parameter between each feature independent variable and dependent variable, feature independent variable is obtained
Second ranking results.
In this example, the relevant parameter between each feature independent variable and dependent variable is ranked up, thus according to correlation
The ranking results of coefficient obtain the second ranking results of feature independent variable.It in specific implementation, can be suitable according to from big to small
Sequence is ranked up, and can also be ranked up according to sequence from small to large.
For example, the relevant parameter between feature independent variable x1, x3, x4x5, x7 and dependent variable be respectively C1, C3, C4, C5,
C7, C1, C3, C4, C5, C7 are ranked up according to sequence from small to large, ranking results are C1 < C3 < C4 < C7 < C5, then feature
Second ranking results of independent variable are x1, x3, x4, x7, x5;Alternatively, according to sequence from big to small to C1, C3, C4, C5, C7
It is ranked up, ranking results C5 > C7 > C4 > C3 > C1, then the second ranking results of feature independent variable are x5, x7, x4, x3, x1.
S305: when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from small to large
Sequence, then if the second ranking results of target signature independent variable are greater than the first sequence of target signature independent variable as a result, by mesh
Mark feature independent variable is deleted from feature independent variable, and target signature independent variable is any one feature independent variable.
After obtaining two ranking results about feature independent variable by S303 and S304, when the first of feature independent variable
Second ranking results of ranking results and feature independent variable are to sort from small to large, for each feature independent variable, are judged
Whether this feature independent variable is greater than the sequence in the first ranking results in the sequence in the second ranking results, if it does, table
Have between bright this feature independent variable and other feature independents variable and be associated with by force, then deletes this feature independent variable.
For example, when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from small to large
Sequence, then feature independent variable x5 is ordered as the 5th in the second ranking results, and is ordered as third in the first ranking results, the
Five are greater than third, then delete feature independent variable x5.And for feature independent variable x1, x3, x4 and x7, in the second ranking results
In sequence no more than the sequence in the first ranking results, show do not have strong association between features described above independent variable, without carrying out
It deletes.
S306: when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from big to small
Sequence, then if the second ranking results of target signature independent variable are less than the first sequence of target signature independent variable as a result, by mesh
Mark feature independent variable is deleted from feature independent variable.
After obtaining two ranking results about feature independent variable by S303 and S304, when the first of feature independent variable
Second ranking results of ranking results and feature independent variable are to sort from large to small, and for each feature independent variable, are judged
Whether this feature independent variable is less than the sequence in the first ranking results in the sequence in the second ranking results, if it is lower, table
Have between bright this feature independent variable and other feature independents variable and be associated with by force, then deletes this feature independent variable.
For example, when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from big to small
Sequence, then feature independent variable x5 is ordered as first in the second ranking results, and is ordered as third in the first ranking results, the
One is less than third, then deletes feature independent variable x5.And for feature independent variable x1, x3, x4 and x7, in the second ranking results
In sequence not less than the sequence in the first ranking results, show do not have strong association between features described above independent variable, without carrying out
It deletes.
It should be noted that residue can be regained when deleting certain feature independent variable from multiple feature independents variable
Second ranking results of feature independent variable and the first sequence as a result, simultaneously judge the second ranking results of each feature independent variable again
With first sequence as a result, until each feature independent variable the second ranking results and consistent the first ranking results when, terminate on
Judgement is stated, to obtain between each other without strongly connected feature independent variable.
By the above method, whether there can be strong association to judge between the feature independent variable of acquisition, work as presence
When, removal has strongly connected feature independent variable, thus obtain between each other without strongly connected feature independent variable, it is above-mentioned to utilize
Feature independent variable goes characterization dependent variable.
Based on above method embodiment, present invention also provides the devices for determining data dependence, below in conjunction with attached drawing
The device is illustrated.
Referring to fig. 4, which is a kind of structure drawing of device of determining data dependence provided by the embodiments of the present application, such as Fig. 4
It is shown, the apparatus may include:
First computing unit 401, for calculating described from change according to the parameter value of independent variable and the parameter value of dependent variable
Pearson correlation coefficients and Spearman's correlation coefficient between amount and the dependent variable, the independent variable and the dependent variable have
There is corresponding relationship;
First determination unit 402 is used for according to the Pearson correlation coefficients and the Spearman's correlation coefficient, really
Relevant parameter between the fixed independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are big
In or be equal to the first numerical value, and be less than or equal to second value, if the Pearson correlation coefficients and the Spearman phase
Relationship number is unequal, and first numerical value is smaller in the Pearson correlation coefficients and the Spearman's correlation coefficient
Value, the second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if described
Pearson correlation coefficients and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the skin
Ademilson related coefficient or the Spearman's correlation coefficient.
In some possible implementations, first determination unit includes:
First computation subunit is obtained for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient
To third value;
Second computation subunit is obtained for being added the Pearson correlation coefficients with the Spearman's correlation coefficient
To the 4th numerical value;
Third computation subunit, for by the third value divided by multiplied by 2, obtaining the 5th number after the 4th numerical value
Value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the described 5th
Numerical value.
In some possible implementations, first determination unit includes:
Second determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient
When absolute value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient
When absolute value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is described
5th numerical value.
In some possible implementations, described device further include:
Second determination unit, for the relevant parameter between the independent variable and the dependent variable to be greater than second threshold
Independent variable, determination are characterized independent variable.
In some possible implementations, described device further include:
Unit is established, for establishing linear equation, linear equation equation one end is the dependent variable, the linear side
The journey equation other end is the sum of each characteristic data items, and each characteristic data items are the feature independent variable and the spy
The product of the corresponding regression coefficient of independent variable is levied, the feature independent variable in each characteristic data items is all different, the spy
The quantity for levying data item is identical as the quantity of the feature independent variable;
Second computing unit, for by after the standardization of the feature independent variable parameter value and the dependent variable standard
Parameter value after change brings the linear equation into, and solution obtains the corresponding regression coefficient of each feature independent variable;
First sequencing unit, for the sequence according to the corresponding regression coefficient of each feature independent variable, described in acquisition
First ranking results of feature independent variable;
Second sequencing unit, for the row according to the relevant parameter between each feature independent variable and the dependent variable
Sequence obtains the second ranking results of the feature independent variable;
First deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable
Ranking results are to sort from small to large, then if the second ranking results of target signature independent variable are greater than the target signature certainly
First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable, and the target signature is certainly
Variable is any one of feature independent variable;
Second deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable
Ranking results are to sort from large to small, then if the second ranking results of target signature independent variable are less than the target signature certainly
First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable.
It should be noted that the specific implementation of each module or unit may refer to Fig. 1 and Fig. 3 the method in the present embodiment
Realization, details are not described herein for the present embodiment.
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, the computer readable storage medium storing program for executing
In be stored with instruction, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned fixed number evidence really
The method of correlation.
The embodiment of the present application also provides a kind of computer program product, and the computer program product is transported on the terminal device
When row, so that the method that the terminal device executes above-mentioned determination data dependence.
As can be seen from the above embodiments, the embodiment of the present application calculates the Pierre between two groups of data of independent variable and dependent variable simultaneously
Gloomy related coefficient and Spearman's correlation coefficient, then calculated Pearson correlation coefficients and Spearman phase relation
Number, determines a new relevant parameter to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter is in
Between Pearson correlation coefficients and Spearman's correlation coefficient, characterized by the relevant parameter related between independent variable and dependent variable
Property, it no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing that analyzed data have
Which kind of incidence relation can also determine the correlation between data.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said
Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality
For applying system or device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, phase
Place is closed referring to method part illustration.
It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two
More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner
It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word
Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to
Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c
(a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also
To be multiple.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of method of determining data dependence, which is characterized in that the described method includes:
According to the parameter value of independent variable and the parameter value of dependent variable, the Pierre between the independent variable and the dependent variable is calculated
Gloomy related coefficient and Spearman's correlation coefficient, the independent variable and the dependent variable have corresponding relationship;
According to the Pearson correlation coefficients and the Spearman's correlation coefficient, the independent variable and the dependent variable are determined
Between relevant parameter, relevant parameter between the independent variable and the dependent variable is greater than or equal to the first numerical value, and is less than
Or it is equal to second value, and if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, first number
Value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is the Pierre
The larger value in gloomy related coefficient and the Spearman's correlation coefficient, if the Pearson correlation coefficients and this described Pierre
Graceful related coefficient is equal, and first numerical value and the second value are the Pearson correlation coefficients or the Spearman
Related coefficient.
2. the method according to claim 1, wherein it is described according to the Pearson correlation coefficients and it is described this
Joseph Pearman related coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value;
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value;
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
3. according to the method described in claim 2, it is characterized in that, it is described according to the Pearson correlation coefficients and it is described this
Joseph Pearman related coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold, really
Relevant parameter between the fixed independent variable and the dependent variable is the second value;
When the absolute value of the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient is less than or equal to described the
When one threshold value, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
4. the method according to claim 1, wherein the method also includes:
Relevant parameter between the independent variable and the dependent variable is greater than to the independent variable of second threshold, determines and is characterized from change
Amount.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Linear equation is established, linear equation equation one end is the dependent variable, and the linear equation equation other end is each
The sum of a characteristic data items, each characteristic data items be one the feature independent variable corresponding with this feature independent variable time
Return the product of coefficient, the feature independent variable in each characteristic data items is all different, the quantity of the characteristic data items with
The quantity of the feature independent variable is identical;
Parameter value after the standardization of parameter value and the dependent variable after the standardization of the feature independent variable is brought into described
Linear equation, solution obtain the corresponding regression coefficient of each feature independent variable;
According to the sequence of the corresponding regression coefficient of each feature independent variable, the first sequence knot of the feature independent variable is obtained
Fruit;
According to the sequence of the relevant parameter between each feature independent variable and the dependent variable, the feature independent variable is obtained
The second ranking results;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from small to large
Sequence, then if the second ranking results of target signature independent variable be greater than the target signature independent variable first sequence as a result,
The target signature independent variable is deleted from the feature independent variable, the target signature independent variable is any one of spy
Levy independent variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from big to small
Sequence, then if the second ranking results of target signature independent variable be less than the target signature independent variable first sequence as a result,
The target signature independent variable is deleted from the feature independent variable.
6. a kind of device of determining data dependence, which is characterized in that described device includes:
First computing unit, for calculating the independent variable and institute according to the parameter value of independent variable and the parameter value of dependent variable
The Pearson correlation coefficients and Spearman's correlation coefficient between dependent variable are stated, the independent variable has corresponding with the dependent variable
Relationship;
First determination unit, described in determining according to the Pearson correlation coefficients and the Spearman's correlation coefficient
Relevant parameter between independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are greater than or wait
In the first numerical value, and it is less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient
Unequal, first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, described
Second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if Pearson's phase
Relationship number and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the pearson correlation
Coefficient or the Spearman's correlation coefficient.
7. device according to claim 6, which is characterized in that first determination unit includes:
First computation subunit obtains for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient
Three numerical value;
Second computation subunit obtains for being added the Pearson correlation coefficients with the Spearman's correlation coefficient
Four numerical value;
Third computation subunit, for by the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the 5th number
Value.
8. device according to claim 7, which is characterized in that first determination unit includes:
Second determines subelement, for absolute when the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient
When value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for absolute when the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient
When value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the described 5th
Numerical value.
9. a kind of computer readable storage medium, which is characterized in that it is stored with instruction in the computer readable storage medium storing program for executing, when
When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item determining numbers of 1-5
According to the method for correlation.
10. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make
Obtain the method that the terminal device perform claim requires the described in any item determining data dependences of 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811012940.7A CN109346168A (en) | 2018-08-31 | 2018-08-31 | A kind of method and device of determining data dependence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811012940.7A CN109346168A (en) | 2018-08-31 | 2018-08-31 | A kind of method and device of determining data dependence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109346168A true CN109346168A (en) | 2019-02-15 |
Family
ID=65291933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811012940.7A Pending CN109346168A (en) | 2018-08-31 | 2018-08-31 | A kind of method and device of determining data dependence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109346168A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978033A (en) * | 2019-03-15 | 2019-07-05 | 第四范式(北京)技术有限公司 | The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification |
CN109975661A (en) * | 2019-04-22 | 2019-07-05 | 西南交通大学 | A kind of electric transmission line fault detection method based on Spearman's correlation coefficient |
CN110457370A (en) * | 2019-08-12 | 2019-11-15 | 渤海大学 | Outlier Detection system and method for cleaning in data mining based on artificial intelligence |
CN112116480A (en) * | 2019-06-20 | 2020-12-22 | 财付通支付科技有限公司 | Virtual resource determination method and device, computer equipment and storage medium |
CN113269361A (en) * | 2021-05-20 | 2021-08-17 | 国网甘肃省电力有限公司酒泉供电公司 | Power consumption increase prediction method based on power consumer relevance analysis |
CN115146705A (en) * | 2022-05-27 | 2022-10-04 | 南京林业大学 | Method for recognizing forest lightning stroke fire by combining remote sensing and surface lightning stroke fire environment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103458267A (en) * | 2013-09-04 | 2013-12-18 | 中国传媒大学 | Video picture quality subjective evaluation method and system |
CN107025596A (en) * | 2016-02-01 | 2017-08-08 | 腾讯科技(深圳)有限公司 | A kind of methods of risk assessment and system |
CN107656903A (en) * | 2017-08-23 | 2018-02-02 | 中国石油天然气股份有限公司 | The determination method and apparatus of the average index of data volume |
-
2018
- 2018-08-31 CN CN201811012940.7A patent/CN109346168A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103458267A (en) * | 2013-09-04 | 2013-12-18 | 中国传媒大学 | Video picture quality subjective evaluation method and system |
CN107025596A (en) * | 2016-02-01 | 2017-08-08 | 腾讯科技(深圳)有限公司 | A kind of methods of risk assessment and system |
CN107656903A (en) * | 2017-08-23 | 2018-02-02 | 中国石油天然气股份有限公司 | The determination method and apparatus of the average index of data volume |
Non-Patent Citations (2)
Title |
---|
王成安: "流感疫情预测技术研究与主动监控系统的实现", 《湖南大学硕士学位论文》 * |
鲁力: "基于互联网数据的中国流感趋势预测研究", 《湖南大学硕士学位论文》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978033A (en) * | 2019-03-15 | 2019-07-05 | 第四范式(北京)技术有限公司 | The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification |
CN109975661A (en) * | 2019-04-22 | 2019-07-05 | 西南交通大学 | A kind of electric transmission line fault detection method based on Spearman's correlation coefficient |
CN112116480A (en) * | 2019-06-20 | 2020-12-22 | 财付通支付科技有限公司 | Virtual resource determination method and device, computer equipment and storage medium |
CN110457370A (en) * | 2019-08-12 | 2019-11-15 | 渤海大学 | Outlier Detection system and method for cleaning in data mining based on artificial intelligence |
CN113269361A (en) * | 2021-05-20 | 2021-08-17 | 国网甘肃省电力有限公司酒泉供电公司 | Power consumption increase prediction method based on power consumer relevance analysis |
CN115146705A (en) * | 2022-05-27 | 2022-10-04 | 南京林业大学 | Method for recognizing forest lightning stroke fire by combining remote sensing and surface lightning stroke fire environment |
CN115146705B (en) * | 2022-05-27 | 2023-08-01 | 南京林业大学 | Method for identifying forest lightning fire by combining remote sensing with surface lightning fire environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109346168A (en) | A kind of method and device of determining data dependence | |
TWI772673B (en) | Industry identification model determination method and device | |
Hartson et al. | Criteria for evaluating usability evaluation methods | |
Mathews et al. | Judgemental revision of sales forecasts: Effectiveness of forecast selection | |
CN108833458B (en) | Application recommendation method, device, medium and equipment | |
Courtney et al. | Shotgun correlations in software measures | |
EP1361526A1 (en) | Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value | |
Chen et al. | Data envelopment analysis with missing data: A multiple linear regression analysis approach | |
CN108764705A (en) | A kind of data quality accessment platform and method | |
CN110032650A (en) | A kind of generation method, device and the electronic equipment of training sample data | |
CN111242318A (en) | Business model training method and device based on heterogeneous feature library | |
JP2011203976A (en) | Diagnostic support apparatus | |
CN110362481A (en) | Automatic test approach and terminal device | |
Xie et al. | Evaluating performance of super-efficiency models in ranking efficient decision-making units based on Monte Carlo simulations | |
CN103678709B (en) | Recommendation system attack detection method based on time series data | |
Chen et al. | Mutual fund performance evaluation–application of system BCC model | |
CN105488061B (en) | A kind of method and device of verify data validity | |
Zhang et al. | % CRTFASTGEEPWR: a SAS macro for power of the generalized estimating equations of multi-period cluster randomized trials with application to stepped wedge designs | |
CN108491189B (en) | Method for evaluating design class diagram based on difference comparison | |
CN108829750A (en) | A kind of quality of data determines system and method | |
CN106815290B (en) | Method and device for determining attribution of bank card based on graph mining | |
CN108985606A (en) | Enterprise's similarity system design method and system | |
CN111144910B (en) | Bidding 'series bid, companion bid' object recommendation method and device based on fuzzy entropy mean shadow album | |
CN106846136A (en) | A kind of data comparison method and equipment | |
CN106301880A (en) | One determines that cyberrelationship degree of stability, Internet service recommend method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |