CN109346168A - A kind of method and device of determining data dependence - Google Patents

A kind of method and device of determining data dependence Download PDF

Info

Publication number
CN109346168A
CN109346168A CN201811012940.7A CN201811012940A CN109346168A CN 109346168 A CN109346168 A CN 109346168A CN 201811012940 A CN201811012940 A CN 201811012940A CN 109346168 A CN109346168 A CN 109346168A
Authority
CN
China
Prior art keywords
independent variable
variable
value
spearman
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811012940.7A
Other languages
Chinese (zh)
Inventor
孙浩
高睿
邹存璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811012940.7A priority Critical patent/CN109346168A/en
Publication of CN109346168A publication Critical patent/CN109346168A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Algebra (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)

Abstract

The embodiment of the present application discloses a kind of method and apparatus of determining data dependence, wherein, this method comprises: according to the parameter value of independent variable and the parameter value of dependent variable, calculate the Pearson correlation coefficients and Spearman's correlation coefficient between two groups of data of independent variable and dependent variable, then calculated Pearson correlation coefficients and Spearman's correlation coefficient, a new relevant parameter is determined to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter is between Pearson correlation coefficients and Spearman's correlation coefficient, correlation between independent variable and dependent variable is characterized by the relevant parameter, it no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing which kind of incidence relation is analyzed data have, it can also determine the correlation between data.

Description

A kind of method and device of determining data dependence
Technical field
This application involves field of computer technology, and in particular to a kind of method and device of determining data dependence.
Background technique
It, can be by calculating the related coefficient between two groups of data in order to determine the correlation between two groups of data.Existing Have in technology, Pearson's (Pearson) related coefficient or Spearman (Spearman) between two groups of data can be calculated Related coefficient, to determine the correlation between two groups of data.Wherein, Pearson correlation coefficients, which are suitable for two groups of data, has linearly Under the scene of incidence relation, Spearman's correlation coefficient is suitable under the scene that two groups of data have non-linear correlation relationship, leads to Often need it is artificial by virtue of experience, selection indicated using Pearson correlation coefficients or Spearman's correlation coefficient two groups of data it Between correlation.But when needing to carry out data dependence analysis, if there is to analyzed data which kind of association there is to close It is uncomprehending situation, then can not be accurately selected from Pearson correlation coefficients or Spearman's correlation coefficient.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of method and device of determining data dependence, to solve existing skill In art when carrying out data dependence analysis, can not accurately it be carried out from Pearson correlation coefficients or Spearman's correlation coefficient The technical issues of selection.
To solve the above problems, technical solution provided by the embodiments of the present application is as follows:
A kind of method of determining data dependence, which comprises
According to the parameter value of independent variable and the parameter value of dependent variable, calculate between the independent variable and the dependent variable Pearson correlation coefficients and Spearman's correlation coefficient, the independent variable and the dependent variable have corresponding relationship;
According to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine the independent variable and it is described because Relevant parameter between variable, the relevant parameter between the independent variable and the dependent variable are greater than or equal to the first numerical value, and Less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, described One numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is described The larger value in Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pearson correlation coefficients and it is described this Joseph Pearman related coefficient is equal, and first numerical value and the second value are the Pearson correlation coefficients or this described skin Germania related coefficient.
In one possible implementation, described related according to the Pearson correlation coefficients and the Spearman Coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value;
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value;
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
In one possible implementation, described related according to the Pearson correlation coefficients and the Spearman Coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold When, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are less than or equal to institute When stating first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
In one possible implementation, the method also includes:
Relevant parameter between the independent variable and the dependent variable is greater than to the independent variable of second threshold, determination is characterized Independent variable.
In one possible implementation, the method also includes:
Linear equation is established, linear equation equation one end is the dependent variable, the linear equation equation other end For the sum of each characteristic data items, each characteristic data items are that a feature independent variable is corresponding with this feature independent variable Regression coefficient product, the feature independent variable in each characteristic data items is all different, the number of the characteristic data items It measures identical as the quantity of the feature independent variable;
Parameter value after the standardization of the feature independent variable is brought into the parameter value after the standardization of the dependent variable The linear equation, solution obtain the corresponding regression coefficient of each feature independent variable;
According to the sequence of the corresponding regression coefficient of each feature independent variable, the first row of the feature independent variable is obtained Sequence result;
According to the sequence of the relevant parameter between each feature independent variable and the dependent variable, the feature is obtained certainly Second ranking results of variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from small To big sequence, then if the second ranking results of target signature independent variable are greater than the first sequence knot of the target signature independent variable Fruit deletes the target signature independent variable from the feature independent variable, and the target signature independent variable is any one institute State feature independent variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from big To small sequence, then if the second ranking results of target signature independent variable are less than the first sequence knot of the target signature independent variable Fruit deletes the target signature independent variable from the feature independent variable.
A kind of device of determining data dependence, described device include:
First computing unit, for calculating the independent variable according to the parameter value of independent variable and the parameter value of dependent variable Pearson correlation coefficients and Spearman's correlation coefficient between the dependent variable, the independent variable have with the dependent variable Corresponding relationship;
First determination unit, for determining according to the Pearson correlation coefficients and the Spearman's correlation coefficient Relevant parameter between the independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are greater than Or it is equal to the first numerical value, and be less than or equal to second value, if the Pearson correlation coefficients are related to the Spearman Coefficient is unequal, and first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, The second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pierre Gloomy related coefficient and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the Pearson Related coefficient or the Spearman's correlation coefficient.
In one possible implementation, first determination unit includes:
First computation subunit is obtained for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient To third value;
Second computation subunit is obtained for being added the Pearson correlation coefficients with the Spearman's correlation coefficient To the 4th numerical value;
Third computation subunit, for by the third value divided by multiplied by 2, obtaining the 5th number after the 4th numerical value Value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the described 5th Numerical value.
In one possible implementation, first determination unit includes:
Second determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient When absolute value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient When absolute value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is described 5th numerical value.
In one possible implementation, described device further include:
Second determination unit, for the relevant parameter between the independent variable and the dependent variable to be greater than second threshold Independent variable, determination are characterized independent variable.
In one possible implementation, described device further include:
Unit is established, for establishing linear equation, linear equation equation one end is the dependent variable, the linear side The journey equation other end is the sum of each characteristic data items, and each characteristic data items are the feature independent variable and the spy The product of the corresponding regression coefficient of independent variable is levied, the feature independent variable in each characteristic data items is all different, the spy The quantity for levying data item is identical as the quantity of the feature independent variable;
Second computing unit, for by after the standardization of the feature independent variable parameter value and the dependent variable standard Parameter value after change brings the linear equation into, and solution obtains the corresponding regression coefficient of each feature independent variable;
First sequencing unit, for the sequence according to the corresponding regression coefficient of each feature independent variable, described in acquisition First ranking results of feature independent variable;
Second sequencing unit, for the row according to the relevant parameter between each feature independent variable and the dependent variable Sequence obtains the second ranking results of the feature independent variable;
First deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable Ranking results are to sort from small to large, then if the second ranking results of target signature independent variable are greater than the target signature certainly First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable, and the target signature is certainly Variable is any one of feature independent variable;
Second deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable Ranking results are to sort from large to small, then if the second ranking results of target signature independent variable are less than the target signature certainly First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable.
A kind of computer readable storage medium is stored with instruction in the computer readable storage medium storing program for executing, works as described instruction When running on the terminal device, so that the method that the terminal device executes above-mentioned determination data dependence.
A kind of computer program product, when the computer program product is run on the terminal device, so that the terminal The method of the above-mentioned determination data dependence of equipment.
It can be seen that the embodiment of the present application has the following beneficial effects:
The embodiment of the present application calculates the Pearson correlation coefficients and Si Pi between two groups of data of independent variable and dependent variable simultaneously Germania related coefficient, then calculated Pearson correlation coefficients and Spearman's correlation coefficient, determine one it is new Relevant parameter characterize the correlation between independent variable and dependent variable, the value of the relevant parameter be in Pearson correlation coefficients and Between Spearman's correlation coefficient, the correlation between independent variable and dependent variable is characterized by the relevant parameter, is no longer needed to from Pierre Gloomy related coefficient and Spearman's correlation coefficient are selected, even if not knowing which kind of incidence relation is analyzed data have, It can determine the correlation between data.
Detailed description of the invention
Fig. 1 is a kind of flow chart of determining data dependence method provided by the embodiments of the present application;
Fig. 2 (a) is independent variable provided by the embodiments of the present application and the linear exemplary diagram of dependent variable;
Fig. 2 (b) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 2 (c) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 2 (d) is independent variable provided by the embodiments of the present application and the linear exemplary diagram of dependent variable;
Fig. 2 (e) is independent variable provided by the embodiments of the present application and dependent variable is in non-linear exemplary diagram;
Fig. 3 is a kind of flow chart for the method for removing strongly connected feature independent variable provided by the embodiments of the present application;
Fig. 4 is a kind of structure chart of determining data dependence device provided by the embodiments of the present application.
Specific embodiment
In order to make the above objects, features, and advantages of the present application more apparent, with reference to the accompanying drawing and it is specific real Mode is applied to be described in further detail the embodiment of the present application.
The technical solution of the application for ease of understanding is below first illustrated the background technique of the application.
To finding in relevance technique study between traditional analysis data, traditional analysis method includes inventor Pearson method and Spearman method.Wherein, Pearson method is for measuring whether on one wire two datasets close Face, for measuring the correlation degree between two groups of data in a linear relationship, the absolute value of Pearson correlation coefficient is bigger, table Bright correlation between the two is stronger.However, this method is relatively specific for analysis in a linear relationship between data, in non- The data analytical effect of linear relationship is poor;Spearman method is mainly used for analysis in the association between non-linear relation data Degree, but this method can not be well reflected the correlation degree between data in a linear relationship.When needs are to a large amount of of acquisition When being associated property of data is analyzed, for not passing through professional training or for the user that the data of acquisition are not known, A kind of method can not be accurately selected to analyze the relevance of data from above two method.
Based on this, the embodiment of the present application provides a kind of method of determining data dependence, at the same calculate independent variable and because Pearson correlation coefficients and Spearman's correlation coefficient between two groups of data of variable are determined further according to above-mentioned two related coefficient The value of one new relevant parameter, the relevant parameter is between Pearson correlation coefficients and Spearman's correlation coefficient, with this Relevant parameter characterizes the correlation between independent variable and dependent variable, no longer needs to related to Spearman from Pearson correlation coefficients Coefficient is selected, even if not knowing which kind of incidence relation is analyzed data have, can also determine the correlation between data Property.
The technical solution of the application for ease of understanding, below in conjunction with attached drawing to a kind of determination provided by the embodiments of the present application The method of data dependence is illustrated.
Referring to Fig. 1, which is a kind of method flow diagram of determining data dependence provided by the embodiments of the present application, such as Fig. 1 Shown, this method may include:
S101: according to the parameter value of independent variable and the parameter value of dependent variable, the skin between independent variable and dependent variable is calculated Ademilson related coefficient and Spearman's correlation coefficient.
In the present embodiment, to obtain in mass data set collected, correlation between independent variable and dependent variable can be with According to the parameter value of each independent variable and the parameter value of dependent variable, the pearson correlation between each independent variable and dependent variable is calculated Coefficient and Spearman's correlation coefficient.
Wherein, independent variable and dependent variable have corresponding relationship, which can correspond to one certainly for a dependent variable Variable, or a dependent variable corresponds to multiple independents variable, when corresponding relationship be the latter when, need to calculate each independent variable with Pearson correlation coefficients and Spearman's correlation coefficient between dependent variable.
For example, needing to carry out multiple inspection to patient, finally by multiple inspection number to determine whether patient suffers from disease A It is made a definite diagnosis according to item.Wherein, if be considered as dependent variable with disease A, each inspection item is considered as an independent variable, meter Calculate each inspection item it is corresponding check data with whether between the corresponding parameter value of disease A Pearson correlation coefficients and Spearman's correlation coefficient can set 1 with the corresponding parameter value of disease A for patient in specific implementation;Not by patient It is set as 0 with the corresponding parameter value of disease A, so as to calculate two related coefficients between independent variable and dependent variable.
In another example bank is whether certain determining trading activity is fraud, when needing the transaction to this trading activity Between, transaction amount, multiple transaction attributes such as loco judged, so that comprehensive descision goes out whether this trading activity is to take advantage of Swindleness behavior.Wherein it is determined that it can be dependent variable, exchange hour, transaction amount and transaction that whether trading activity, which is fraud, Multiple transaction attributes such as place can be independent variable, calculate two related coefficients between each independent variable and dependent variable.Having When body is realized, for convenience of calculating, trading activity can be determined as to fraud and be set as 1, be not that fraud is set as 0, Loco can be indicated with administrative code, wherein administrative code is the province for representing China by different level with six Arabic numerals (autonomous region, municipality directly under the Central Government), regional (city, state, alliance), county (area, city, flag) title so that independent variable and dependent variable are corresponding Parameter value be numeric type data, to calculate related coefficient.
It should be noted that above-mentioned two related coefficient is used to characterize the correlation between independent variable and dependent variable, it is related Absolute coefficient is bigger, shows that the relevance between the independent variable and dependent variable is stronger, that is to say, that the independent variable is to dependent variable Influence it is bigger.For example, in inspection item independent variable erythrocyte distribution width whether suffered to dependent variable it is related between disease A Coefficient is larger, and the degree of influence for showing whether erythrocyte distribution width suffers from disease A to patient diagnosed is larger;Alternatively, independent variable Related coefficient between exchange hour and dependent variable fraud is larger, shows that exchange hour is fraud row to determining trading activity For degree of influence it is larger.
For ease of understanding, according to the parameter value calculation Pearson correlation coefficients and Si Pi of the parameter value of independent variable and dependent variable Whether Germania related coefficient is fraud by trading activity of dependent variable, and independent variable is exchange hour, transaction amount and transaction Be illustrated for place, as shown in table 1, available a plurality of transaction data, include in every transaction data dependent variable and Multiple independents variable.
As shown in table 1, a plurality of transaction data is obtained, each independent variable and dependent variable are corresponding with ginseng in every transaction data Numerical value utilizes the corresponding column parameter value of independent variable and dependent variable when calculating the related coefficient between independent variable and dependent variable Corresponding column parameter value carries out the calculating of related coefficient, below in conjunction with 1 pair of calculating Pearson correlation coefficients of table and this Pierre Graceful related coefficient is illustrated.
(1) Pearson correlation coefficients are calculated
In specific implementation, it can use formula (1) and calculate the pearson correlation system obtained between independent variable and dependent variable Number:
Wherein, Pearson correlation coefficients of the r between independent variable xi and dependent variable y;N is that independent variable xi corresponds to parameter value Number, xijFor corresponding j-th of the parameter value of independent variable xi, yjFor corresponding j-th of the parameter value of dependent variable y.
It is exemplified by Table 1, i=1,2 and 3, N=3, it, will when calculating the correlation coefficient r between independent variable x1 and dependent variable y Corresponding three parameter values of x1 and corresponding three parameter values of y substitute into formula (1), and the Pearson of x1 and y can be calculated Independent variable x2, x3 are similarly substituted into above-mentioned formula respectively, can calculate its Pearson between dependent variable by correlation coefficient r Correlation coefficient r.
(2) Spearman's correlation coefficient is calculated
In specific implementation, it is related to can use the Spearman that formula (2) calculate between acquisition independent variable and dependent variable Coefficient:
Wherein, Spearman's correlation coefficient of the ρ between independent variable xi and dependent variable y, N are that independent variable xi corresponds to parameter value Number, xijFor corresponding j-th of the parameter value of independent variable xi, yj is corresponding j-th of the parameter value of dependent variable y,It is corresponding for xi The average value of parameter value,The average value of parameter value is corresponded to for y.
It is exemplified by Table 1, i=1,2 and 3, N=3, when calculating the correlation coefficient r between independent variable x1 and dependent variable y, first The average value of corresponding three parameter values of x1 and the average value of corresponding three parameter values of y are calculated, then substitutes into formula (2) In, the Spearman's correlation coefficient ρ of x1 and y can be calculated, similarly, independent variable x2, x3 are substituted into above-mentioned formula respectively, Its Spearman's correlation coefficient ρ between dependent variable can be calculated.
By above-mentioned two calculation formula, the Pearson correlation coefficients and Si Pi between independent variable and dependent variable can be determined Then Germania related coefficient executes S102 according to above-mentioned two related coefficient.
S102: it according to Pearson correlation coefficients and Spearman's correlation coefficient, determines between independent variable and dependent variable Relevant parameter.
In the present embodiment, using Pearson correlation coefficients and Spearman's correlation coefficient, independent variable and dependent variable are calculated Between relevant parameter, the relevant parameter be greater than or equal to the first numerical value, and be less than or equal to second value, wherein such as pericarp Ademilson related coefficient and Spearman's correlation coefficient are unequal, and the first numerical value is Pearson correlation coefficients and Spearman phase relation Smaller value in number, second value is the larger value in Pearson correlation coefficients and Spearman's correlation coefficient, if Pearson Related coefficient and Spearman's correlation coefficient are equal, and the first numerical value and second value are Pearson correlation coefficients or Spearman Related coefficient.
Namely in the embodiment of the present application, Pearson correlation coefficients and when unequal Spearman's correlation coefficient, independent variable Relevant parameter between dependent variable is between Pearson correlation coefficients and Spearman's correlation coefficient, Pearson correlation coefficients Relevant parameter and Pearson correlation coefficients or this Pierre when equal with Spearman's correlation coefficient, between independent variable and dependent variable Graceful related coefficient is equal.
About the phase calculated using Pearson correlation coefficients and Spearman's correlation coefficient between independent variable and dependent variable Parameter is closed, the embodiment of the present application provides a kind of calculation method, specifically includes: Pearson correlation coefficients are related to Spearman Multiplication obtains third value;Pearson correlation coefficients are added with Spearman's correlation coefficient, obtain the 4th numerical value;It will Third value is divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;Determine that the relevant parameter between independent variable and dependent variable is 5th numerical value.
Above-mentioned calculation method for ease of understanding may refer to formula (3):
Wherein, r is Pearson correlation coefficients, and ρ is Spearman's correlation coefficient, and ρ * r is third value, and ρ+r is the 4th number Value, Coff are the 5th numerical value, i.e. relevant parameter between independent variable and dependent variable.
After obtaining the Pearson correlation coefficients and Spearman's correlation coefficient of independent variable and dependent variable by S101, by two A related coefficient substitutes into formula (3), can calculate the relevant parameter obtained between independent variable and dependent variable.
In the present embodiment, it is located at Pearson correlation coefficients and Spearman's correlation coefficient due to calculating acquisition relevant parameter Between, the correlation between independent variable and dependent variable can be characterized, even if so that user does not know analyzed data with which kind of pass Connection relationship can also determine the correlation between data.
The correlation that can be used for characterizing between independent variable and dependent variable for relevant parameter below in conjunction with attached drawing is said It is bright.
Referring to fig. 2 (a), discrete point indicates a certain independent variable of acquisition and the data of dependent variable in figure, can from figure In a linear relationship between independent variable and dependent variable out, calculating and obtaining Pearson correlation coefficients is 1, Spearman's correlation coefficient 1, Above-mentioned two related coefficient is substituted into formula (3), obtaining relevant parameter Coff value is 1, due to Pearson correlation coefficients absolute value Bigger, the association between two data is stronger, when Pearson correlation coefficients are 1, show independent variable and dependent variable is linear strong Correlation can indicate there is strong correlation between independent variable and dependent variable since coff value is also 1.
Referring to fig. 2 (b), discrete point indicates that a certain independent variable of acquisition and the data of dependent variable, straight line indicate variation in figure Trend, the variation tendency can also embody in related coefficient, when related coefficient is positive value, show dependent variable with independent variable Increase and increase, when related coefficient is negative value, shows that dependent variable reduces with the increase of independent variable, straight line is upper in Fig. 2 (b) The trend of liter is in non-linear relation between independent variable and dependent variable, and the Pearson correlation coefficients for calculating acquisition are 0.851, this Pierre Graceful related coefficient is 1, and above-mentioned two related coefficient is substituted into formula (3), and obtaining relevant parameter coff value is 0.92, due to this skin The absolute value of Germania related coefficient is bigger, shows that the relevance between two data is stronger, independent variable and dependent variable in Fig. 2 (b) Spearman's correlation coefficient be 1, show that independent variable and dependent variable are non-linear strong association, but due to calculating the coff value obtained 0.92 is also larger, is only second to 1, and can also characterize between independent variable and dependent variable is to be associated with by force.
Referring to fig. 2 (c), as can be seen from the figure between independent variable and dependent variable in non-linear relation, the skin of acquisition is calculated Ademilson related coefficient is -0.093, and Spearman's correlation coefficient is -0.093, and above-mentioned two related coefficient is substituted into formula (3), Obtaining relevant parameter coff value is -0.093, wherein negative sign shows that dependent variable reduces with the increase of independent variable.It is obtained due to calculating The absolute value for obtaining Spearman's correlation coefficient is smaller, shows that the relevance between independent variable and dependent variable is smaller, obtains due to calculating The related parameter values obtained are also smaller, and can also characterize is weak rigidity between independent variable and dependent variable.
Referring to fig. 2 (d), as can be seen from the figure in a linear relationship between independent variable and dependent variable, it calculates and obtains Pearson Related coefficient is -1, and Spearman's correlation coefficient is -1, and above-mentioned two related coefficient is substituted into formula (3), obtains relevant parameter Coff value is -1, and since Pearson correlation coefficients absolute value is bigger, the association between two data is stronger, when pearson correlation system When number is -1, show that independent variable and dependent variable are linear strong correlation, since coff value is also -1, can indicate independent variable and because becoming There is strong correlation between amount.
Fig. 2 (e) is in as can be seen from the figure non-linear relation between independent variable and dependent variable, calculates the skin of acquisition by ginseng Ademilson related coefficient is -0.799, and Spearman's correlation coefficient is -1, and above-mentioned two related coefficient is substituted into formula (3), is obtained Relevant parameter coff value is -0.888, since the absolute value of Spearman's correlation coefficient is bigger, shows the pass between two data Connection property is stronger, and the Spearman's correlation coefficient of independent variable and dependent variable is -1 in Fig. 2 (e), shows that independent variable and dependent variable are non- Linearly strong association, but it is also larger due to calculating the coff value -0.88 obtained, it is only second to -1, independent variable can also be characterized and because becoming It is to be associated with by force between amount.
By above-mentioned analysis it is found that using Pearson correlation coefficients and Spearman's correlation coefficient calculate independent variable and because Relevant parameter between variable can not only be taken into account linear but also can take into account non-linear, and be able to reflect independent variable and dependent variable Between correlation so that user when facing new data, no longer needs to from Pearson correlation coefficients and Spearman phase relation Number is selected, and can also determine the correlation between data.
As can be seen from the above description, when obtaining one group of new data, formula (3) is can use and calculate independent variable in new data With the relevant parameter of dependent variable.However, cannot directly utilize formula (3) when the new data distribution situation of acquisition is more discrete Relevant parameter is obtained, needs further to sentence Pearson correlation coefficients and the difference of the Spearman's correlation coefficient It is disconnected, to determine the relevant parameter of independent variable and dependent variable in new data according to judging result, specifically, working as Pearson correlation coefficients When being greater than first threshold with the absolute value of the difference of Spearman's correlation coefficient, the related ginseng between independent variable and dependent variable is determined Number is second value;When Pearson correlation coefficients and the absolute value of the difference of Spearman's correlation coefficient are less than or equal to described the When one threshold value, determine that the relevant parameter between independent variable and dependent variable is the 5th numerical value.
In the present embodiment, the difference between Pearson correlation coefficients and Spearman's correlation coefficient is calculated, and judges difference Whether value is greater than the first preset threshold, when difference is greater than the first preset threshold, the then relevant parameter between independent variable and dependent variable For the larger value in Pearson correlation coefficients and Spearman's correlation coefficient;When difference is not more than the first preset threshold, then certainly Relevant parameter between variable and dependent variable is the coff value obtained using formula (3).Wherein, the first preset threshold usual situation Under can be set to 0.5, in specific implementation, can be set according to practical situations, the present embodiment is pre- for first If the setting of threshold value is without limiting.
As can be seen from the above description, the embodiment of the present application can be calculated simultaneously between two groups of data of independent variable and dependent variable Pearson correlation coefficients and Spearman's correlation coefficient, then calculated Pearson correlation coefficients and Spearman phase Relationship number determines a new relevant parameter to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter Between Pearson correlation coefficients and Spearman's correlation coefficient, characterized between independent variable and dependent variable by the relevant parameter Correlation no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing analyzed data With which kind of incidence relation, the correlation between data can also be determined.
In practical applications, it is obtained in data acquisition system between independent variable and dependent variable when using above method embodiment calculating Relevant parameter after, the feature independent variable that can characterize dependent variable, this feature independent variable can also be selected from multiple independents variable For the independent variable for having changed larger impact to dependent variable, it is based on this, the embodiment of the present application provides a kind of selection feature change certainly The method of amount, specifically, the relevant parameter between independent variable and dependent variable to be greater than to the independent variable of second threshold, determination is characterized Independent variable.
In the present embodiment, first determine whether to calculate the relevant parameter between the independent variable obtained and dependent variable by the above method Whether the second preset threshold is greater than, if it is greater, then the determination of corresponding independent variable is characterized independent variable.For example, the phase of x1 and y The relevant parameter that the relevant parameter that pass parameter is 0.85, x2 and y is 0.78, x3 and y is 0.56, and the second preset threshold is 0.7, then X1 and x2 are characterized independent variable.
Wherein, the second preset value is referred to Pearson correlation coefficients and is set with strength of association corresponding relationship, works as skin When Ademilson related coefficient is located at [0.8,1], show between two data to be extremely strong correlation;When being located at [0.6,0.8], show two It is strong correlation between a data;When being located at [0.4,0.6], show between two data to be moderate correlation;When [0.2, When 0.4], show between two data to be weak correlation;When [0,0.2], show between two data for extremely weak correlation or without correlation. Due to needing strong correlation between the feature independent variable and dependent variable of selection, the second preset threshold can be set as 0.6, When the relevant parameter between a certain independent variable and dependent variable is greater than 0.6, feature independent variable is determined it as.
It should be noted that the second preset threshold can also be set according to other modes, the present embodiment corresponding second The setting of preset threshold is without limiting.
In addition, when carrying out feature Variable selection, selected feature independent variable not only need to meet with dependent variable it Between strong association, it is also necessary to meet between each feature independent variable for weak rigidity, i.e., cannot be strong between each feature independent variable Association.Therefore, after determination is characterized independent variable, it is also necessary to judge between each feature independent variable whether to be to be associated with by force.When When between the feature independent variable selected to be associated with by force, need to remove the strong association between feature independent variable.
Based on this, whether it is that strong association and removal are special that the embodiment of the present application provides between a kind of judging characteristic independent variable Strongly connected method between sign independent variable, is illustrated this method below in conjunction with attached drawing.
Referring to Fig. 3, which is strongly connected method between a kind of removal feature independent variable provided by the embodiments of the present application, such as Shown in Fig. 3, this method may include:
S301: linear equation is established.
In this example, linear equation is established for the feature independent variable and dependent variable of acquisition, the linear equation equation one End is dependent variable, and the other end is the sum of each characteristic data items, and each characteristic data items are a feature independent variable and this feature The product of the corresponding regression coefficient of independent variable, the feature independent variable in each characteristic data items are all different, characteristic data items Quantity is identical as the quantity of feature independent variable.
In practical applications, each feature independent variable is corresponding with respective regression coefficient, each feature independent variable and its Corresponding regression coefficient is added again after being multiplied, and forms the other end of linear equation equation.For example, share 7 independent variable x1, x2, X3, x4, x5, x6, x7, having selected feature independent variable by the above method is x1, x3, x4x5, x7, then the linear equation tool established Body can show as y=a1*x1+a3*x3+a4*x4+a5*x5+a7*x7, wherein a1, a3, a4, a5 and a7 are characterized certainly respectively The corresponding regression coefficient of variable x1, x3, x4x5, x7.
S302: the parameter value after the standardization of parameter value and dependent variable after the standardization of feature independent variable is brought into linearly Equation, solution obtain the corresponding regression coefficient of each feature independent variable.
In this example, influence of the different dimensions to subsequent calculated result, Ke Yixian are corresponded to eliminate different characteristic independent variable The corresponding parameter value of feature independent variable and the corresponding parameter value of dependent variable are standardized, then by standardized ginseng Numerical value substitutes into above-mentioned linear equation, calculates the corresponding regression coefficient of each feature independent variable.
It in specific implementation, can be using 0-1 standardized method to the parameter value of feature independent variable and the parameter of dependent variable Value is normalized, wherein 0-1 standardization is also known as deviation standardization, is to carry out linear transformation to parameter value, falls result In [0,1] section, transfer function are as follows:
Wherein, x* is the parameter value after standardization, and x is certain feature independent variable or dependent variable corresponding parameter value, a max For the maximum value of certain feature independent variable or the corresponding whole parameter values of dependent variable, min is that certain feature independent variable or dependent variable are corresponding Whole parameter values minimum value.
For example, transaction amount is characterized independent variable, the corresponding three parameter value x2 of transaction amount in table 10、x21、x22, from Determine then a maximum value and a minimum value substitute into above-mentioned transfer function in above three parameter value, to each parameter value It is standardized, the parameter value after being standardized.
It should be noted that can also be normalized using other standards method, such as min-max standard Change, the embodiment of the present application to the concrete mode of normalized without limitation.
In addition, due in table 1 loco be using six Arabic numerals composition administrative code indicate, When being standardized, administrative code can be seen as design parameter value, then be marked using above-mentioned transfer function Quasi-ization processing.
In specific implementation, by multiple ginsengs of the multiple parameter values of the feature independent variable Jing Guo standardization and dependent variable Numerical value substitutes into thread equation, to form multiple linear equations, then solves to above-mentioned multiple linear equations, obtains each feature The corresponding regression coefficient of independent variable.
S303: according to the sequence of the corresponding regression coefficient of each feature independent variable, the first sequence of feature independent variable is obtained As a result.
In this example, the regression coefficient of acquisition is ranked up, to obtain feature according to the ranking results of regression coefficient First ranking results of independent variable.In specific implementation, can be ranked up according to sequence from big to small, can also according to from It is small to be ranked up to big sequence.
For example, be ranked up according to sequence from small to large to a1, a3, a4, a5 and a7, ranking results be a1 < a3 < a5 < A7 < a4, then the first ranking results of feature independent variable be x1, x3, x5, x7, x4, alternatively, according to sequence from big to small to a1, A3, a4, a5 and a7 are ranked up, ranking results a4 > a7 > a5 > a3 > a1, then the first ranking results of feature independent variable be x4, x7、x5、x3、x1。
S304: according to the sequence of the relevant parameter between each feature independent variable and dependent variable, feature independent variable is obtained Second ranking results.
In this example, the relevant parameter between each feature independent variable and dependent variable is ranked up, thus according to correlation The ranking results of coefficient obtain the second ranking results of feature independent variable.It in specific implementation, can be suitable according to from big to small Sequence is ranked up, and can also be ranked up according to sequence from small to large.
For example, the relevant parameter between feature independent variable x1, x3, x4x5, x7 and dependent variable be respectively C1, C3, C4, C5, C7, C1, C3, C4, C5, C7 are ranked up according to sequence from small to large, ranking results are C1 < C3 < C4 < C7 < C5, then feature Second ranking results of independent variable are x1, x3, x4, x7, x5;Alternatively, according to sequence from big to small to C1, C3, C4, C5, C7 It is ranked up, ranking results C5 > C7 > C4 > C3 > C1, then the second ranking results of feature independent variable are x5, x7, x4, x3, x1.
S305: when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from small to large Sequence, then if the second ranking results of target signature independent variable are greater than the first sequence of target signature independent variable as a result, by mesh Mark feature independent variable is deleted from feature independent variable, and target signature independent variable is any one feature independent variable.
After obtaining two ranking results about feature independent variable by S303 and S304, when the first of feature independent variable Second ranking results of ranking results and feature independent variable are to sort from small to large, for each feature independent variable, are judged Whether this feature independent variable is greater than the sequence in the first ranking results in the sequence in the second ranking results, if it does, table Have between bright this feature independent variable and other feature independents variable and be associated with by force, then deletes this feature independent variable.
For example, when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from small to large Sequence, then feature independent variable x5 is ordered as the 5th in the second ranking results, and is ordered as third in the first ranking results, the Five are greater than third, then delete feature independent variable x5.And for feature independent variable x1, x3, x4 and x7, in the second ranking results In sequence no more than the sequence in the first ranking results, show do not have strong association between features described above independent variable, without carrying out It deletes.
S306: when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from big to small Sequence, then if the second ranking results of target signature independent variable are less than the first sequence of target signature independent variable as a result, by mesh Mark feature independent variable is deleted from feature independent variable.
After obtaining two ranking results about feature independent variable by S303 and S304, when the first of feature independent variable Second ranking results of ranking results and feature independent variable are to sort from large to small, and for each feature independent variable, are judged Whether this feature independent variable is less than the sequence in the first ranking results in the sequence in the second ranking results, if it is lower, table Have between bright this feature independent variable and other feature independents variable and be associated with by force, then deletes this feature independent variable.
For example, when the first ranking results of feature independent variable and the second ranking results of feature independent variable are from big to small Sequence, then feature independent variable x5 is ordered as first in the second ranking results, and is ordered as third in the first ranking results, the One is less than third, then deletes feature independent variable x5.And for feature independent variable x1, x3, x4 and x7, in the second ranking results In sequence not less than the sequence in the first ranking results, show do not have strong association between features described above independent variable, without carrying out It deletes.
It should be noted that residue can be regained when deleting certain feature independent variable from multiple feature independents variable Second ranking results of feature independent variable and the first sequence as a result, simultaneously judge the second ranking results of each feature independent variable again With first sequence as a result, until each feature independent variable the second ranking results and consistent the first ranking results when, terminate on Judgement is stated, to obtain between each other without strongly connected feature independent variable.
By the above method, whether there can be strong association to judge between the feature independent variable of acquisition, work as presence When, removal has strongly connected feature independent variable, thus obtain between each other without strongly connected feature independent variable, it is above-mentioned to utilize Feature independent variable goes characterization dependent variable.
Based on above method embodiment, present invention also provides the devices for determining data dependence, below in conjunction with attached drawing The device is illustrated.
Referring to fig. 4, which is a kind of structure drawing of device of determining data dependence provided by the embodiments of the present application, such as Fig. 4 It is shown, the apparatus may include:
First computing unit 401, for calculating described from change according to the parameter value of independent variable and the parameter value of dependent variable Pearson correlation coefficients and Spearman's correlation coefficient between amount and the dependent variable, the independent variable and the dependent variable have There is corresponding relationship;
First determination unit 402 is used for according to the Pearson correlation coefficients and the Spearman's correlation coefficient, really Relevant parameter between the fixed independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are big In or be equal to the first numerical value, and be less than or equal to second value, if the Pearson correlation coefficients and the Spearman phase Relationship number is unequal, and first numerical value is smaller in the Pearson correlation coefficients and the Spearman's correlation coefficient Value, the second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if described Pearson correlation coefficients and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the skin Ademilson related coefficient or the Spearman's correlation coefficient.
In some possible implementations, first determination unit includes:
First computation subunit is obtained for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient To third value;
Second computation subunit is obtained for being added the Pearson correlation coefficients with the Spearman's correlation coefficient To the 4th numerical value;
Third computation subunit, for by the third value divided by multiplied by 2, obtaining the 5th number after the 4th numerical value Value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the described 5th Numerical value.
In some possible implementations, first determination unit includes:
Second determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient When absolute value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for the difference when the Pearson correlation coefficients and the Spearman's correlation coefficient When absolute value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is described 5th numerical value.
In some possible implementations, described device further include:
Second determination unit, for the relevant parameter between the independent variable and the dependent variable to be greater than second threshold Independent variable, determination are characterized independent variable.
In some possible implementations, described device further include:
Unit is established, for establishing linear equation, linear equation equation one end is the dependent variable, the linear side The journey equation other end is the sum of each characteristic data items, and each characteristic data items are the feature independent variable and the spy The product of the corresponding regression coefficient of independent variable is levied, the feature independent variable in each characteristic data items is all different, the spy The quantity for levying data item is identical as the quantity of the feature independent variable;
Second computing unit, for by after the standardization of the feature independent variable parameter value and the dependent variable standard Parameter value after change brings the linear equation into, and solution obtains the corresponding regression coefficient of each feature independent variable;
First sequencing unit, for the sequence according to the corresponding regression coefficient of each feature independent variable, described in acquisition First ranking results of feature independent variable;
Second sequencing unit, for the row according to the relevant parameter between each feature independent variable and the dependent variable Sequence obtains the second ranking results of the feature independent variable;
First deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable Ranking results are to sort from small to large, then if the second ranking results of target signature independent variable are greater than the target signature certainly First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable, and the target signature is certainly Variable is any one of feature independent variable;
Second deletes unit, for the first ranking results and the second of the feature independent variable when the feature independent variable Ranking results are to sort from large to small, then if the second ranking results of target signature independent variable are less than the target signature certainly First sequence of variable is as a result, the target signature independent variable is deleted from the feature independent variable.
It should be noted that the specific implementation of each module or unit may refer to Fig. 1 and Fig. 3 the method in the present embodiment Realization, details are not described herein for the present embodiment.
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, the computer readable storage medium storing program for executing In be stored with instruction, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned fixed number evidence really The method of correlation.
The embodiment of the present application also provides a kind of computer program product, and the computer program product is transported on the terminal device When row, so that the method that the terminal device executes above-mentioned determination data dependence.
As can be seen from the above embodiments, the embodiment of the present application calculates the Pierre between two groups of data of independent variable and dependent variable simultaneously Gloomy related coefficient and Spearman's correlation coefficient, then calculated Pearson correlation coefficients and Spearman phase relation Number, determines a new relevant parameter to characterize the correlation between independent variable and dependent variable, the value of the relevant parameter is in Between Pearson correlation coefficients and Spearman's correlation coefficient, characterized by the relevant parameter related between independent variable and dependent variable Property, it no longer needs to be selected from Pearson correlation coefficients and Spearman's correlation coefficient, even if not knowing that analyzed data have Which kind of incidence relation can also determine the correlation between data.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality For applying system or device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, phase Place is closed referring to method part illustration.
It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c (a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also To be multiple.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of method of determining data dependence, which is characterized in that the described method includes:
According to the parameter value of independent variable and the parameter value of dependent variable, the Pierre between the independent variable and the dependent variable is calculated Gloomy related coefficient and Spearman's correlation coefficient, the independent variable and the dependent variable have corresponding relationship;
According to the Pearson correlation coefficients and the Spearman's correlation coefficient, the independent variable and the dependent variable are determined Between relevant parameter, relevant parameter between the independent variable and the dependent variable is greater than or equal to the first numerical value, and is less than Or it is equal to second value, and if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, first number Value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is the Pierre The larger value in gloomy related coefficient and the Spearman's correlation coefficient, if the Pearson correlation coefficients and this described Pierre Graceful related coefficient is equal, and first numerical value and the second value are the Pearson correlation coefficients or the Spearman Related coefficient.
2. the method according to claim 1, wherein it is described according to the Pearson correlation coefficients and it is described this Joseph Pearman related coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value;
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value;
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
3. according to the method described in claim 2, it is characterized in that, it is described according to the Pearson correlation coefficients and it is described this Joseph Pearman related coefficient determines the relevant parameter between the independent variable and the dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold, really Relevant parameter between the fixed independent variable and the dependent variable is the second value;
When the absolute value of the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient is less than or equal to described the When one threshold value, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
4. the method according to claim 1, wherein the method also includes:
Relevant parameter between the independent variable and the dependent variable is greater than to the independent variable of second threshold, determines and is characterized from change Amount.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Linear equation is established, linear equation equation one end is the dependent variable, and the linear equation equation other end is each The sum of a characteristic data items, each characteristic data items be one the feature independent variable corresponding with this feature independent variable time Return the product of coefficient, the feature independent variable in each characteristic data items is all different, the quantity of the characteristic data items with The quantity of the feature independent variable is identical;
Parameter value after the standardization of parameter value and the dependent variable after the standardization of the feature independent variable is brought into described Linear equation, solution obtain the corresponding regression coefficient of each feature independent variable;
According to the sequence of the corresponding regression coefficient of each feature independent variable, the first sequence knot of the feature independent variable is obtained Fruit;
According to the sequence of the relevant parameter between each feature independent variable and the dependent variable, the feature independent variable is obtained The second ranking results;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from small to large Sequence, then if the second ranking results of target signature independent variable be greater than the target signature independent variable first sequence as a result, The target signature independent variable is deleted from the feature independent variable, the target signature independent variable is any one of spy Levy independent variable;
When the first ranking results of the feature independent variable and the second ranking results of the feature independent variable are from big to small Sequence, then if the second ranking results of target signature independent variable be less than the target signature independent variable first sequence as a result, The target signature independent variable is deleted from the feature independent variable.
6. a kind of device of determining data dependence, which is characterized in that described device includes:
First computing unit, for calculating the independent variable and institute according to the parameter value of independent variable and the parameter value of dependent variable The Pearson correlation coefficients and Spearman's correlation coefficient between dependent variable are stated, the independent variable has corresponding with the dependent variable Relationship;
First determination unit, described in determining according to the Pearson correlation coefficients and the Spearman's correlation coefficient Relevant parameter between independent variable and the dependent variable, the relevant parameter between the independent variable and the dependent variable are greater than or wait In the first numerical value, and it is less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient Unequal, first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, described Second value is the larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if Pearson's phase Relationship number and the Spearman's correlation coefficient are equal, and first numerical value and the second value are the pearson correlation Coefficient or the Spearman's correlation coefficient.
7. device according to claim 6, which is characterized in that first determination unit includes:
First computation subunit obtains for the Pearson correlation coefficients to be multiplied with the Spearman's correlation coefficient Three numerical value;
Second computation subunit obtains for being added the Pearson correlation coefficients with the Spearman's correlation coefficient Four numerical value;
Third computation subunit, for by the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
First determines subelement, for determining that the relevant parameter between the independent variable and the dependent variable is the 5th number Value.
8. device according to claim 7, which is characterized in that first determination unit includes:
Second determines subelement, for absolute when the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient When value is greater than first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
Third determines subelement, for absolute when the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient When value is less than or equal to the first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the described 5th Numerical value.
9. a kind of computer readable storage medium, which is characterized in that it is stored with instruction in the computer readable storage medium storing program for executing, when When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item determining numbers of 1-5 According to the method for correlation.
10. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make Obtain the method that the terminal device perform claim requires the described in any item determining data dependences of 1-5.
CN201811012940.7A 2018-08-31 2018-08-31 A kind of method and device of determining data dependence Pending CN109346168A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811012940.7A CN109346168A (en) 2018-08-31 2018-08-31 A kind of method and device of determining data dependence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811012940.7A CN109346168A (en) 2018-08-31 2018-08-31 A kind of method and device of determining data dependence

Publications (1)

Publication Number Publication Date
CN109346168A true CN109346168A (en) 2019-02-15

Family

ID=65291933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811012940.7A Pending CN109346168A (en) 2018-08-31 2018-08-31 A kind of method and device of determining data dependence

Country Status (1)

Country Link
CN (1) CN109346168A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978033A (en) * 2019-03-15 2019-07-05 第四范式(北京)技术有限公司 The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification
CN109975661A (en) * 2019-04-22 2019-07-05 西南交通大学 A kind of electric transmission line fault detection method based on Spearman's correlation coefficient
CN110457370A (en) * 2019-08-12 2019-11-15 渤海大学 Outlier Detection system and method for cleaning in data mining based on artificial intelligence
CN112116480A (en) * 2019-06-20 2020-12-22 财付通支付科技有限公司 Virtual resource determination method and device, computer equipment and storage medium
CN113269361A (en) * 2021-05-20 2021-08-17 国网甘肃省电力有限公司酒泉供电公司 Power consumption increase prediction method based on power consumer relevance analysis
CN115146705A (en) * 2022-05-27 2022-10-04 南京林业大学 Method for recognizing forest lightning stroke fire by combining remote sensing and surface lightning stroke fire environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458267A (en) * 2013-09-04 2013-12-18 中国传媒大学 Video picture quality subjective evaluation method and system
CN107025596A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 A kind of methods of risk assessment and system
CN107656903A (en) * 2017-08-23 2018-02-02 中国石油天然气股份有限公司 The determination method and apparatus of the average index of data volume

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458267A (en) * 2013-09-04 2013-12-18 中国传媒大学 Video picture quality subjective evaluation method and system
CN107025596A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 A kind of methods of risk assessment and system
CN107656903A (en) * 2017-08-23 2018-02-02 中国石油天然气股份有限公司 The determination method and apparatus of the average index of data volume

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王成安: "流感疫情预测技术研究与主动监控系统的实现", 《湖南大学硕士学位论文》 *
鲁力: "基于互联网数据的中国流感趋势预测研究", 《湖南大学硕士学位论文》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978033A (en) * 2019-03-15 2019-07-05 第四范式(北京)技术有限公司 The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification
CN109975661A (en) * 2019-04-22 2019-07-05 西南交通大学 A kind of electric transmission line fault detection method based on Spearman's correlation coefficient
CN112116480A (en) * 2019-06-20 2020-12-22 财付通支付科技有限公司 Virtual resource determination method and device, computer equipment and storage medium
CN110457370A (en) * 2019-08-12 2019-11-15 渤海大学 Outlier Detection system and method for cleaning in data mining based on artificial intelligence
CN113269361A (en) * 2021-05-20 2021-08-17 国网甘肃省电力有限公司酒泉供电公司 Power consumption increase prediction method based on power consumer relevance analysis
CN115146705A (en) * 2022-05-27 2022-10-04 南京林业大学 Method for recognizing forest lightning stroke fire by combining remote sensing and surface lightning stroke fire environment
CN115146705B (en) * 2022-05-27 2023-08-01 南京林业大学 Method for identifying forest lightning fire by combining remote sensing with surface lightning fire environment

Similar Documents

Publication Publication Date Title
CN109346168A (en) A kind of method and device of determining data dependence
TWI772673B (en) Industry identification model determination method and device
Mathews et al. Judgemental revision of sales forecasts: Effectiveness of forecast selection
CN108833458B (en) Application recommendation method, device, medium and equipment
Courtney et al. Shotgun correlations in software measures
Chen et al. Data envelopment analysis with missing data: A multiple linear regression analysis approach
CN108764705A (en) A kind of data quality accessment platform and method
CN110362481A (en) Automatic test approach and terminal device
Xu et al. Evaluating OR/MS journals via PageRank
CN110032650A (en) A kind of generation method, device and the electronic equipment of training sample data
CN111242318A (en) Business model training method and device based on heterogeneous feature library
JP2011203976A (en) Diagnostic support apparatus
Xie et al. Evaluating performance of super-efficiency models in ranking efficient decision-making units based on Monte Carlo simulations
CN103678709B (en) Recommendation system attack detection method based on time series data
Chen et al. Mutual fund performance evaluation–application of system BCC model
CN105488061B (en) A kind of method and device of verify data validity
Zhang et al. % CRTFASTGEEPWR: a SAS macro for power of the generalized estimating equations of multi-period cluster randomized trials with application to stepped wedge designs
JPH11175602A (en) Credit risk measuring device
CN108491189B (en) Method for evaluating design class diagram based on difference comparison
CN108829750A (en) A kind of quality of data determines system and method
CN106815290B (en) Method and device for determining attribution of bank card based on graph mining
CN108985606A (en) Enterprise&#39;s similarity system design method and system
CN111144910B (en) Bidding &#39;series bid, companion bid&#39; object recommendation method and device based on fuzzy entropy mean shadow album
CN106846136A (en) A kind of data comparison method and equipment
CN106301880A (en) One determines that cyberrelationship degree of stability, Internet service recommend method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination