CN107315711A - A kind of adaptive exogenous variable recognition methods - Google Patents

A kind of adaptive exogenous variable recognition methods Download PDF

Info

Publication number
CN107315711A
CN107315711A CN201710373056.5A CN201710373056A CN107315711A CN 107315711 A CN107315711 A CN 107315711A CN 201710373056 A CN201710373056 A CN 201710373056A CN 107315711 A CN107315711 A CN 107315711A
Authority
CN
China
Prior art keywords
variable
matrix
exogenous
adaptive
exogenous variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710373056.5A
Other languages
Chinese (zh)
Inventor
郝志峰
何敏藩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN201710373056.5A priority Critical patent/CN107315711A/en
Publication of CN107315711A publication Critical patent/CN107315711A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a kind of adaptive exogenous variable recognition methods, comprise the following steps:Data set is set, each variable includes m sample data wherein in data set, set matrix, set an array;Calculate data set and do least square regression computing with remaining all data, obtain residual error;Calculate variable and the mutual information of all residual errors;The mutual information is substituted into the element in matrix;Maximum often capable, is stored in array in calculating matrix;Minimum value is found out in array;Find out the variable as exogenous variable independently of remaining all residual error.The present invention utilizes minimax thought, with reference to the feature of exogenous variable, so that the independence critical parameter introduced is an adaptive parameter value, avoid traditional algorithm sensitivity different to independence value difference and cause the problem of discrimination is low, the defect that different pieces of information set pair fixes independent parameter sensitivity and leads to not identification is it also avoid, the identification of exogenous variable is improved.

Description

A kind of adaptive exogenous variable recognition methods
Technical field
The present invention relates to data mining technology field, more specifically to a kind of adaptive exogenous variable identification side Method.
Background technology
Causal discovery algorithm is mainly widely used in artificial intelligence field, so-called causal discovery algorithm, is that one kind is based on Exogenous variable is recognized, produces an algorithm of reaction motion mechanism, described external from the definition of above-mentioned causal discovery algorithm Variable is the trigger of causal discovery algorithm, and correct identification exogenous variable provides effective intervention during artificial intelligence control Measure, the causal mechanism for making artificial intelligence system be best understood between things.
The method of identification exogenous variable mainly has two kinds in the prior art, is all based on linear non-gaussian principle, the first Recognition methods is to introduce certain hypothesis, it is assumed for example that the non-Gaussian system intensity of any disturbance variable is less than exogenous variable, from And the EggFinder algorithms recognized using the non-Gaussian system index negentropy of exogenous variable;Be for second two-by-two go compare change The DirectLiNGAM algorithms or LR algorithm of amount and the independence of the residual error of remaining all variable.
First method shortcoming is to need to add certain hypothesis limitation, and outer change is recognized using non-Gaussian system index Amount, therefore the first recognition methods can only accurately identify some kinds of data, without generality;Second method shortcoming exists In taken in the way of summation the corresponding variable of minimum value as exogenous variable be insecure measurement index.
In summary, go to recognize exogenous variable using both the above method, there is adaptability and the not enough defect of reliability.
The content of the invention
The technical problem to be solved in the present invention is:There is provided one kind can be flexible, the method for efficiently identifying out exogenous variable.
The present invention solve its technical problem solution be:
A kind of adaptive exogenous variable recognition methods, comprises the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample number According to setting matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by mutual information described in step CIt is substituted into matrix M the i-th row jth row Value;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
As the further improvement of above-mentioned technical proposal, in the step B, residual error is calculatedFormula it is as follows,
As the further improvement of above-mentioned technical proposal, in the step C, kernel method is utilized to calculate XiWith all residual errors Mutual informationCalculation formula is as follows: It is wherein described
Wherein described p, q= 1,2,3 ... m;Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K=2 × 10-2
The beneficial effects of the invention are as follows:Exogenous variable recognition methods of the present invention by matrix M per a line maximum Array Max is constituted, finally the minimum value in array Max is found out, using the basic thought of maximin strategy, with reference to external The feature of variable in itself so that the independence critical parameter of introducing is an adaptive parameter value, so as to avoid traditional calculation Method is different to independence value difference sensitive and causes the problem of discrimination is low, and it also avoid different pieces of information set pair fixation independence ginseng Number is sensitive and leads to not the defect of identification, is effectively improved the identification of exogenous variable.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described.Obviously, described accompanying drawing is a part of embodiment of the present invention, rather than is all implemented Example, those skilled in the art on the premise of not paying creative work, can also obtain other designs according to these accompanying drawings Scheme and accompanying drawing.
Fig. 1 is the flow chart of exogenous variable recognition methods of the present invention.
Embodiment
The technique effect of the design of the present invention, concrete structure and generation is carried out below with reference to embodiment and accompanying drawing clear Chu, complete description, to be completely understood by the purpose of the present invention, feature and effect.Obviously, described embodiment is this hair Bright a part of embodiment, rather than whole embodiments, based on embodiments of the invention, those skilled in the art is not paying The other embodiment obtained on the premise of creative work, belongs to the scope of protection of the invention.It is each in the invention Individual technical characteristic, can be with combination of interactions on the premise of not conflicting conflict.
Reference picture 1, for solve in the prior art recognition methods of the artificial intelligence system to exogenous variable exist adaptability and The not enough defect of reliability, the present invention provides a kind of adaptive exogenous variable recognition methods, comprised the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample number According to setting matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by described in step CIt is substituted into the element of matrix M the i-th row jth row;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
Specifically, the maximum in matrix M per a line is constituted array Max by exogenous variable recognition methods of the present invention, The last minimum value in array Max is found out, using the basic thought of maximin strategy, with reference to the spy of exogenous variable in itself Levy so that the independence critical parameter of introducing is an adaptive parameter value, so as to avoid traditional algorithm to independence value Difference is sensitive and causes the problem of discrimination is low, and it also avoid different pieces of information set pair fixation independent parameter sensitivity and cause The defect of None- identified, is effectively improved the identification of exogenous variable.
It is further used as preferred embodiment, in step B described in the invention, calculates residual errorFormula it is as follows,It is wherein describedWith In measurement XiWith XjGlobal error, the var (Xi) be used to calculate XiCovariance.Need to rely on variable X in the step Bn In m sample data can calculate residual errorI.e. described residual errorCalculating process in need with reference to exogenous variable Feature so that introduced independence judges that parameter lambda can possess adaptive function in following step, for any Sample data is all suitable for, so as to improve the discrimination of exogenous variable.
It is further used as preferred embodiment, in step C described in the invention, X is calculated using kernel methodiWith it is all residual DifferenceMutual informationCalculation formula is as follows: It is wherein described
Wherein described p, q=1,2,3 ... m;The I is unit matrix;Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K =2 × 10-2.In step C X is calculated using kernel methodiWith all residual errorsMutual informationThe mutual trust Breath can regard the information content on another variable therewith included in a stochastic variable as.This method will be calculated and obtained Mutual informationThe element in matrix M is all substituted into, the maximum per a line in matrix M is found out afterwards Value, and is stored in array Max, then finds out the minimum value in array Max, is designated as λ, that is, find out so that only one of which variable all Independently of the max-thresholds of all residual errors, now the λ is exactly independent parameter value, is finally found out further according to λ independently of remaining The variable of all residual errors is designated as exogenous variable.Two two places are needed to go the independence of comparison variable and all variable residual errors in the prior art Property, and the corresponding variable of minimum value is obtained as exogenous variable in the way of summation, and recognition methods of the present invention is utilized The thought of maximin strategy, introduce an adaptive λ and as independence judge parameter, it is to avoid conventional method is to independence Difference is sensitive and causes the problem of discrimination is low, fixes independent parameter sensitivity while it also avoid different pieces of information set pair and causes The defect of None- identified, improves the discrimination of exogenous variable.
Identification process step is as follows in reference picture 1, exogenous variable recognition methods specific embodiment of the present invention:
Step S01:Input observed data collection Xn, each variable includes m sample data in data set;
Step S02:Set matrix M of the element as INF, setting array Max;
Step S03:Setup parameter i=1;
Step S04:Setup parameter j=1;
Step S05:Judge whether j is not equal to i, if it is, continuing down to perform, if it is not, jumping to step S07;
Step S06:Calculate variable X in data setiWith variable XjDo minimum two and multiply regressing calculation, obtain residual error
Step S07:J=j+1, judges whether j is less than or equal to n, if it is, return to step S05, past if not continuing It is lower to perform;
Step S08:I=i+1, judges whether i is less than or equal to n, if it is, return to step S04, if not continuation down Perform;
Step S09:Setup parameter i=1;
Step S10:Setup parameter j=1;
Step S11:Judge whether j is not equal to i, if it is, continuing down to perform, if it is not, jumping to step S14;
Step S12:Calculate XiWith all residual errorsMutual information
Step S13:Will be describedIt is substituted into the element of matrix M the i-th row jth row;
Step S14:J=j+1, judges whether j is less than or equal to n, if it is, return to step S12, past if not continuing It is lower to perform;
Step S15:I=i+1, judges whether i is less than or equal to n, if it is, return to step S10, past if not continuing It is lower to perform;
Step S16:Maximum in calculating matrix M in the i-th row, is stored in array Max;
Step S17:Minimum value is found out in array Max, λ is designated as;
Step S18:According to λ value, the variable as exogenous variable independently of remaining all residual error is found out.
The better embodiment to the present invention is illustrated above, but the invention is not limited to the implementation Example, those skilled in the art can also make a variety of equivalent modifications or replace on the premise of without prejudice to spirit of the invention Change, these equivalent modifications or replacement are all contained in the application claim limited range.

Claims (3)

1. a kind of adaptive exogenous variable recognition methods, it is characterised in that:Comprise the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample data, setting Matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by mutual information described in step CIt is substituted into the element of matrix M the i-th row jth row;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
2. a kind of adaptive exogenous variable recognition methods according to claim 1, it is characterised in that:In the step B, Calculate residual errorFormula it is as follows,
3. a kind of adaptive exogenous variable recognition methods according to claim 2, it is characterised in that:In the step C, X is calculated using kernel methodiWith all residual errorsMutual informationCalculation formula is as follows:It is wherein described Wherein described p, q=1,2,3 ... m; Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K=2 × 10-2
CN201710373056.5A 2017-05-24 2017-05-24 A kind of adaptive exogenous variable recognition methods Pending CN107315711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710373056.5A CN107315711A (en) 2017-05-24 2017-05-24 A kind of adaptive exogenous variable recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710373056.5A CN107315711A (en) 2017-05-24 2017-05-24 A kind of adaptive exogenous variable recognition methods

Publications (1)

Publication Number Publication Date
CN107315711A true CN107315711A (en) 2017-11-03

Family

ID=60181589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710373056.5A Pending CN107315711A (en) 2017-05-24 2017-05-24 A kind of adaptive exogenous variable recognition methods

Country Status (1)

Country Link
CN (1) CN107315711A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460176A (en) * 2018-01-02 2018-08-28 佛山科学技术学院 A method of it improving satellite orbit perturbation power model and indicates precision
CN109508558A (en) * 2018-10-31 2019-03-22 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity
CN109657482A (en) * 2018-10-26 2019-04-19 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460176A (en) * 2018-01-02 2018-08-28 佛山科学技术学院 A method of it improving satellite orbit perturbation power model and indicates precision
CN109657482A (en) * 2018-10-26 2019-04-19 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity
CN109657482B (en) * 2018-10-26 2022-11-18 创新先进技术有限公司 Data validity verification method, device and equipment
CN109508558A (en) * 2018-10-31 2019-03-22 阿里巴巴集团控股有限公司 A kind of verification method and device of data validity
CN109508558B (en) * 2018-10-31 2022-11-18 创新先进技术有限公司 Data validity verification method, device and equipment

Similar Documents

Publication Publication Date Title
Pascoal et al. Theoretical evaluation of feature selection methods based on mutual information
CN106780045B (en) Policy information corrects method and apparatus
US7809170B2 (en) Method and apparatus for choosing and evaluating sample size for biometric training process
CN107315711A (en) A kind of adaptive exogenous variable recognition methods
CN110704634A (en) Method and device for checking and repairing knowledge graph link errors and storage medium
US10528844B2 (en) Method and apparatus for distance measurement
US11062120B2 (en) High speed reference point independent database filtering for fingerprint identification
CN112036295B (en) Bill image processing method and device, storage medium and electronic equipment
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN116506223B (en) Collaborative network protection method and system
CN110969200A (en) Image target detection model training method and device based on consistency negative sample
CN110991538B (en) Sample classification method and device, storage medium and computer equipment
CN111199186A (en) Image quality scoring model training method, device, equipment and storage medium
US10895919B2 (en) Gesture control method and apparatus for display screen
CN110046009B (en) Recording method, recording device, server and readable storage medium
US11150993B2 (en) Method, apparatus and computer program product for improving inline pattern detection
CN111523387A (en) Method and device for detecting hand key points and computer device
CN108920601B (en) Data matching method and device
CN111104339B (en) Software interface element detection method, system, computer equipment and storage medium based on multi-granularity learning
CN114422450B (en) Network traffic analysis method and device based on multi-source network traffic data
JPH03188586A (en) Pattern recognition/inspection processing system
CN113344079B (en) Image tag semi-automatic labeling method, system, terminal and medium
CN112446428B (en) Image data processing method and device
CN110223290A (en) Film appraisal procedure, device, computer equipment and storage medium
CN115280374A (en) Labeling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171103