CN107315711A - A kind of adaptive exogenous variable recognition methods - Google Patents
A kind of adaptive exogenous variable recognition methods Download PDFInfo
- Publication number
- CN107315711A CN107315711A CN201710373056.5A CN201710373056A CN107315711A CN 107315711 A CN107315711 A CN 107315711A CN 201710373056 A CN201710373056 A CN 201710373056A CN 107315711 A CN107315711 A CN 107315711A
- Authority
- CN
- China
- Prior art keywords
- variable
- matrix
- exogenous
- adaptive
- exogenous variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a kind of adaptive exogenous variable recognition methods, comprise the following steps:Data set is set, each variable includes m sample data wherein in data set, set matrix, set an array;Calculate data set and do least square regression computing with remaining all data, obtain residual error;Calculate variable and the mutual information of all residual errors;The mutual information is substituted into the element in matrix;Maximum often capable, is stored in array in calculating matrix;Minimum value is found out in array;Find out the variable as exogenous variable independently of remaining all residual error.The present invention utilizes minimax thought, with reference to the feature of exogenous variable, so that the independence critical parameter introduced is an adaptive parameter value, avoid traditional algorithm sensitivity different to independence value difference and cause the problem of discrimination is low, the defect that different pieces of information set pair fixes independent parameter sensitivity and leads to not identification is it also avoid, the identification of exogenous variable is improved.
Description
Technical field
The present invention relates to data mining technology field, more specifically to a kind of adaptive exogenous variable identification side
Method.
Background technology
Causal discovery algorithm is mainly widely used in artificial intelligence field, so-called causal discovery algorithm, is that one kind is based on
Exogenous variable is recognized, produces an algorithm of reaction motion mechanism, described external from the definition of above-mentioned causal discovery algorithm
Variable is the trigger of causal discovery algorithm, and correct identification exogenous variable provides effective intervention during artificial intelligence control
Measure, the causal mechanism for making artificial intelligence system be best understood between things.
The method of identification exogenous variable mainly has two kinds in the prior art, is all based on linear non-gaussian principle, the first
Recognition methods is to introduce certain hypothesis, it is assumed for example that the non-Gaussian system intensity of any disturbance variable is less than exogenous variable, from
And the EggFinder algorithms recognized using the non-Gaussian system index negentropy of exogenous variable;Be for second two-by-two go compare change
The DirectLiNGAM algorithms or LR algorithm of amount and the independence of the residual error of remaining all variable.
First method shortcoming is to need to add certain hypothesis limitation, and outer change is recognized using non-Gaussian system index
Amount, therefore the first recognition methods can only accurately identify some kinds of data, without generality;Second method shortcoming exists
In taken in the way of summation the corresponding variable of minimum value as exogenous variable be insecure measurement index.
In summary, go to recognize exogenous variable using both the above method, there is adaptability and the not enough defect of reliability.
The content of the invention
The technical problem to be solved in the present invention is:There is provided one kind can be flexible, the method for efficiently identifying out exogenous variable.
The present invention solve its technical problem solution be:
A kind of adaptive exogenous variable recognition methods, comprises the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample number
According to setting matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by mutual information described in step CIt is substituted into matrix M the i-th row jth row
Value;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
As the further improvement of above-mentioned technical proposal, in the step B, residual error is calculatedFormula it is as follows,
As the further improvement of above-mentioned technical proposal, in the step C, kernel method is utilized to calculate XiWith all residual errors
Mutual informationCalculation formula is as follows:
It is wherein described
Wherein described p, q=
1,2,3 ... m;Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K=2 × 10-2。
The beneficial effects of the invention are as follows:Exogenous variable recognition methods of the present invention by matrix M per a line maximum
Array Max is constituted, finally the minimum value in array Max is found out, using the basic thought of maximin strategy, with reference to external
The feature of variable in itself so that the independence critical parameter of introducing is an adaptive parameter value, so as to avoid traditional calculation
Method is different to independence value difference sensitive and causes the problem of discrimination is low, and it also avoid different pieces of information set pair fixation independence ginseng
Number is sensitive and leads to not the defect of identification, is effectively improved the identification of exogenous variable.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment
Accompanying drawing is briefly described.Obviously, described accompanying drawing is a part of embodiment of the present invention, rather than is all implemented
Example, those skilled in the art on the premise of not paying creative work, can also obtain other designs according to these accompanying drawings
Scheme and accompanying drawing.
Fig. 1 is the flow chart of exogenous variable recognition methods of the present invention.
Embodiment
The technique effect of the design of the present invention, concrete structure and generation is carried out below with reference to embodiment and accompanying drawing clear
Chu, complete description, to be completely understood by the purpose of the present invention, feature and effect.Obviously, described embodiment is this hair
Bright a part of embodiment, rather than whole embodiments, based on embodiments of the invention, those skilled in the art is not paying
The other embodiment obtained on the premise of creative work, belongs to the scope of protection of the invention.It is each in the invention
Individual technical characteristic, can be with combination of interactions on the premise of not conflicting conflict.
Reference picture 1, for solve in the prior art recognition methods of the artificial intelligence system to exogenous variable exist adaptability and
The not enough defect of reliability, the present invention provides a kind of adaptive exogenous variable recognition methods, comprised the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample number
According to setting matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by described in step CIt is substituted into the element of matrix M the i-th row jth row;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
Specifically, the maximum in matrix M per a line is constituted array Max by exogenous variable recognition methods of the present invention,
The last minimum value in array Max is found out, using the basic thought of maximin strategy, with reference to the spy of exogenous variable in itself
Levy so that the independence critical parameter of introducing is an adaptive parameter value, so as to avoid traditional algorithm to independence value
Difference is sensitive and causes the problem of discrimination is low, and it also avoid different pieces of information set pair fixation independent parameter sensitivity and cause
The defect of None- identified, is effectively improved the identification of exogenous variable.
It is further used as preferred embodiment, in step B described in the invention, calculates residual errorFormula it is as follows,It is wherein describedWith
In measurement XiWith XjGlobal error, the var (Xi) be used to calculate XiCovariance.Need to rely on variable X in the step Bn
In m sample data can calculate residual errorI.e. described residual errorCalculating process in need with reference to exogenous variable
Feature so that introduced independence judges that parameter lambda can possess adaptive function in following step, for any
Sample data is all suitable for, so as to improve the discrimination of exogenous variable.
It is further used as preferred embodiment, in step C described in the invention, X is calculated using kernel methodiWith it is all residual
DifferenceMutual informationCalculation formula is as follows:
It is wherein described
Wherein described p, q=1,2,3 ...
m;The I is unit matrix;Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K
=2 × 10-2.In step C X is calculated using kernel methodiWith all residual errorsMutual informationThe mutual trust
Breath can regard the information content on another variable therewith included in a stochastic variable as.This method will be calculated and obtained
Mutual informationThe element in matrix M is all substituted into, the maximum per a line in matrix M is found out afterwards
Value, and is stored in array Max, then finds out the minimum value in array Max, is designated as λ, that is, find out so that only one of which variable all
Independently of the max-thresholds of all residual errors, now the λ is exactly independent parameter value, is finally found out further according to λ independently of remaining
The variable of all residual errors is designated as exogenous variable.Two two places are needed to go the independence of comparison variable and all variable residual errors in the prior art
Property, and the corresponding variable of minimum value is obtained as exogenous variable in the way of summation, and recognition methods of the present invention is utilized
The thought of maximin strategy, introduce an adaptive λ and as independence judge parameter, it is to avoid conventional method is to independence
Difference is sensitive and causes the problem of discrimination is low, fixes independent parameter sensitivity while it also avoid different pieces of information set pair and causes
The defect of None- identified, improves the discrimination of exogenous variable.
Identification process step is as follows in reference picture 1, exogenous variable recognition methods specific embodiment of the present invention:
Step S01:Input observed data collection Xn, each variable includes m sample data in data set;
Step S02:Set matrix M of the element as INF, setting array Max;
Step S03:Setup parameter i=1;
Step S04:Setup parameter j=1;
Step S05:Judge whether j is not equal to i, if it is, continuing down to perform, if it is not, jumping to step S07;
Step S06:Calculate variable X in data setiWith variable XjDo minimum two and multiply regressing calculation, obtain residual error
Step S07:J=j+1, judges whether j is less than or equal to n, if it is, return to step S05, past if not continuing
It is lower to perform;
Step S08:I=i+1, judges whether i is less than or equal to n, if it is, return to step S04, if not continuation down
Perform;
Step S09:Setup parameter i=1;
Step S10:Setup parameter j=1;
Step S11:Judge whether j is not equal to i, if it is, continuing down to perform, if it is not, jumping to step S14;
Step S12:Calculate XiWith all residual errorsMutual information
Step S13:Will be describedIt is substituted into the element of matrix M the i-th row jth row;
Step S14:J=j+1, judges whether j is less than or equal to n, if it is, return to step S12, past if not continuing
It is lower to perform;
Step S15:I=i+1, judges whether i is less than or equal to n, if it is, return to step S10, past if not continuing
It is lower to perform;
Step S16:Maximum in calculating matrix M in the i-th row, is stored in array Max;
Step S17:Minimum value is found out in array Max, λ is designated as;
Step S18:According to λ value, the variable as exogenous variable independently of remaining all residual error is found out.
The better embodiment to the present invention is illustrated above, but the invention is not limited to the implementation
Example, those skilled in the art can also make a variety of equivalent modifications or replace on the premise of without prejudice to spirit of the invention
Change, these equivalent modifications or replacement are all contained in the application claim limited range.
Claims (3)
1. a kind of adaptive exogenous variable recognition methods, it is characterised in that:Comprise the following steps:
Step A. setting data set X=[X1, X2... Xn], each variable X wherein in data setn, include m sample data, setting
Matrix M, matrix M all elements are INF, set an array Max;
Step B. calculates data set XiWith remaining all data XjLeast square regression computing is done, wherein j ≠ i obtains residual error
Step C. calculates XiWith all residual errorsMutual informationWherein j ≠ i;
Step D. is by mutual information described in step CIt is substituted into the element of matrix M the i-th row jth row;
Maximum in step E. calculating matrix M in the i-th row, is stored in array Max, is designated as Max (i);
Step F. finds out minimum value in array Max, is designated as λ;
Step G. finds out the variable as exogenous variable independently of remaining all residual error according to λ value.
2. a kind of adaptive exogenous variable recognition methods according to claim 1, it is characterised in that:In the step B,
Calculate residual errorFormula it is as follows,。
3. a kind of adaptive exogenous variable recognition methods according to claim 2, it is characterised in that:In the step C,
X is calculated using kernel methodiWith all residual errorsMutual informationCalculation formula is as follows:It is wherein described Wherein described p, q=1,2,3 ... m;
Work as m>When 1000, core widthK=2 × 10-3;As m≤1000, core width cs=1, K=2 × 10-2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710373056.5A CN107315711A (en) | 2017-05-24 | 2017-05-24 | A kind of adaptive exogenous variable recognition methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710373056.5A CN107315711A (en) | 2017-05-24 | 2017-05-24 | A kind of adaptive exogenous variable recognition methods |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107315711A true CN107315711A (en) | 2017-11-03 |
Family
ID=60181589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710373056.5A Pending CN107315711A (en) | 2017-05-24 | 2017-05-24 | A kind of adaptive exogenous variable recognition methods |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107315711A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460176A (en) * | 2018-01-02 | 2018-08-28 | 佛山科学技术学院 | A method of it improving satellite orbit perturbation power model and indicates precision |
CN109508558A (en) * | 2018-10-31 | 2019-03-22 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
CN109657482A (en) * | 2018-10-26 | 2019-04-19 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
-
2017
- 2017-05-24 CN CN201710373056.5A patent/CN107315711A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460176A (en) * | 2018-01-02 | 2018-08-28 | 佛山科学技术学院 | A method of it improving satellite orbit perturbation power model and indicates precision |
CN109657482A (en) * | 2018-10-26 | 2019-04-19 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
CN109657482B (en) * | 2018-10-26 | 2022-11-18 | 创新先进技术有限公司 | Data validity verification method, device and equipment |
CN109508558A (en) * | 2018-10-31 | 2019-03-22 | 阿里巴巴集团控股有限公司 | A kind of verification method and device of data validity |
CN109508558B (en) * | 2018-10-31 | 2022-11-18 | 创新先进技术有限公司 | Data validity verification method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pascoal et al. | Theoretical evaluation of feature selection methods based on mutual information | |
CN106780045B (en) | Policy information corrects method and apparatus | |
US7809170B2 (en) | Method and apparatus for choosing and evaluating sample size for biometric training process | |
CN107315711A (en) | A kind of adaptive exogenous variable recognition methods | |
CN110704634A (en) | Method and device for checking and repairing knowledge graph link errors and storage medium | |
US10528844B2 (en) | Method and apparatus for distance measurement | |
US11062120B2 (en) | High speed reference point independent database filtering for fingerprint identification | |
CN112036295B (en) | Bill image processing method and device, storage medium and electronic equipment | |
CN110210625A (en) | Modeling method, device, computer equipment and storage medium based on transfer learning | |
CN116506223B (en) | Collaborative network protection method and system | |
CN110969200A (en) | Image target detection model training method and device based on consistency negative sample | |
CN110991538B (en) | Sample classification method and device, storage medium and computer equipment | |
CN111199186A (en) | Image quality scoring model training method, device, equipment and storage medium | |
US10895919B2 (en) | Gesture control method and apparatus for display screen | |
CN110046009B (en) | Recording method, recording device, server and readable storage medium | |
US11150993B2 (en) | Method, apparatus and computer program product for improving inline pattern detection | |
CN111523387A (en) | Method and device for detecting hand key points and computer device | |
CN108920601B (en) | Data matching method and device | |
CN111104339B (en) | Software interface element detection method, system, computer equipment and storage medium based on multi-granularity learning | |
CN114422450B (en) | Network traffic analysis method and device based on multi-source network traffic data | |
JPH03188586A (en) | Pattern recognition/inspection processing system | |
CN113344079B (en) | Image tag semi-automatic labeling method, system, terminal and medium | |
CN112446428B (en) | Image data processing method and device | |
CN110223290A (en) | Film appraisal procedure, device, computer equipment and storage medium | |
CN115280374A (en) | Labeling method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171103 |