CN109886409B

CN109886409B - Quantitative causal relationship judging method of multidimensional time sequence

Info

Publication number: CN109886409B
Application number: CN201910115769.0A
Authority: CN
Inventors: 梁湘三
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2019-02-15
Filing date: 2019-02-15
Publication date: 2023-12-12
Anticipated expiration: 2039-02-15
Also published as: CN109886409A

Abstract

The invention relates to a quantitative causal relation judging method of a multidimensional time sequence, in particular to a flow packetCollecting m time sequences; preprocessing each time sequence to obtain a plurality of stable time sequences with the same time interval; calculating quantitative and directional causal relationship between different time sequences, wherein any two time sequences X _j And X _i The causal relationship between the two information flows can be based on Liang-Kleeman information flow T _j→i And T is _i→j To make a judgment, T _j→i Estimated by the following formula:and finally, integrating the causality between each two time sequences obtained through calculation to obtain a complete causality network diagram. The method can infer the causal relationship among a plurality of events through simple calculation, and the causal relationship is output in a quantitative mode, so that the problem of causal relationship inference is solved, and the method has wide and important application in a plurality of important fields such as finance, climate, environment, neuroscience, weather forecast, artificial intelligence, big data science and the like.

Description

Quantitative causal relationship judging method of multidimensional time sequence

Technical Field

The invention belongs to the technical field of big data technical processing, and particularly relates to a quantitative causal relationship judging method of a multidimensional time sequence.

Background

Causal analysis is a core problem of scientific research and is a direct goal of considerable scientific research. In one letter to j.s.switch, einstein has described: the development of western science is determined by two great achievements: the logic system is invented by Greek philosophy, and the causal relationship among events can be deduced through systematic experiments in the period of the rendition. Now suppose we have made experiments on two dynamic events to get two time series, then one of the most fundamental problems comes: we cannot exactly distinguish whether it is the cause or the effect exactly from the two sequences? Or, if there is a entanglement like "pre-chicken or pre-egg" between them, we can quantitatively characterize the circulating causality between them?

This is an ancient, well-known interdisciplinary problem, and scientists including the Nobel prize acquirer, greenworker (Clive Granger), have struggled for half a century. In fact, it is listed as one of the greatest challenges in today's emerging disciplines, big data science (O' Neil & Schutt,2013, p.274). Over the last decade, liang & Kleeman (2005), liang (2008), liang (2014), liang (2016), etc. have demonstrated through various efforts that this causal relationship can be strictly derived from the original physical laws, and does not have to appear in a semi-empirical form as in the traditional approach. Conventional causal analysis (such as Granger causal verification and transitive entropy analysis) in many cases fails to verify such an observed fact, known as the "zero cause criterion": if the evolution of one event does not depend on another event, the latter is not the cause of the former, but in this new architecture this fact appears as a proven theorem.

The causal analysis method can be generally described as follows for an m-dimensional random power system

Wherein x= (x ₁ ,x ₂ ,...,x _m ) Is a state vector, F is an m-dimensional vector field,is an m-dimensional white noise vector, b= (B) _ij ) _m×m Is an amplitude matrix of noise disturbance, then is composed of the component x _j To x _i Liang-Kleeman information stream of (Liang, 2016)

Wherein the method comprises the steps ofρ _j|i (x _j |x _i ) Is x _j For x _i Conditional probability density of>Theoretically, if T _j→i Not equal to 0, x _j Is x _i Cause of (1), and T _j→i Is the causal strength; and vice versa.

The above theory proves to be strictly true for any power system, in sharp contrast to the existing gladhand causal tests or empirical causal tests such as transfer entropy analysis. However, the formula (II) is very complex, which results in that it is precisely established in classical problems, but is hardly solved in practical problems, which greatly constrains its application value. To solve this problem, liang (2014) proposes an estimation method and gets a fairly compact formula for the two-dimensional problem. This formula has been validated against many existing semi-empirical causal analyses (e.g., granger causal tests, transitive entropy analysis) and has been successfully used in more and more realistic problems. For example, with only the stock price time series of two IBM and GE companies downloaded from Yahoo, it was found that there was a strong, nearly unidirectional causal relationship from IBM to GE over a period of time in the 70 s, and thus a dust seal of decades was explored about "seven dwarfs" and "giant" competing for the maxima market. In another example, stips et al (2016) use this formula to find a clear, nearly unidirectional causal relationship between carbon dioxide and global warming. For the last hundred years CO2 did contribute to global warming, but on the ancient climate scale of over a thousand years this causality could be reversed completely, as global warming resulted in an increase in CO2 concentration.

The estimation of Liang (2014) is very successful for two-dimensional Liang-Kleeman information flow, i.e. the (II) formula when m=2, but its success is limited to the two-dimensional case (i.e. the case of m=2), and is not sufficient for the multidimensional case.

Disclosure of Invention

The invention aims to provide a quantitative causal relation judging method of a multidimensional time sequence, which can quantitatively obtain causal relations among a plurality of events.

In order to solve the technical problems, the invention adopts the following technical scheme:

a quantitative causal relation judging method of a multidimensional time sequence is characterized in that: the magnitude and direction of the causal relation among the multidimensional time series can be quantitatively estimated, and the specific implementation scheme is as follows:

step 1, collecting m time sequences, wherein each sequence corresponds to an event, and the m time sequences are respectively X ₁ ,X ₂ ,…,X _m ；

Step 2, preprocessing each time sequence to obtain a plurality of time sequences with the same time interval;

step 3, when some time sequences in the step 2 are not stable, further processing is needed to make the sequences become approximately stable; when the time sequence in the step 2 is stable, performing a step 4;

step 4, calculating quantitative and directional causal relations among the time sequences; the specific calculation method is as follows:

given m stationary, equidistant time series X ₁ ,X ₂ ,…,X _m Let X be _j To X _i Information flow of (i, j=1, 2,..m, i+.j) is T _j→i X is then _j And X _i Causal relationships between can be defined by _j→i And T is _i→j To make a judgment, T _j→i Can be estimated by the following equation:

wherein,for information flow T _j→i Maximum likelihood estimate of (a); />For m time series X ₁ ,X ₂ ,…,X _m Is a covariance matrix of (a); detC is a determinant of C; delta _jk Is C _jk Algebraic remainder of (2); c (C) _k,di Is X _k And a derived sequence->Covariance between>The definition is as follows: />Δt is the time interval of the sequence;

theoretically whenNon-zero, then X _j Is X _i Cause of (1)/(2)>Is the size of the causal relationship and is about +.>Performing significance test;

and 5, integrating the causal relationship data between every two time sequences obtained by calculation in the step 4 to obtain a complete causal relationship network diagram.

Selecting a time point t from the time series of the data stabilization obtained in the step 3 _n At a time point t _n Time window [ t ] is taken for center _n -L/2,t _n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t _n A causal relationship network graph at the location;

and (5) sequentially taking time points of different positions according to the time sequence in the time sequence, and repeatedly executing the step (4) and the step (5) to obtain a causal relationship network graph changing along with time.

The step 4 is performedA saliency check must be performed to determine whether it differs significantly from zero, or not, when the saliency check result differs significantly from zero, the causal relationship between two events is proved, and when the saliency check result does not differ significantly from zero, the causal relationship between two events is proved not to be significant, the specific flow of the saliency check is as follows:

x is to be _i And X _j Arranged to the first and second positions at this timeConversion to->The method in the derivation step 4In the process of the formula, we use a linear model:

wherein x= (X ₁ ,X ₂ ,...,X _m ) Is a state vector, here represented as m time series,is white noise, (f) ₁ ,b ₁ ,a ₁₁ ,a ₁₂ ,...,a _1m ) Is a parameter to be estimated, and it can be demonstrated that the maximum likelihood estimation of these parameters is:

wherein the method comprises the steps ofFor m time series X ₁ ,X ₂ ,…,X _m Is a covariance matrix of (a); detC is a determinant of C; delta _jk Is C _jk Algebraic remainder of (2); c (C) _k,di Is X _k And a derived sequence->Covariance between>The definition is as follows:Δt is the time interval of the sequence; />Is sequence X _k Arithmetic mean of (2); />Is a sequence ofArithmetic mean of (2); />N is the sequence length (total number of time steps), according to the central limit theorem, ++>Approximately obeys a normal distribution, where only +.>Let its variance beIt relates to Fisher information matrix I, and Fisher information matrix I is calculated as follows:

note the state vector x= (X) at time point n ₁ ,X ₂ ,...,X _m ) For X (N), n=1, 2,..n, N, assuming that X (N) = (X _1,n ,X _2,n ,...,X _m,n ) In the case of (2), the next state vector X (n+1) = (X) _1,n+1 ,X _2,n+1 ,...,X _m,n+1 ) If the probability density of the transition probability density is ρ, the Fisher information matrix multiplied by the total number of steps of time N is

The above formula can be demonstrated as:

a in the matrix _1j ,f ₁ ,b ₁ Using its estimateInstead of the above-mentioned, the method,

inverting the obtained matrix (NI), the fourth diagonal component of the obtained inverse matrixI.e. variance->Is used for the estimation of the (c),

the variance of (c) is: />

Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z _α Further obtainStandard deviation of +.>In other words, for confidence level α, X ₂ To X ₁ The confidence interval of the true causality of (2) is:

when the confidence interval does not include zero, X ₂ Is X ₁ Cause of (1); otherwise, the causality is not obvious. For arbitrary sequence X _i And X is _j Significance test of causal relationship between the two, X is only needed _i And X is _j And (5) placing the materials at the first and second positions, and repeating the steps.

When the confidence level is set to 90%, the confidence interval coefficient z _α =1.65, where the confidence interval is:

when the confidence level is set to 95%, the confidence interval coefficient z=1.96, at which time the confidence interval is:

when the confidence level is set to 99%, the confidence interval coefficient z=2.56, at which time the confidence interval is:

when X is _i And X _j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.

The quantitative causal relation judging method of the multidimensional time sequence has the beneficial effects that: the method can infer the causal relationship among a plurality of events through simple calculation, and the causal relationship is output in a quantitative form, so that the problem of causal relationship inference is solved. In general, causal inference in scientific research and engineering application depends on a large number of experiments such as physics, chemistry, biology and the like or computer simulation, and huge cost is consumed, but the result is not correct, compared with the method, the calculation cost is almost negligible. The method has wide application in various aspects such as financial analysis, neuroscience research, climate change research, weather forecast, earthquake prediction, haze tracing and the like.

Drawings

FIG. 1 is a workflow diagram of a quantitative causal relationship determination method of the multi-dimensional time series of the present invention.

FIG. 2 is a schematic diagram of the causal relationship between companies over time inferred from stock prices using the method of the present invention.

FIG. 3 is a geographical map of data acquisition used in inferring the source of a contaminant using the method of the present invention.

FIG. 4 is a time series of the various contaminant concentrations across the site when the source of the contaminant is inferred using the method of the present invention.

FIG. 5 is a flow chart of the operation of the method of the present invention to infer the source of contaminants.

FIG. 6 is a spatial distribution diagram of contaminant concentration when tracing a haze source using the method of the present invention.

FIG. 7 is a workflow diagram of a haze source tracing process using the method of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments.

In this embodiment, the number of m time sequences acquired in step 1 is greater than or equal to two, and the possible existence of a causal relationship between any two time sequences is: has no causality, one-way causality and two-way causality.

In this embodiment, the preprocessing method for each time sequence in step 2 includes, but is not limited to, supplementing the missing point data, mapping the irregular time distribution data to the regular time points, and so on, and finally obtaining a plurality of time sequences with accurate data mapping and the same time length.

In this embodiment, the preprocessing mode performed on the time series in step 3 when the data is not stable includes, but is not limited to, performing linear trend removal processing on the series, or transforming to another series, such as converting the stock price series into a yield series, etc. The pretreated time series can be judged by various classical stability criteria, and the time series can be approximately stable according to visual inspection in the embodiment to meet the requirements.

In this embodiment, the quantitative and directional causal relationship between each time series, X, is calculated in step 4 _j And X _i Causal relationships between can be defined by _j→i And T is _i→j To judge: if T _j→i Not equal to 0, then X _j Is X _i Because, if zero, it is not; if T _i→j Not equal to 0, then X _i Is X _j If zero, it is not. There are four possibilities here in total: (1) X is X _j Is X _i Cause of (1), X _i Also X _j Cause of (1); (2) X is X _j Is X _i Cause of (1), X _i Other than X _j Cause of (1); (3) X is X _j Other than X _i Cause of (1), X _i Is X _j Cause of (1); (4) X is X _j And X is _i There is no causal relationship between them. Again, the order i, j=1, 2,..m, is calculated, resulting in information flow and causal relationships between all time sequences.

In this embodiment, the data T is estimated for each of the information streams obtained in step 4 _j→i A significance test is performed to determine whether it is significantly different from zero, or not significantly different from zero, when the significance test result is significantly different from zero, the causal relationship between the two events is proved, and when the significance test result is not significantly different from zero, the causal relationship between the two events is not proved, the specific flow of the significance test is as follows:

x is to be _i And X _j Arranged to the first and second positions at this timeConversion to->In deriving the formula of step 4, we use a linear model:

The above formula can be demonstrated as:

a in the matrix _1j ,f ₁ ,b ₁ Using its estimateInstead of

Inverting the obtained matrix (NI), the fourth diagonal component of the obtained inverse matrixI.e. variance->Estimated value of ∈10->The variance of (c) is: />

Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z _α Further calculateStandard deviation of (2)In other words, for confidence level α, X ₂ To X ₁ The confidence interval of the true causality of (2) is:

Further, when the confidence level is set to 90%, the confidence interval coefficient z _α =1.65, where the confidence interval is:

in this embodiment, when the result of the calculation in step 4 appears X _i And X _j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.

Further, expanding the data samples includes, but is not limited to, increasing the sequence length, increasing the data temporal resolution, or both.

In this embodiment, the step 5 may be applied to an optional time window with a length L, that is, a causal relationship network between multiple time sequences in the window may be obtained, and further a causal relationship network that varies with time may be obtained, which is implemented as follows:

selecting a time point t from the stationary time series obtained in the step 3 _n At a time point t _n Time window [ t ] is taken for center _n -L/2,t _n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t _n A causal relationship network graph at the location;

and (5) sequentially taking time points of different positions according to the time sequence in the time sequence, and repeatedly executing the step (4) and the step (5) to obtain a causal relationship network graph changing along with time. The specific implementation steps are as follows:

determining the length of a time window for estimating the causal relationship as L and the starting time t ₀ =l/2+1, total time series length t _n Calculate [ t ] ₀ -L/2,t ₀ +L/2]Causal relationships between variables in the model and performing significance test, so as to obtain t ₀ Whether +L/2 is greater than or equal to t _n As a criterion, successively calculate at t ₀ ＝t ₀ And (5) causality relation data at the time of +1, and finally obtaining a causality network graph changing along with time.

The technical scheme is further verified by taking a specific example:

first embodiment:

inferring from the company stock prices the relationship between the two companies:

acquiring a time sequence of Amazon, google and Walmart stock prices, wherein the time is from the beginning of the 19 th 8 th year of Google2004 to the end of the 26 th 12 th year of 2014, removing trade days and holidays, and totally counting 2607 data points, and preprocessing price data to obtain log daily gain due to unstable price sequence, so as to form a time sequence, wherein the preprocessing mode comprises the following specific steps: if P (n) represents the closing price on the n-th day, r (n) = lnP (n+1) -lnP (n) is the log-day gain, and then the causal relationship between the two at each time point is obtained by taking 1000 trade days as the sliding time window length, the result as shown in fig. 2 can be obtained, and it can be seen from the graph that the information flow from Amazon to Google is insignificant at 90% probability, that is, amazon is never the factor of Google, but Google is the factor of Amazon before the autumn of 2010, and the information flow is significantly not zero. The reason for this is because Amazon has heretofore relied on Google, a search engine, which is a search engine on the principle of online shopping. But after the fall of 2010, amazon is promptly abandoned in Google in order not to lose the huge market of china, so Google is no longer the cause of Amazon since then.

Specific embodiment II:

presuming the source of the pollutant through the pollutant concentration of different regional positions:

in order to understand the cause of pollution in the open sea of Jiangsu, the method of the invention is used for carrying out causal analysis on pollutant index sequences at certain selected points, and the calculation process is shown in figure 5. In the first step, as shown in fig. 3, three points a (120.5 e,34.8 n), B (121.5 e,33.5 n) and C (122.5 e,31.8 n) are selected, which correspond to the harbour, the salt city and the Shanghai open sea respectively, pollutant concentration data of nearly twenty years are taken, the time span is the first year of 1998 to the end of 2016, the output interval of each step is 15 days, and the total of 456 time points are shown in fig. 4, and the time sequence of the obtained nutrient salt (NO 3, siO3 and PO 4) and chlorophyll is shown in fig. 4. The following are the information flows calculated with steps 1 to 5 and their 90% confidence intervals for the causal relationship of various nutrient salts with chlorophyll between the three A, B, C points:

nitrate (NO 3)

T _B→A ＝-0.038621±0.040181

T _A→B ＝0.204099±0.046894

T _C→A ＝-0.049355±0.039681

T _A→C ＝0.335230±0.048638

T _C→B ＝0.124917±0.051953

T _B→C ＝0.019727±0.054564

Silicate (SiO 3)

T _B→A ＝0.092092±0.030884

T _A→B ＝-0.043766±0.019159

T _C→A ＝0.026947±0.024741

T _A→C ＝-0.030499±0.017957

T _C→B ＝0.258933±0.042229

T _B→C ＝0.059791±0.049407

Phosphate (PO 4)

T _B→A ＝0.110934±0.055773

T _A→B ＝0.208629±0.073974

T _C→A ＝0.027468±0.053460

T _A→C ＝0.322688±0.070432

T _C→B ＝0.108395±0.066720

T _B→C ＝0.077843±0.066274

Chlorophyll-a (Chl-a)

T _B→A ＝-0.016813±0.044458

T _A→B ＝0.120196±0.049673

T _C→A ＝0.176085±0.051851

T _A→C ＝0.207379±0.053923

T _C→B ＝0.186485±0.054840

T _B→C ＝0.041304±0.051044

For these several indexes, it can be seen that the north and south sea areas are both the factors of the salt city and the open sea, so that the pollution of the salt city and the open sea is not only the result of the salt city and the sea area, but also the neighbor from the north and the south. Also, in the case of silicate, phosphate, the salt city open sea is indeed the cause of the open sea in the cloud harbor, the open sea in the Shanghai, because of the information flow T _B→A ，T _B→C Significantly different from zero.

However, it is quite unexpected that T for nitrate, chlorophyll _B→A， T _B→C And not significantly different from zero, that is, from this data sample, the salt city open sea does not appear to be the cause of the Shanghai and the Liyun harbor open sea because the information flow is not significantly zero. If pollution from nitrate and chlorophyll (such as algal bloom) is to be discussed, although pollution from the outside sea in salt city is serious, although pollution from the north of su has long been thought to pollute the mouth of the Yangtze river by the coastal flow of su in Jiangsu, it is not the cause of the outside sea in Shanghai and Liyun harbor.

Third embodiment:

haze tracing is carried out through PM2.5 concentration in the atmosphere of different regional positions:

as shown in fig. 6 and 7, by acquiring PM2.5 concentration data in the atmosphere within a certain time period of the multi-region PM2.5 concentration monitoring station, a corresponding time sequence is formed, and a haze cause and effect network diagram among multiple sites can be obtained through calculation, so that one or more main sources can be found.

It will be further appreciated from the above description of various embodiments and computing principles that the present method can infer causal relationships between events by simple computation, and that such causal relationships are output in quantitative form, thereby solving the problem of causal relationship inference. In general, causal inference in scientific research and engineering application depends on a large number of experiments such as physics, chemistry, biology and the like or computer simulation, and huge cost is consumed, but the result is not correct, compared with the method, the calculation cost is almost negligible. The method has wide application in a plurality of scientific fields such as finance, climate, environment, neuroscience and the like.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. A quantitative causal relation judging method of a multidimensional time sequence is characterized in that: the magnitude and direction of the causal relation among the multidimensional time series can be quantitatively estimated, and the specific implementation scheme is as follows:

given m stationary, equidistant time series X ₁ ,X ₂ ,…,X _m RecordingFrom X _j To X _i I, j=1, 2,..m, i+.j information stream is T _j→i X is then _j And X _i Causal relationships between can be defined by _j→i And T is _i→j To make a judgment, T _j→i Can be estimated by the following equation:

wherein,for information flow T _j→i Maximum likelihood estimate of (a); />For m time series X ₁ ,X ₂ ,…,X _m Is a covariance matrix of (a); detC is a determinant of C; and (V) _jk Is C _jk Algebraic remainder of (2); c (C) _k,di Is X _k And a derived sequence->Covariance between, derived sequence is defined as:

Δt is the time interval of the sequence;

when (when)Significantly different from zero, then X _j Is X _i Cause of (1)/(2)>The size of the causal relationship;

step 5, integrating the causal relationship data between every two time sequences obtained by calculation in the step 4 to obtain a complete causal relationship network diagram;

the method comprises the steps of obtaining PM2.5 concentration data in the atmosphere in a certain time period of a multi-region PM2.5 concentration monitoring station, forming a corresponding time sequence, obtaining a haze causal network diagram among multiple places through calculation, and further finding one or more main sources.

2. The quantitative causal relationship determination method of the multidimensional time series of claim 1, wherein: selecting a time point t from the time series of the data stabilization obtained in the step 3 _n At a time point t _n Time window [ t ] is taken for center _n -L/2,t _n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t _n A causal relationship network graph at the location;

3. The quantitative causal relationship determination method of a multidimensional time series according to claim 1 or 2, wherein: the step 4 is performedA saliency test must be performed to determine if it is significantly different from zero, or not significantly different from zero, and when the saliency test result is significantly different from zero, it is proved that there is a causal relationship between the two events, and when the saliency test result is not significantly different from zero, it is proved that there is not significant causal relationship between the two events; the specific flow of the significance test is as follows:

x is to be _i And X _j Arranged to the first and second positions,conversion to->In deriving the formula of step 4, we use the following linear model:

wherein the method comprises the steps ofIs sequence X _k Arithmetic mean of (2); />Is the sequence->Arithmetic mean of (2);n is the total time step number of the sequence, according to the central limit theorem,approximately obeys a normal distribution, where only +.>Let its variance beIt relates to Fisher information matrix I, and Fisher information matrix I is calculated as follows:

The above formula can be demonstrated as:

a in the matrix _1j ,f ₁ ,b ₁ Using its estimateReplacement;

inverting the obtained matrix NI, the fourth diagonal component of the obtained inverse matrixI.e. variance->Is a function of the estimated value of (2);the variance of (c) is: />

Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z _α Further obtainStandard deviation of (2)In other words, for confidence level α, X ₂ To X ₁ The confidence interval of the true causality of (2) is:

when the confidence interval does not include zero, X ₂ Is X ₁ Cause of (1); otherwise, the causality is not obvious; for arbitrary sequence X _i And X is _j Significance test of causal relationship between the two, X is only needed _i And X is _j And (5) placing the materials at the first and second positions, and repeating the steps.

4. A method of quantitative causal relationship determination of a multidimensional time series according to claim 3, wherein: when the confidence level is set to 90%, the confidence interval coefficient z _α =1.65, where the confidence interval is:

confidence interval coefficient z when confidence level is set to 95% _α =1.96, where the confidence interval is:

when the confidence level is set to 99%, the confidence interval coefficient z _α =2.56, where the confidence interval is:

5. a method of quantitative causal relationship determination of a multidimensional time series according to claim 3, wherein: when X is _i And X _j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.