CN109886409B - Quantitative causal relationship judging method of multidimensional time sequence - Google Patents

Quantitative causal relationship judging method of multidimensional time sequence Download PDF

Info

Publication number
CN109886409B
CN109886409B CN201910115769.0A CN201910115769A CN109886409B CN 109886409 B CN109886409 B CN 109886409B CN 201910115769 A CN201910115769 A CN 201910115769A CN 109886409 B CN109886409 B CN 109886409B
Authority
CN
China
Prior art keywords
time
causal relationship
sequence
causal
quantitative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910115769.0A
Other languages
Chinese (zh)
Other versions
CN109886409A (en
Inventor
梁湘三
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201910115769.0A priority Critical patent/CN109886409B/en
Publication of CN109886409A publication Critical patent/CN109886409A/en
Application granted granted Critical
Publication of CN109886409B publication Critical patent/CN109886409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a quantitative causal relation judging method of a multidimensional time sequence, in particular to a flow packetCollecting m time sequences; preprocessing each time sequence to obtain a plurality of stable time sequences with the same time interval; calculating quantitative and directional causal relationship between different time sequences, wherein any two time sequences X j And X i The causal relationship between the two information flows can be based on Liang-Kleeman information flow T j→i And T is i→j To make a judgment, T j→i Estimated by the following formula:and finally, integrating the causality between each two time sequences obtained through calculation to obtain a complete causality network diagram. The method can infer the causal relationship among a plurality of events through simple calculation, and the causal relationship is output in a quantitative mode, so that the problem of causal relationship inference is solved, and the method has wide and important application in a plurality of important fields such as finance, climate, environment, neuroscience, weather forecast, artificial intelligence, big data science and the like.

Description

Quantitative causal relationship judging method of multidimensional time sequence
Technical Field
The invention belongs to the technical field of big data technical processing, and particularly relates to a quantitative causal relationship judging method of a multidimensional time sequence.
Background
Causal analysis is a core problem of scientific research and is a direct goal of considerable scientific research. In one letter to j.s.switch, einstein has described: the development of western science is determined by two great achievements: the logic system is invented by Greek philosophy, and the causal relationship among events can be deduced through systematic experiments in the period of the rendition. Now suppose we have made experiments on two dynamic events to get two time series, then one of the most fundamental problems comes: we cannot exactly distinguish whether it is the cause or the effect exactly from the two sequences? Or, if there is a entanglement like "pre-chicken or pre-egg" between them, we can quantitatively characterize the circulating causality between them?
This is an ancient, well-known interdisciplinary problem, and scientists including the Nobel prize acquirer, greenworker (Clive Granger), have struggled for half a century. In fact, it is listed as one of the greatest challenges in today's emerging disciplines, big data science (O' Neil & Schutt,2013, p.274). Over the last decade, liang & Kleeman (2005), liang (2008), liang (2014), liang (2016), etc. have demonstrated through various efforts that this causal relationship can be strictly derived from the original physical laws, and does not have to appear in a semi-empirical form as in the traditional approach. Conventional causal analysis (such as Granger causal verification and transitive entropy analysis) in many cases fails to verify such an observed fact, known as the "zero cause criterion": if the evolution of one event does not depend on another event, the latter is not the cause of the former, but in this new architecture this fact appears as a proven theorem.
The causal analysis method can be generally described as follows for an m-dimensional random power system
Wherein x= (x 1 ,x 2 ,...,x m ) Is a state vector, F is an m-dimensional vector field,is an m-dimensional white noise vector, b= (B) ij ) m×m Is an amplitude matrix of noise disturbance, then is composed of the component x j To x i Liang-Kleeman information stream of (Liang, 2016)
Wherein the method comprises the steps ofρ j|i (x j |x i ) Is x j For x i Conditional probability density of>Theoretically, if T j→i Not equal to 0, x j Is x i Cause of (1), and T j→i Is the causal strength; and vice versa.
The above theory proves to be strictly true for any power system, in sharp contrast to the existing gladhand causal tests or empirical causal tests such as transfer entropy analysis. However, the formula (II) is very complex, which results in that it is precisely established in classical problems, but is hardly solved in practical problems, which greatly constrains its application value. To solve this problem, liang (2014) proposes an estimation method and gets a fairly compact formula for the two-dimensional problem. This formula has been validated against many existing semi-empirical causal analyses (e.g., granger causal tests, transitive entropy analysis) and has been successfully used in more and more realistic problems. For example, with only the stock price time series of two IBM and GE companies downloaded from Yahoo, it was found that there was a strong, nearly unidirectional causal relationship from IBM to GE over a period of time in the 70 s, and thus a dust seal of decades was explored about "seven dwarfs" and "giant" competing for the maxima market. In another example, stips et al (2016) use this formula to find a clear, nearly unidirectional causal relationship between carbon dioxide and global warming. For the last hundred years CO2 did contribute to global warming, but on the ancient climate scale of over a thousand years this causality could be reversed completely, as global warming resulted in an increase in CO2 concentration.
The estimation of Liang (2014) is very successful for two-dimensional Liang-Kleeman information flow, i.e. the (II) formula when m=2, but its success is limited to the two-dimensional case (i.e. the case of m=2), and is not sufficient for the multidimensional case.
Disclosure of Invention
The invention aims to provide a quantitative causal relation judging method of a multidimensional time sequence, which can quantitatively obtain causal relations among a plurality of events.
In order to solve the technical problems, the invention adopts the following technical scheme:
a quantitative causal relation judging method of a multidimensional time sequence is characterized in that: the magnitude and direction of the causal relation among the multidimensional time series can be quantitatively estimated, and the specific implementation scheme is as follows:
step 1, collecting m time sequences, wherein each sequence corresponds to an event, and the m time sequences are respectively X 1 ,X 2 ,…,X m
Step 2, preprocessing each time sequence to obtain a plurality of time sequences with the same time interval;
step 3, when some time sequences in the step 2 are not stable, further processing is needed to make the sequences become approximately stable; when the time sequence in the step 2 is stable, performing a step 4;
step 4, calculating quantitative and directional causal relations among the time sequences; the specific calculation method is as follows:
given m stationary, equidistant time series X 1 ,X 2 ,…,X m Let X be j To X i Information flow of (i, j=1, 2,..m, i+.j) is T j→i X is then j And X i Causal relationships between can be defined by j→i And T is i→j To make a judgment, T j→i Can be estimated by the following equation:
wherein,for information flow T j→i Maximum likelihood estimate of (a); />For m time series X 1 ,X 2 ,…,X m Is a covariance matrix of (a); detC is a determinant of C; delta jk Is C jk Algebraic remainder of (2); c (C) k,di Is X k And a derived sequence->Covariance between>The definition is as follows: />Δt is the time interval of the sequence;
theoretically whenNon-zero, then X j Is X i Cause of (1)/(2)>Is the size of the causal relationship and is about +.>Performing significance test;
and 5, integrating the causal relationship data between every two time sequences obtained by calculation in the step 4 to obtain a complete causal relationship network diagram.
Selecting a time point t from the time series of the data stabilization obtained in the step 3 n At a time point t n Time window [ t ] is taken for center n -L/2,t n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t n A causal relationship network graph at the location;
and (5) sequentially taking time points of different positions according to the time sequence in the time sequence, and repeatedly executing the step (4) and the step (5) to obtain a causal relationship network graph changing along with time.
The step 4 is performedA saliency check must be performed to determine whether it differs significantly from zero, or not, when the saliency check result differs significantly from zero, the causal relationship between two events is proved, and when the saliency check result does not differ significantly from zero, the causal relationship between two events is proved not to be significant, the specific flow of the saliency check is as follows:
x is to be i And X j Arranged to the first and second positions at this timeConversion to->The method in the derivation step 4In the process of the formula, we use a linear model:
wherein x= (X 1 ,X 2 ,...,X m ) Is a state vector, here represented as m time series,is white noise, (f) 1 ,b 1 ,a 11 ,a 12 ,...,a 1m ) Is a parameter to be estimated, and it can be demonstrated that the maximum likelihood estimation of these parameters is:
wherein the method comprises the steps ofFor m time series X 1 ,X 2 ,…,X m Is a covariance matrix of (a); detC is a determinant of C; delta jk Is C jk Algebraic remainder of (2); c (C) k,di Is X k And a derived sequence->Covariance between>The definition is as follows:Δt is the time interval of the sequence; />Is sequence X k Arithmetic mean of (2); />Is a sequence ofArithmetic mean of (2); />N is the sequence length (total number of time steps), according to the central limit theorem, ++>Approximately obeys a normal distribution, where only +.>Let its variance beIt relates to Fisher information matrix I, and Fisher information matrix I is calculated as follows:
note the state vector x= (X) at time point n 1 ,X 2 ,...,X m ) For X (N), n=1, 2,..n, N, assuming that X (N) = (X 1,n ,X 2,n ,...,X m,n ) In the case of (2), the next state vector X (n+1) = (X) 1,n+1 ,X 2,n+1 ,...,X m,n+1 ) If the probability density of the transition probability density is ρ, the Fisher information matrix multiplied by the total number of steps of time N is
The above formula can be demonstrated as:
a in the matrix 1j ,f 1 ,b 1 Using its estimateInstead of the above-mentioned, the method,
inverting the obtained matrix (NI), the fourth diagonal component of the obtained inverse matrixI.e. variance->Is used for the estimation of the (c),
the variance of (c) is: />
Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z α Further obtainStandard deviation of +.>In other words, for confidence level α, X 2 To X 1 The confidence interval of the true causality of (2) is:
when the confidence interval does not include zero, X 2 Is X 1 Cause of (1); otherwise, the causality is not obvious. For arbitrary sequence X i And X is j Significance test of causal relationship between the two, X is only needed i And X is j And (5) placing the materials at the first and second positions, and repeating the steps.
When the confidence level is set to 90%, the confidence interval coefficient z α =1.65, where the confidence interval is:
when the confidence level is set to 95%, the confidence interval coefficient z=1.96, at which time the confidence interval is:
when the confidence level is set to 99%, the confidence interval coefficient z=2.56, at which time the confidence interval is:
when X is i And X j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.
The quantitative causal relation judging method of the multidimensional time sequence has the beneficial effects that: the method can infer the causal relationship among a plurality of events through simple calculation, and the causal relationship is output in a quantitative form, so that the problem of causal relationship inference is solved. In general, causal inference in scientific research and engineering application depends on a large number of experiments such as physics, chemistry, biology and the like or computer simulation, and huge cost is consumed, but the result is not correct, compared with the method, the calculation cost is almost negligible. The method has wide application in various aspects such as financial analysis, neuroscience research, climate change research, weather forecast, earthquake prediction, haze tracing and the like.
Drawings
FIG. 1 is a workflow diagram of a quantitative causal relationship determination method of the multi-dimensional time series of the present invention.
FIG. 2 is a schematic diagram of the causal relationship between companies over time inferred from stock prices using the method of the present invention.
FIG. 3 is a geographical map of data acquisition used in inferring the source of a contaminant using the method of the present invention.
FIG. 4 is a time series of the various contaminant concentrations across the site when the source of the contaminant is inferred using the method of the present invention.
FIG. 5 is a flow chart of the operation of the method of the present invention to infer the source of contaminants.
FIG. 6 is a spatial distribution diagram of contaminant concentration when tracing a haze source using the method of the present invention.
FIG. 7 is a workflow diagram of a haze source tracing process using the method of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments.
A quantitative causal relation judging method of a multidimensional time sequence is characterized in that: the magnitude and direction of the causal relation among the multidimensional time series can be quantitatively estimated, and the specific implementation scheme is as follows:
step 1, collecting m time sequences, wherein each sequence corresponds to an event, and the m time sequences are respectively X 1 ,X 2 ,…,X m
Step 2, preprocessing each time sequence to obtain a plurality of time sequences with the same time interval;
step 3, when some time sequences in the step 2 are not stable, further processing is needed to make the sequences become approximately stable; when the time sequence in the step 2 is stable, performing a step 4;
step 4, calculating quantitative and directional causal relations among the time sequences; the specific calculation method is as follows:
given m stationary, equidistant time series X 1 ,X 2 ,…,X m Let X be j To X i Information flow of (i, j=1, 2,..m, i+.j) is T j→i X is then j And X i Causal relationships between can be defined by j→i And T is i→j To make a judgment, T j→i Can be estimated by the following equation:
wherein,for information flow T j→i Maximum likelihood estimate of (a); />For m time series X 1 ,X 2 ,…,X m Is a covariance matrix of (a); detC is a determinant of C; delta jk Is C jk Algebraic remainder of (2); c (C) k,di Is X k And a derived sequence->Covariance between>The definition is as follows: />Δt is the time interval of the sequence;
theoretically whenNon-zero, then X j Is X i Cause of (1)/(2)>Is the size of the causal relationship and is about +.>Performing significance test;
and 5, integrating the causal relationship data between every two time sequences obtained by calculation in the step 4 to obtain a complete causal relationship network diagram.
In this embodiment, the number of m time sequences acquired in step 1 is greater than or equal to two, and the possible existence of a causal relationship between any two time sequences is: has no causality, one-way causality and two-way causality.
In this embodiment, the preprocessing method for each time sequence in step 2 includes, but is not limited to, supplementing the missing point data, mapping the irregular time distribution data to the regular time points, and so on, and finally obtaining a plurality of time sequences with accurate data mapping and the same time length.
In this embodiment, the preprocessing mode performed on the time series in step 3 when the data is not stable includes, but is not limited to, performing linear trend removal processing on the series, or transforming to another series, such as converting the stock price series into a yield series, etc. The pretreated time series can be judged by various classical stability criteria, and the time series can be approximately stable according to visual inspection in the embodiment to meet the requirements.
In this embodiment, the quantitative and directional causal relationship between each time series, X, is calculated in step 4 j And X i Causal relationships between can be defined by j→i And T is i→j To judge: if T j→i Not equal to 0, then X j Is X i Because, if zero, it is not; if T i→j Not equal to 0, then X i Is X j If zero, it is not. There are four possibilities here in total: (1) X is X j Is X i Cause of (1), X i Also X j Cause of (1); (2) X is X j Is X i Cause of (1), X i Other than X j Cause of (1); (3) X is X j Other than X i Cause of (1), X i Is X j Cause of (1); (4) X is X j And X is i There is no causal relationship between them. Again, the order i, j=1, 2,..m, is calculated, resulting in information flow and causal relationships between all time sequences.
In this embodiment, the data T is estimated for each of the information streams obtained in step 4 j→i A significance test is performed to determine whether it is significantly different from zero, or not significantly different from zero, when the significance test result is significantly different from zero, the causal relationship between the two events is proved, and when the significance test result is not significantly different from zero, the causal relationship between the two events is not proved, the specific flow of the significance test is as follows:
x is to be i And X j Arranged to the first and second positions at this timeConversion to->In deriving the formula of step 4, we use a linear model:
wherein x= (X 1 ,X 2 ,...,X m ) Is a state vector, here represented as m time series,is white noise, (f) 1 ,b 1 ,a 11 ,a 12 ,...,a 1m ) Is a parameter to be estimated, and it can be demonstrated that the maximum likelihood estimation of these parameters is:
wherein the method comprises the steps ofFor m time series X 1 ,X 2 ,…,X m Is a covariance matrix of (a); detC is a determinant of C; delta jk Is C jk Algebraic remainder of (2); c (C) k,di Is X k And a derived sequence->Covariance between>The definition is as follows:Δt is the time interval of the sequence; />Is sequence X k Arithmetic mean of (2); />Is a sequence ofArithmetic mean of (2); />N is the sequence length (total number of time steps), according to the central limit theorem, ++>Approximately obeys a normal distribution, where only +.>Let its variance beIt relates to Fisher information matrix I, and Fisher information matrix I is calculated as follows:
note the state vector x= (X) at time point n 1 ,X 2 ,...,X m ) For X (N), n=1, 2,..n, N, assuming that X (N) = (X 1,n ,X 2,n ,...,X m,n ) In the case of (2), the next state vector X (n+1) = (X) 1,n+1 ,X 2,n+1 ,...,X m,n+1 ) If the probability density of the transition probability density is ρ, the Fisher information matrix multiplied by the total number of steps of time N is
The above formula can be demonstrated as:
a in the matrix 1j ,f 1 ,b 1 Using its estimateInstead of
Inverting the obtained matrix (NI), the fourth diagonal component of the obtained inverse matrixI.e. variance->Estimated value of ∈10->The variance of (c) is: />
Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z α Further calculateStandard deviation of (2)In other words, for confidence level α, X 2 To X 1 The confidence interval of the true causality of (2) is:
when the confidence interval does not include zero, X 2 Is X 1 Cause of (1); otherwise, the causality is not obvious. For arbitrary sequence X i And X is j Significance test of causal relationship between the two, X is only needed i And X is j And (5) placing the materials at the first and second positions, and repeating the steps.
Further, when the confidence level is set to 90%, the confidence interval coefficient z α =1.65, where the confidence interval is:
when the confidence level is set to 95%, the confidence interval coefficient z=1.96, at which time the confidence interval is:
when the confidence level is set to 99%, the confidence interval coefficient z=2.56, at which time the confidence interval is:
in this embodiment, when the result of the calculation in step 4 appears X i And X j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.
Further, expanding the data samples includes, but is not limited to, increasing the sequence length, increasing the data temporal resolution, or both.
In this embodiment, the step 5 may be applied to an optional time window with a length L, that is, a causal relationship network between multiple time sequences in the window may be obtained, and further a causal relationship network that varies with time may be obtained, which is implemented as follows:
selecting a time point t from the stationary time series obtained in the step 3 n At a time point t n Time window [ t ] is taken for center n -L/2,t n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t n A causal relationship network graph at the location;
and (5) sequentially taking time points of different positions according to the time sequence in the time sequence, and repeatedly executing the step (4) and the step (5) to obtain a causal relationship network graph changing along with time. The specific implementation steps are as follows:
determining the length of a time window for estimating the causal relationship as L and the starting time t 0 =l/2+1, total time series length t n Calculate [ t ] 0 -L/2,t 0 +L/2]Causal relationships between variables in the model and performing significance test, so as to obtain t 0 Whether +L/2 is greater than or equal to t n As a criterion, successively calculate at t 0 =t 0 And (5) causality relation data at the time of +1, and finally obtaining a causality network graph changing along with time.
The technical scheme is further verified by taking a specific example:
first embodiment:
inferring from the company stock prices the relationship between the two companies:
acquiring a time sequence of Amazon, google and Walmart stock prices, wherein the time is from the beginning of the 19 th 8 th year of Google2004 to the end of the 26 th 12 th year of 2014, removing trade days and holidays, and totally counting 2607 data points, and preprocessing price data to obtain log daily gain due to unstable price sequence, so as to form a time sequence, wherein the preprocessing mode comprises the following specific steps: if P (n) represents the closing price on the n-th day, r (n) = lnP (n+1) -lnP (n) is the log-day gain, and then the causal relationship between the two at each time point is obtained by taking 1000 trade days as the sliding time window length, the result as shown in fig. 2 can be obtained, and it can be seen from the graph that the information flow from Amazon to Google is insignificant at 90% probability, that is, amazon is never the factor of Google, but Google is the factor of Amazon before the autumn of 2010, and the information flow is significantly not zero. The reason for this is because Amazon has heretofore relied on Google, a search engine, which is a search engine on the principle of online shopping. But after the fall of 2010, amazon is promptly abandoned in Google in order not to lose the huge market of china, so Google is no longer the cause of Amazon since then.
Specific embodiment II:
presuming the source of the pollutant through the pollutant concentration of different regional positions:
in order to understand the cause of pollution in the open sea of Jiangsu, the method of the invention is used for carrying out causal analysis on pollutant index sequences at certain selected points, and the calculation process is shown in figure 5. In the first step, as shown in fig. 3, three points a (120.5 e,34.8 n), B (121.5 e,33.5 n) and C (122.5 e,31.8 n) are selected, which correspond to the harbour, the salt city and the Shanghai open sea respectively, pollutant concentration data of nearly twenty years are taken, the time span is the first year of 1998 to the end of 2016, the output interval of each step is 15 days, and the total of 456 time points are shown in fig. 4, and the time sequence of the obtained nutrient salt (NO 3, siO3 and PO 4) and chlorophyll is shown in fig. 4. The following are the information flows calculated with steps 1 to 5 and their 90% confidence intervals for the causal relationship of various nutrient salts with chlorophyll between the three A, B, C points:
nitrate (NO 3)
T B→A =-0.038621±0.040181
T A→B =0.204099±0.046894
T C→A =-0.049355±0.039681
T A→C =0.335230±0.048638
T C→B =0.124917±0.051953
T B→C =0.019727±0.054564
Silicate (SiO 3)
T B→A =0.092092±0.030884
T A→B =-0.043766±0.019159
T C→A =0.026947±0.024741
T A→C =-0.030499±0.017957
T C→B =0.258933±0.042229
T B→C =0.059791±0.049407
Phosphate (PO 4)
T B→A =0.110934±0.055773
T A→B =0.208629±0.073974
T C→A =0.027468±0.053460
T A→C =0.322688±0.070432
T C→B =0.108395±0.066720
T B→C =0.077843±0.066274
Chlorophyll-a (Chl-a)
T B→A =-0.016813±0.044458
T A→B =0.120196±0.049673
T C→A =0.176085±0.051851
T A→C =0.207379±0.053923
T C→B =0.186485±0.054840
T B→C =0.041304±0.051044
For these several indexes, it can be seen that the north and south sea areas are both the factors of the salt city and the open sea, so that the pollution of the salt city and the open sea is not only the result of the salt city and the sea area, but also the neighbor from the north and the south. Also, in the case of silicate, phosphate, the salt city open sea is indeed the cause of the open sea in the cloud harbor, the open sea in the Shanghai, because of the information flow T B→A ,T B→C Significantly different from zero.
However, it is quite unexpected that T for nitrate, chlorophyll B→A, T B→C And not significantly different from zero, that is, from this data sample, the salt city open sea does not appear to be the cause of the Shanghai and the Liyun harbor open sea because the information flow is not significantly zero. If pollution from nitrate and chlorophyll (such as algal bloom) is to be discussed, although pollution from the outside sea in salt city is serious, although pollution from the north of su has long been thought to pollute the mouth of the Yangtze river by the coastal flow of su in Jiangsu, it is not the cause of the outside sea in Shanghai and Liyun harbor.
Third embodiment:
haze tracing is carried out through PM2.5 concentration in the atmosphere of different regional positions:
as shown in fig. 6 and 7, by acquiring PM2.5 concentration data in the atmosphere within a certain time period of the multi-region PM2.5 concentration monitoring station, a corresponding time sequence is formed, and a haze cause and effect network diagram among multiple sites can be obtained through calculation, so that one or more main sources can be found.
It will be further appreciated from the above description of various embodiments and computing principles that the present method can infer causal relationships between events by simple computation, and that such causal relationships are output in quantitative form, thereby solving the problem of causal relationship inference. In general, causal inference in scientific research and engineering application depends on a large number of experiments such as physics, chemistry, biology and the like or computer simulation, and huge cost is consumed, but the result is not correct, compared with the method, the calculation cost is almost negligible. The method has wide application in a plurality of scientific fields such as finance, climate, environment, neuroscience and the like.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (5)

1. A quantitative causal relation judging method of a multidimensional time sequence is characterized in that: the magnitude and direction of the causal relation among the multidimensional time series can be quantitatively estimated, and the specific implementation scheme is as follows:
step 1, collecting m time sequences, wherein each sequence corresponds to an event, and the m time sequences are respectively X 1 ,X 2 ,…,X m
Step 2, preprocessing each time sequence to obtain a plurality of time sequences with the same time interval;
step 3, when some time sequences in the step 2 are not stable, further processing is needed to make the sequences become approximately stable; when the time sequence in the step 2 is stable, performing a step 4;
step 4, calculating quantitative and directional causal relations among the time sequences; the specific calculation method is as follows:
given m stationary, equidistant time series X 1 ,X 2 ,…,X m RecordingFrom X j To X i I, j=1, 2,..m, i+.j information stream is T j→i X is then j And X i Causal relationships between can be defined by j→i And T is i→j To make a judgment, T j→i Can be estimated by the following equation:
wherein,for information flow T j→i Maximum likelihood estimate of (a); />For m time series X 1 ,X 2 ,…,X m Is a covariance matrix of (a); detC is a determinant of C; and (V) jk Is C jk Algebraic remainder of (2); c (C) k,di Is X k And a derived sequence->Covariance between, derived sequence is defined as:
Δt is the time interval of the sequence;
when (when)Significantly different from zero, then X j Is X i Cause of (1)/(2)>The size of the causal relationship;
step 5, integrating the causal relationship data between every two time sequences obtained by calculation in the step 4 to obtain a complete causal relationship network diagram;
the method comprises the steps of obtaining PM2.5 concentration data in the atmosphere in a certain time period of a multi-region PM2.5 concentration monitoring station, forming a corresponding time sequence, obtaining a haze causal network diagram among multiple places through calculation, and further finding one or more main sources.
2. The quantitative causal relationship determination method of the multidimensional time series of claim 1, wherein: selecting a time point t from the time series of the data stabilization obtained in the step 3 n At a time point t n Time window [ t ] is taken for center n -L/2,t n +L/2]Executing step 4 and step 5 under the condition of a time window with the time length of L to obtain a time point t n A causal relationship network graph at the location;
and (5) sequentially taking time points of different positions according to the time sequence in the time sequence, and repeatedly executing the step (4) and the step (5) to obtain a causal relationship network graph changing along with time.
3. The quantitative causal relationship determination method of a multidimensional time series according to claim 1 or 2, wherein: the step 4 is performedA saliency test must be performed to determine if it is significantly different from zero, or not significantly different from zero, and when the saliency test result is significantly different from zero, it is proved that there is a causal relationship between the two events, and when the saliency test result is not significantly different from zero, it is proved that there is not significant causal relationship between the two events; the specific flow of the significance test is as follows:
x is to be i And X j Arranged to the first and second positions,conversion to->In deriving the formula of step 4, we use the following linear model:
wherein x= (X 1 ,X 2 ,...,X m ) Is a state vector, here represented as m time series,is white noise, (f) 1 ,b 1 ,a 11 ,a 12 ,...,a 1m ) Is a parameter to be estimated, and it can be demonstrated that the maximum likelihood estimation of these parameters is:
wherein the method comprises the steps ofIs sequence X k Arithmetic mean of (2); />Is the sequence->Arithmetic mean of (2);n is the total time step number of the sequence, according to the central limit theorem,approximately obeys a normal distribution, where only +.>Let its variance beIt relates to Fisher information matrix I, and Fisher information matrix I is calculated as follows:
note the state vector x= (X) at time point n 1 ,X 2 ,...,X m ) For X (N), n=1, 2,..n, N, assuming that X (N) = (X 1,n ,X 2,n ,...,X m,n ) In the case of (2), the next state vector X (n+1) = (X) 1,n+1 ,X 2,n+1 ,...,X m,n+1 ) If the probability density of the transition probability density is ρ, the Fisher information matrix multiplied by the total number of steps of time N is
The above formula can be demonstrated as:
a in the matrix 1j ,f 1 ,b 1 Using its estimateReplacement;
inverting the obtained matrix NI, the fourth diagonal component of the obtained inverse matrixI.e. variance->Is a function of the estimated value of (2);the variance of (c) is: />
Given the confidence level alpha, looking up a standard normal distribution table to obtain a confidence interval coefficient z α Further obtainStandard deviation of (2)In other words, for confidence level α, X 2 To X 1 The confidence interval of the true causality of (2) is:
when the confidence interval does not include zero, X 2 Is X 1 Cause of (1); otherwise, the causality is not obvious; for arbitrary sequence X i And X is j Significance test of causal relationship between the two, X is only needed i And X is j And (5) placing the materials at the first and second positions, and repeating the steps.
4. A method of quantitative causal relationship determination of a multidimensional time series according to claim 3, wherein: when the confidence level is set to 90%, the confidence interval coefficient z α =1.65, where the confidence interval is:
confidence interval coefficient z when confidence level is set to 95% α =1.96, where the confidence interval is:
when the confidence level is set to 99%, the confidence interval coefficient z α =2.56, where the confidence interval is:
5. a method of quantitative causal relationship determination of a multidimensional time series according to claim 3, wherein: when X is i And X j When the causal relationship between the data is not obvious, the causal relationship between the data cannot be simply judged, and under the condition that the data and the data are allowed, the data sample is enlarged, and the steps 1 to 4 are repeatedly executed.
CN201910115769.0A 2019-02-15 2019-02-15 Quantitative causal relationship judging method of multidimensional time sequence Active CN109886409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910115769.0A CN109886409B (en) 2019-02-15 2019-02-15 Quantitative causal relationship judging method of multidimensional time sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910115769.0A CN109886409B (en) 2019-02-15 2019-02-15 Quantitative causal relationship judging method of multidimensional time sequence

Publications (2)

Publication Number Publication Date
CN109886409A CN109886409A (en) 2019-06-14
CN109886409B true CN109886409B (en) 2023-12-12

Family

ID=66928169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910115769.0A Active CN109886409B (en) 2019-02-15 2019-02-15 Quantitative causal relationship judging method of multidimensional time sequence

Country Status (1)

Country Link
CN (1) CN109886409B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026852B (en) * 2019-11-28 2023-06-30 广东工业大学 Financial event-oriented hybrid causal relationship discovery method
CN112884151B (en) * 2021-01-27 2022-07-05 武汉理工大学 Method and system for controlling environment of glass melting furnace based on causal reasoning
CN113627663B (en) * 2021-08-04 2023-11-10 浙江大学 Dynamic causal analysis method based on geographic time sequence in city
CN116187443B (en) * 2023-02-10 2024-05-24 中国科学院自动化研究所 Causal strength detection method and detection device based on multidimensional symbol dynamics

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171142A (en) * 2017-12-26 2018-06-15 中南大学 A kind of causal method of key variables in determining complex industrial process

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171142A (en) * 2017-12-26 2018-06-15 中南大学 A kind of causal method of key variables in determining complex industrial process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Unraveling the cause-effect relation between time series;X. San Liang;《https://journals.aps.org/pre/pdf/10.1103/PhysRevE.90.052150》;第1-11页 *

Also Published As

Publication number Publication date
CN109886409A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109886409B (en) Quantitative causal relationship judging method of multidimensional time sequence
Loy-Benitez et al. Soft sensor validation for monitoring and resilient control of sequential subway indoor air quality through memory-gated recurrent neural networks-based autoencoders
CN107529651A (en) A kind of urban transportation passenger flow forecasting and equipment based on deep learning
CN111367959B (en) Zero-time-lag nonlinear expansion Granger causal analysis method
CN114445634A (en) Sea wave height prediction method and system based on deep learning model
CN114049545B (en) Typhoon intensity determining method, system, equipment and medium based on point cloud voxels
Xu et al. Evaluation of spatiotemporal dynamics of simulated land use/cover in China using a probabilistic cellular automata-Markov model
CN103616732A (en) Quality control method and quality monitoring device of upper-air wind data
CN106295690A (en) Time series data clustering method based on Non-negative Matrix Factorization and system
CN117273152A (en) Pollutant tracing method by constructing quantitative causal graph
CN116467933A (en) Storm surge water increasing prediction method and system based on deep learning
Che et al. ED-DRAP: Encoder–decoder deep residual attention prediction network for radar echoes
CN116151459A (en) Power grid flood prevention risk probability prediction method and system based on improved Transformer
CN110348469A (en) A kind of user's method for measuring similarity based on DeepWalk internet startup disk model
Zhu et al. Multi-resolution spatio-temporal prediction with application to wind power generation
CN110019167A (en) Long-term new forms of energy resource data base construction method and system in one kind
CN114493035A (en) Enterprise default probability prediction method and device
Diodato et al. Long-term winter temperatures in central Mediterranean: forecast skill of an ensemble statistical model
Geng et al. Study on index model of tropical cyclone intensity change based on projection pursuit and evolution strategy
Yang et al. Data resolution improvement for ocean of things based on improved FCM
Koniaev-Gurchenko et al. Automated evaluation of smart city data from cloud-system
Xiao et al. Prediction of Monthly Rainfall in Plateau Area Based on Convolutional Neural Network
CN109697630A (en) A kind of businessman's volume of the flow of passengers multiplicity and prediction technique based on sparse regression
Zhang et al. Intelligent flood disaster forecasting based on improved neural network algorithm
Zhao et al. Prediction method of dust pollutant diffusion range in building demolition based on Euclidean distance transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant