Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to take account of the relationship between different index data and improve the reliability and rationality of a plurality of predicted index data, the specification provides an index data generation method and an index data generation device in an abnormal environment. The execution subject of the index data generation method in an abnormal environment provided in the present specification may be a server or a terminal.
It should be noted that, although the index data generation scheme in the abnormal environment provided in the present specification is described in the present specification with the bank pressure test as an application scenario, it should be understood that the index data generation scheme in the abnormal environment provided in the present specification may also be applied to other scenarios.
It should be noted that, in the embodiment of the present specification, the normal environment and the abnormal environment are relative, and may be defined by a trend of the index value, for example, if the overnight Shibor interest rate fluctuates between 1% and 2% for a long period of time without large fluctuation, the overnight Shibor interest rate for the long period of time is in the normal environment, and on this basis, if the overnight Shibor interest rate for a certain day rises to 5%, the overnight Shibor interest rate for the certain day is in the abnormal environment (or extreme environment).
A method for generating index data in an abnormal environment according to an embodiment of the present specification will be described in detail with reference to fig. 1 to 8.
As shown in fig. 1, in an embodiment, the method for generating index data in an abnormal environment provided by the present specification may include the following steps:
step 102, respectively preprocessing historical data of a plurality of indexes to obtain target data of the plurality of indexes, wherein the target data are normalized data of variation values among the historical data at different moments.
In this specification, an index is understood to be a criterion that can measure the level of a characteristic of a thing. For example, Shanghai Interbank Offered Rate (Shibor) is a good measure of currency market liquidity.
It is also understood that indices which characterize different things may differ, and that indices which characterize the same thing may be more than one. For example, an index that can measure the risk of bank credit may include both normal loan migration rate and bad loan migration rate, unlike an index that measures currency market liquidity. As also described in the background of the present specification, the indexes Shibor that can measure currency market liquidity may be more than one, but Shibor including 8 time limit structures of overnight (O/N), one week (1W), two weeks (2W), 1 month (1M), 3 months (3M), 6 months (6M), 9 months (9M), and one year (1Y).
In this specification, for convenience of description, the descriptions of the respective technical solutions provided in this specification are mainly given by taking Shibor with 8 different deadline structures as an example of the plurality of indexes in step 102.
In one example, step 102 may specifically include: calculating change values (which may be referred to as difference values) of data after an occurrence time with respect to data before the occurrence time for the historical data of the plurality of indexes at preset time intervals as steps, respectively, to obtain change value data of the plurality of indexes; and respectively carrying out standardization processing on the change value data of the indexes to obtain target data of the indexes. The preset time interval may be set according to actual needs, for example, assuming that historical data is used to predict Shibor interest rate of the first day in the future, and when the Shibor interest rate is used for bank stress test, the preset time interval may be set to one day; assuming that historical data is to be used to predict the Shibor interest rate the next day in the future for bank stress testing, the preset time interval may be set to two days, and so on.
For example, assume that the history data of one index of the plurality of indexes is { a }1,a2,a3,a4,…,ai,…,anIn which a1To anIs historical data collected in units of days, and a1To anAnd n is the number of the collected historical data, and i is 1,2, … and n. Then, when the preset time interval is 1 day, the obtained variation value data of the index is { a }2-a1,a3-a2,a4-a3,…,ai-ai-1,…,an-an-1}。
The normalization process may include a normalization process, and in particular, a Z-score (zscore) may be used for the normalization process. It should be understood that the manner of normalization can be varied and is not specifically enumerated herein.
The target data of a certain index obtained by preprocessing the historical data of a plurality of indexes by the preprocessing method can be understood as normalized data of the change value of the index between historical data at different times.
And 104, performing principal component analysis on the target data of the indexes to obtain data of a plurality of principal components.
Principal Component Analysis (PCA) is a mathematical transformation method that transforms a given set of correlated variables into another set of uncorrelated variables by linear transformation, the new variables being arranged in order of decreasing eigenvalues, the transformation resulting in several variables with several Principal components.
In step 104, besides obtaining data of a plurality of principal components, the eigenvalue and transformation matrix of each principal component can be saved and recorded, which facilitates determination of the first principal component and the second principal component in subsequent steps and facilitates the inverse operation of principal component analysis in subsequent steps.
In the embodiment of the present specification, a plurality of indexes are used as a set of relevant variables by using principal component analysis, and are converted into another set of irrelevant variables by linear transformation, and a specific transformation method may adopt the prior art, and is not described in detail herein. For example, if the indexes are Shibor with 8 time limit structures, their target data can be shown in table 1, and data of the main components obtained by mathematical transformation can be shown in table 2. It should be understood that, in practical applications, the target data in table 1 are specific values, and the data of the principal components in table 2 are also specific values.
TABLE 1
Serial number
|
At night
|
One week
|
Two weeks
|
1 month
|
Three months old
|
6 months old
|
9 months old
|
1 year
|
1
|
a1 |
b1 |
c1 |
d1 |
e1 |
f1 |
e1 |
g1 |
2
|
a2 |
b2 |
c2 |
d2 |
e2 |
f2 |
e2 |
g2 |
···
|
···
|
···
|
···
|
···
|
···
|
···
|
···
|
···
|
n
|
an |
bn |
cn |
dn |
en |
fn |
en |
gn |
TABLE 2
Serial number
|
A first principal component
|
The second principal component
|
···
|
M main component
|
1
|
h1 |
j1 |
···
|
k1 |
2
|
h2 |
j2 |
···
|
k2 |
···
|
···
|
···
|
···
|
···
|
n
|
hn |
jn |
···
|
kn |
And 106, determining the data of the at least one principal component in the abnormal environment based on the distribution of the data of the at least one principal component and a preset confidence level, wherein the preset confidence level is a confidence level of the data of the plurality of principal components belonging to the data in the normal environment.
In one example, determining the data of the at least one principal component in the abnormal environment based on the distribution of the data of the at least one principal component and the preset confidence level may include: determining a confidence interval under the preset confidence level based on the characteristic value of the at least one principal component, the degree of freedom of the data of the at least one principal component and the preset confidence level; determining data of the at least one principal component in an anomalous environment based on the data of the at least one principal component falling outside the confidence interval.
On this basis, the preset confidence level may also be understood as a confidence level at which the data of the plurality of principal components fall within the above confidence interval. Confidence levels are often expressed in terms of percentages, i.e., confidence levels may be expressed in terms of percentages.
In the first implementation manner of the foregoing example, assuming that the at least one principal component includes a first principal component and a second principal component, the first principal component being the principal component with the largest eigenvalue among the plurality of principal components, and the second principal component being the principal component with the second ranked in magnitude of eigenvalue among the plurality of principal components, the determining the confidence interval under the preset confidence level based on the eigenvalue of the at least one principal component, the degree of freedom of the data of the at least one principal component, and the preset confidence level may specifically include the following sub-steps:
sub-step 1062, drawing a two-dimensional scatter distribution diagram of the data of the first principal component and the data of the second principal component, wherein the two-dimensional scatter distribution diagram takes the first principal component and the second principal component as coordinate axes of a cartesian coordinate system.
As shown in fig. 2, a two-dimensional scatter plot of the data of the first principal component and the data of the second principal component may be plotted with the first principal component being an abscissa of a cartesian coordinate system and the second principal component being an ordinate of the cartesian coordinate system. In fig. 2, the coordinate value of one point may be represented by (the value of the first principal component, the value of the second principal component).
Sub-step 1064, determining a long axis of a confidence ellipse based on the eigenvalue of the first principal component, the degree of freedom of the data of the first principal component, and the preset confidence, and determining a short axis of the confidence ellipse based on the eigenvalue of the second principal component, the degree of freedom of the data of the second principal component, and the preset confidence, wherein the confidence ellipse is used for characterizing the confidence interval.
For example, the major axis and the minor axis of the confidence ellipse can be calculated by the following two formulas:
major axis sqrt (s x A)
Short axis sqrt (s B)
Wherein, the 'sqrt' represents the operation of the root opening number, A is the characteristic value of the first principal component, and B is the characteristic value of the second principal component; s-t 1-t 2-finv (p, t1, t3)/(t2), where "finv" represents the inverse function of the F distribution; p represents a preset confidence; t1 is the molecular degree of freedom of finv, where t1 is 2 since confidence intervals are determined for the two principal components; t2 is the number of data of the first principal component, or t2 is the number of data of the second principal component, as can be seen from table 2 above, where t2 is n; t3 is the denominator degree of freedom for finv, in this example t3 ═ t 2-1.
More specifically, assuming that n is 2223, p is 95%, t2 is 2223, and t3 is 2222, the corresponding is:
s=2*2223*finv(0.95,2,2222)/(2222)
sub-step 1066, based on the major axis and the minor axis, draws the confidence ellipse in the two-dimensional scatter plot centered at the origin.
Specifically, as shown in fig. 2, under different preset confidence levels, ellipses of different sizes centered on the origin may be drawn. In fig. 2, ellipses 10 to 50 are confidence ellipses at which the preset confidence levels are equal to 95%, 99%, 99.9%, 99.99%, and 99.999%, respectively. In the example shown in fig. 2, the long axis of the confidence ellipse is parallel to the coordinate axis corresponding to the first principal component, and the short axis of the confidence ellipse is parallel to the coordinate axis corresponding to the second principal component.
On this basis, the step of determining the data of the at least one principal component in an abnormal environment based on the data of the at least one principal component falling outside the confidence interval may include: and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the points which are positioned outside the confidence ellipse in the two-dimensional scatter point distribution diagram.
In a more specific embodiment, a point may be selected from points in either quadrant of the two-dimensional scatter plot that are outside the confidence ellipse as the data point in the anomalous environment; and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the coordinate values of the data points in the Cartesian coordinate system. Of course, in practical applications, more data points in the abnormal environment may be selected according to practical requirements, so as to determine more data of the first principal component and data of the second principal component.
In another more specific embodiment, the centroid of a point in any quadrant of the two-dimensional scatter plot that is outside the confidence ellipse can be taken as the data point in the anomalous environment; and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the coordinate values of the data points in the Cartesian coordinate system. Thus, a data point in an abnormal environment can be determined in each of the four quadrants.
For example, in FIG. 2, when the confidence ellipse is ellipse 10, one or more of centroid 11 of a point outside ellipse 10 in the first quadrant, centroid 12 of a point outside ellipse 10 in the second quadrant, centroid 13 of a point outside ellipse 10 in the third quadrant, and centroid 14 of a point outside ellipse 10 in the third quadrant may be taken as the data point in the anomalous environment. When the confidence ellipse is the ellipse 20, one or more of the centroid 21 of a point outside the ellipse 20 in the first quadrant, the centroid 22 of a point outside the ellipse 20 in the second quadrant, the centroid 23 of a point outside the ellipse 20 in the third quadrant, and the centroid 24 of a point outside the ellipse 20 in the third quadrant may be taken as data points in the abnormal environment. By analogy, when the confidence ellipse is the ellipse 30, one or more of the centroid 31, the centroid 32, the centroid 33, and the centroid 34 outside the ellipse 30 may be taken as a data point in the abnormal environment; when the confidence ellipse is ellipse 40, one or more of centroid 41, centroid 42, centroid 43, and centroid 44 outside ellipse 40 may be taken as the data point in the abnormal environment; alternatively, when the confidence ellipse is ellipse 50, one or more of centroid 51, centroid 52, centroid 53, and centroid 54 outside ellipse 50 may be taken as the data point in the abnormal environment.
It is understood that, since the centroid can reflect the general trend and the average level of the plurality of data, the centroid of the point outside the ellipse in any quadrant is used as the data point in the abnormal environment to obtain the data of the first principal component and the second principal component in the abnormal environment more reliably and reasonably.
Since the coordinate value of one point can be expressed by (the value of the first principal component, the value of the second principal component) in fig. 2, in the two embodiments for specifying the data point in the abnormal environment, the abscissa value of the data point can be specified as the data of the first principal component in the abnormal environment, and the ordinate value of the data point can be specified as the data of the second principal component in the abnormal environment.
Of course, in addition to using the scatter distribution diagram of the two principal components (the first principal component and the second principal component) to draw the confidence ellipse to determine the data of the two principal components in the abnormal environment, in a second embodiment, the determining the data of the at least one principal component in the abnormal environment based on the distribution and the preset confidence of the data of the at least one principal component may include: determining data of a first principal component under an abnormal environment based on the distribution of the data of the first principal component and preset confidence level; wherein the first principal component is a principal component having a largest eigenvalue among the plurality of principal components. I.e. the confidence interval is determined based on one principal component.
Specifically, a confidence interval under the preset confidence level may be determined based on a feature value of a first principal component, the preset confidence level, and a degree of freedom of data of the first principal component; and determining the data of the first principal component under the abnormal environment based on the data of the first principal component and the confidence interval.
At this time, since only the first principal component is selected to determine the confidence interval, the determined confidence interval may be a line segment on the one-dimensional axis, and the line segment is centered on the origin of the one-dimensional axis. This makes it possible to determine the data of the first principal component in an abnormal environment based on the points falling outside the line segment.
Alternatively, in a third embodiment, the determining the data of the at least one principal component in the abnormal environment based on the distribution and the preset confidence level of the data of the at least one principal component may include: determining the data of the first principal component, the data of the second principal component and the data of the third principal component under the abnormal environment based on the distribution and preset confidence level of the data of the first principal component, the data of the second principal component and the third principal component; wherein the first principal component is a principal component having a largest eigenvalue among the plurality of principal components, the second principal component is a principal component having a second ranked in magnitude of eigenvalues among the plurality of principal components, and the second principal component is a principal component having a third ranked in magnitude of eigenvalues among the plurality of principal components. I.e. the confidence interval is determined on the basis of the three principal components.
Specifically, the data of the first principal component, the data of the second principal component, and the three-dimensional scatter distribution chart of the third principal component may be drawn with the first principal component, the second principal component, and the third principal component as coordinate axes of a cartesian coordinate system; drawing a confidence ellipsoid under the preset confidence degree by taking an origin as a center in the three-dimensional scatter distribution diagram, wherein the determination method of the lengths of the three half axes of the confidence ellipsoid is similar to the method for determining the long axis and the short axis of the confidence ellipsoid, and repeated description is not provided herein; and then determining the data of the first principal component, the data of the second principal component and the data of the third principal component in the abnormal environment based on the points outside the confidence ellipsoid in the three-dimensional scatter point distribution diagram.
It is not difficult to imagine that, in step 106, the greater the number of principal components contained in the "at least one principal component", the more reasonable the confidence interval determined in step 108 is, so that the more reliable the index data under the predicted abnormal environment is. Of course, the greater the number of said "at least one principal component", the more complicated the process of determining the confidence interval. In practical applications, a balance can be made between these two aspects, and the number of suitable principal components included in the "at least one principal component" can be determined.
And 108, sequentially performing the inverse operation of the principal component analysis and the inverse operation of the preprocessing on the data of the at least one principal component in the abnormal environment to generate the prediction data of the multiple indexes in the abnormal environment.
It can be understood that the data of the principal component is not the index data, and the principal component data needs to be reversely reduced to obtain the index data. For example, target data of the index is obtained by inverse transformation of matrix transformation used in principal component analysis, and then denormalization and change value reduction are performed on the target data of the index to obtain prediction data of the index.
In the index data generation method in an abnormal environment provided in the embodiment of the present specification, the data of the plurality of principal components obtained by performing principal component analysis on the target data of the plurality of indexes can reflect the relation between the historical data of the plurality of indexes, so when the index data in the abnormal environment is predicted by using the principal component analysis, the relation between different indexes can be considered, the reliability and the rationality of the predicted data of the plurality of indexes are improved, and when the index data is used for performing a pressure test, the obtained pressure test result is more accurate.
In addition, compared with the traditional method, the index data generation method under the abnormal environment provided by the embodiment of the specification has a theoretical basis and strong interpretability, so that the predicted data is more reliable and more reasonable.
Studies have shown that, in the above embodiment, if the plurality of indexes are the Shibor interest rates of 8 different time-frame structures, the first principal component with the largest eigenvalue in the above can reflect the time-frame structure level, and the second principal component with the second largest eigenvalue can reflect the time-frame structure slope change, so that, in one example, the simulated prediction of the data of the plurality of principal components in the abnormal environment in step 106 can be simplified to the simulated prediction of the data of the first principal component and the second principal component in the abnormal environment.
More specifically, the first principal component may reflect the rising and descending conditions of the Shibor interest rates of the 8 time limit structures, if the first principal component is a positive value, the rising of the Shibor interest rate of the 8 time limit structures is predicted, otherwise, the descending of the Shibor interest rate of the 8 time limit structures is predicted; the second principal component may reflect a slope between the Shibor interest rates for different time frame structures, indicating that the difference between the Shibor interest rates for different time frame structures is increasing if the second principal component is positive, and decreasing otherwise. When the influences of the two principal components are added together, it is possible to predict an abnormal situation such as interest rate hang-up, which is a phenomenon in which the short-term Shibor interest rate is larger than the long-term Shibor interest rate.
Fig. 3 to 6 show the change relationship of the Shibor interest rates of the 8 time frame structures and the values of the first and second principal components, in fig. 3 to 6, reference numerals 31 to 38 respectively indicate the Shibor interest rates of the 8 time frame structures at night, one week, two weeks, 1 month, 3 months, 6 months, 9 months, and 1 year, and fig. 3 to 6 respectively indicate the change of the Shibor interest rates of the first to fourth quadrants in fig. 2.
Specifically, as shown in fig. 3 and 6, when the value of the first principal component is greater than 0, the Shibor interest rates of the 8 deadline structures all show an increasing trend; as shown in fig. 4 and 5, when the value of the first major component is less than 0, the Shibor interest rates of the 8 term structures all show a downward trend; as shown in fig. 3 and 4, when the value of the second principal component is greater than 0, the difference between the Shibor interest rates of different time frame structures increases; as shown in fig. 5 and 6, when the value of the second principal component is less than 0, the difference between the Shibor interest rates of different time frame structures decreases. And as can be seen from fig. 6, when the value of the first principal component is greater than 0, the Shibor interest rates of the 8 time limit structures rise as a whole, and when the value of the second principal component is less than 0, the difference between the Shibor interest rates of the different time limit structures decreases, and an extreme case where the interest rates hang up upside down occurs. As can be seen from fig. 3, when the value of the first principal component is less than 0, the Shibor interest rates of the 8 time limit structures decrease as a whole, and when the value of the second principal component is greater than 0, the difference between the Shibor interest rates of the different time limit structures increases, and an extreme case where the interest rates hang up upside down also occurs. This is in line with the extreme case of historical currency market liquidity mutation, and therefore, the index data predicted by the method provided by the specification is more reasonable and reliable.
Optionally, in another embodiment, on the basis of the example shown in fig. 2, as shown in fig. 7, before the step 108, the index data generation method in an abnormal environment provided in the embodiment of the present specification may further include:
and 110, determining data of other main components corresponding to the point in the quadrant where the data point is located, wherein the other main components are main components except the first main component and the second main component in the plurality of main components.
For example, assuming that the plurality of principal components determined in step 104 include a third principal component and a fourth principal component in addition to the first principal component and the second principal component, data of the third principal component and the fourth principal component corresponding to points in the four quadrants shown in fig. 2 needs to be determined in step 110.
And 112, determining a normal distribution graph of the data of the other main components based on the standard deviation of the data of the other main components.
In this step, the data of the third principal component and the data of the fourth principal component corresponding to the points in the four quadrants determined in step 110 are respectively calculated according to the difference between the corresponding quadrants: calculating the standard deviation of the data of the third principal component and calculating the standard deviation of the data of the fourth principal component; then, a normal distribution graph of the data of the third principal component is drawn based on the standard deviation of the data of the third principal component, and a normal distribution graph of the data of the fourth principal component is drawn based on the standard deviation of the data of the fourth principal component. And finally, obtaining the normal distribution diagram of the data of the third principal component and the normal distribution diagram of the data of the fourth principal component in different quadrants.
And step 114, determining the data of the other main components under the abnormal environment based on the normal distribution diagram.
The normal distribution graph can reflect the distribution rule of random samples, as shown in fig. 8, the normal distribution curve is a bell-shaped curve with a middle height and two gradually descending ends and is completely symmetrical, generally speaking, the samples farther from the center of the bell-shaped curve have smaller probability of appearing, and are usually samples in some abnormal environments. Therefore, data far from the center of the bell-shaped curve in the normal distribution diagram of the remaining principal components can be determined as data in an abnormal environment, for example, data at a position indicated by reference numeral 81 or 82 in fig. 8 can be determined as data in an abnormal environment.
As shown in fig. 7, step 108 may now include: and sequentially performing the inverse operation of the principal component analysis and the inverse operation of the preprocessing on the data of the first principal component, the data of the second principal component and the data of the other principal components in the abnormal environment to generate the prediction data of the indexes in the abnormal environment.
Specifically, in the abnormal environment, after the data of the first principal component, the data of the second principal component, and the data of the remaining principal components are merged, the inverse operation of the principal component analysis and the inverse operation of the preprocessing are sequentially performed, and the prediction data of the plurality of indexes in the abnormal environment may be generated.
For example, assume that the data point under the abnormal environment determined in step 106 is the centroid 11 in the first quadrant in the two-dimensional scattergram shown in fig. 2, and assume that the coordinates of the centroid 11 are (1, 2); in step 114, the values of the remaining principal components in the abnormal environment are determined to be (0.1, 0.2, 0.3, 0.4, 0.5, 0.6); such a set of (1, 2, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6) data can be combined, and then the inverse operation of the principal component analysis and the inverse operation of the preprocessing are performed on (1, 2, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6) to generate prediction data of a plurality of indexes under abnormal circumstances. Similar approaches may be used for centroids in other quadrants shown in fig. 2, and will not be repeated here.
It is to be understood that, in the method for determining index data in an abnormal environment provided by the embodiment shown in fig. 7, in addition to the data of the first principal component and the data of the second principal component in the abnormal environment, the data of the remaining principal components are merged together, and the data of a plurality of indexes are predicted in a reverse direction, so that the reliability and the rationality of the predicted data of the plurality of indexes can be further improved.
The above description has provided a method for generating index data in an abnormal environment, and the electronic device provided in the present description is described below.
Fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification. Referring to fig. 9, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program, and forms the index data generation device under the abnormal environment on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
preprocessing historical data of a plurality of indexes respectively to obtain target data of the indexes, wherein the target data are standardized data of change values among the historical data at different moments;
performing principal component analysis on the target data of the indexes to obtain data of a plurality of principal components;
determining data of at least one principal component in an abnormal environment based on distribution of the data of the at least one principal component and a preset confidence level, wherein the preset confidence level is a confidence level of the data of the plurality of principal components belonging to the data in a normal environment;
and sequentially performing the inverse operation of the principal component analysis and the inverse operation of the preprocessing on the data of the at least one principal component in the abnormal environment to generate the prediction data of the plurality of indexes in the abnormal environment.
The index data generation method in an abnormal environment as disclosed in the embodiment shown in fig. 1 in this specification can be applied to a processor, or can be implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the index data generation method in the abnormal environment shown in fig. 1, which is not described herein again.
Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular to perform the following:
preprocessing historical data of a plurality of indexes respectively to obtain target data of the indexes, wherein the target data are standardized data of change values among the historical data at different moments;
performing principal component analysis on the target data of the indexes to obtain data of a plurality of principal components;
determining data of at least one principal component in an abnormal environment based on distribution of the data of the at least one principal component and a preset confidence level, wherein the preset confidence level is a confidence level of the data of the plurality of principal components belonging to the data in a normal environment;
and sequentially performing the inverse operation of the principal component analysis and the inverse operation of the preprocessing on the data of the at least one principal component in the abnormal environment to generate the prediction data of the plurality of indexes in the abnormal environment.
Next, an index data generating device in an abnormal environment provided in the present specification will be described.
Fig. 10 is a schematic configuration diagram of the index data generating device 1000 in an abnormal environment provided in the present specification. Referring to fig. 10, in a software implementation, the index data generating apparatus 1000 in an abnormal environment may include: a preprocessing module 1001, a principal component analysis module 1002, a first determining module 1003 and a generating module 1004.
The preprocessing module 1001 is configured to respectively preprocess historical data of multiple indexes to obtain target data of the multiple indexes, where the target data is normalized data of a variation value between historical data at different times.
Optionally, the preprocessing module 1001 may be specifically configured to calculate, for the historical data of the multiple indexes, change values of data after the generation time with respect to data before the generation time by using preset time intervals as step lengths, respectively, to obtain change value data of the multiple indexes.
Respectively carrying out standardization processing on the variation value data of the indexes to obtain target data of the indexes
A principal component analysis module 1002, configured to perform principal component analysis on the target data of the multiple indexes to obtain data of multiple principal components.
A first determining module 1003, configured to determine, based on a distribution of data of at least one principal component and a preset confidence level, the data of the at least one principal component in an abnormal environment, where the preset confidence level is a confidence level that the data of the plurality of principal components belong to the data in a normal environment.
Optionally, the first determining module 1003 may be specifically configured to determine a confidence interval under the preset confidence level based on the feature value of the at least one principal component, the degree of freedom of the data of the at least one principal component, and the preset confidence level; determining data of the at least one principal component in an anomalous environment based on the data of the at least one principal component falling outside the confidence interval.
Optionally, in an embodiment, the at least one principal component includes a first principal component and a second principal component, the first principal component is a principal component with a largest eigenvalue among the plurality of principal components, and the second principal component is a principal component with a second rank of eigenvalue among the plurality of principal components. And the first determining module 1003 may be configured to: drawing a two-dimensional scatter distribution diagram of the data of the first principal component and the data of the second principal component, wherein the two-dimensional scatter distribution diagram takes the first principal component and the second principal component as coordinate axes of a Cartesian coordinate system; determining a long axis of a confidence ellipse based on the eigenvalue of the first principal component, the degree of freedom of the data of the first principal component and the preset confidence level, and determining a short axis of the confidence ellipse based on the eigenvalue of the second principal component, the degree of freedom of the data of the second principal component and the preset confidence level, wherein the confidence ellipse is used for representing the confidence interval; and drawing the confidence ellipse by taking an origin as a center in the two-dimensional scatter distribution diagram based on the long axis and the short axis.
And the first determining module 1003 may be configured to: and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the points which are positioned outside the confidence ellipse in the two-dimensional scatter point distribution diagram.
More specifically, the first determining module 1003 may be configured to: selecting one point from points outside the confidence ellipse in any quadrant of the two-dimensional scatter distribution map as a data point in the abnormal environment; and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the coordinate values of the data points in the Cartesian coordinate system.
Alternatively, the first determining module 1003 may be configured to: taking the centroid of a point in any quadrant of the two-dimensional scatter distribution map, which is located outside the confidence ellipse, as a data point in the abnormal environment;
and determining the data of the first principal component and the data of the second principal component under the abnormal environment based on the coordinate values of the data points in the Cartesian coordinate system.
A generating module 1004, configured to perform an inverse operation of the principal component analysis and an inverse operation of the preprocessing on the data of the at least one principal component in the abnormal environment in sequence, and generate predicted data of the multiple indexes in the abnormal environment.
In the index data generating apparatus 1000 under an abnormal environment according to this embodiment, the data of the plurality of principal components obtained by principal component analysis of the target data of the plurality of indices can reflect the relationship between the history data of the plurality of indices, and therefore, when the index data under an abnormal environment is predicted by the principal component analysis, the relationship between different indices can be considered, the reliability and the rationality of the predicted data of the plurality of indices can be improved, and when a pressure test is performed by using the index data, the obtained pressure test result can be more accurate.
Fig. 11 is a schematic structural diagram of the index data generating apparatus 1000 under an abnormal environment according to another embodiment of the present specification, and as shown in fig. 11, the index data generating apparatus 1000 under an abnormal environment includes: the preprocessing module 1001, the principal component analyzing module 1002, the first determining module 1003, and the generating module 1004 may further include: a second determination module 1005, a third determination module 1006, and a fourth determination module 1007.
A second determining module 1005, configured to determine, before the generating module 1004 is triggered, data of remaining principal components corresponding to a point in a quadrant where the data point is located, where the remaining principal components are principal components of the plurality of principal components except for the first principal component and the second principal component.
A third determining module 1006, configured to determine a normal distribution graph of the data of the remaining principal components based on a standard deviation of the data of the remaining principal components.
A fourth determining module 1007, configured to determine data of the remaining principal components in an abnormal environment based on the normal distribution map, and trigger the generating module 1004.
In addition, the generating module 1004 is specifically configured to: and sequentially performing the inverse operation of the principal component analysis and the inverse operation of the preprocessing on the data of the first principal component, the data of the second principal component and the data of the other principal components in the abnormal environment to generate the prediction data of the indexes in the abnormal environment.
The index data determination apparatus 1000 under an abnormal environment according to the embodiment shown in fig. 11 merges the data of the remaining principal components in addition to the data of the first principal component and the data of the second principal component under an abnormal environment, and inversely predicts the data of the plurality of indexes, and therefore, the reliability and the rationality of the predicted data of the plurality of indexes can be further improved.
It should be noted that the index data generation apparatus 1000 in an abnormal environment can implement the method in the embodiment of the method in fig. 1, and specifically, reference may be made to the index data generation method in an abnormal environment in the embodiment shown in fig. 1, which is not described again.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.