CN113792749A - Time series data abnormity detection method, device, equipment and storage medium - Google Patents

Time series data abnormity detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN113792749A
CN113792749A CN202011282307.7A CN202011282307A CN113792749A CN 113792749 A CN113792749 A CN 113792749A CN 202011282307 A CN202011282307 A CN 202011282307A CN 113792749 A CN113792749 A CN 113792749A
Authority
CN
China
Prior art keywords
neighbor
time point
time
index data
data corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011282307.7A
Other languages
Chinese (zh)
Inventor
李婷
张钧波
郑宇�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong City Beijing Digital Technology Co Ltd
Original Assignee
Jingdong City Beijing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong City Beijing Digital Technology Co Ltd filed Critical Jingdong City Beijing Digital Technology Co Ltd
Priority to CN202011282307.7A priority Critical patent/CN113792749A/en
Publication of CN113792749A publication Critical patent/CN113792749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a time series data abnormity detection method, a time series data abnormity detection device, time series data abnormity detection equipment and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: acquiring time series data, wherein the time series data are sequences of index data corresponding to each time point in continuous time points; acquiring a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for expressing the relevance between the index data corresponding to each neighbor time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point; performing dimensionality reduction processing on a plurality of index data corresponding to a plurality of neighboring time points based on neighboring relevance characteristics to obtain associated neighboring data corresponding to each time point; and dividing the plurality of associated adjacent data corresponding to the continuous time points to determine the time point of the index data abnormity from the continuous time points. The method improves the accuracy of the abnormal detection of various index data with different neighbor relevance.

Description

Time series data abnormity detection method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of data processing, and in particular relates to a method, a device, equipment and a readable storage medium for detecting time series data abnormity.
Background
In real life, time series indexes with complex association can be obtained through statistics, each time point corresponds to one index or one group of index data, and sample data between the time points do not have necessary connection. Economic indicators such as Gross Domestic Product (GDP) in a certain region are influenced by a plurality of factors, and the indicators in the historical years have certain influence on the indicators in the current year. When some decisions are made on the development of the region, indexes related to the development condition of the region in historical years, such as GDP, population, employment number and the like, are analyzed, abnormal detection is carried out on index data, the historical time point of the abnormal condition is detected, and the reason of the abnormal condition is analyzed, so that accurate decision support is provided.
In general, there are many types of indices for detecting an abnormality, and the case of each index is greatly different. For example, the economic indicators include macroscopic total amount of social consumption products, each part of residential consumption structure, fixed asset investment of each industry, export amount of each type of commodity and the like, the total amount of categories is hundreds of thousands, and the difference between different types of indicators is large. In the related technology, a unified model algorithm is usually adopted when anomaly detection is performed on different time series indexes, and the accuracy of anomaly detection on multiple types of indexes by using the unified model is low.
As described above, how to provide the accuracy of abnormality detection for various types of time series indicators is an urgent problem to be solved.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a time series data abnormity detection method, a time series data abnormity detection device, time series data abnormity detection equipment and a readable storage medium, which improve the abnormity detection accuracy of multiple types of time series indexes at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a time-series data abnormality detection method including: acquiring time sequence data, wherein the time sequence data are sequences of index data corresponding to each time point in continuous time points; acquiring a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for representing relevance between the index data corresponding to each time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point; performing dimensionality reduction processing on a plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance characteristics to obtain relevant neighboring data corresponding to each time point; dividing the plurality of associated neighbor data corresponding to the successive time points to determine a time point of index data abnormality from the successive time points.
According to an embodiment of the present disclosure, the time-series data includes index data of a plurality of similar regions corresponding to the respective time points; the acquiring time-series data includes: acquiring index data of a first preset area corresponding to each time point; acquiring index data of a second preset area corresponding to each time point; when the similarity between the index data of the first preset area and the index data of the second preset area is larger than a preset threshold value, obtaining the plurality of similar areas, wherein the plurality of similar areas comprise the first preset area and the second preset area.
According to an embodiment of the present disclosure, the neighbor association feature includes a first dimension and a second dimension, the first dimension is a neighbor time point sequence, and the second dimension is an information gain sequence corresponding to the neighbor time point sequence; the obtaining of the neighbor correlation characteristics according to the index data corresponding to each time point includes: acquiring index data corresponding to each neighboring time point of each time point; and calculating the information gain sequence corresponding to the adjacent time point sequence based on the index data corresponding to each time point and the index data corresponding to each adjacent time point of each time point.
According to an embodiment of the present disclosure, the performing, based on the neighbor relevance feature, a dimension reduction process on the plurality of index data corresponding to the plurality of neighbor time points includes: determining associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance features; determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point; and reducing the dimension of the plurality of index data corresponding to the plurality of adjacent time points to the dimension of the reduced index data by a principal component analysis method.
According to an embodiment of the present disclosure, the determining the dimensionality of the reduced metric data according to the associated neighboring time points of the respective time points includes: and determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point based on the adjacent relevance characteristics.
According to an embodiment of the present disclosure, the performing, based on the neighbor relevance feature, a dimension reduction process on the plurality of index data corresponding to the plurality of neighbor time points includes: determining associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance features; the obtaining the associated neighbor data corresponding to each time point includes: and obtaining the index data corresponding to the associated neighbor time points of each time point as the associated neighbor data after dimension reduction.
According to an embodiment of the present disclosure, the dividing the plurality of associated neighboring data corresponding to the consecutive time points to determine a time point of index data abnormality from the consecutive time points includes: obtaining an isolated tree according to the plurality of associated neighbor data corresponding to the continuous time points; respectively obtaining abnormal values of the associated adjacent data corresponding to the continuous time points on the basis of the isolated tree; and obtaining a time point corresponding to the correlation neighbor data of which the abnormal value is greater than the preset threshold value as a time point of abnormal index data.
According to still another aspect of the present disclosure, there is provided a time-series data abnormality detection apparatus including: the data acquisition module is used for acquiring time series data, and the time series data are sequences of index data corresponding to each time point in continuous time points; the relevance feature extraction module is used for obtaining a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for representing relevance between the index data corresponding to each time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point; the index dimension reduction module is used for carrying out dimension reduction processing on a plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance characteristics to obtain relevant neighboring data corresponding to each time point; and the anomaly detection module is used for dividing the plurality of associated adjacent data corresponding to the continuous time points so as to determine the time point of the anomaly of the index data from the continuous time points.
According to an embodiment of the present disclosure, the time-series data includes index data of a plurality of similar regions corresponding to the respective time points; the data acquisition module is further used for acquiring index data of the first predetermined area corresponding to each time point; acquiring index data of a second preset area corresponding to each time point; the data acquisition module further comprises a similar region aggregation module, configured to obtain the multiple similar regions when a similarity between the index data of the first predetermined region and the index data of the second predetermined region is greater than a preset threshold, where the multiple similar regions include the first predetermined region and the second predetermined region.
According to an embodiment of the present disclosure, the neighbor association feature includes a first dimension and a second dimension, the first dimension is a neighbor time point sequence, and the second dimension is an information gain sequence corresponding to the neighbor time point sequence; the relevance feature extraction module is further to: acquiring index data corresponding to each neighboring time point of each time point; and calculating the information gain sequence corresponding to the adjacent time point sequence based on the index data corresponding to each time point and the index data corresponding to each adjacent time point of each time point.
According to an embodiment of the present disclosure, the index dimension reduction module is further configured to: determining associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance features; determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point; and reducing the dimension of the plurality of index data corresponding to the plurality of adjacent time points to the dimension of the reduced index data by a principal component analysis method.
According to an embodiment of the present disclosure, the index dimension reduction module is further configured to determine a dimension of the index data after dimension reduction according to the associated neighboring time point of each time point based on the neighboring relevance feature.
According to an embodiment of the disclosure, the metric dimension reduction module is further configured to determine associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance feature; and obtaining the index data corresponding to the associated neighbor time points of each time point as the associated neighbor data after dimension reduction.
According to an embodiment of the present disclosure, the anomaly detection module is further configured to: obtaining an isolated tree according to the plurality of associated neighbor data corresponding to the continuous time points; respectively obtaining abnormal values of the associated adjacent data corresponding to the continuous time points on the basis of the isolated tree; and obtaining a time point corresponding to the correlation neighbor data of which the abnormal value is greater than the preset threshold value as a time point of abnormal index data.
According to yet another aspect of the present disclosure, there is provided an apparatus comprising: a memory, a processor and executable instructions stored in the memory and executable in the processor, the processor implementing any of the methods described above when executing the executable instructions.
According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement any of the methods described above.
According to the time series data anomaly detection method provided by the embodiment of the disclosure, the neighbor relevance characteristics are obtained according to the index data corresponding to each time point, the dimension reduction processing is carried out on the multiple index data corresponding to the multiple neighbor time points on the basis of the neighbor relevance characteristics, the relevant neighbor data corresponding to each time point is obtained, the multiple relevant neighbor data corresponding to the continuous time points are divided to determine the time point of the anomaly of the index data from the continuous time points, and therefore the neighbor time point index data with strong index relevance with each time point can be screened as the anomaly detection object, and the anomaly detection accuracy of the multiple index data with different neighbor relevance is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a schematic diagram of a system architecture in an embodiment of the disclosure.
Fig. 2 shows a flowchart of a time series data anomaly detection method in an embodiment of the present disclosure.
FIG. 3A is a flow diagram illustrating a method for region aggregation to obtain anomaly detection data, according to an example embodiment.
FIG. 3B illustrates a financial sector production total similarity thermodynamic diagram in accordance with an exemplary embodiment.
Fig. 3C shows a curve of tax revenue target from 2000 to 2018 meeshan city enterprises.
Figure 3D shows a tax revenue index curve from 2000 years to 2018 dike state business.
FIG. 3E shows a tax revenue index curve from 2000 to 2018 for a Tunning City business.
Figure 3F shows a tax revenue index curve from 2000 years to 2018 mianyang enterprises.
FIG. 3G illustrates a four city business income tax index similarity cluster diagram.
FIG. 4A is a flow diagram illustrating a feature dimension reduction method for anomaly detection in accordance with an exemplary embodiment.
FIG. 4B illustrates a diagram of left and right subtree partitioning, according to an embodiment.
Fig. 4C illustrates an importance histogram for the real estate industry for nearly 5 years, according to one embodiment.
FIG. 4D illustrates an importance histogram for the building industry for nearly 5 years, according to one embodiment.
FIG. 4E shows a reduced-dimension scatter plot of total real estate value neighbor point-in-time data according to FIG. 4C.
FIG. 5 is a flow diagram illustrating another feature dimension reduction method in accordance with an exemplary embodiment.
Fig. 6A is a flow chart illustrating a method of anomaly determination according to an exemplary embodiment.
Fig. 6B shows a schematic diagram of a sample cutting process, according to an embodiment.
FIG. 6C illustrates another sample cutting process schematic according to an embodiment.
FIG. 6D illustrates a schematic diagram of an orphan tree, according to one embodiment.
FIG. 6E illustrates a graph of the number of professional teachers in higher general schools, Sichuan province over time and a corresponding abnormal score graph, according to one embodiment.
Fig. 7 is a block diagram illustrating a time-series data abnormality detecting apparatus according to an exemplary embodiment.
Fig. 8 is a block diagram illustrating another time-series data abnormality detecting apparatus according to an exemplary embodiment.
Fig. 9 shows a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, apparatus, steps, etc. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the present disclosure, unless otherwise expressly specified or limited, the terms "connected" and the like are to be construed broadly, e.g., as meaning electrically connected or in communication with each other; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.
As described above, there is no necessary connection between sample data at various time points of some kind of indicators, the indicators in the past year have a certain influence on the indicators in the current year, and the global development trend needs to be considered when analyzing the indicators in the past year for anomaly detection. The influence of different types of indexes on the historical years of the current year index may be different, so that the accuracy of anomaly detection by adopting the unified model is lower. Therefore, the present disclosure provides a time series data anomaly detection method, which obtains a neighbor relevance feature according to index data corresponding to each time point, performs dimension reduction processing on a plurality of index data corresponding to a plurality of neighbor time points based on the neighbor relevance feature to obtain associated neighbor data corresponding to each time point, and divides the associated neighbor data corresponding to the continuous time points to determine a time point of an index data anomaly from the continuous time points, so that neighbor time point index data having a strong index relevance with each time point can be screened as an anomaly detection object, thereby improving the accuracy of anomaly detection on a plurality of index data having different neighbor relevance.
Fig. 1 illustrates an exemplary system architecture 10 to which the time-series data anomaly detection method or the time-series data anomaly detection apparatus of the present disclosure can be applied.
As shown in fig. 1, system architecture 10 may include a terminal device 102, a network 104, a server 106, and a database 108. The terminal device 102 may be a variety of electronic devices having a display screen and supporting input, output, including but not limited to smart phones, tablets, laptop portable computers, desktop computers, wearable devices, virtual reality devices, smart homes, and the like. Network 104 is the medium used to provide communication links between terminal device 102 and server 106. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 106 may be a server or a cluster of servers, etc. that provide various services. The database 108 may be a large database software installed on a server or a small database software installed on a computer for storing and managing data.
A user may use terminal device 102 to interact with server 106 and database 108 via network 104 to receive or transmit data and the like. For example, the user imports a list of metric data at the terminal device 102, uploads the metric data to the server 106 via the network 104 for anomaly analysis, or uploads the metric data to the database 108 via the network 104 for storage. For another example, the user obtains the same-class index data of a plurality of regions from the database 108 through the network 104, and performs processing on the terminal device 102 to obtain similar regions.
Data may also be received from database 108 or sent to database 108, etc. at server 106 via network 104. For example, the server 106 may be a background processing server for obtaining the index data to be subjected to the anomaly detection from the database 108 through the network 104. For another example, the server 106 may be configured to acquire similar index data of multiple regions from the database 108 through the network 104, perform regional aggregation, and transmit the aggregated index data to the database 108 through the network 104 for storage.
It should be understood that the number of terminal devices, networks, servers, and databases in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, and databases, as desired for implementation.
Fig. 2 is a flowchart illustrating a time-series data anomaly detection method according to an exemplary embodiment. The method shown in fig. 2 may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 2, a method 20 provided by an embodiment of the present disclosure may include the following steps.
In step S202, time-series data, which is a series of index data corresponding to each of the successive time points, is acquired. The consecutive time points may be consecutive years, and the sequence of the index data may be index data corresponding to each of the consecutive years, for example, the number of teachers in full vocational lessons of the general higher schools in the Sichuan province in each year, for 50 years from 1970 to 2019. The successive time points may also be successive months, quarters, half years, etc., e.g., a sequence of index data for real estate fixed asset investment amounts for each quarter continuing from 2019 in 2000.
In some embodiments, for example, when index data at a time point of year is counted, if there are only some recent index data, the amount of time-series data of one area to be subjected to abnormality detection is small, such as economic-related indexes of areas: GDP, graduation people and the like, some indexes may begin to be counted after 2000 years, the historical value of a single index is very few, the historical reference value of a region is only 20, and data related to the year may be data of nearly three years, so that the amount of samples obtained for modeling of time sequence features and dividing of abnormal points is small, and the accuracy of abnormal detection is reduced. The data of the regions with similar index data may be aggregated when acquiring the time series data, and specific embodiments may refer to fig. 3A to 3G, which are not described in detail herein.
In step S204, a neighborhood relevance feature indicating relevance between index data corresponding to each of a plurality of neighborhood time points at each time point and index data corresponding to each time point is obtained from the index data corresponding to each time point. The index timing anomaly is generally defined as that a point with a large difference from a recent index value may be an anomaly, and therefore, index data of a neighboring time point of the index data of each time point can be obtained first to extract a neighboring relevance feature, for example, data of 1-5 years before each time point can be extracted for extracting the neighboring relevance feature.
In some embodiments, for example, a correlation (importance) between the index of the neighboring time point and the index of the current time point may be calculated using a kini coefficient, and then a neighboring correlation feature is obtained according to the neighboring correlation.
In some embodiments, for example, the relevance (importance) between the index of the neighboring time point and the index of the current time point can be measured by the information gain, and then the neighboring relevance feature is obtained according to the neighboring relevance, and the detailed description can refer to fig. 4A to 4D, which is not described in detail herein.
In step S206, dimension reduction processing is performed on the plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance feature, and relevant neighboring data corresponding to each time point is obtained. After the relevance features are extracted, neighboring time points with large influence on the indexes of each time point, namely, neighboring time points with strong relevance (or more important), can be obtained, the index data of the neighboring time points can be subjected to dimension reduction processing, the N-dimensional index data of N (N is a positive integer greater than 2) neighboring time points corresponding to each time point can be reduced to 2 dimensions, and 2-dimensional relevance neighboring data corresponding to each time point can be obtained.
In step S208, a plurality of associated neighboring data corresponding to the continuous time points are divided to determine a time point at which the index data is abnormal from among the continuous time points. After obtaining the associated neighboring data points corresponding to each time point, the data points can be divided for anomaly detection.
In some embodiments, for example, the associated neighbor data points may be partitioned by using an isolated forest as an anomaly detection model, and specific embodiments may refer to fig. 6A to 6D, which are not described in detail herein.
In other embodiments, a probability distribution model may be constructed, for example, based on statistical methods, and the probability that the 2-dimensional features of each data point conform to the model is calculated, and objects with low probability are considered outliers, such as the RobustScaler method in feature engineering, etc. For example, the abnormal point detection may be performed based on a clustering method, if it is found that the data sample size of some cluster is less than that of other clusters after clustering, and the value of the characteristic mean distribution of the data in the cluster is greatly different from that of other clusters, the sample points in the clusters may be regarded as abnormal points, such as a BIRCH clustering algorithm, a DBSCAN density clustering algorithm, and the like.
According to the time series data anomaly detection method provided by the embodiment of the disclosure, the neighbor relevance characteristics are obtained according to the index data corresponding to each time point, the dimension reduction processing is carried out on the multiple index data corresponding to the multiple neighbor time points on the basis of the neighbor relevance characteristics, the relevant neighbor data corresponding to each time point is obtained, the multiple relevant neighbor data corresponding to the continuous time points are divided to determine the time point of the anomaly of the index data from the continuous time points, and therefore the neighbor time point index data with strong index relevance with each time point can be screened as the target of anomaly detection, and the accuracy of anomaly detection of the multiple index data with different neighbor relevance is improved.
FIG. 3A is a flow diagram illustrating a method for region aggregation to obtain anomaly detection data, according to an example embodiment. FIG. 3A may be taken as a processing procedure in one embodiment of step S202 shown in FIG. 2. When index data at a time point of year is counted, historical index data of an area to be subjected to abnormality detection may be less, and data of similar areas may be aggregated by the embodiments of fig. 3A to 3G to expand the sample size. The method shown in fig. 3A may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 3A, a method 30 provided by an embodiment of the present disclosure may include the following steps.
In step S302, index data of a first predetermined region corresponding to each time point is acquired.
In step S304, index data of the second predetermined region corresponding to each time point is acquired. The purpose of the region aggregation is to assist the target region in abnormality detection based on a small amount of data by means of the history of similar regions. The first predetermined area and the second predetermined area may be two provinces, cities, counties and the like with similar situations, for example, when the economic index data of the Sichuan province is detected to be abnormal, the total production value of the financial industry of the Sichuan province only comprises the data of each year in nearly 20 years, and the decision can be assisted by the same index data of the city formed by the Sichuan province or the same index data of the Chongqing city with similar economic conditions.
In step S306, when the similarity between the index data of the first predetermined area and the index data of the second predetermined area is greater than a preset threshold, a plurality of similar areas are obtained, and the plurality of similar areas include the first predetermined area and the second predetermined area.
In some embodiments, cosine (cosine) similarity may be used to measure the similarity of historical indicators for different regions, for example. If the first predetermined area is represented as i, the index sequence of the first predetermined area is represented as a vector xiThe second predetermined area is denoted j and the index sequence of the second predetermined area is denoted as vector xjWherein the time points corresponding to the indexes of the first predetermined area and the indexes of the second predetermined area are the same, that is, the vector lengths of the index sequence of the first predetermined area and the index sequence of the second predetermined area are the same, thenSimilarity S between the first predetermined area and the second predetermined areai,jCan be calculated from the following formula:
Figure BDA0002781173950000111
wherein,
Figure BDA0002781173950000112
represents the inner product of region i and region j, | xiI represents index vector x of region iiTwo normal forms of (1) | | xjThe index vector x of the region j is represented by | |jTaking the binary expression of the region i as an example, the calculation formula is:
Figure BDA0002781173950000113
in the formula, k represents a vector xiK is a positive integer.
After the cosine similarity between indexes of the areas is obtained through calculation, a thermodynamic diagram of the area similarity can be obtained, so that the similarity of the indexes between the areas can be displayed intuitively. For example, FIG. 3B illustrates a financial industry regional production total similarity thermodynamic diagram. As shown in fig. 3B, cosine similarity scores of the index of the total production value of the financial area between each two cities in the le shan city, the neighuan city, the south filling city, the sichuan province, the yibin city, the deyang city, the metropolis city, the luzhou city, the mianyang city and the seudong city are calculated respectively, and the scores are displayed in a thermodynamic diagram form, so that the metropolis city and the deyang city are most similar to the sichuan province in the total production value of the financial industry. The Chengdu city is the meeting of Sichuan province, and the financial production index of Chengdu directly affects the index of Sichuan province.
In other embodiments, for example, the similarity matrix of the region index may also be used for clustering, and the similar region of the predetermined region is obtained through a clustering effect. As shown in fig. 3C-3G, fig. 3C shows a curve of tax income indicators obtained from 2000 years to 2018 parts of the enterprise in the eyebrow mountain city, fig. 3D shows a curve of tax income indicators obtained from 2000 years to 2018 parts of the enterprise in the daphne city, fig. 3E shows a curve of tax income indicators obtained from 2000 years to 2018 parts of the enterprise in the tunnel city, and fig. 3F shows a curve of tax income indicators obtained from 2000 years to 2018 parts of the enterprise in the sheep mountain city, and it can be seen from the graphs that the fluctuation laws of the sheep yang and the tunnel are similar, and the fluctuation laws of the eyebrow mountain and the daphne state are similar; FIG. 3G shows a similarity clustering chart of tax revenue targets obtained by four-city enterprises, where it can be seen from the clustering effect that Meishan is closest to Dazhou, Mianyang is closest to tunnelling.
In step S308, index data of a plurality of similar regions corresponding to respective time points is acquired. The purpose of the region aggregation is to assist the target region in abnormality detection based on a small amount of data by means of the history of similar regions. For example, for the small data amount of the Sichuan province, the Sichuan province can be used as an urban city and a Chongqing city with similar economic conditions can be used for assisting in decision making.
According to the region aggregation method provided by the embodiment of the disclosure, when the historical data of each region is less, the similarity regions are aggregated according to the curves of the historical indexes, and the similar region is used for assisting modeling, so that the problem of small data volume of the target region is effectively solved, and the accuracy of subsequent abnormal detection is improved.
FIG. 4A is a flow diagram illustrating a feature dimension reduction method for anomaly detection in accordance with an exemplary embodiment. Fig. 4A may be taken as a processing procedure of step S206 shown in fig. 2 in one embodiment. The method shown in fig. 4A may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 4A, a method 40 provided by an embodiment of the present disclosure may include the following steps.
In step S402, index data corresponding to each neighboring time point of each time point is acquired. The index timing anomaly is generally defined as that a point with a large difference from the recent index value may be an anomaly, for example, about 1-5 years in each year is taken as a neighboring time point, and whether the relevance between the data about 5 years in each time point and the time point is abnormally changed or not is judged.
In step S404, an information gain sequence corresponding to the neighboring time point sequence is calculated based on the index data corresponding to each time point and the index data corresponding to each neighboring time point of each time point. The correlation between a neighboring time point and the corresponding time point data can be measured by an information gain.
In some embodiments, for example, a neighboring time point, such as approximately 1-5 years per year, is represented as cp∈{c0,c1,c2,c3,c4P ∈ {0, 1, 2, 3, 4}, then each c can be calculated by the following equationpCorresponding information gain Ip:,
Figure BDA0002781173950000131
In the formula ykRepresents the k index value in the index sequence, Q represents the index sequence data set,
Figure BDA0002781173950000132
neighboring time points c representing respective time periodspThe average of the index values of (a),
Figure BDA0002781173950000133
in
Figure BDA0002781173950000134
And
Figure BDA0002781173950000135
respectively represent index sequence data according to
Figure BDA0002781173950000136
Left and right subtree data sets, y, of the partitionk′For data in the left sub-tree dataset or the right sub-tree dataset,
Figure BDA0002781173950000137
is the average of the data in the left or right sub-tree data set. FIG. 4B illustrates aAnd (5) a schematic diagram of division of left and right subtrees. As shown in FIG. 4B, in
Figure BDA0002781173950000138
When the temperature of the water is higher than the set temperature,
Figure BDA0002781173950000139
data y in (1)1To y6All to one ratio
Figure BDA00027811739500001310
The size of the product is small, and the product is small,
Figure BDA00027811739500001311
data y in (1)1To y6All to one ratio
Figure BDA00027811739500001312
Is large. According to the calculation method of the information gain of the formula (3), the relevance between each neighboring time point and the corresponding time point, that is, the importance of each neighboring time point can be obtained. An importance histogram for the real estate industry for nearly 5 years is shown in FIG. 4C, according to one embodiment. An importance histogram for the building industry for nearly 5 years is shown in FIG. 4D, according to one embodiment. As shown in fig. 4C, for an index of the total value of real estate production, values in the last three years have the greatest contribution to the current value, and the longer the time interval, the faster the feature importance decreases. The important characteristics of different economic indicators are different, as shown in fig. 4D, for the construction industry, the influence is greatest in the last 1 year and the last 5 years, which indicates that the construction industry has periodicity, and the influence of the periodic characteristics is relatively large. If the aggregation data can be obtained by performing the regional aggregation according to the method shown in fig. 3A, the total number k of the indexes in the corresponding equation (3) should be the number of indexes in one region multiplied by the number of aggregation regions.
In step S406, associated neighbor time points for each time point are determined from the plurality of neighbor time points based on the neighbor association feature. The neighbor association feature comprises a first dimension and a second dimension, wherein the first dimension is a neighbor time point sequence, and the second dimension is an information gain sequence corresponding to the neighbor time point sequence. Taking FIG. 4C as an example, C is selected for nearly three years0,c1,c2For associating neighboring time points, pairCorresponding information gain sequence { I0,I1,I2Is {0.7 × 10 }6,1.18×106,1.35×106}。
In step S408, the dimensionality of the reduced index data is determined based on the associated neighboring time points of each time point. And determining the dimensionality of the reduced index data according to the correlated adjacent time points of each time point based on the adjacent correlation characteristic.
In some embodiments, for example, if the sequence length of the neighboring relevance feature is 3, that is, the time points of the neighboring relevance are 3, the index data after dimensionality reduction can be determined to be 3-dimensional, and the specific implementation can refer to fig. 5, which is not described in detail herein.
In some embodiments, for example, if the principal component analysis method is used for dimensionality reduction, the index data after dimensionality reduction may be determined to be 2-dimensional, and then dimensionality reduction operations are performed on the index data (e.g., 3-dimensional index data) associated with the neighboring time points.
In step S410, a plurality of index data corresponding to the plurality of neighboring time points are dimensionality-reduced to a dimensionality-reduced index data dimensionality by a principal component analysis method. For example, the index data of the associated neighboring time point may be converted into 2-dimensional data through linear transformation, and the meaning of the two dimensions after transformation is independent of the meaning of the index itself, and is used to represent the association between the associated neighboring time point and the index data of the corresponding time point.
In some embodiments, for example, the reduced-dimension data may be visualized in a two-dimensional coordinate system to obtain a two-dimensional data scatter plot. FIG. 4E shows a reduced-dimension scatter plot of total real estate value neighbor point-in-time data according to FIG. 4C. As shown in fig. 4E, the total real estate value of a region is abnormal except for some time points of a few regions, and the total real estate value shows an aggregated cluster structure, which indicates that the real estate rules are similar for most of the regions. Points that are further away from most of the data points can be obtained from the scatter plot as outliers.
According to the neighbor time point data dimension reduction method provided by the embodiment of the disclosure, relevance characteristics are measured through information gain, more important neighbor time points are screened, relevance among characteristics is comprehensively considered, relevant characteristic integration is carried out, noise characteristics are eliminated, key characteristics are reserved, and accuracy of anomaly detection is improved.
FIG. 5 is a flow diagram illustrating another feature dimension reduction method in accordance with an exemplary embodiment. Fig. 5 may be a processing procedure in another embodiment as step S206 shown in fig. 2. The method shown in fig. 5 may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Step S502, the related neighbor time point of each time point is determined from a plurality of neighbor time points based on the neighbor relevance characteristics. Taking FIG. 4D as an example, the last year c is selected0And the fifth year before c4To correlate neighboring time points, the corresponding information gain sequence { I0,I4Is {5.8 × 10 }6,1.5×106,1.35×106}。
Step S504, the index data corresponding to the associated neighbor time points of each time point is obtained as the associated neighbor data after the dimensionality reduction. Taking FIG. 4D as an example, the last year c can be directly selected0And the fifth year before c4The total production value index data of the construction industry area is used as 2-dimensional data after dimension reduction, and can also be visually displayed in a two-dimensional coordinate system to obtain a two-dimensional data scatter diagram so as to find out abnormal data points.
Fig. 6A is a flow chart illustrating a method of anomaly determination according to an exemplary embodiment. FIG. 6A may be implemented as a process of step S308 shown in FIG. 3 in one embodiment. The method shown in fig. 6A may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 6A, a method 60 provided by embodiments of the present disclosure may include the following steps.
Step S602, obtaining the isolated tree according to the plurality of associated neighbor data corresponding to the continuous time points. The associated neighbor data after dimension reduction can be described by an isolated forest. Randomly selecting n sample points from associated neighbor data as root nodes of an isolated tree, randomly assigning a dimension (from two dimensions), and randomly generating a cut point within the data range of the current root node, wherein the cut point is generated between the maximum value and the minimum value of the assigned dimension in the data of the current node; the selection of the cutting point generates a hyperplane, the data space of the current root node is divided into 2 subspaces, the point smaller than the cutting point under the specified dimensionality is placed in the left branch of the current root node, and the point larger than the cutting point is placed in the right branch of the current root node; and continuously constructing new leaf nodes in two steps before recursion of the left branch node and the right branch node of the current root node until only one piece of data (which can not be cut any more) is arranged on the leaf nodes or the tree grows to the set height.
In some embodiments, for example, fig. 6B shows a sample cutting process schematic according to an embodiment and fig. 6C shows another sample cutting process schematic according to an embodiment, as shown in fig. 6B, 6C, compared to ziPoint, z0The dots can be divided in fewer steps. FIG. 6D is a diagram illustrating an isolated tree, as shown in FIG. 6D, where x and y represent values in two dimensions, respectively, and where a (8.7,9.2) node in the tree may correspond to z0And (4) point.
Step S604, abnormal values of each associated neighbor data corresponding to the continuous time points are respectively obtained based on the isolated tree. After the isolated tree is constructed, only prediction can be carried out on each relevant adjacent data, namely, the leaf node where the data falls can be seen. The degree of anomaly at each sample point can be measured using the average path length, which is the number of edges that the root node to the leaf node of the isolated tree passes through. The anomaly score u (x, n) for a sample point (x, y) of the n sample points can be calculated by:
Figure BDA0002781173950000161
Figure BDA0002781173950000162
where h (x) is a path length representing a total required path length from a root node of the isolated tree to a leaf node x, and E (h (x)) represents that associated neighbor data is sampled multiple times to obtain a path length average of the leaf nodes x corresponding to the plurality of isolated trees. q (n) represents the mean of the path lengths when the number of sampling samples is n, and is used for normalizing the path length h (x) of the root node sample x. H (n-1) is a harmonic number, and when n is determined, n is a fixed value.
Step S606, the time point corresponding to the adjacent data with the abnormal value larger than the preset threshold value is obtained as the time point of the abnormal index data.
In some embodiments, for example, fig. 6E shows a graph of the number of teachers specializing in higher schools in the state of sichuan province over time and a corresponding abnormality score graph according to an embodiment, as shown in fig. 6E, the horizontal axis represents the year, the vertical axis of the upper graph represents the number of teachers, and the vertical axis of the lower graph represents the abnormality score, and it can be seen that the time points of the sudden decrease and sudden increase of the number of teachers are possible abnormalities. The broken line parallel to the horizontal axis represents a preset threshold value, which can be obtained by calculation based on the abnormality score proportion. For example, the threshold set in fig. 6E is 3%, that is, the node 3% before the abnormality score, and it can be determined that the node is abnormal. Points A and B belong to sudden drop, point C belongs to sudden rise, the number of teachers is suddenly reduced or increased in the early stage of fluctuation rise or fall, and the situation can be considered as identified abnormal points.
According to the anomaly detection method provided by the embodiment of the disclosure, the anomaly score is calculated for the associated neighbor data points based on the isolated tree, and the time points which are possibly abnormal are screened out according to the anomaly score, so that the method is more accurate compared with a method for judging the anomaly by directly setting a threshold value in an index dimension.
Fig. 7 is a block diagram illustrating a time-series data abnormality detecting apparatus according to an exemplary embodiment. The apparatus shown in fig. 7 may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 7, the apparatus 70 provided in the embodiment of the present disclosure may include a data obtaining module 702, a relevance feature extracting module 704, an index dimension reducing module 706, and an anomaly detecting module 708.
The data obtaining module 702 may be configured to obtain time-series data, where the time-series data is a sequence of index data corresponding to each time point in consecutive time points.
The relevance feature extracting module 704 may be configured to obtain a neighboring relevance feature according to the index data corresponding to each time point, where the neighboring relevance feature is used to represent a relevance between the index data corresponding to each neighboring time point in the multiple neighboring time points of each time point and the index data corresponding to each time point.
The index dimension reduction module 706 is configured to perform dimension reduction processing on the plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance characteristics, and obtain relevant neighboring data corresponding to each time point.
The anomaly detection module 708 may be configured to divide the plurality of associated neighbor data corresponding to successive points in time to determine a point in time from the successive points in time at which the metric data is anomalous.
Fig. 8 is a block diagram illustrating another time-series data abnormality detecting apparatus according to an exemplary embodiment. The apparatus shown in fig. 8 may be applied to, for example, a server side of the system, and may also be applied to a terminal device of the system.
Referring to fig. 8, the apparatus 80 provided in the embodiment of the present disclosure may include a data obtaining module 802, a relevance feature extracting module 804, an index dimension reducing module 806, and an anomaly detecting module 808, where the data obtaining module 802 includes a similar region aggregating module 8022.
The data obtaining module 802 may be configured to obtain time-series data, where the time-series data is a sequence of index data corresponding to each time point in consecutive time points. The time-series data includes index data of a plurality of similar regions corresponding to the respective time points.
The data obtaining module 802 may also be configured to obtain index data of a first predetermined area corresponding to each time point; and acquiring index data of the second preset area corresponding to each time point.
The similar region aggregation module 8022 may further be configured to, when the similarity between the index data of the first predetermined region and the index data of the second predetermined region is greater than a preset threshold, obtain a plurality of similar regions, where the plurality of similar regions includes the first predetermined region and the second predetermined region.
The relevance feature extracting module 804 may be configured to obtain a neighboring relevance feature according to the index data corresponding to each time point, where the neighboring relevance feature is used to represent a relevance between the index data corresponding to each neighboring time point in the multiple neighboring time points of each time point and the index data corresponding to each time point. The neighbor association feature comprises a first dimension and a second dimension, wherein the first dimension is a neighbor time point sequence, and the second dimension is an information gain sequence corresponding to the neighbor time point sequence.
The relevance feature extraction module 804 may also be configured to obtain index data corresponding to each neighboring time point of each time point; and calculating an information gain sequence corresponding to the adjacent time point sequence based on the index data corresponding to each time point and the index data corresponding to each adjacent time point of each time point.
The index dimension reduction module 806 may be configured to perform dimension reduction processing on the plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance feature, so as to obtain relevant neighboring data corresponding to each time point.
The metric dimension reduction module 806 is further operable to determine an associated neighbor time point for each time point from the plurality of neighbor time points based on the neighbor association feature; determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point; and reducing the dimension of the plurality of index data corresponding to the plurality of adjacent time points to the dimension of the reduced index data by a principal component analysis method.
The metric dimension reduction module 806 can also be configured to determine a reduced-dimension metric data dimension from the associated neighbor time points for each time point based on the neighbor relevance feature.
The metric dimension reduction module 806 is further operable to determine an associated neighbor time point for each time point from the plurality of neighbor time points based on the neighbor association feature; and obtaining the index data corresponding to the associated neighbor time points of each time point as the associated neighbor data after the dimension reduction.
The anomaly detection module 808 can be configured to divide the plurality of associated neighbor data corresponding to the consecutive time points to determine a time point of anomaly of the index data from the consecutive time points.
The anomaly detection module 808 can also be configured to: obtaining an isolated tree according to a plurality of associated neighbor data corresponding to the continuous time points; respectively obtaining abnormal values of each associated neighbor data corresponding to the continuous time points based on the isolated tree; and obtaining a time point corresponding to the correlation neighbor data of which the abnormal value is greater than the preset threshold value as a time point of abnormal index data.
The specific implementation of each module in the apparatus provided in the embodiment of the present disclosure may refer to the content in the foregoing method, and is not described herein again.
Fig. 9 shows a schematic structural diagram of an electronic device in an embodiment of the present disclosure. It should be noted that the apparatus shown in fig. 9 is only an example of a computer system, and should not bring any limitation to the function and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 9, the apparatus 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the apparatus 900 are also stored. The CPU901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present disclosure are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a data acquisition module, a relevance feature extraction module, an index dimension reduction module and an anomaly detection module. The names of these modules do not in some cases constitute a limitation on the modules themselves, and for example, the data acquisition module may also be described as a "module that acquires time-series data from a connected database server".
As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring time series data, wherein the time series data are sequences of index data corresponding to each time point in continuous time points; acquiring a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for expressing the relevance between the index data corresponding to each neighbor time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point; performing dimensionality reduction processing on a plurality of index data corresponding to a plurality of neighboring time points based on neighboring relevance characteristics to obtain associated neighboring data corresponding to each time point; and dividing the plurality of associated adjacent data corresponding to the continuous time points to determine the time point of the index data abnormity from the continuous time points.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A time series data abnormality detection method is characterized by comprising the following steps:
acquiring time sequence data, wherein the time sequence data are sequences of index data corresponding to each time point in continuous time points;
acquiring a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for representing relevance between the index data corresponding to each time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point;
performing dimensionality reduction processing on a plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance characteristics to obtain relevant neighboring data corresponding to each time point;
dividing the plurality of associated neighbor data corresponding to the successive time points to determine a time point of index data abnormality from the successive time points.
2. The method according to claim 1, wherein the time-series data includes index data of a plurality of similar regions corresponding to the respective time points;
the acquiring time-series data includes:
acquiring index data of a first preset area corresponding to each time point;
acquiring index data of a second preset area corresponding to each time point;
when the similarity between the index data of the first preset area and the index data of the second preset area is larger than a preset threshold value, obtaining the plurality of similar areas, wherein the plurality of similar areas comprise the first preset area and the second preset area.
3. The method of claim 1, wherein the neighbor association features comprise a first dimension and a second dimension, the first dimension being a sequence of neighbor time points, the second dimension being a sequence of information gains corresponding to the sequence of neighbor time points;
the obtaining of the neighbor correlation characteristics according to the index data corresponding to each time point includes:
acquiring index data corresponding to each neighboring time point of each time point;
and calculating the information gain sequence corresponding to the adjacent time point sequence based on the index data corresponding to each time point and the index data corresponding to each adjacent time point of each time point.
4. The method of claim 1, wherein the dimension reduction processing of the plurality of metric data corresponding to the plurality of neighbor time points based on the neighbor relevance feature comprises:
determining associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance features;
determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point;
and reducing the dimension of the plurality of index data corresponding to the plurality of adjacent time points to the dimension of the reduced index data by a principal component analysis method.
5. The method of claim 4, wherein determining the dimensionality of the reduced dimensional metric data based on the associated neighbor time points for the respective time points comprises:
and determining the dimensionality of the reduced index data according to the associated adjacent time points of each time point based on the adjacent relevance characteristics.
6. The method of claim 1, wherein the dimension reduction processing of the plurality of metric data corresponding to the plurality of neighbor time points based on the neighbor relevance feature comprises:
determining associated neighbor time points for the respective time points from the plurality of neighbor time points based on the neighbor relevance features;
the obtaining the associated neighbor data corresponding to each time point includes:
and obtaining the index data corresponding to the associated neighbor time points of each time point as the associated neighbor data after dimension reduction.
7. The method according to any one of claims 1 to 6, wherein said dividing the plurality of associated neighbor data corresponding to the successive points in time to determine a point in time of index data anomaly from the successive points in time comprises:
obtaining an isolated tree according to the plurality of associated neighbor data corresponding to the continuous time points;
respectively obtaining abnormal values of the associated adjacent data corresponding to the continuous time points on the basis of the isolated tree;
and obtaining a time point corresponding to the correlation neighbor data of which the abnormal value is greater than the preset threshold value as a time point of abnormal index data.
8. A time-series data abnormality detection apparatus, characterized by comprising:
the data acquisition module is used for acquiring time series data, and the time series data are sequences of index data corresponding to each time point in continuous time points;
the relevance feature extraction module is used for obtaining a neighbor relevance feature according to the index data corresponding to each time point, wherein the neighbor relevance feature is used for representing relevance between the index data corresponding to each time point in a plurality of neighbor time points of each time point and the index data corresponding to each time point;
the index dimension reduction module is used for carrying out dimension reduction processing on a plurality of index data corresponding to the plurality of neighboring time points based on the neighboring relevance characteristics to obtain relevant neighboring data corresponding to each time point;
and the anomaly detection module is used for dividing the plurality of associated adjacent data corresponding to the continuous time points so as to determine the time point of the anomaly of the index data from the continuous time points.
9. An apparatus, comprising: memory, processor and executable instructions stored in the memory and executable in the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the executable instructions.
10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, implement the method of any one of claims 1-7.
CN202011282307.7A 2020-11-16 2020-11-16 Time series data abnormity detection method, device, equipment and storage medium Pending CN113792749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282307.7A CN113792749A (en) 2020-11-16 2020-11-16 Time series data abnormity detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282307.7A CN113792749A (en) 2020-11-16 2020-11-16 Time series data abnormity detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113792749A true CN113792749A (en) 2021-12-14

Family

ID=79181171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282307.7A Pending CN113792749A (en) 2020-11-16 2020-11-16 Time series data abnormity detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113792749A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576823A (en) * 2023-11-29 2024-02-20 上海徽视科技集团有限公司 Queuing and calling system terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576823A (en) * 2023-11-29 2024-02-20 上海徽视科技集团有限公司 Queuing and calling system terminal
CN117576823B (en) * 2023-11-29 2024-05-14 上海徽视科技集团有限公司 Queuing and calling system terminal

Similar Documents

Publication Publication Date Title
CN112148987B (en) Message pushing method based on target object activity and related equipment
CN111612041B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN111784528B (en) Abnormal community detection method and device, computer equipment and storage medium
CN111612039B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN112036476A (en) Data feature selection method and device based on two-classification service and computer equipment
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN110796159A (en) Power data classification method and system based on k-means algorithm
CN111369344A (en) Method and device for dynamically generating early warning rule
CN111353051A (en) K-means and Apriori-based algorithm maritime big data association analysis method
Gowtham Sethupathi et al. Efficient rainfall prediction and analysis using machine learning techniques
CN115114484A (en) Abnormal event detection method and device, computer equipment and storage medium
CN110751354B (en) Abnormal user detection method and device
CN114219664A (en) Product recommendation method and device, computer equipment and storage medium
CN113792749A (en) Time series data abnormity detection method, device, equipment and storage medium
CN113743453A (en) Population quantity prediction method based on random forest
CN116862658A (en) Credit evaluation method, apparatus, electronic device, medium and program product
CN115982654B (en) Node classification method and device based on self-supervision graph neural network
CN112860824B (en) Scale adaptability evaluation method for high-resolution DEM terrain feature extraction
CN115099875A (en) Data classification method based on decision tree model and related equipment
CN115545753A (en) Partner prediction method based on Bayesian algorithm and related equipment
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
CN111626887A (en) Social relationship evaluation method and device
CN118312657B (en) Knowledge base-based intelligent large model analysis recommendation system and method
CN118504775B (en) Urban planning method and system based on digital twinning
WO2023231184A1 (en) Feature screening method and apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination