CN116383273A - Time sequence dimension reduction representation method and system - Google Patents
Time sequence dimension reduction representation method and system Download PDFInfo
- Publication number
- CN116383273A CN116383273A CN202310352540.5A CN202310352540A CN116383273A CN 116383273 A CN116383273 A CN 116383273A CN 202310352540 A CN202310352540 A CN 202310352540A CN 116383273 A CN116383273 A CN 116383273A
- Authority
- CN
- China
- Prior art keywords
- point
- adjacent
- line segment
- simplified
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000009467 reduction Effects 0.000 title claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000001174 ascending effect Effects 0.000 claims description 6
- 230000006835 compression Effects 0.000 abstract description 5
- 238000007906 compression Methods 0.000 abstract description 5
- 230000002159 abnormal effect Effects 0.000 abstract description 2
- 101100394003 Butyrivibrio fibrisolvens end1 gene Proteins 0.000 description 4
- 239000004973 liquid crystal related substance Substances 0.000 description 4
- 230000000630 rising effect Effects 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Algebra (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Operations Research (AREA)
- Computing Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application discloses a time sequence dimension reduction representation method and a system, wherein the method comprises the following steps: traversing a time series, identifying top and bottom features of the time series; dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments; calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points; and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation. By means of the dimension reduction method, dimension reduction work on variable-length variable-amplitude time sequences can be completed under the condition that time sequences with a large number of drift, distortion, fluctuation, abnormal points, pulling up and compression exist and time sequence characteristics are effectively guaranteed.
Description
Technical Field
The application relates to the technical field of databases, in particular to a time sequence dimension reduction representation method and a system.
Background
Time series data is common in almost all human activities, including clinical medical vital sign recording equipment, real-time transaction data for financial stock futures, sales data for electronic commerce retail markets, astronomical observations, and real-time weather temperatures.
In recent years, with the popularization of emerging applications such as data centers and the internet of things, the scale of time-series data is also expanding. Many real-time applications produce tens or even hundreds of millions of time series data, with storage scales up to TB or PB.
Time series similarity queries. Time series similarity query is an important research direction in the field of time series mining. A time series similarity query refers to finding a set of target sequences that are most similar to a given time series on a set of time series data according to some similarity metric function. The time sequence similarity query is the basic pre-work of time sequence clustering, classification, anomaly detection and frequent pattern mining. The similarity queries of the time series can be divided into two major categories, full-sequence matching and sub-sequence matching. Wherein full sequence matching means that the searched time sequence has the same length as the target sequence. Sub-sequence matching refers to finding all sub-sequences similar to the target sequence in a longer sequence.
Because time series have high dimensionality, processing directly on the original data is very costly. Thus, it is common practice to perform data or dimensional reduction and transformation on time series data, the data being mapped into transformed space and retaining a small set of "strongest" transformed coefficients as features/representations. Because the dimensions of the new space are relatively low, such dimension reduction methods are known as time-series dimension reduction representation techniques.
Disclosure of Invention
The method and the system for representing the time sequence dimension reduction are provided, and under the condition that a large amount of drift, distortion, fluctuation, outliers, pulling-up and compression exist in time sequence data, the similar time sequence query is difficult, information is often lost, or a large amount of noise is introduced, dimension reduction of a variable-length variable-amplitude time sequence cannot be well completed, and therefore a large amount of errors are generated in the subsequent time sequence similarity measurement.
To achieve the above object, the present application provides the following solutions:
a time-series dimension-reduction representation method, comprising the steps of:
traversing a time series, identifying top and bottom features of the time series;
dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments;
calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points;
and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
Preferably, the method for identifying the top-bottom feature comprises the following steps:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
Preferably, the recording method of the start point and the end point comprises the following steps:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
Preferably, the characteristic value includes: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
Preferably, the calculating and comparing method of the gap comprises:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
The application also provides a time sequence dimension reduction representation system, which comprises: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence;
the simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments;
the calculation module is used for calculating the characteristic values of a plurality of simplified curves based on the starting point and the ending point;
and the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
Preferably, the workflow of the identification module includes:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
Preferably, the method for recording the start point and the end point by the simplifying module includes:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
Preferably, the characteristic value includes: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
Preferably, the calculation and comparison method of the difference by the comparison module comprises the following steps:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
Compared with the prior art, the beneficial effects of this application are:
by means of the dimension reduction method, dimension reduction work on variable-length variable-amplitude time sequences can be completed under the condition that time sequences with a large number of drift, distortion, fluctuation, abnormal points, pulling up and compression exist and time sequence characteristics are effectively guaranteed.
Drawings
For a clearer description of the technical solutions of the present application, the drawings that are required to be used in the embodiments are briefly described below, it being evident that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method according to a first embodiment of the present application;
FIG. 2 is a schematic top feature diagram of a first embodiment of the present application;
FIG. 3 is a schematic view of the bottom feature of the first embodiment of the present application;
FIG. 4 is a schematic view of a descending segment according to the first embodiment of the present application;
FIG. 5 is a diagram illustrating a rising line segment according to a first embodiment of the present disclosure;
fig. 6 is a schematic system structure of a second embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
Example 1
In a first embodiment, as shown in fig. 1, a time-series dimension-reduction representation method includes the following steps:
s1, traversing the time sequence, and identifying the top and bottom features of the time sequence.
The time sequence is a sequence formed by arranging various numerical values of indexes at different times in time sequence, and the time sequence analysis is a theory and a method for establishing a mathematical model through curve fitting and parameter estimation according to time sequence data obtained by system observation, and is generally used in the fields of finance, weather prediction, market analysis and the like.
The method for identifying the top and bottom features comprises the following steps: judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5; if the first curve has a first inflection point higher than a first adjacent point and a second adjacent point adjacent to the first inflection point, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point; if the second inflection point is lower than the fifth adjacent point and the sixth adjacent point adjacent to the second inflection point in the first curve, the fifth adjacent point is lower than the seventh adjacent point adjacent to the fifth adjacent point on the other side, the sixth adjacent point is lower than the eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
In this embodiment, the original time series is traversed starting from the head of the time series, and the time axis coordinates of all the top and bottom are identified (the variable length luffing problem is primarily solved).
As shown in fig. 2, definition of top: the top sub-sequence, the shortest length, needs to satisfy five points, the middle point (vertex) is higher than its neighbors, which are higher than the two sides. The structure of this foundation is called the top and the highest point is marked as the vertex; as shown in fig. 3, the definition of bottom: the sub-sequence of the bottom, the shortest length, needs to satisfy five points, the middle point (bottom point) is lower than its neighbors, which are lower than the two sides. This basic structure is called the bottom and the lowest point is noted as the bottom point. Five points satisfying such a feature are defined as the top and bottom infrastructure. Wherein the vertices are referred to as vertices/nadirs. All vertices and nadir points are highlighted. In the S1 process, the case of abnormality of the time series point can be partially solved. By combining a plurality of top points and bottom points, partial interference of time sequence drift, distortion, fluctuation, pull-up and compression can be partially solved.
S2, dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments.
The recording method of the start point and the end point comprises the following steps: when the simplified line segment is a descending line segment, taking a vertex as a starting point, recording the vertex coordinates of the vertex, taking a bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list; when the simplified line segment is an ascending line segment, the bottom point is used as a starting point, the bottom point coordinates of the bottom point are recorded, the top point is used as an end point, the top point coordinates of the top point are recorded, the top point coordinates and the bottom point coordinates are integrated into a data pair, and the data pair is stored in a linked list.
The start and end time of each simplified line segment is recorded, and the start and end coordinates are stored in a new linked list. In this embodiment, as shown in fig. 4, a descending line segment is formed by points from top to bottom, recording the top at the leftmost side as start1 and the bottom at the rightmost side as end1; alternatively, as shown in fig. 5, a rising line segment is formed from bottom points to top points, where the bottom point at the leftmost side is recorded as start1, and the top point at the rightmost side is recorded as end1. Coordinates of the vertices and the nadir are recorded as a pair of data, and sequentially stored in the linked list, (start 1, end 1), (start 2, end 2), … … … … (start n, end n).
S3, calculating characteristic values of a plurality of simplified curves based on the starting point and the ending point.
The characteristic values include: simplifying the slope K of the line segment, simplifying the standard deviation sigma of the line segment 2 The mean μ of the line segments, the height ym1 of the start point, and the height ym2 of the end point are simplified. Wherein the starting point height ym1 is a value. Corresponds to a value corresponding to a (start m) time point in the time series. Endpoint height ym2 value. Corresponding to the value corresponding to the (end m) time point in the time series.
The calculation method of the characteristic value is as follows:
line segment equation for the height of the starting point ym1, the height of the ending point ym 2: y=kx+c
The straight line parameters to be solved are slope k and intercept c.
Wherein, the liquid crystal display device comprises a liquid crystal display device,bringing it into the objective function J1 yields:
the objective function derives θ and makes it equal to zero, yielding:
and (3) solving to obtain: θ= (X) T X) -1 X T y
s4, calculating differences of characteristic values of the simplified line segments and the adjacent segments respectively, and if the differences are smaller than a preset threshold value, merging the line segments and the adjacent segments to finish dimension reduction representation.
The calculation and comparison method of the gap comprises the following steps:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
In this embodiment, if the start and end points obtained in S2 are recorded as follows:
......(start665,end665),(start666,end666)......
after calculation through S4, the combination is performed to obtain:
......(start665,end666)......
wherein, the preset threshold epsilon k 、∈ μ It is required to be specified by an expert or manually changed according to actual requirements.
Example two
In the second embodiment, as shown in fig. 6, a time-series dimension-reduction representation system includes: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence. In this embodiment, a method for identifying a top-bottom feature includes: judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5; if the first curve has a first inflection point higher than a first adjacent point and a second adjacent point adjacent to the first inflection point, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point; if the second inflection point is lower than the fifth adjacent point and the sixth adjacent point adjacent to the second inflection point in the first curve, the fifth adjacent point is lower than the seventh adjacent point adjacent to the fifth adjacent point on the other side, the sixth adjacent point is lower than the eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
In this embodiment, the original time series is traversed starting from the head of the time series, and the time axis coordinates of all the top and bottom are identified (the variable length luffing problem is primarily solved). Definition of roof: the top sub-sequence, the shortest length, needs to satisfy five points, the middle point (vertex) is higher than its neighbors, which are higher than the two sides. The structure of this foundation is called the top and the highest point is marked as the vertex; definition of bottom: the sub-sequence of the bottom, the shortest length, needs to satisfy five points, the middle point (bottom point) is lower than its neighbors, which are lower than the two sides. This basic structure is called the bottom and the lowest point is noted as the bottom point. Five points satisfying such a feature are defined as the top and bottom infrastructure. Wherein the vertices are referred to as vertices/nadirs. All vertices and nadir points are highlighted. In the S1 process, the case of abnormality of the time series point can be partially solved. By combining a plurality of top points and bottom points, partial interference of time sequence drift, distortion, fluctuation, pull-up and compression can be partially solved.
The simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments. In this embodiment, the recording method of the start point and the end point includes: when the simplified line segment is a descending line segment, taking a vertex as a starting point, recording the vertex coordinates of the vertex, taking a bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list; when the simplified line segment is an ascending line segment, the bottom point is used as a starting point, the bottom point coordinates of the bottom point are recorded, the top point is used as an end point, the top point coordinates of the top point are recorded, the top point coordinates and the bottom point coordinates are integrated into a data pair, and the data pair is stored in a linked list.
The start and end time of each simplified line segment is recorded, and the start and end coordinates are stored in a new linked list. In this embodiment, a descending line segment is formed by points from top points to bottom points, and records that the top point at the leftmost side is start1 and the bottom point at the rightmost side is end1; or, a rising line segment is formed by bottom points to top points, the bottom point at the leftmost side is recorded as start1, and the top point at the rightmost side is recorded as end1. Coordinates of the vertices and the nadir are recorded as a pair of data, and sequentially stored in the linked list, (start 1, end 1), (start 2, end 2), … … … … (start n, end n).
The calculation module is used for calculating characteristic values of a plurality of simplified curves based on the starting point and the ending point. In this embodiment, the feature values include: simplifying the slope K of the line segment, simplifying the standard deviation sigma of the line segment 2 The mean μ of the line segments, the height ym1 of the start point, and the height ym2 of the end point are simplified. Wherein the starting point height ym1 is a value. Corresponds to a value corresponding to a (start m) time point in the time series. Endpoint height ym2 value. Corresponding to the value corresponding to the (end m) time point in the time series.
The calculation method of the characteristic value is as follows:
line segment equation for the height of the starting point ym1, the height of the ending point ym 2: y=kx+c
The straight line parameters to be solved are slope k and intercept c.
Wherein, the liquid crystal display device comprises a liquid crystal display device,bringing it into the objective function J 1 Obtaining:
the objective function derives θ and makes it equal to zero, yielding:
and (3) solving to obtain: θ= (X) T X) -1 X T y
the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation. In this embodiment, the method for calculating and comparing the gap includes:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
In this embodiment, if the start and end points obtained in S2 are recorded as follows:
......(start665,end665),(start666,end666)......
after calculation through S4, the combination is performed to obtain:
......(start665,end666)......
wherein, the preset threshold epsilon k 、∈ μ It is required to be specified by an expert or manually changed according to actual requirements.
The foregoing embodiments are merely illustrative of the preferred embodiments of the present application and are not intended to limit the scope of the present application, and various modifications and improvements made by those skilled in the art to the technical solutions of the present application should fall within the protection scope defined by the claims of the present application.
Claims (10)
1. A time-series dimension-reduction representation method, characterized by comprising the steps of:
traversing a time series, identifying top and bottom features of the time series;
dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments;
calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points;
and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
2. A time series dimension reduction representation method according to claim 1, wherein said method of identifying said top and bottom features comprises:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
3. The method for time-series dimension-reduction representation according to claim 2, wherein the recording method of the start and end points comprises:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
4. A time-series reduced dimension representation method according to claim 3, wherein said characteristic values comprise: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
5. The method for time-series dimension-reduction representation according to claim 4, wherein said method for calculating and comparing said difference comprises:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
6. A time-series dimension-reduction representation system, comprising: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence;
the simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments;
the calculation module is used for calculating the characteristic values of a plurality of simplified curves based on the starting point and the ending point;
and the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
7. The time series reduced dimension representation system of claim 6, wherein the workflow of the recognition module comprises:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
8. The time series dimension reduction representation system of claim 7, wherein the method for recording the start and end points by the simplifying module comprises:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
9. The time series reduced dimension representation system of claim 8, wherein the characteristic values comprise: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
10. The time series dimension reduction representation system of claim 9, wherein the comparison module calculates and compares the gap by a method comprising:
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310352540.5A CN116383273A (en) | 2023-04-04 | 2023-04-04 | Time sequence dimension reduction representation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310352540.5A CN116383273A (en) | 2023-04-04 | 2023-04-04 | Time sequence dimension reduction representation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383273A true CN116383273A (en) | 2023-07-04 |
Family
ID=86961132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310352540.5A Pending CN116383273A (en) | 2023-04-04 | 2023-04-04 | Time sequence dimension reduction representation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383273A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077309A (en) * | 2013-03-28 | 2014-10-01 | 日电(中国)有限公司 | Method and device for carrying out dimension reduction processing on time-sequential sequence |
CN104820779A (en) * | 2015-04-28 | 2015-08-05 | 电子科技大学 | Extreme point and turning point based time sequence dimensionality reduction method |
CN106960059A (en) * | 2017-04-06 | 2017-07-18 | 山东大学 | A kind of Model of Time Series Streaming dimensionality reduction based on Piecewise Linear Representation is with simplifying method for expressing |
CN109241130A (en) * | 2018-07-27 | 2019-01-18 | 山东大学 | A kind of time series data dimensionality reduction and multi-resolution representation method based on weight |
-
2023
- 2023-04-04 CN CN202310352540.5A patent/CN116383273A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077309A (en) * | 2013-03-28 | 2014-10-01 | 日电(中国)有限公司 | Method and device for carrying out dimension reduction processing on time-sequential sequence |
US20140297606A1 (en) * | 2013-03-28 | 2014-10-02 | Nec (China) Co., Ltd. | Method and device for processing a time sequence based on dimensionality reduction |
CN104820779A (en) * | 2015-04-28 | 2015-08-05 | 电子科技大学 | Extreme point and turning point based time sequence dimensionality reduction method |
CN106960059A (en) * | 2017-04-06 | 2017-07-18 | 山东大学 | A kind of Model of Time Series Streaming dimensionality reduction based on Piecewise Linear Representation is with simplifying method for expressing |
CN109241130A (en) * | 2018-07-27 | 2019-01-18 | 山东大学 | A kind of time series data dimensionality reduction and multi-resolution representation method based on weight |
Non-Patent Citations (1)
Title |
---|
XIAOKCEHUI: "时间序列专题之三 时间序列的分段线性表示", Retrieved from the Internet <URL:《https://blog.csdn.net/xiaokcehui/article/details/120776478》> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Multiview dimension reduction via Hessian multiset canonical correlations | |
Wendt et al. | Wavelet leader multifractal analysis for texture classification | |
Gogolou et al. | Progressive similarity search on time series data | |
US20080235222A1 (en) | System and method for measuring similarity of sequences with multiple attributes | |
Gadelmawla et al. | Calculation of the machining time of cutting tools from captured images of machined parts using image texture features | |
Tamura et al. | Clustering of time series using hybrid symbolic aggregate approximation | |
CN116383273A (en) | Time sequence dimension reduction representation method and system | |
Joseph et al. | Multi-query content based image retrieval system using local binary patterns | |
Li | Distance measure with improved lower bound for multivariate time series | |
Marco-Blanco et al. | Time Series Clustering With Random Convolutional Kernels | |
Nguyen et al. | Visual features for multivariate time series | |
Salah et al. | Feature extraction and selection in archaeological images for automatic annotation | |
CN112561991A (en) | Level meter image identification method based on SURF (speeded Up robust features) feature extraction and color segmentation | |
Sun et al. | Fine clustering analysis of internet financial credit investigation based on big data | |
Arora et al. | A comparative study on content based image retrieval methods | |
Enireddy et al. | A data mining approach for compressed medical image retrieval | |
Abbad et al. | Rao-Geodesic distance on the generalized gamma manifold: Study of three sub-manifolds and application in the Texture Retrieval domain | |
Pei et al. | A Novel Three-stage Feature Fusion Methodology and its Application in Degradation State Identification for Hydraulic Pumps | |
Xie et al. | Pattern-based characterization of time series | |
Sun et al. | A dynamic programming approach for accurate content-based retrieval of ordinary and nano-scale medical images | |
Yang et al. | A Research on Dimension Reduction Method of Time Series Based on Trend Division | |
Dheepa et al. | An Efficient Encoder-Decoder CNN for Brain Tumor Segmentation in MRI Images | |
Liu et al. | Visualization of the image geometric transformation group based on riemannian manifold | |
Kumar et al. | Measurement of EDMed surfaces roughness using convolutional neural network | |
Mondal et al. | A hybrid shape-based image clustering using time-series analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |