CN116383273A - Time sequence dimension reduction representation method and system - Google Patents

Time sequence dimension reduction representation method and system Download PDF

Info

Publication number
CN116383273A
CN116383273A CN202310352540.5A CN202310352540A CN116383273A CN 116383273 A CN116383273 A CN 116383273A CN 202310352540 A CN202310352540 A CN 202310352540A CN 116383273 A CN116383273 A CN 116383273A
Authority
CN
China
Prior art keywords
point
adjacent
line segment
simplified
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310352540.5A
Other languages
Chinese (zh)
Inventor
史晓贤
周同明
王秦
马振武
魏媛媛
赵春海
赵春阁
成怡
马坤
汪卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Yongxin New Material Technology Co ltd
Original Assignee
Jinan Yongxin New Material Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Yongxin New Material Technology Co ltd filed Critical Jinan Yongxin New Material Technology Co ltd
Priority to CN202310352540.5A priority Critical patent/CN116383273A/en
Publication of CN116383273A publication Critical patent/CN116383273A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The application discloses a time sequence dimension reduction representation method and a system, wherein the method comprises the following steps: traversing a time series, identifying top and bottom features of the time series; dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments; calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points; and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation. By means of the dimension reduction method, dimension reduction work on variable-length variable-amplitude time sequences can be completed under the condition that time sequences with a large number of drift, distortion, fluctuation, abnormal points, pulling up and compression exist and time sequence characteristics are effectively guaranteed.

Description

Time sequence dimension reduction representation method and system
Technical Field
The application relates to the technical field of databases, in particular to a time sequence dimension reduction representation method and a system.
Background
Time series data is common in almost all human activities, including clinical medical vital sign recording equipment, real-time transaction data for financial stock futures, sales data for electronic commerce retail markets, astronomical observations, and real-time weather temperatures.
In recent years, with the popularization of emerging applications such as data centers and the internet of things, the scale of time-series data is also expanding. Many real-time applications produce tens or even hundreds of millions of time series data, with storage scales up to TB or PB.
Time series similarity queries. Time series similarity query is an important research direction in the field of time series mining. A time series similarity query refers to finding a set of target sequences that are most similar to a given time series on a set of time series data according to some similarity metric function. The time sequence similarity query is the basic pre-work of time sequence clustering, classification, anomaly detection and frequent pattern mining. The similarity queries of the time series can be divided into two major categories, full-sequence matching and sub-sequence matching. Wherein full sequence matching means that the searched time sequence has the same length as the target sequence. Sub-sequence matching refers to finding all sub-sequences similar to the target sequence in a longer sequence.
Because time series have high dimensionality, processing directly on the original data is very costly. Thus, it is common practice to perform data or dimensional reduction and transformation on time series data, the data being mapped into transformed space and retaining a small set of "strongest" transformed coefficients as features/representations. Because the dimensions of the new space are relatively low, such dimension reduction methods are known as time-series dimension reduction representation techniques.
Disclosure of Invention
The method and the system for representing the time sequence dimension reduction are provided, and under the condition that a large amount of drift, distortion, fluctuation, outliers, pulling-up and compression exist in time sequence data, the similar time sequence query is difficult, information is often lost, or a large amount of noise is introduced, dimension reduction of a variable-length variable-amplitude time sequence cannot be well completed, and therefore a large amount of errors are generated in the subsequent time sequence similarity measurement.
To achieve the above object, the present application provides the following solutions:
a time-series dimension-reduction representation method, comprising the steps of:
traversing a time series, identifying top and bottom features of the time series;
dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments;
calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points;
and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
Preferably, the method for identifying the top-bottom feature comprises the following steps:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
Preferably, the recording method of the start point and the end point comprises the following steps:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
Preferably, the characteristic value includes: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
Preferably, the calculating and comparing method of the gap comprises:
Figure BDA0004162110370000031
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure BDA0004162110370000032
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
The application also provides a time sequence dimension reduction representation system, which comprises: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence;
the simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments;
the calculation module is used for calculating the characteristic values of a plurality of simplified curves based on the starting point and the ending point;
and the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
Preferably, the workflow of the identification module includes:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
Preferably, the method for recording the start point and the end point by the simplifying module includes:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
Preferably, the characteristic value includes: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
Preferably, the calculation and comparison method of the difference by the comparison module comprises the following steps:
Figure BDA0004162110370000051
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure BDA0004162110370000052
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
Compared with the prior art, the beneficial effects of this application are:
by means of the dimension reduction method, dimension reduction work on variable-length variable-amplitude time sequences can be completed under the condition that time sequences with a large number of drift, distortion, fluctuation, abnormal points, pulling up and compression exist and time sequence characteristics are effectively guaranteed.
Drawings
For a clearer description of the technical solutions of the present application, the drawings that are required to be used in the embodiments are briefly described below, it being evident that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method according to a first embodiment of the present application;
FIG. 2 is a schematic top feature diagram of a first embodiment of the present application;
FIG. 3 is a schematic view of the bottom feature of the first embodiment of the present application;
FIG. 4 is a schematic view of a descending segment according to the first embodiment of the present application;
FIG. 5 is a diagram illustrating a rising line segment according to a first embodiment of the present disclosure;
fig. 6 is a schematic system structure of a second embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
Example 1
In a first embodiment, as shown in fig. 1, a time-series dimension-reduction representation method includes the following steps:
s1, traversing the time sequence, and identifying the top and bottom features of the time sequence.
The time sequence is a sequence formed by arranging various numerical values of indexes at different times in time sequence, and the time sequence analysis is a theory and a method for establishing a mathematical model through curve fitting and parameter estimation according to time sequence data obtained by system observation, and is generally used in the fields of finance, weather prediction, market analysis and the like.
The method for identifying the top and bottom features comprises the following steps: judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5; if the first curve has a first inflection point higher than a first adjacent point and a second adjacent point adjacent to the first inflection point, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point; if the second inflection point is lower than the fifth adjacent point and the sixth adjacent point adjacent to the second inflection point in the first curve, the fifth adjacent point is lower than the seventh adjacent point adjacent to the fifth adjacent point on the other side, the sixth adjacent point is lower than the eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
In this embodiment, the original time series is traversed starting from the head of the time series, and the time axis coordinates of all the top and bottom are identified (the variable length luffing problem is primarily solved).
As shown in fig. 2, definition of top: the top sub-sequence, the shortest length, needs to satisfy five points, the middle point (vertex) is higher than its neighbors, which are higher than the two sides. The structure of this foundation is called the top and the highest point is marked as the vertex; as shown in fig. 3, the definition of bottom: the sub-sequence of the bottom, the shortest length, needs to satisfy five points, the middle point (bottom point) is lower than its neighbors, which are lower than the two sides. This basic structure is called the bottom and the lowest point is noted as the bottom point. Five points satisfying such a feature are defined as the top and bottom infrastructure. Wherein the vertices are referred to as vertices/nadirs. All vertices and nadir points are highlighted. In the S1 process, the case of abnormality of the time series point can be partially solved. By combining a plurality of top points and bottom points, partial interference of time sequence drift, distortion, fluctuation, pull-up and compression can be partially solved.
S2, dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments.
The recording method of the start point and the end point comprises the following steps: when the simplified line segment is a descending line segment, taking a vertex as a starting point, recording the vertex coordinates of the vertex, taking a bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list; when the simplified line segment is an ascending line segment, the bottom point is used as a starting point, the bottom point coordinates of the bottom point are recorded, the top point is used as an end point, the top point coordinates of the top point are recorded, the top point coordinates and the bottom point coordinates are integrated into a data pair, and the data pair is stored in a linked list.
The start and end time of each simplified line segment is recorded, and the start and end coordinates are stored in a new linked list. In this embodiment, as shown in fig. 4, a descending line segment is formed by points from top to bottom, recording the top at the leftmost side as start1 and the bottom at the rightmost side as end1; alternatively, as shown in fig. 5, a rising line segment is formed from bottom points to top points, where the bottom point at the leftmost side is recorded as start1, and the top point at the rightmost side is recorded as end1. Coordinates of the vertices and the nadir are recorded as a pair of data, and sequentially stored in the linked list, (start 1, end 1), (start 2, end 2), … … … … (start n, end n).
S3, calculating characteristic values of a plurality of simplified curves based on the starting point and the ending point.
The characteristic values include: simplifying the slope K of the line segment, simplifying the standard deviation sigma of the line segment 2 The mean μ of the line segments, the height ym1 of the start point, and the height ym2 of the end point are simplified. Wherein the starting point height ym1 is a value. Corresponds to a value corresponding to a (start m) time point in the time series. Endpoint height ym2 value. Corresponding to the value corresponding to the (end m) time point in the time series.
The calculation method of the characteristic value is as follows:
line segment equation for the height of the starting point ym1, the height of the ending point ym 2: y=kx+c
The straight line parameters to be solved are slope k and intercept c.
So there are:
Figure BDA0004162110370000081
written in matrix form: />
Figure BDA0004162110370000082
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004162110370000083
bringing it into the objective function J1 yields:
Figure BDA0004162110370000084
the objective function derives θ and makes it equal to zero, yielding:
Figure BDA0004162110370000085
and (3) solving to obtain: θ= (X) T X) -1 X T y
Namely:
Figure BDA0004162110370000086
s4, calculating differences of characteristic values of the simplified line segments and the adjacent segments respectively, and if the differences are smaller than a preset threshold value, merging the line segments and the adjacent segments to finish dimension reduction representation.
The calculation and comparison method of the gap comprises the following steps:
Figure BDA0004162110370000091
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure BDA0004162110370000092
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
In this embodiment, if the start and end points obtained in S2 are recorded as follows:
......(start665,end665),(start666,end666)......
after calculation through S4, the combination is performed to obtain:
......(start665,end666)......
wherein, the preset threshold epsilon k
Figure BDA0004162110370000093
μ It is required to be specified by an expert or manually changed according to actual requirements.
Example two
In the second embodiment, as shown in fig. 6, a time-series dimension-reduction representation system includes: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence. In this embodiment, a method for identifying a top-bottom feature includes: judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5; if the first curve has a first inflection point higher than a first adjacent point and a second adjacent point adjacent to the first inflection point, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point; if the second inflection point is lower than the fifth adjacent point and the sixth adjacent point adjacent to the second inflection point in the first curve, the fifth adjacent point is lower than the seventh adjacent point adjacent to the fifth adjacent point on the other side, the sixth adjacent point is lower than the eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
In this embodiment, the original time series is traversed starting from the head of the time series, and the time axis coordinates of all the top and bottom are identified (the variable length luffing problem is primarily solved). Definition of roof: the top sub-sequence, the shortest length, needs to satisfy five points, the middle point (vertex) is higher than its neighbors, which are higher than the two sides. The structure of this foundation is called the top and the highest point is marked as the vertex; definition of bottom: the sub-sequence of the bottom, the shortest length, needs to satisfy five points, the middle point (bottom point) is lower than its neighbors, which are lower than the two sides. This basic structure is called the bottom and the lowest point is noted as the bottom point. Five points satisfying such a feature are defined as the top and bottom infrastructure. Wherein the vertices are referred to as vertices/nadirs. All vertices and nadir points are highlighted. In the S1 process, the case of abnormality of the time series point can be partially solved. By combining a plurality of top points and bottom points, partial interference of time sequence drift, distortion, fluctuation, pull-up and compression can be partially solved.
The simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments. In this embodiment, the recording method of the start point and the end point includes: when the simplified line segment is a descending line segment, taking a vertex as a starting point, recording the vertex coordinates of the vertex, taking a bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list; when the simplified line segment is an ascending line segment, the bottom point is used as a starting point, the bottom point coordinates of the bottom point are recorded, the top point is used as an end point, the top point coordinates of the top point are recorded, the top point coordinates and the bottom point coordinates are integrated into a data pair, and the data pair is stored in a linked list.
The start and end time of each simplified line segment is recorded, and the start and end coordinates are stored in a new linked list. In this embodiment, a descending line segment is formed by points from top points to bottom points, and records that the top point at the leftmost side is start1 and the bottom point at the rightmost side is end1; or, a rising line segment is formed by bottom points to top points, the bottom point at the leftmost side is recorded as start1, and the top point at the rightmost side is recorded as end1. Coordinates of the vertices and the nadir are recorded as a pair of data, and sequentially stored in the linked list, (start 1, end 1), (start 2, end 2), … … … … (start n, end n).
The calculation module is used for calculating characteristic values of a plurality of simplified curves based on the starting point and the ending point. In this embodiment, the feature values include: simplifying the slope K of the line segment, simplifying the standard deviation sigma of the line segment 2 The mean μ of the line segments, the height ym1 of the start point, and the height ym2 of the end point are simplified. Wherein the starting point height ym1 is a value. Corresponds to a value corresponding to a (start m) time point in the time series. Endpoint height ym2 value. Corresponding to the value corresponding to the (end m) time point in the time series.
The calculation method of the characteristic value is as follows:
line segment equation for the height of the starting point ym1, the height of the ending point ym 2: y=kx+c
The straight line parameters to be solved are slope k and intercept c.
So there are:
Figure BDA0004162110370000111
written in matrix form: />
Figure BDA0004162110370000112
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004162110370000113
bringing it into the objective function J 1 Obtaining:
Figure BDA0004162110370000114
the objective function derives θ and makes it equal to zero, yielding:
Figure BDA0004162110370000115
and (3) solving to obtain: θ= (X) T X) -1 X T y
Namely:
Figure BDA0004162110370000116
the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation. In this embodiment, the method for calculating and comparing the gap includes:
Figure BDA0004162110370000121
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure BDA0004162110370000122
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
In this embodiment, if the start and end points obtained in S2 are recorded as follows:
......(start665,end665),(start666,end666)......
after calculation through S4, the combination is performed to obtain:
......(start665,end666)......
wherein, the preset threshold epsilon k
Figure BDA0004162110370000123
μ It is required to be specified by an expert or manually changed according to actual requirements.
The foregoing embodiments are merely illustrative of the preferred embodiments of the present application and are not intended to limit the scope of the present application, and various modifications and improvements made by those skilled in the art to the technical solutions of the present application should fall within the protection scope defined by the claims of the present application.

Claims (10)

1. A time-series dimension-reduction representation method, characterized by comprising the steps of:
traversing a time series, identifying top and bottom features of the time series;
dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics, and recording starting points and ending points of the plurality of simplified line segments;
calculating characteristic values of a plurality of simplified curves based on the starting points and the ending points;
and respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
2. A time series dimension reduction representation method according to claim 1, wherein said method of identifying said top and bottom features comprises:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
3. The method for time-series dimension-reduction representation according to claim 2, wherein the recording method of the start and end points comprises:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
4. A time-series reduced dimension representation method according to claim 3, wherein said characteristic values comprise: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
5. The method for time-series dimension-reduction representation according to claim 4, wherein said method for calculating and comparing said difference comprises:
Figure FDA0004162110360000021
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure FDA0004162110360000022
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
6. A time-series dimension-reduction representation system, comprising: the device comprises an identification module, a simplification module, a calculation module and a comparison module;
the identification module is used for traversing the time sequence and identifying the top and bottom characteristics of the time sequence;
the simplification module is used for dividing the time sequence into a plurality of simplified line segments based on the top-bottom characteristics and recording starting points and ending points of the plurality of simplified line segments;
the calculation module is used for calculating the characteristic values of a plurality of simplified curves based on the starting point and the ending point;
and the comparison module is used for respectively calculating the difference between the characteristic values of the simplified line segment and the adjacent segment, and if the difference is smaller than a preset threshold value, merging the line segment and the adjacent segment to finish the dimension reduction representation.
7. The time series reduced dimension representation system of claim 6, wherein the workflow of the recognition module comprises:
judging the number of inflection points of the time sequence, and recording the number of inflection points as a first curve if the number of the inflection points is not less than 5;
if a first inflection point is higher than a first adjacent point and a second adjacent point adjacent to the first inflection point in the first curve, the first adjacent point is higher than a third adjacent point adjacent to the first adjacent point on the other side, and the second adjacent point is higher than a fourth adjacent point adjacent to the second adjacent point on the other side, the first curve is marked as a top characteristic, and the first inflection point is marked as a top point;
if there is a second inflection point lower than a fifth adjacent point and a sixth adjacent point adjacent to the second inflection point in the first curve, and the fifth adjacent point is lower than a seventh adjacent point adjacent to the fifth adjacent point on the other side, and the sixth adjacent point is lower than an eighth adjacent point adjacent to the sixth adjacent point on the other side, the first curve is marked as a bottom feature, and the second inflection point is marked as a bottom point.
8. The time series dimension reduction representation system of claim 7, wherein the method for recording the start and end points by the simplifying module comprises:
when the simplified line segment is a descending line segment, taking the vertex as a starting point, recording the vertex coordinates of the vertex, taking the bottom point as an end point, recording the bottom point coordinates of the bottom point, integrating the vertex coordinates and the bottom point coordinates into a data pair, and storing the data pair into a linked list;
when the simplified line segment is an ascending line segment, taking the bottom point as a starting point, recording the bottom point coordinate of the bottom point, taking the top point as an end point, recording the top point coordinate of the top point, integrating the top point coordinate and the bottom point coordinate into a data pair, and storing the data pair into a linked list.
9. The time series reduced dimension representation system of claim 8, wherein the characteristic values comprise: the slope K of the simplified line segment, the standard deviation sigma of the simplified line segment 2 The mean value mu of the simplified line segments, the height ym1 of the starting point and the height ym2 of the ending point.
10. The time series dimension reduction representation system of claim 9, wherein the comparison module calculates and compares the gap by a method comprising:
Figure FDA0004162110360000041
wherein k is m To simplify the slope of line segment m, σ 2 m To simplify the standard deviation of line segment m, mu m To simplify the mean value of line segment m, k n To simplify the slope of line segment n, σ 2 n To simplify the standard deviation of line segment n, μ n To simplify the mean value of line segment n, E k Is a preset threshold value for the slope,
Figure FDA0004162110360000042
e is a preset threshold of standard deviation μ Is a preset threshold value of the mean value.
CN202310352540.5A 2023-04-04 2023-04-04 Time sequence dimension reduction representation method and system Pending CN116383273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310352540.5A CN116383273A (en) 2023-04-04 2023-04-04 Time sequence dimension reduction representation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310352540.5A CN116383273A (en) 2023-04-04 2023-04-04 Time sequence dimension reduction representation method and system

Publications (1)

Publication Number Publication Date
CN116383273A true CN116383273A (en) 2023-07-04

Family

ID=86961132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310352540.5A Pending CN116383273A (en) 2023-04-04 2023-04-04 Time sequence dimension reduction representation method and system

Country Status (1)

Country Link
CN (1) CN116383273A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077309A (en) * 2013-03-28 2014-10-01 日电(中国)有限公司 Method and device for carrying out dimension reduction processing on time-sequential sequence
CN104820779A (en) * 2015-04-28 2015-08-05 电子科技大学 Extreme point and turning point based time sequence dimensionality reduction method
CN106960059A (en) * 2017-04-06 2017-07-18 山东大学 A kind of Model of Time Series Streaming dimensionality reduction based on Piecewise Linear Representation is with simplifying method for expressing
CN109241130A (en) * 2018-07-27 2019-01-18 山东大学 A kind of time series data dimensionality reduction and multi-resolution representation method based on weight

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077309A (en) * 2013-03-28 2014-10-01 日电(中国)有限公司 Method and device for carrying out dimension reduction processing on time-sequential sequence
US20140297606A1 (en) * 2013-03-28 2014-10-02 Nec (China) Co., Ltd. Method and device for processing a time sequence based on dimensionality reduction
CN104820779A (en) * 2015-04-28 2015-08-05 电子科技大学 Extreme point and turning point based time sequence dimensionality reduction method
CN106960059A (en) * 2017-04-06 2017-07-18 山东大学 A kind of Model of Time Series Streaming dimensionality reduction based on Piecewise Linear Representation is with simplifying method for expressing
CN109241130A (en) * 2018-07-27 2019-01-18 山东大学 A kind of time series data dimensionality reduction and multi-resolution representation method based on weight

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAOKCEHUI: "时间序列专题之三 时间序列的分段线性表示", Retrieved from the Internet <URL:《https://blog.csdn.net/xiaokcehui/article/details/120776478》> *

Similar Documents

Publication Publication Date Title
Liu et al. Multiview dimension reduction via Hessian multiset canonical correlations
Wilson Data representation for time series data mining: time domain approaches
Gogolou et al. Progressive similarity search on time series data
Thurnhofer-Hemsi et al. Multiobjective optimization of deep neural networks with combinations of Lp-norm cost functions for 3D medical image super-resolution
Hong et al. SSDTW: Shape segment dynamic time warping
US20080235222A1 (en) System and method for measuring similarity of sequences with multiple attributes
Gadelmawla et al. Calculation of the machining time of cutting tools from captured images of machined parts using image texture features
Li Piecewise aggregate representations and lower-bound distance functions for multivariate time series
Shen et al. TC-DTW: Accelerating multivariate dynamic time warping through triangle inequality and point clustering
CN116383273A (en) Time sequence dimension reduction representation method and system
Joseph et al. Multi-query content based image retrieval system using local binary patterns
Tamura et al. Clustering of time series using hybrid symbolic aggregate approximation
Li Distance measure with improved lower bound for multivariate time series
Gupta et al. Combination of local, global and k-mean using wavelet transform for content base image retrieval
CN116910503A (en) Sparse feature selection method based on local feature correlation and high-order labels
CN112561991B (en) Liquid level meter image recognition method based on SURF feature extraction and color segmentation
Nguyen et al. Visual features for multivariate time series
Tsitsipas et al. Scotty: fast a priori structure-based extraction from time series
Moon et al. Image patch analysis and clustering of sunspots: A dimensionality reduction approach
Zhang Multiple features facial image retrieval by spectral regression and fuzzy aggregation approach
Enireddy et al. A data mining approach for compressed medical image retrieval
Abbad et al. Rao-Geodesic distance on the generalized gamma manifold: Study of three sub-manifolds and application in the Texture Retrieval domain
Blandon et al. An enhanced and interpretable feature representation approach to support shape classification from binary images
Pei et al. A Novel Three-stage Feature Fusion Methodology and its Application in Degradation State Identification for Hydraulic Pumps
Xie et al. Pattern-based characterization of time series

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination