CN109933607B - Periodic time series data processing method - Google Patents

Periodic time series data processing method Download PDF

Info

Publication number
CN109933607B
CN109933607B CN201910075079.7A CN201910075079A CN109933607B CN 109933607 B CN109933607 B CN 109933607B CN 201910075079 A CN201910075079 A CN 201910075079A CN 109933607 B CN109933607 B CN 109933607B
Authority
CN
China
Prior art keywords
data
turning point
points
turning
trend
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910075079.7A
Other languages
Chinese (zh)
Other versions
CN109933607A (en
Inventor
文曙东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weinuo Times Beijing Technology Co ltd
Original Assignee
Weinuo Times Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weinuo Times Beijing Technology Co ltd filed Critical Weinuo Times Beijing Technology Co ltd
Priority to CN201910075079.7A priority Critical patent/CN109933607B/en
Publication of CN109933607A publication Critical patent/CN109933607A/en
Application granted granted Critical
Publication of CN109933607B publication Critical patent/CN109933607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of data processing. The application discloses a periodic time sequence data processing method, which aims to solve the problem of inaccurate judgment of turning points of a data sequence trend in the prior art. The periodic time sequence data processing method of the application comprises the following steps: a. grouping the data in the time period K according to the natural week, and respectively extracting the data in the corresponding time in each group to form 7 data sequences S (1) and S (2) … … S (7); b. summing the data of the corresponding positions in the 7 data sequences to obtain an 8 th data sequence S (8); c. the trend turning points of the data sequences S (1) and S (2) … … S (8) are obtained; wherein: k is more than or equal to 7n, and n is a positive integer. The method has the advantages that the influence of the periodical change of the data on the turning point identification can be eliminated, the accuracy of the turning point identification is improved, the change fluctuation trend of the data is truly reflected, and a more scientific basis is provided for decision making. The application simplifies the data processing process and improves the data processing efficiency.

Description

Periodic time series data processing method
Technical Field
The application relates to the technical field of data processing, in particular to a time sequence data processing method with periodic characteristics, and specifically relates to a method for identifying turning points of periodic data trend of a data sequence.
Background
When we predict, we need to analyze the shape features of the time series, is the rise? Is the decline? Is it stationary? This requires selecting the most interesting point in human vision, namely the turning point of time series data, and the common characteristic of the turning points is that the change trend of the two sides is obviously different.
The time sequence data contains trend information, and trend turning points can be extracted according to the trend information of the data, so that the purposes of compressing the data and reducing noise influence are achieved. The change trend of the event can be predicted and judged by analyzing the time sequence data, and a basis is provided for various decisions.
The conventional turning point identification method does not consider periodic fluctuation within one week. In a typical application scenario, the travel data of the passengers has obvious characteristics with a natural period as a cycle, and the travel data of the passengers takes 7 days as a cycle. Because of the periodic fluctuations within 7 days, the data is directly processed to seek the turning point, and the turning point cannot be accurately located to a certain day.
In practice, a large number of traffic traveler travel data are periodic, such as railway traffic data, and the traffic travelers have obvious periodicity in Monday, tuesday, … … and Sunday. Beijing-Shanghai section friday, sunday passenger flow is significantly higher than other dates. The aim is that the whole year is taken, 7 months to 8 months are summer peak period, and the rising trend is obvious in 6 months. Such data sequences have a periodic variation within a week and a tendency to rise or fall throughout the year. Because the data in the unit of week periodically fluctuates, the turning point date of the passenger transportation trend in the whole year range is inconvenient to find according to the common turning point identification method. How to conveniently and accurately find the turning point date of the data rising or falling within the whole year range is a decision basis for the grouping of passenger train number passing, and is a problem to be solved.
Disclosure of Invention
The application mainly aims to provide a periodic time sequence data processing method to solve the problem of inaccurate judgment of turning points of a data sequence trend in the prior art.
In order to achieve the above object, according to an aspect of the present application, there is provided a periodic time-series data processing method, comprising the steps of:
a. grouping the data in the time period K according to the natural week, and respectively extracting the data in the corresponding time in each group to form 7 data sequences S (1) and S (2) … … S (7);
b. summing the data of the corresponding positions in the 7 data sequences to obtain an 8 th data sequence S (8);
c. the trend turning points of the data sequences S (1) and S (2) … … S (8) are obtained;
wherein: k is more than or equal to 7n, and n is a positive integer.
Further, the method further comprises the steps of:
d. the number of trend turning points of the data sequence S (m+1) is taken as the number of trend turning points of the time period K.
Further, the step c specifically includes:
and adopting a mathematical method to obtain trend turning points of the data sequences S (1) and S (2) … … S (8).
Further, the mathematical method specifically comprises the following steps:
the data sequences are arranged, each data point is connected by a straight line, the judgment is carried out according to the slope difference of the connecting line between a certain data point and the adjacent data points on the left side and the right side, and when the slope difference is larger than a set threshold value, the data is listed as turning points.
Or (b)
And (3) arranging the data sequences, connecting the head data points and the tail data points by using straight lines, calculating the vertical distance between all the data points in the middle and the straight lines, and selecting the point with the largest distance as the turning point. Then the turning point is used as a new endpoint, the endpoint and the original head and tail points form two data sequences, and the new turning point is found by the same method. And sequentially cycling until the distances from all points to the straight line reach a set value or until the number of turning points reaches the set value.
Further, the unit of K is year; n=52.
Further, the method further comprises the steps of:
e. positioning a trend turning point of 1 year around a turning point of S (8), forming a continuous 10-day Date set Date (10) by 7 days of the week and 3 days after the last week, checking the turning point dates of the previous S (1) to S (7), extracting the turning point dates falling in the Date (10) set to form a new set, selecting the smallest Date value from the new set, and positioning the turning point Date of the turning point week to the day, wherein the turning point Date also becomes a trend turning point of the whole year.
Further, the turning point date around the last turning point is located to the last day.
The method has the advantages that the influence of the periodical change of the data on the turning point identification can be eliminated, the accuracy of the turning point identification is improved, the change fluctuation trend of the data is truly reflected, and a more scientific basis is provided for decision making. The application simplifies the data processing process and improves the data processing efficiency.
The application is further described below with reference to the drawings and detailed description. Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application, and together with the description serve to explain the specific embodiment of the application. In the drawings:
FIG. 1 is a schematic diagram of the amount of travel transmitted by a station passenger;
fig. 2 is a schematic diagram of data fitting of an example.
Detailed Description
It should be noted that, without conflict, the specific embodiments, examples and features thereof in the present disclosure may be combined with each other. The present application will now be described in detail with reference to the accompanying drawings in conjunction with the following.
In order that those skilled in the art will better understand the present application, a detailed description and a complete description of the technical solutions of the embodiments and examples of the present application will be provided below with reference to the accompanying drawings in the embodiments and examples, and it is apparent that the described examples are only some examples of the present application and not all examples. All other embodiments, examples, and implementations of what is known to those of ordinary skill in the art as being without undue burden are intended to be within the scope of the present application.
In the application, the turning point of the data sequence trend is characteristic data reflecting the data change trend, and the data change trend before and after the data point is obviously different.
The time series data processed by the method has the characteristics of small period circulation within 7 days of a week and annual trend complexity, so that the dimension reduction of the time series data is particularly important, namely the number of points in the time series data is reduced as much as possible under the condition of keeping the approximate shape of the time series data.
And (3) dimension reduction treatment: the time series has a periodic characteristic in units of natural weeks, and weekly data is subjected to dimension reduction processing in order to eliminate interference of the periodicity on turning point analysis. Accordingly, the present application introduces 8 time sequences.
First, the Monday to Sunday data packets are extracted to form 7 sets of time series, note that 365 days a year for more than 52 weeks, and the extra day is omitted from discussion herein. The following are provided:
a time series S (1) consisting of all monday, 52 data;
a time series S (2) consisting of all Tuesdays, 52 data;
time series S (3) consisting of all wednesdays, 52 data;
a time series S (4) consisting of all thursday, 52 data;
a time series S (5) consisting of all friday, 52 data;
time series S (6) consisting of all Saturday, 52 data;
time series S (7) of all sundays, 52 data.
The 7 sequences eliminate the small-period cyclic characteristic in units of weeks, and can generally reflect the data change trend all the year round, but the data on certain dates can have singular values, so that the trend shapes of the 7 sequences are inconsistent.
And secondly, the weekly data are summed to form a weekly data sequence, and the weekly data are used as an 8 th time sequence S (8), so that the data in one year are changed from 365 days to 52 weeks, the influence of periodic fluctuation every 7 days is eliminated, positive and negative errors of the data in each day in one week are mutually counteracted, and the influence of singular values of the data is reduced. The time series of 52 week data formation may exhibit a trend of annual data. The inflection point week of the annual data can be selected by this time series.
The above 8 time series data are subjected to turning point analysis and extraction, namely 52 data are subjected to segmentation processing, and the time series data in each period can be simulated by using straight line segments approximately. The time sequence data is expressed as adjacent line segment clusters, a plurality of straight line segments which are adjacent end to end are used for approximately replacing the original time sequence, and the intervals are not necessarily equal. Conventional methods include a distance maximization method (vertical distance, orthogonal distance), a time-series piecewise linear method of extracting edge points based on slopes, and the like.
The trend turning point of S (8) is used as the circumference of the annual trend turning point, but the annual trend turning point is required to be positioned to a specific date, and the whole process comprises the following steps:
1. for railway passenger demand data with cycle characteristics, data from monday to sunday in one year are extracted in a grouping mode and are decomposed into seven subsequences S (1) and S (2) … … S (7), and the number of the data in the sequences is 52.
2. The 7 day of week data is summed to form a 52 week data sequence S (8) that eliminates the small cycle characteristics in weeks and also eliminates the effects of unknown factor interference on individual day data. The sequence is the eighth sequence.
3. And respectively extracting turning points of the newly generated eight time sequences. Because of many interference factors in reality, seven sequences in the step 1 have some singular data, the selected turning points are interfered and inaccurate, the data of the eighth sequence S (8) in the step 2 is summed with the data of one week, smoothed, and shows the annual data trend, and the selected turning points are annual trend turning points, but are positioned to the week instead of the specific date.
4. From the turning point of the eighth sequence S (8), it is known which week the turning point is in, but the turning point positioning date of the possible annual change trend may not be in that week, but 3 days after the last week. I.e. a trend changes from the last half of the week. We formed a continuous 10-day set Date (10) of 7 days around and three days after the last week of the turning points, looked at the turning points of the first seven sequences, selected all turning points of the 7 sequences contained in the Date set Date (10), selected the smallest Date, and then located the turning point to this specific Date.
The date of the turning point is contained in the circumference of the turning point
…… …… …… …… …… …… …… ……
Week n Trend 1 Trend 1 Trend 1 Trend 1 Trend 1 Trend 1 Trend 1
Week n+1 (circumference of turning point) Trend 1 Turning point Turning point Turning point Turning point Turning point Turning point
Week n+2 Turning point Trend 2 Trend 2 Trend 2 Trend 2 Trend 2 Trend 2
Week n+3 Trend 2 Trend 2 Trend 2 Trend 2 Trend 2 Trend 2 Trend 2
…… …… …… …… …… …… …… ……
The two turning points are located in the last three days of the week around the turning point
5. If the turning point date is not selected in step 4, the week minimum date is located at the turning point date.
6. In particular, the last week is the end point week, the last turning point is set as the last day.
Examples:
data is transmitted for passengers 22 weeks before 2015 at a station, as shown in fig. 1. The data is dimensionality-reduced to yield seven subsequences S (1), S (2), … … S (7) from monday to sunday, and a weekly traffic time sequence S (8) summing up consecutive 7 days. Turning point detection is carried out on eight sequences of S (1) and S (2) … … S (8).
The turning point detection method comprises the following steps:
adding the head and tail points into a variable point sequence, connecting two variable points, and obtaining a coordinate (X i ,Y i ) Wherein X is i Number of weeks, Y i Is the corresponding passenger flow volume. Obtaining a linear formula y=ax+b, and applying a distance formulaCalculating the distance between the rest points in the interval and the connecting line of the variable points, selecting the point farthest from the straight line, adding the point to the variable point set, connecting two adjacent variable points, calculating the distance between each point in the interval and the straight line, selecting the point farthest from the straight line, and continuing until 5 variable points (comprising two end points) are selected.
And analyzing the passenger flow data of the station for one year, and respectively detecting the turning points of the week sequence and the subsequence. The turning points for the 8 sequences are as follows:
circumference is: (1, 11, 13, 16, 22)
And (3) Zhou: (1,2,4, 11, 22)
Zhou three: (1,2,4, 11, 22)
Thursday: (1,3,5, 12, 22)
Friday: (1,3,5, 16, 22)
Saturday: (1,3,5, 12, 22)
Day of week: (1,5, 10, 15, 22)
Week data: (1,3,5, 11, 22)
Eight sequences are analyzed to find the turning points of each sequence, and the first force generating point in the Zhou Xulie turning points is found by taking the intersection of the turning points, as shown in the following table 3, wherein the turning points are marked green and selected for the week data sequence, and the data in the frame is the date corresponding to the turning points of the subsequence. The method comprises the following specific steps:
the first week in the week data is a turning point, and the turning point at the beginning is selected to be 1 month and 1 day; the third week in the week sequence is turning point, the first turning point is 1 month 15 days, and the first turning point selected this time is 1 month 13 days since the turning points are 1 month 13 and 1 month 14 which are both turning points when three days are seen forward for insurance.
Week 5 is the turning point, the first turning point is 1 month 29 days, and two days forward is also the turning point, i.e. the first turning point is shifted forward to 1 month 27 days. According to the method, the intersection of the turn points of the week sequence and the subsequence is continuously taken, and the turn point of the last week is required to be positioned on the last day. Finally, all turning point dates are selected as shown in table 4.
TABLE 3 Table 3
TABLE 4 Table 4
Turning point Date of day Corresponding to the sky
1 1 month and 1 day 1
2 1 month 13 days 13
3 1 month and 27 days 17
4 3 months and 16 days 75
5 6 months 3 days 154
The resulting fitted graph is shown in fig. 2, wherein each square point is 154 days of passenger flow data, and the dots are selected turning points (including starting points).

Claims (2)

1. A method of processing periodic time-series data, comprising the steps of:
a. grouping passenger transmission data in a time period K according to natural weeks, respectively extracting data of corresponding time in each group, and forming data sequences S (1) and S (2) … … S (7) from monday to sunday;
b. summing the data of the corresponding positions in the data sequences S (1) and S (2) … … S (7) to obtain an 8 th data sequence S (8) of the passenger flow;
c. adopting a mathematical method to obtain trend turning points of the data sequences S (1) and S (2) … … S (8);
the mathematical method specifically comprises the following steps:
the data sequences are arranged, each data point is connected by a straight line, the judgment is carried out according to the slope difference of the connecting line between a certain data point and the adjacent data points on the left side and the right side, and when the slope difference is larger than a set threshold value, the data is listed as turning points;
or (b)
The data sequences are arranged, the straight lines are used for connecting the head data points and the tail data points, the vertical and vertical distances between all the data points in the middle and the straight lines are calculated, and the point with the largest distance is selected as the turning point; then the turning point is used as a new endpoint, the endpoint and the original head and tail points form two data sequences, and the new turning point is found by the same method; sequentially cycling until the distances from all points to the straight line reach a set value or until the number of turning points reaches the set value;
d. taking the number of trend turning points of the data sequence S (8) as the number of trend turning points of the time period K;
wherein: k is more than or equal to 7n, n is a positive integer, the unit of K is year, and n=52;
e. positioning a trend turning point of 1 year around a turning point of S (8), forming a continuous 10-day Date set Date (10) by 7 days of the week and 3 days after the last week, checking the turning point dates of the data sequences S (1) to S (7), extracting the turning point dates falling in the Date (10) set to form a new set, selecting the smallest Date value from the new set, and positioning the turning point Date of the turning point week to the day, wherein the turning point Date also becomes a trend turning point all the year round.
2. The method of processing periodic time-series data according to claim 1, wherein the turning point date around the last turning point is located to the last day.
CN201910075079.7A 2019-01-25 2019-01-25 Periodic time series data processing method Active CN109933607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910075079.7A CN109933607B (en) 2019-01-25 2019-01-25 Periodic time series data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910075079.7A CN109933607B (en) 2019-01-25 2019-01-25 Periodic time series data processing method

Publications (2)

Publication Number Publication Date
CN109933607A CN109933607A (en) 2019-06-25
CN109933607B true CN109933607B (en) 2023-10-03

Family

ID=66985239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910075079.7A Active CN109933607B (en) 2019-01-25 2019-01-25 Periodic time series data processing method

Country Status (1)

Country Link
CN (1) CN109933607B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125184A (en) * 2019-11-23 2020-05-08 同济大学 Bus passenger flow dynamic monitoring method based on time sequence structural variable point identification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011081491A (en) * 2009-10-05 2011-04-21 Nec Biglobe Ltd Time series analysis device, time series analysis method and program
CN104268660A (en) * 2014-10-13 2015-01-07 国家电网公司 Trend recognition method for electric power system predication-like data
JP2016045917A (en) * 2014-08-27 2016-04-04 株式会社日立ソリューションズ西日本 Device for tendency extraction and evaluation of time series data
CN107764458A (en) * 2017-09-25 2018-03-06 中国航空工业集团公司西安飞机设计研究所 A kind of aircraft handing characteristics curve generation method
CN108804731A (en) * 2017-09-12 2018-11-13 中南大学 Based on the dual evaluation points time series trend feature extracting method of vital point

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011081491A (en) * 2009-10-05 2011-04-21 Nec Biglobe Ltd Time series analysis device, time series analysis method and program
JP2016045917A (en) * 2014-08-27 2016-04-04 株式会社日立ソリューションズ西日本 Device for tendency extraction and evaluation of time series data
CN104268660A (en) * 2014-10-13 2015-01-07 国家电网公司 Trend recognition method for electric power system predication-like data
CN108804731A (en) * 2017-09-12 2018-11-13 中南大学 Based on the dual evaluation points time series trend feature extracting method of vital point
CN107764458A (en) * 2017-09-25 2018-03-06 中国航空工业集团公司西安飞机设计研究所 A kind of aircraft handing characteristics curve generation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
桑夏夏 ; 李旭伟 ; .一种金融时间序列区域分割方法的研究.四川大学学报(自然科学版).2018,(第06期),全文. *
王炜炜 ; 单杏花 ; .基于时间序列聚类方法的小长假铁路客流规律研究.铁路计算机应用.2015,(第04期),全文. *

Also Published As

Publication number Publication date
CN109933607A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN107103754B (en) Road traffic condition prediction method and system
CN102360378A (en) Outlier detection method for time-series data
CN116418882B (en) Memory data compression method based on HPLC dual-mode carrier communication
CN109933607B (en) Periodic time series data processing method
CN112000808B (en) Data processing method and device and readable storage medium
CN108446795A (en) Power system load fluction analysis method, apparatus and readable storage medium storing program for executing
CN102096672A (en) Method for extracting classification rule based on fuzzy-rough model
CN104679970A (en) Data detection method and device
CN108647261A (en) Global isoplethes drawing method based on meteorological data discrete point gridding processing
CN107229842A (en) A kind of three generations's sequencing sequence bearing calibration based on Local map
CN111291216A (en) Method and system for analyzing foothold based on face structured data
CN112116810A (en) Whole road network segment travel time estimation method based on urban road checkpoint data
CN111815941B (en) Frequent congestion bottleneck identification method and device based on historical road conditions
CN112652164B (en) Traffic time interval dividing method, device and equipment
CN101826070A (en) Key point-based data sequence linear fitting method
CN108681741A (en) Based on the subway of IC card and resident's survey data commuting crowd's information fusion method
CN107818415A (en) A kind of recognition methods of attending a school by taking daily trips based on subway brushing card data
CN115527610B (en) Cluster analysis method for single-cell histology data
CN116776073A (en) Pollutant concentration evaluation method and device
CN108597224B (en) Method and system for identifying to-be-improved traffic facilities based on space-time trajectory data
CN104484565B (en) Specific pair relationhip model building method and line distance computation parameter query method between railway line
KR20180116508A (en) Apparatus and method for clustering using temperature data
CN112508303B (en) OD passenger flow prediction method, device, equipment and readable storage medium
KR20180116507A (en) Apparatus and method for analyzing weather data
US20190362282A1 (en) Operational recommendations based on multi-jurisdictional inputs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230823

Address after: Room 203, 2nd Floor, Building 12, East District, No. 31 Jiaoda East Road, Haidian District, Beijing, 100044

Applicant after: WEINUO TIMES (BEIJING) TECHNOLOGY CO.,LTD.

Address before: 1602-16, 16th floor, innovation building, Southwest Jiaotong University, No. 111, North Section 1, 2nd Ring Road, Jinniu District, Chengdu, Sichuan 610000

Applicant before: SICHUAN QUANCHENG TIANYOU TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant