CN112214533B - Time sequence data aggregation method and device - Google Patents

Time sequence data aggregation method and device Download PDF

Info

Publication number
CN112214533B
CN112214533B CN202011128219.1A CN202011128219A CN112214533B CN 112214533 B CN112214533 B CN 112214533B CN 202011128219 A CN202011128219 A CN 202011128219A CN 112214533 B CN112214533 B CN 112214533B
Authority
CN
China
Prior art keywords
data
time
time sequence
sequence data
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011128219.1A
Other languages
Chinese (zh)
Other versions
CN112214533A (en
Inventor
向新桃
房新楠
樊翔
高文
汤瑾璟
陈立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Merchant Ship Design and Research Institute of CSSC No 604 Research Institute
Original Assignee
Shanghai Merchant Ship Design and Research Institute of CSSC No 604 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Merchant Ship Design and Research Institute of CSSC No 604 Research Institute filed Critical Shanghai Merchant Ship Design and Research Institute of CSSC No 604 Research Institute
Priority to CN202011128219.1A priority Critical patent/CN112214533B/en
Publication of CN112214533A publication Critical patent/CN112214533A/en
Application granted granted Critical
Publication of CN112214533B publication Critical patent/CN112214533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Traffic Control Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a time sequence data aggregation method and a device, which relate to the technical field of information processing and comprise the following steps: firstly, acquiring a time sequence data set, and segmenting the time sequence data set according to a first time window to obtain a plurality of first time sequence data subsets: then segmenting the first time sequence data subset with the data type being the variable data segment according to a second time window to obtain a plurality of second time sequence data subsets; repeatedly executing the segmentation step on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; and finally, aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data. According to the invention, the time window of the variable data segment can be adjusted in a self-adaptive manner, more time sequence data are reserved in a manner of reducing the time window, so that the aggregated target behavior data can still reflect the behavior of the intelligent ship, and the aggregation efficiency and the aggregation precision are improved.

Description

Time sequence data aggregation method and device
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for aggregating time-series data.
Background
In the field of industrial big data, especially in the field of intelligent ships, time series data is a common data form. In general, the time sequence data kept stable for a period of time indicates that the engineering object is in a stable working state, and only one piece of data is needed to record the state of the engineering object. The time sequence data continuously changed in a period of time usually indicates that the engineering object is in the state change process, and the behavior of the engineering object needs to be described by using a data sequence (a plurality of pieces of data). Different from Internet big data, in the field of industrial big data, the process of data change and the behavior of engineering objects are compared and analyzed, so that the method is one of important application modes of the industrial big data. Thus, special data aggregation requirements are set forth: only the stable data segment is compressed, and the variable data segment is reserved.
Aiming at the data aggregation requirement, the existing time sequence data aggregation method mainly aims at providing a software tool package from the aspect of big data, and the time sequence data is grouped through the software tool package. However, when grouping, the size of the time window needs to be adjusted by a manual intervention mode, and the mode cannot meet the requirements of the intelligent ship on real-time processing and real-time transmission of time sequence data. On the other hand, the method is too long in time consumption, huge in labor cost consumption and low in polymerization efficiency, and does not meet the development trend of big data and intelligence.
In summary, the existing time sequence data aggregation method has the technical problems of manual intervention and low aggregation efficiency.
Disclosure of Invention
The invention aims to provide a time sequence data aggregation method and device, which are used for solving the technical problems of manual intervention and low aggregation efficiency existing in the conventional time sequence data aggregation method.
In a first aspect, the present invention provides a method for aggregating time-series data, including: acquiring a time sequence data set, and segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; segmenting the first time sequence data subset with the data type being the variable data segment according to a second time window to obtain a plurality of second time sequence data subsets; wherein the second time window is smaller than the first time window; repeatedly executing the segmentation step on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; and aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
Further, before the segmenting the first time-ordered data subset of the data type as the variant data segment according to the second time window, the method further comprises: evaluating the first time sequence data subset to obtain an evaluation result; and determining the data type of the first time sequence data subset according to the evaluation result.
Further, the first time sequence data subset comprises a plurality of time sequence data; the evaluating the first time sequence data subset to obtain an evaluation result includes: calculating the mean value and standard deviation of the first time sequence data subset; calculating a difference value between each of the time series data in the first time series data subset and the mean value; counting the number of time sequence data with the difference value larger than a first preset threshold value; and determining the evaluation result based on the number of time series data and the standard deviation.
Further, the data types include a stable data segment and a variable data segment, and the determining the data type of the first time sequence data subset according to the evaluation result includes: if the evaluation result is that the number of the time sequence data is smaller than or equal to a preset number and the standard deviation is smaller than or equal to a second preset threshold value, determining that the data type of the first time sequence data subset is a stable data segment; and if the evaluation result is that the number of the time sequence data is larger than a preset number or the standard deviation is larger than a second preset threshold value, determining that the data type of the first time sequence data subset is a variable data segment.
Further, the time sequence data aggregation method further comprises the following steps: and aggregating the first time sequence data subset of which the data type is the stable data segment to obtain aggregated first target state data.
Further, before the segmenting the first time-ordered data subset of the data type being the variable data segment according to the second time window, the method further comprises: determining a preset segmentation formula; and determining the second time window based on the first time window and the preset segmentation formula.
Further, the time sequence data aggregation method further comprises the following steps: and aggregating the second time sequence data subset with the data type of the stable data segment to obtain aggregated second target state data.
In a second aspect, the present invention provides a time-series data aggregation apparatus, including: the acquisition segmentation unit is used for acquiring a time sequence data set, segmenting the time sequence data set according to a preset first time window and obtaining a plurality of first time sequence data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; the first segmentation unit is used for segmenting a first time sequence data subset with the data type being a variable data segment according to a second time window to obtain a plurality of second time sequence data subsets; wherein the second time window is smaller than the first time window; the second segmentation unit is used for repeatedly executing segmentation steps on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; the first aggregation unit is used for aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
In a third aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the time-series data aggregation method implemented by the computer program.
In a fourth aspect, the present invention also provides a computer readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the time series data aggregation method.
The invention provides a time sequence data aggregation method and a device, comprising the following steps: firstly, acquiring a time sequence data set, and segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; then segmenting the first time sequence data subset with the data type being the variable data segment according to a second time window to obtain a plurality of second time sequence data subsets; wherein the second time window is smaller than the first time window; repeatedly executing the segmentation step on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; and finally, aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
According to the invention, the first time sequence data subset with the data type of the variable data segment in the first time window is continuously segmented according to the second time window smaller than the first time window, so that the time window of the variable data segment can be adjusted in a self-adaptive manner, more time sequence data can be reserved in a manner of reducing the time window, thus ensuring that the aggregated target behavior data can still reflect the behavior of the intelligent ship, and improving the aggregation efficiency and the aggregation precision.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for aggregating time-series data according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for aggregating time-series data according to an embodiment of the present invention;
FIG. 3 is a flowchart of step S102 in FIG. 2;
fig. 4 is a flowchart of step S103 in fig. 2;
FIG. 5 is a flowchart of another method for aggregating time-series data according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a time-series data aggregation device according to an embodiment of the present invention.
Icon:
11-acquiring a segmentation unit; 12-a first segmentation unit; 13-a second segmentation unit; 14-a first polymerization unit.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
At present, the intellectualization of ships has become a trend of global shipping industry, and through the overall design of intelligent ships, related intelligent applications such as intelligent navigation, intelligent ship bodies, intelligent cabins, intelligent energy efficiency and the like are carried, so that the functions of auxiliary decision making, remote control, unmanned autonomy and the like of the ships can be realized. And the normal operation of each intelligent application needs the corresponding operation data of the ship. The data management platform is developed through combing functional requirements of various intelligent applications of the ship on scenes such as data acquisition, storage, distribution and ship shore transmission, conventional cleaning, distribution, storage and ship shore returning are carried out on data streams, the requirements of various intelligent applications on data processing are reduced, and the intelligent applications can concentrate on own business.
In the field of industrial big data, in particular in the field of intelligent ships, time-series data (the evolution of the index data item Y over time T) is a common data form. In general, data that remains stable for a period of time typically indicates that the engineering object is in a stable operating state, where only one piece of data is needed to record its state. While continuously changing data over a period of time typically indicates that the engineering object is in the process of changing state, a sequence of data is needed to describe its behavior. Different from Internet big data, in the field of industrial big data, the data change process and the behavior of engineering objects are compared and analyzed, so that the method is one of important application modes of the industrial big data. Thus, special data aggregation requirements are set forth: only the stable data segment is compressed, and the variable data segment is reserved.
In the big data field, grouping and aggregation are conventional data processing methods. The data grouping is to divide the data into different groups according to the data analysis requirement; data aggregation is a conversion calculation for a group of data, such as data statistics (mean, mode, sample number, standard deviation, etc.), sample ratio meeting a certain condition, etc. Through the grouping and aggregation operation, the data volume can be reduced, and the system efficiency can be improved; meanwhile, grouping and aggregation operations can refine the implicit engineering significance of data, and are key steps of data analysis.
The existing time sequence data aggregation method mainly aims at providing a software tool kit from the viewpoint of big data, and does not propose a depth algorithm suitable for engineering requirements. In addition, the main problems of the existing polymerization methods are: when the grouping and aggregation tools are applied, the super parameters (such as the size of a time window, the selection and combination modes of aggregation functions and the like) need to be manually debugged. On the one hand, the manual intervention mode can not meet the requirements of real-time processing and real-time transmission of intelligent ships. On the other hand, the manual debugging mode is too long in time consumption, consumes huge labor cost and does not accord with big data and intelligent development trend. Based on the above, the invention aims to provide a time sequence data aggregation method and device, which can avoid manual intervention by adaptively adjusting the size of a time window, improve the aggregation efficiency and the aggregation precision, meet the requirements of real-time processing and real-time transmission of intelligent ships, and also meet the development trend of big data and intelligence.
For the convenience of understanding the present embodiment, a detailed description will be first given of a time series data aggregation method disclosed in the embodiment of the present invention.
Example 1:
according to an embodiment of the present invention, there is provided an embodiment of a time series data aggregation method, it should be noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different from that herein.
Fig. 1 is a flowchart of a time-series data aggregation method according to an embodiment of the present invention, as shown in fig. 1, where the method includes the following steps:
step S101, a time sequence data set is obtained, and the time sequence data set is segmented according to a preset first time window, so that a plurality of first time sequence data subsets are obtained.
In an embodiment of the present application, the time series data set is a set of time series data recorded in time series by indicators of the intelligent ship, where the indicators include, but are not limited to: oil consumption, navigational speed, wind speed, displacement, etc. The size of the first time window may be predefined according to the index. Each first time sequence data subset obtained after segmentation corresponds to a first time window, and each first time window has continuity with the adjacent first time window. The sizes of the different first time windows may be the same or different, and thus embodiments of the present application are not limited in detail. The present application is described below taking the case that all the first time windows have the same size.
Step S104, the first time sequence data subset with the data type being the variable data segment is segmented according to the second time window, and a plurality of second time sequence data subsets are obtained.
In the embodiment of the invention, the data types include: a stable data segment and a variable data segment. The second time window is smaller than the first time window. The size of the second time window may be adaptively adjusted according to the first time window. The principle of the self-adaptive adjustment in the embodiment is as follows: and (3) reducing a first time window in which a first time sequence data subset with the data type being the variable data segment is located, namely dividing the first time window into a plurality of second time windows, and correspondingly segmenting the first time sequence data subset to obtain a plurality of second time sequence data subsets, wherein the number of the second time sequence data subsets is the same as that of the second time windows. In the step S104, by segmenting the first time sequence data subset with the data type being the variable data segment according to the second time window, more variable data can be reserved to accurately reflect the behavior of the intelligent ship.
Step S105, repeating the step of segmenting the second subset of time-series data with the variable data segment until the preset number of time-series data is reserved in the nth time window.
In the embodiment of the present invention, the data type of the second time sequence data subset determines the adjustment mode of the time window in which the second time sequence data subset is located. When the data type of the second time sequence data subset is a variable data segment, the time window in which the second time sequence data subset is positioned can be adjusted in a cutting mode to obtain a plurality of third time windows, and on one hand, the third time sequence data subset with the data type of the stable data segment can be selected, so that the purity of the variable data in the second time sequence data subset is ensured; on the other hand, accurate representation of intelligent vessel behavior can be achieved with time series data within a smaller time window. It should be noted that the preset number may be one, two or three, so that the definition of the preset number in the embodiment of the present invention is not specifically limited, and the value of the preset number may be set by an expert in the field.
Step S106, aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
In the embodiment of the invention, the analysis of the behavior of the intelligent ship cannot be realized by the preset number of time sequence data reserved in one N time window, so that all the continuous N time windows can be aggregated to obtain the target behavior data, and the target behavior data can reflect the behavior of the intelligent ship.
On the one hand, since the informatization of the ship starts late and the technology accumulation is insufficient, a high-efficiency aggregation method for ship operation data is not formed at present. On the other hand, marine computing equipment and storage equipment are limited in performance due to maritime regulations, and intelligent ship data transmission needs to rely on wireless and satellite communication, so that the data transmission efficiency is low, and therefore, a perfect and professional data aggregation method is urgently needed to be established for compressing data and reducing hardware requirements.
The time sequence data aggregation method provided by the embodiment of the invention comprises the following steps: firstly, acquiring a time sequence data set, and segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; then segmenting the first time sequence data subset with the data type being the variable data segment according to a second time window to obtain a plurality of second time sequence data subsets; wherein the second time window is smaller than the first time window; repeatedly executing the segmentation step on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; and finally, aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data. According to the embodiment of the invention, only the first time sequence data subset with the variable data segment as the data type is continuously segmented according to the second time window smaller than the first time window, the time window of the variable data segment can be adjusted in a self-adaptive manner, more time sequence data can be reserved in a manner of reducing the time window, so that the aggregated target behavior data can still reflect the behavior of the intelligent ship, and the aggregation efficiency and the aggregation precision are improved.
In an alternative embodiment, as shown in fig. 2, before the first time data subset of the data type being the variable data segment is segmented according to the second time window in step S103, the method further comprises:
step S102, evaluating the first time sequence data subset to obtain an evaluation result;
step S103, determining the data type of the first time sequence data subset according to the evaluation result.
In an embodiment of the present application, the evaluation criteria used to evaluate the first subset of time series may be custom set. Specific steps of step S102 are described in detail in steps S301 to S304 described below, and specific steps of step S103 are described in detail in steps S401 to S402 described below. The steps S102 to S103 are mainly used for determining the data type of the first time sequence data subset. Taking the first time sequence data subset as an example, the following analysis is performed, where the first time sequence data subset in the present application may include only the stable data segment, only the variable data segment, or both the stable data segment and the variable data segment. In general, the first time-series data subset containing only the change data segment and the first time-series data subset containing both the steady data segment and the change data segment are generally predetermined as change data segments by way of evaluation.
In an alternative embodiment, the first time data subset includes a plurality of time data, as shown in fig. 3, step S102, evaluates the first time data subset to obtain an evaluation result, including the following steps:
step S301, calculating the mean value and standard deviation of the first time sequence data subset;
step S302, calculating the difference value between each time sequence data in the first time sequence data subset and the average value;
Step S303, counting the number of time sequence data with the difference value larger than a first preset threshold value;
Step S304, determining an evaluation result based on the number of time series data and the standard deviation.
The embodiment of the invention can evaluate the evaluation result of the first time sequence data subset, and the subsequent evaluation of the time sequence data subset in a smaller time window also adopts the evaluation modes described in the steps S301 to S304. The embodiment of the invention judges whether the time sequence data subset is a variable data segment or not through the standard deviation of the time sequence data subset in the time window, and can further divide the variable data segment into small segments, and the self-adaptive adjustment of the time window can be realized through the technical means of continuous evaluation and segmentation, so that the purpose of the invention is to be able to self-adaptively distinguish the variable data segment from the stable data segment.
The time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence, namely one piece of data is collected at one time point, so that after the size of a time window is determined, a plurality of pieces of data existing in the time window can be determined. The embodiment of the invention calculates time sequence data in a time window, for example: average and standard deviation. When the embodiment of the invention uses the average value to represent the overall state of the time sequence data in the time window, the following preconditions are provided: the time series data within the time window does not change much (i.e., the standard deviation is relatively small), the average value can accurately represent the overall state within the time window. On the contrary, if the time sequence data in the time window has a large change (i.e. the standard deviation is relatively large), it is indicated that the time sequence data in the time period has a significant change, and at this time, it is inappropriate to replace the time sequence data in the time period with one piece of data, so for the variable data period, the time sequence data in the last time window can be divided into small segments by using a smaller time window until the time window only contains one piece of data.
In an alternative embodiment, the data types include a stable data segment and a variable data segment, as shown in fig. 4, step S103, determining the data type of the first time-ordered data subset according to the evaluation result includes:
Step S401, if the evaluation result is that the number of time series data is smaller than or equal to the preset number and the standard deviation is smaller than or equal to the second preset threshold, determining that the data type of the first time series data subset is a stable data segment;
In step S402, if the evaluation result is that the number of time series data is greater than the preset number or the standard deviation is greater than the second preset threshold, the data type of the first time series data subset is determined to be a variable data segment.
In the embodiment of the present invention, the evaluation results are the two results, and one is: the number of the time sequence data is smaller than or equal to the preset number and the standard deviation is smaller than or equal to a second preset threshold value, and the other is that: the number of time sequence data is larger than the preset number or the standard deviation is larger than a second preset threshold value. Different evaluation results correspond to different data types of the first time-ordered data subset. Similarly, different evaluation results also correspond to different data types of the nth time series data subset.
In an alternative embodiment, as shown in fig. 2, the method for aggregating time series data further includes the steps of:
step S107, aggregating the first time sequence data subset of which the data type is the stable data segment to obtain aggregated first target state data.
In the embodiment of the invention, if the data in one time window is unchanged, the first target state data taking time average and any piece of data in the time window are not different. However, in practical situations, the collected time sequence data contains noise, so that noise processing cannot be realized by taking one piece of data alone. Because the noise satisfies the normal distribution, the influence of the noise can be reduced by a time averaging mode in the aggregation of the embodiment of the invention.
In an alternative embodiment, as shown in fig. 2, before the first time data subset of the data type being the variant data segment is segmented according to the second time window in step S104, the method further comprises the steps of:
step S108, determining a preset segmentation formula;
Step S109, determining a second time window based on the first time window and a preset segmentation formula.
In the embodiment of the invention, the preset division formula is w=w 0/K, where initially, w is a second time window, w 0 is a first time window, and K is a constant. When the segmentation step is repeatedly performed, w is a third time window when w 0 is the second time window. By the formula, the size of the next time window can be determined in a self-adaptive manner, and the self-adaptive adjustment of the time window can be realized.
In an alternative embodiment, as shown in fig. 2, the method for aggregating time series data further includes the steps of: step S110, aggregating the second time sequence data subset with the data type of the stable data segment to obtain aggregated second target state data.
In the embodiment of the present invention, similar to the step S107 described above, the object of the embodiment of the present invention is to characterize the stable state of the intelligent ship using the second target state data.
Physically, the speed invariance may be referred to as a state, and the speed change may be referred to as a behavior. To facilitate understanding of states and behaviors, the present embodiment exemplifies states and behaviors: for example, the index is the navigational speed, and the time window is 60 minutes, so that the navigational speed is basically unchanged (the standard deviation is small) in one time window, the ship is in a stable navigational state, the intelligent ship keeps a certain speed to drive forwards, and the intelligent ship can be physically called a constant speed state. If the speed of the ship changes from 6 knots to 12 knots (the standard deviation is larger) in a time window, the intelligent ship is in acceleration motion in the time period, and a shipman is in control of the ship to accelerate, so that the ship is operated by the shipman and the ship is accelerated. For another example, if the index is displacement and the time window is 30 minutes, if the displacement in one time window is basically unchanged, it is indicated that the intelligent ship maintains the displacement unchanged and maintains the state. If the displacement increases from 1 to 2 ten thousand tons in a time window, this means that the crew is increasing the load, which is the behavior of the crew, and the load of the ship is significantly changed, which is the behavior of the ship.
With the above state, since the time series data is kept substantially unchanged, the target state data can be represented by an average value. With the above-described behavior, since the time series data is constantly changing, and it is not determined whether the time series data is getting fast or slow, nor is it determined whether the first half period is getting fast or the second half period is getting fast. It is not straightforward to aggregate the varying data segments, but rather a smaller time window is utilized to see if the time series data is still significantly varying within a smaller time period. Therefore, the embodiment of the invention can utilize smaller time windows to reserve more change data.
In summary, the time-series data aggregation method provided by the embodiment of the invention is a time window adaptive adjustment method, and has the following advantages: (1) The time sequence data aggregation method meets the special requirements of industrial big data on data aggregation: aggregation compression is carried out on the stable data segments, and a larger time window is adopted to aggregate the time sequence data of the segments into one piece of data during aggregation compression so as to represent the stable state of an engineering object, and meanwhile, the data quantity is reduced, and the data noise is reduced; (2) And the data are aggregated and compressed by adopting a smaller time window, so that the compressed data can still reflect the behaviors of engineering objects. Therefore, the embodiment of the invention can ensure that the target state data/target behavior data accurately reflect the state/behavior of the engineering object (the engineering object is taken as an intelligent ship as an example in the embodiment).
Example 2:
Fig. 5 is a flowchart of another method for aggregating time-series data according to an embodiment of the present invention. As shown in fig. 5, step 1, for a time-series data set [ T, Y ], the data is divided into segments according to a predefined continuous time window (i.e. the first time window described above). Where T represents a time series, and Y represents sampling data (i.e., the above-mentioned index) corresponding to the time series, and may be fuel consumption, navigational speed, wind speed, or the like.
Step 2, calculating a mean value T m of T aiming at the time sequence data subset of each section; calculate the mean Y m, standard deviation Y std, and the difference Δy (i)=|yi-Ym|(i) between each sample and Y m, i=1, 2,..n; counting the number of samples Y N of which the difference is higher than a first preset threshold Deltay max; the aggregate manner of the segment of time series data subset is then determined according to Y N and Y std: if Y N is not greater than the preset number Y N_max and Y std is not greater than the second preset threshold Y std_max, then the time series data of the segment is considered to remain stable within the time window, and can be aggregated into one sample, denoted by (T m,Ym). If Y N is greater than the preset number Y N_max or Y std is greater than the second preset threshold Y std_max, then the time series data of the segment is considered to have changed significantly within the time window. It should be noted that Δy max、YN_max、Ystd_max and K are super parameters, which are closely related to the ship engineering and should be determined by a field expert.
In step 2 above, the time series data set is as follows:
Y is one column of data, and [ T, Y ] is multiple rows and two columns, wherein the first column is time T, and the second column is oil consumption Y. Wherein each row represents a record of fuel consumption at a different point in time. Then, the time sequence data set is segmented, for example, 100 lines are taken as a segment, if Y is 1000 lines in total, the segmentation result is 10 segments of time sequence data subsets, and step 2 and step 3 can be called for each segment of data subset, so that the aggregation of the time sequence data can be realized. In the embodiment of the invention, Y can also represent other indexes such as navigational speed and the like.
Step 3, for the time series data with larger change in the time window, the embodiment further segments the time series data with smaller time window (for example, 1/2 or 1/3 of the size of the original window is taken), and the calculation process is repeated for a small segment of time series data after segmentation until only one sample is reserved in the nth time window.
In step 3 above, the size of the initial time window (i.e., the first time window) may be determined based on engineering and navigation experience and is represented by w 0. If w 0 is too small, the data compression effect is not ideal; if w 0 is too large, the compressed data continuity is insufficient, so the value of w 0 should be determined by experienced marine engineering designers or navigation domain specialists. For example, if the sampled data is at a speed of a ship, taking into account that the acceleration and deceleration process of the ship generally takes 10 to 60 minutes, then w 0 takes the value of 60 minutes to be appropriate; if the sampled data is rudder angle, and the change of rudder angle is considered to be frequent, then w 0 can take 10 seconds or 20 seconds.
Step 4, for the time series data which is kept stable in the time window, the embodiment aggregates all the time series data in the time window into one sample.
And step 5, still taking each time window as an operation unit, after finishing aggregation operation on the data in one time window, storing the compressed aggregation result into a ship-based database, and then sending the aggregation result to a shore-side database through a ship-side communication module.
In this embodiment, the time-series data aggregation method combines the data analysis method and knowledge in the field of ship engineering, so that the special requirements of industrial big data on data aggregation can be met: only the stable data segments are aggregated and compressed, the variable data segments are reserved, the data quantity is reduced, the data noise is reduced, and the data can be ensured to accurately reflect the state and the behavior of the engineering object. The data aggregation method is applied to the software and hardware support platform described below in the embodiment.
The embodiment also designs a software and hardware support platform for implementing the data aggregation method, which comprises the following steps: the system comprises data acquisition equipment, a cache, a processor, a data aggregation module, a ship base database, a ship-shore communication module, a shore base database, a terminal and a man-machine interaction module, wherein:
The data acquisition equipment and the cache are respectively used for acquiring and temporarily storing data in a time window; the processor is used for configuring the formula used for calculating in the step 2 so as to realize calculation; and the data aggregation module is used for compressing the stable data segment through aggregation operation on the time sequence data in each time window and compressing the variable data segment in a smaller time window. The ship-based database and the shore-based database are respectively used for storing the aggregated data, and the ship-to-shore communication module is responsible for synchronously transmitting the data stored by the ship-based database back to the shore-based database. The terminal and the man-machine interaction module are used for coacting to configure the super parameters involved in the algorithm, including w 0,Δymax、YN_max、Ystd_max and K. In summary, the software and hardware support platform is used for realizing data acquisition, aggregation and storage.
Example 3:
The embodiment of the present invention provides a time-series data aggregation device, which is mainly used for executing the time-series data aggregation method provided by the above content of embodiment 1, and the time-series data aggregation device provided by the embodiment of the present invention is specifically described below.
Fig. 6 is a schematic structural diagram of a time-series data aggregation device according to an embodiment of the present invention. As shown in fig. 6, the time-series data aggregation apparatus mainly includes: a segmentation unit 11, a first segmentation unit 12, a second segmentation unit 13, a first aggregation unit 14 are obtained, wherein:
The acquiring and segmenting unit 11 is configured to acquire a time-series data set, segment the time-series data set according to a preset first time window, and obtain a plurality of first time-series data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence;
a first segmentation unit 12, configured to segment a first time-sequence data subset with a data type being a variable data segment according to a second time window, so as to obtain a plurality of second time-sequence data subsets; wherein the second time window is smaller than the first time window;
a second segmentation unit 13, configured to repeatedly perform the segmentation step on a second time-series data subset of the data type being the variable data segment until a preset number of time-series data is reserved in an nth time window;
the first aggregation unit 14 is configured to aggregate the preset number of time-series data reserved in all the nth time windows, so as to obtain target behavior data after aggregation.
The time sequence data aggregation device provided by the embodiment of the invention comprises: firstly, acquiring a time sequence data set by using an acquisition segmentation unit 11, and segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; then, the first segmentation unit 12 is utilized to segment the first time sequence data subset with the data type being the variable data segment according to a second time window, so as to obtain a plurality of second time sequence data subsets; wherein the second time window is smaller than the first time window; repeatedly executing the segmentation step by using the second segmentation unit 13 on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window; finally, the first aggregation unit 14 is utilized to aggregate the preset number of time sequence data reserved in all the nth time windows, and the target behavior data after aggregation is obtained. According to the embodiment of the invention, only the first time sequence data subset with the variable data segment as the data type is continuously segmented according to the second time window smaller than the first time window, the time window of the variable data segment can be adjusted in a self-adaptive manner, more time sequence data can be reserved in a manner of reducing the time window, so that the aggregated target behavior data can still reflect the behavior of the intelligent ship, and the aggregation efficiency and the aggregation precision are improved.
Optionally, the apparatus further comprises an evaluation unit and a first determination unit, wherein:
the evaluation unit is used for evaluating the first time sequence data subset to obtain an evaluation result;
and the first determining unit is used for determining the data type of the first time sequence data subset according to the evaluation result.
Optionally, the first time sequence data subset comprises a plurality of time sequence data; the evaluation unit comprises a first calculation module, a second calculation module, a statistics module and a first determination module, wherein:
the first calculation module is used for calculating the mean value and standard deviation of the first time sequence data subset;
the second calculation module is used for calculating the difference value between each time sequence data in the first time sequence data subset and the average value;
The statistics module is used for counting the time sequence data number of which the difference value is larger than a first preset threshold value;
and the first determining module is used for determining an evaluation result based on the number of the time series data and the standard deviation.
Optionally, the data type includes a stable data segment and a variable data segment, and the determining unit includes a second determining module and a third determining module, where:
the second determining module is used for determining that the data type of the first time sequence data subset is a stable data segment if the evaluation result is that the number of time sequence data is smaller than or equal to the preset number and the standard deviation is smaller than or equal to a second preset threshold value;
And the third determining module is used for determining that the data type of the first time sequence data subset is a variable data segment if the evaluation result is that the number of time sequence data is larger than the preset number or the standard deviation is larger than the second preset threshold value.
Optionally, the apparatus further comprises a second polymerization unit;
and the second aggregation unit is used for aggregating the first time sequence data subset of which the data type is the stable data segment to obtain aggregated first target state data.
Optionally, the apparatus further comprises a second determining unit and a third determining unit, wherein:
A second determining unit configured to determine a preset division formula;
And a third determining unit, configured to determine a second time window based on the first time window and a preset segmentation formula.
Optionally, the time series data aggregation device further comprises a third aggregation unit, wherein:
And the third aggregation unit is used for aggregating the second time sequence data subset of which the data type is the stable data segment to obtain aggregated second target state data.
In an alternative embodiment, the present embodiment further provides an electronic device, including a memory, and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps of the method of the embodiment of the method.
In an alternative embodiment, the instant embodiment further provides a computer readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of the method embodiment described above.
In addition, in the description of embodiments of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In the description of the present embodiment, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of description and simplification of description, and do not indicate or imply that the apparatus or element to be referred to must have a specific direction, be configured and operated in the specific direction, and thus should not be construed as limiting the present embodiment. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided in this embodiment, it should be understood that the disclosed method and apparatus may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present embodiment may be essentially or a part contributing to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (8)

1. A method of time series data aggregation, comprising:
Acquiring a time sequence data set, segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets, and evaluating the first time sequence data subsets to obtain an evaluation result: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence; the first time sequence data subset comprises a plurality of time sequence data; the evaluating the first time sequence data subset to obtain an evaluation result includes: calculating the mean value and standard deviation of the first time sequence data subset; calculating a difference value between each of the time series data in the first time series data subset and the mean value; counting the number of time sequence data with the difference value larger than a first preset threshold value; determining the evaluation result based on the number of time series data and the standard deviation;
Segmenting a first time sequence data subset with the data type being a variable data segment according to a second time window to obtain a plurality of second time sequence data subsets, wherein if the evaluation result is that the time sequence data number is larger than a preset number or the standard deviation is larger than a second preset threshold value, determining that the data type of the first time sequence data subset is the variable data segment; wherein the second time window is smaller than the first time window;
Repeatedly executing the segmentation step on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window;
And aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
2. The method of time series data aggregation of claim 1, wherein the data type comprises a stable data segment, and wherein determining the data type of the first subset of time series data based on the evaluation result comprises:
and if the evaluation result is that the number of the time sequence data is smaller than or equal to a preset number and the standard deviation is smaller than or equal to a second preset threshold value, determining that the data type of the first time sequence data subset is a stable data segment.
3. The method of time series data aggregation according to claim 2, further comprising:
And aggregating the first time sequence data subset of which the data type is the stable data segment to obtain aggregated first target state data.
4. The method of time sequential data aggregation of claim 1, wherein prior to said segmenting the first subset of time sequential data of the data type being a variant data segment by a second time window, the method further comprises:
Determining a preset segmentation formula;
and determining the second time window based on the first time window and the preset segmentation formula.
5. The method of time series data aggregation as claimed in claim 1, further comprising:
And aggregating the second time sequence data subset with the data type of the stable data segment to obtain aggregated second target state data.
6. A time series data aggregation apparatus, comprising:
The acquisition segmentation unit is used for acquiring a time sequence data set, segmenting the time sequence data set according to a preset first time window to obtain a plurality of first time sequence data subsets, and evaluating the first time sequence data subsets to obtain an evaluation result: the time sequence data set is a set of time sequence data recorded by indexes of the intelligent ship in time sequence, and the first time sequence data subset comprises a plurality of time sequence data; the evaluating the first time sequence data subset to obtain an evaluation result includes: calculating the mean value and standard deviation of the first time sequence data subset; calculating a difference value between each of the time series data in the first time series data subset and the mean value; counting the number of time sequence data with the difference value larger than a first preset threshold value; determining the evaluation result based on the number of time series data and the standard deviation;
The first segmentation unit is used for segmenting a first time sequence data subset with the data type being a variable data segment according to a second time window to obtain a plurality of second time sequence data subsets, wherein if the evaluation result is that the number of the time sequence data is larger than a preset number or the standard deviation is larger than a second preset threshold value, the data type of the first time sequence data subset is determined to be the variable data segment; wherein the second time window is smaller than the first time window;
the second segmentation unit is used for repeatedly executing segmentation steps on a second time sequence data subset of which the data type is the variable data segment until a preset number of time sequence data are reserved in an N time window;
the first aggregation unit is used for aggregating the preset number of time sequence data reserved in all the N time windows to obtain the aggregated target behavior data.
7. An electronic device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, wherein the processor, when executing the computer program, implements the method of any of claims 1 to 5.
8. A computer readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any one of claims 1 to 5.
CN202011128219.1A 2020-10-20 2020-10-20 Time sequence data aggregation method and device Active CN112214533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011128219.1A CN112214533B (en) 2020-10-20 2020-10-20 Time sequence data aggregation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011128219.1A CN112214533B (en) 2020-10-20 2020-10-20 Time sequence data aggregation method and device

Publications (2)

Publication Number Publication Date
CN112214533A CN112214533A (en) 2021-01-12
CN112214533B true CN112214533B (en) 2024-06-14

Family

ID=74056098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011128219.1A Active CN112214533B (en) 2020-10-20 2020-10-20 Time sequence data aggregation method and device

Country Status (1)

Country Link
CN (1) CN112214533B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935208B (en) * 2022-12-09 2024-02-02 国网湖北省电力有限公司信息通信公司 Online segmentation method, equipment and medium for multi-element time series operation data of data center equipment
CN117874315B (en) * 2024-03-13 2024-05-14 普益智慧云科技(成都)有限公司 User demand analysis display method, system, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664603A (en) * 2018-05-09 2018-10-16 北京奇艺世纪科技有限公司 A kind of method and device of abnormal polymerization value that repairing time series data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216959B2 (en) * 2016-08-01 2019-02-26 Mitsubishi Electric Research Laboratories, Inc Method and systems using privacy-preserving analytics for aggregate data
CN111291824B (en) * 2020-02-24 2024-03-22 网易(杭州)网络有限公司 Time series processing method, device, electronic equipment and computer readable medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664603A (en) * 2018-05-09 2018-10-16 北京奇艺世纪科技有限公司 A kind of method and device of abnormal polymerization value that repairing time series data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于滑动窗口及局部特征的时间序列符号化方法;谭宏强;牛强;;计算机应用研究(第03期);全文 *

Also Published As

Publication number Publication date
CN112214533A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112214533B (en) Time sequence data aggregation method and device
JP6759673B2 (en) Collision risk calculation program, collision risk calculation method and collision risk calculation device
CN112132346A (en) Ship navigation track prediction method based on ship type
CN109977523B (en) Online compression method and device for mass ship AIS trajectory data
Harvey et al. Simple, robust, and powerful tests of the breaking trend hypothesis
EP3262530B1 (en) Proactive emerging threat detection
CN111488985A (en) Deep neural network model compression training method, device, equipment and medium
CN110363094A (en) A kind of ship abnormal behaviour recognition methods, device and terminal device
EP3380903A1 (en) Marine vessel performance monitoring
DE112018007751T5 (en) Automated optimization of computer operating systems
US20210214057A1 (en) Evaluation method of ship propulsive performance in actual seas, evaluation program of ship propulsive performance in actual seas and evaluation system of ship propulsive performance in actual seas
CN116842474A (en) Ship motion extremely short-term forecasting method and system based on TFT model
CN115359386A (en) Safe fishing method, system and medium for oceanic fishery based on Internet of things
CN108876009A (en) The determination of coal mining accident prediction model and monitoring method, storage medium and electronic equipment
CN115578546A (en) Ship attitude prediction method, equipment, device and system
CN116204518A (en) Ship track analysis method based on TSH compression and DBSCAN clustering
DE102023103798A1 (en) AUTOMATIC FAULT PREDICTION IN DATA CENTERS
Park et al. Estimation model of energy efficiency operational indicator using public data based on big data technology
US7248741B2 (en) Video sequences correlation and static analysis and scene changing forecasting in motion estimation
CN114663964A (en) Ship remote driving behavior state monitoring and early warning method and system and storage medium
CN108595469A (en) A kind of semantic-based agricultural machinery monitor video image section band Transmission system
CN113887678A (en) Ship track generation method and system based on massive image data
CN116703001B (en) Oil consumption prediction method and system of intelligent ship, intelligent ship and medium
CN116432082B (en) Ship fault feature analysis method, system and storage medium
CN112612282B (en) Inland river navigation control method and system based on ship host optimization and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant