CN115994137A

CN115994137A - Data management method based on application service system of Internet of things

Info

Publication number: CN115994137A
Application number: CN202310287966.7A
Authority: CN
Inventors: 秦少卿; 张梓韦
Original assignee: Wuxi Hongding Software Technology Co ltd
Current assignee: Wuxi Hongding Software Technology Co ltd
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-04-21
Anticipated expiration: 2043-03-23
Also published as: CN115994137B

Abstract

The invention relates to the field of data processing, in particular to a data management method based on an application service system of the Internet of things, which comprises the following steps: obtaining a distance set according to the data sequence to be analyzed and the standard data sequence, and further obtaining a time period to be analyzed; obtaining a first temporary sequence and a second temporary sequence according to a time period to be analyzed; obtaining initial change trend difference according to the path length difference and trend change times of the first temporary sequence and the second temporary sequence, and obtaining final change trend difference by combining the time difference distance; determining a correction coefficient, and further obtaining a similarity degree; and obtaining a clearing coefficient according to the storage days, and further clearing the data sequence to be analyzed. The method and the system can enable the application service system of the Internet of things to store important abnormal data for a long time and clear redundant invalid data timely and accurately.

Description

Data management method based on application service system of Internet of things

Technical Field

The invention relates to the field of data processing, in particular to a data management method based on an application service system of the Internet of things.

Background

The internet of things combines different information sensing devices with the internet through recognition sensing and network communication to form a huge network capable of connecting people, machines and objects in various scenes so as to realize the functions of intelligent recognition, positioning, tracking, supervision and the like. Along with the increasing wide application of the internet of things technology in the fields of green agriculture, industrial monitoring, environmental monitoring and the like, the number of different types of data acquisition equipment such as sensors, cameras and the like is rapidly increased, so that the data volume stored in a database is also rapidly increased.

To ensure that the storage space of the database always maintains a certain available capacity, the database automatically clears the expiration data according to the number of days of reservation. However, in many situations, the data collected by the data collection device daily has large repeatability, such as workshop temperature, device operation data and the like, and only when abnormality occurs, the data collected by the data collection device daily can change greatly, and the abnormal data are often important data which need to be analyzed. If the data is cleared only according to the retention days, important data which is easy to cause abnormality is cleared, and normal data which repeatedly appear every day is saved in a large amount. Therefore, accurate removal of redundant invalid data is one of the keys of data management of the application service system of the internet of things.

Disclosure of Invention

The invention provides a data management method based on an application service system of the Internet of things, which aims to solve the existing problems.

The data management method based on the application service system of the Internet of things adopts the following technical scheme:

the embodiment of the invention provides a data management method based on an application service system of the Internet of things, which comprises the following steps:

acquiring a data sequence to be analyzed and a standard data sequence; obtaining an optimal matching path of a data sequence to be analyzed and a standard data sequence, and obtaining a distance set according to the optimal matching path; obtaining a distance change curve according to the distance set, and obtaining a time period to be analyzed according to each data point on the distance change curve; respectively calling a sequence formed by corresponding data in a data sequence to be analyzed and a standard data sequence in a time period to be analyzed as a first temporary sequence and a second temporary sequence;

respectively obtaining vector sets and reference vectors of the first temporary sequence and the second temporary sequence according to each data point corresponding to the first temporary sequence and the second temporary sequence; obtaining a first path length and a second path length according to clustering results of each vector in the vector sets of the first temporary sequence and the second temporary sequence; obtaining the side of the first temporary sequence and the side of the second temporary sequence according to the reference vectors of the first temporary sequence and the second temporary sequence; obtaining a path length difference corresponding to the time period to be analyzed according to the first path length, the second path length, the side of the first temporary sequence and the side of the second temporary sequence; obtaining a time difference distance and trend change times of the first temporary sequence and the second temporary sequence according to clustering results of each vector and adjacent vectors in a vector set corresponding to the first temporary sequence and the second temporary sequence; obtaining initial change trend difference according to the path length difference and trend change times of the first temporary sequence and the second temporary sequence;

obtaining a final variation trend difference corresponding to the time period to be analyzed according to the time difference distance and the initial variation trend difference; obtaining correction coefficients according to the final variation trend difference, the time length, the average value of all data in the first temporary sequence and the average value of all data in the second temporary sequence of all the time periods to be analyzed; obtaining the similarity degree according to the correction coefficient and the distance set;

obtaining a clearance coefficient according to the similarity and the storage days; and clearing the data sequence to be analyzed according to the clearing coefficient.

Preferably, the step of obtaining the time period to be analyzed according to each data point on the distance change curve includes:

and acquiring all peak points of the distance change curve, taking an average value of all data in the distance set as a distance threshold value, and taking a peak region where the peak point with the ordinate larger than the distance threshold value is located as a time period to be analyzed.

Preferably, the step of obtaining the peak area includes: for any one peak point, two valley points closest to the peak point are taken as two end points of a peak region, and all data points between the two end points form the peak region.

Preferably, the method for obtaining the vector sets of the first temporary sequence and the second temporary sequence and the reference vector comprises the following steps:

each data point in the first temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the first temporary sequence form a vector set of the first temporary sequence; taking a vector formed by taking a first data point in the first temporary sequence as a vector starting point and a last data point as a vector end point as a reference vector of the first temporary sequence;

each data point in the second temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the second temporary sequence form a vector set of the second temporary sequence; the vector formed by taking the first data point in the second temporary sequence as a vector starting point and the last data point as a vector end point is taken as a reference vector of the second temporary sequence.

Preferably, the method for acquiring the first path length and the second path length includes:

taking the sum of all vectors contained in each clustering result in the first temporary sequence as an accumulated vector corresponding to each clustering result, and taking the included angle between the accumulated vectors corresponding to each clustering result as a first path length;

and taking the sum of all vectors contained in each clustering result in the second temporary sequence as an accumulation vector corresponding to each clustering result, and taking the included angle between the accumulation vectors corresponding to each clustering result as a second path length.

Preferably, the acquiring method of the belonging side of the first temporary sequence and the belonging side of the second temporary sequence is as follows:

taking a straight line where a reference vector of the first temporary sequence is located as a boundary line, counting the number of data points in an area above and an area below the boundary line of each data point in the first temporary sequence, and taking the area where the maximum number of data points is located as the side of the first temporary sequence;

and counting the number of data points in the area above and the area below the boundary line of each data point in the second temporary sequence by taking the straight line where the reference vector of the second temporary sequence is located as the boundary line, and taking the area where the maximum number of data points is located as the side of the second temporary sequence.

Preferably, the method for obtaining the path length difference comprises the following steps:

when the belonging side of the first temporary sequence is the same as the belonging side of the second temporary sequence, the path length difference is the absolute value of the difference between the first path length and the second path length; otherwise, the path length difference is the result of the addition between the first path length and the second path length.

Preferably, the obtaining expression of the initial variation trend difference is:

wherein ,

is the initial variation trend difference; d represents a path length difference;

and

the trend change times of the first temporary sequence and the second temporary sequence are respectively shown.

Preferably, the expression for obtaining the correction coefficient is:

wherein P is a correction coefficient;

indicating the final variation trend difference of the jth time period to be analyzed,

representing the time length of the j-th time period to be analyzed;

and

respectively representing the average value of all data in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, wherein m is the number of the time periods to be analyzed,

is an exponential function based on natural constants.

Preferably, the obtaining expression of the similarity degree is:

wherein R is the similarity degree of the data sequence to be analyzed and the standard data sequence, P is the correction coefficient,

for the i-th distance value in the distance set, n is the number of distance values contained in the distance set,

is an exponential function based on natural constants.

Preferably, the clearance coefficient calculation formula is:

wherein T represents a clearing coefficient, R represents the similarity degree between the data sequence to be analyzed and the standard data sequence, the smaller the value is, the more important the value is, long-term storage is needed, and K represents the reserved days of the data sequence to be analyzed;

preferably, the clearing of the data sequence to be analyzed according to the clearing coefficient comprises:

and clearing the data sequence to be analyzed, wherein the clearing coefficient is larger than a clearing threshold value.

The beneficial effects of the invention are as follows: according to the method, the standard data sequence and the data sequence to be analyzed are subjected to continuous analysis, the DTW algorithm is used for obtaining the optimal matching path of the data sequence to be analyzed and the standard data sequence according to the optimal matching path, so that the standard data sequence and the dissimilar time period in the data sequence to be analyzed are obtained, namely the time period to be analyzed, the DTW algorithm is only subjected to distortion matching according to Euclidean distances in the time periods to be analyzed, the DTW distances between the data sequence to be analyzed and the standard data sequence in the time periods to be analyzed are forcibly reduced, the similarity degree of the standard data sequence and the data sequence to be analyzed is higher than the actual similarity degree, and therefore the database can be used for carrying out long-term storage on important data according to the same characteristics of the standard data sequence and the data sequence to be analyzed, analyzing the actual similarity degree of the standard data sequence and the data sequence to be analyzed, and obtaining the actual similarity degree of the data sequence to be analyzed according to the actual data value difference, and the correction coefficient of the DTW distance is used for accurately obtaining the similarity degree of the standard data sequence and the data sequence to be analyzed, and finally judging whether the data sequence to be analyzed needs to be cleared or not is required to be cleared according to the storage days.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of steps of a data management method based on an application service system of the internet of things.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of a data management method based on an application service system of the internet of things according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the data management method based on the application service system of the internet of things provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a data management method based on an application service system of the internet of things according to an embodiment of the present invention is shown, where the method includes the following steps:

step S001: and obtaining a standard data sequence and a data sequence to be analyzed.

In the application field of the internet of things technology, data acquired by data acquisition equipment every day repeatedly appear, such as temperature, humidity, plant equipment operation data and the like in an agricultural greenhouse, and the plant equipment operation data is taken as an example, under the condition of no faults, the equipment operation data in two days are similar in change, if faults occur, the equipment operation data in the day can be greatly changed, and the abnormal data are often important data which need to be analyzed in the embodiment.

If the data is cleared only according to the retention days, important data which is easy to cause abnormality is cleared, and normal data which repeatedly appear every day is saved in a large amount. Therefore, according to the abnormal degree of the daily data collection and the number of reserved days, the embodiment sets the clearing threshold value, stores important abnormal data for a long time, and timely and accurately clears redundant invalid data.

The data acquired from the Internet of things equipment are streaming data, namely time sequence data, which are freely and continuously generated in a mode; when judging whether the time sequence data generated by the factory equipment in a certain date needs to be cleared or not, taking the date as a target date, firstly acquiring the time sequence data generated in the target date, and referring the time sequence data as a data sequence to be analyzed; and then manually selecting time sequence data generated in one day of normal operation of the equipment, and using the obtained time sequence data as a standard data sequence to serve as reference data of the abnormality degree of the data sequence to be analyzed, namely judging the abnormality degree of the data sequence to be analyzed by analyzing the similarity between the data sequence to be analyzed and the standard data sequence.

Step S002: according to the vector sets and the reference vectors respectively corresponding to the first temporary sequence and the second temporary sequence, a first path length, a second path length, a side of the first temporary sequence and a side of the second temporary sequence are obtained, and then a path length difference and a time difference distance are obtained; and combining the trend change times of the first temporary sequence and the second temporary sequence to obtain initial change trend difference.

1. And obtaining the corresponding relation between the data sequence to be analyzed and the standard data sequence by using a DTW algorithm, and obtaining the standard data sequence and the time period to be analyzed in the data sequence to be analyzed by threshold segmentation.

When the factory equipment in the target date fails, the obtained data sequence to be analyzed and the standard data sequence have larger difference, the DTW algorithm requires that all data points are matched by utilizing Euclidean distance, and the matching should have continuity, so that the matching distortion of dissimilar parts in time sequence data is serious, and the deviation is necessarily present in the process of calculating the similarity between the dissimilar parts. According to the embodiment, the to-be-analyzed time periods in the to-be-analyzed data sequence and the standard data sequence are extracted, the to-be-analyzed time periods are used as the to-be-analyzed time periods of the embodiment, the features in the to-be-analyzed time periods are quantitatively analyzed, the DTW distance is adjusted according to the quantitative result, and therefore the more accurate similarity degree of the to-be-analyzed time periods is obtained. The specific process is as follows:

firstly, matching a standard data sequence and a data sequence to be analyzed by using a DTW algorithm, wherein the DTW algorithm is a known technology, and the principle is that a distance matrix is constructed according to the distance between each data in one sequence and each data in the other sequence, and then the minimum distance path from the starting point of the two sequences to the end point of the two sequences is searched in the distance matrix, and the standard data sequence and the data sequence to be analyzed are correspondingThe minimum distance path of (a) is called as the best matching path, and the distance set A is formed by recording the corresponding Euclidean distances on the best matching path

, wherein

The i-th distance value on the best matching path is used, and n is the number of the distance values contained in the distance set;

then, taking the serial number of each distance value in the distance set A as the horizontal axis and the size of each distance value as the vertical axis, and performing curve fitting on each distance value in the distance set A by using a least square method to obtain a distance change curve, wherein the coordinate of the ith data point on the distance change curve can be expressed as

The method comprises the steps of carrying out a first treatment on the surface of the Because the corresponding area of the distance change curve approaches to a straight line in the time period when the two time sequence data changes are similar, and the corresponding area of the distance change curve appears as a wave crest in the time period when the two time sequence data changes are dissimilar; therefore, each peak point and trough point on the distance change curve are obtained, the average value of all data in the distance set A is taken as a distance threshold, and the peak area where the peak point with the ordinate of the peak point larger than the distance threshold is located is taken as a time period in which the standard data sequence is dissimilar to the change of the data sequence to be analyzed, namely, each time period to be analyzed in the embodiment.

The wave crest region acquisition process comprises the following steps: for any one peak point, taking two valley points closest to the peak point as two end points of a peak region, all data points between the two end points form a peak region, and processing each peak point to obtain each peak region.

The wave crest regions where the wave crest points with the ordinate larger than the distance threshold value are located form each time period to be analyzed of the embodiment at the corresponding data acquisition time in the data sequence to be analyzed; the corresponding data in the data sequence to be analyzed and the standard data sequence in each time period to be analyzed respectively form each first temporary sequence and each second temporary sequence in the embodiment.

2. And carrying out quantitative analysis on the characteristics in the time period to be analyzed.

Because the DTW algorithm performs distortion matching in the standard data sequence and dissimilar time periods in the data sequence to be analyzed to forcibly reduce the DTW distance of the region so that the final similarity result is higher, the embodiment obtains the similarity degree corresponding to each time period to be analyzed according to the difference between the change trends of the first temporary sequence and the second temporary sequence and the difference between the data values in each time period to be analyzed, and takes the similarity degree as the correction coefficient of the DTW distance.

For the first temporary sequence and the second temporary sequence corresponding to any one time period to be analyzed, the Euclidean distance between the starting point and the ending point of the standard data sequence in the time period to be analyzed and the starting point and the ending point of the data sequence to be analyzed in the time period to be analyzed is smaller, namely, the two pairs of data points are in a coincident or nearly coincident state, at the moment, in the time period to be analyzed, the vector summation of all adjacent data points corresponding to the standard data sequence is consistent with the vector summation of all adjacent data points corresponding to the data sequence to be analyzed, so that the difference characteristic of the standard data sequence and the data sequence to be analyzed can be analyzed through the consistent characteristic;

firstly, acquiring vectors of adjacent data points of a data sequence to be analyzed in the time period to be analyzed, namely firstly acquiring vectors formed by the adjacent data points in a first temporary sequence corresponding to the time period to be analyzed, for example, for the kth data point of the first temporary sequence, the data point adjacent to the data point is the kth+1st data point, and the vector corresponding to the kth data point is a vector formed by taking the kth data point as a vector starting point and the kth+1st data point as a vector end point; thereby obtaining a vector set corresponding to the data sequence to be analyzed, namely a vector set corresponding to the first temporary sequence, in the time period to be analyzed

The method comprises the steps of carrying out a first treatment on the surface of the Taking the first data point in the first temporary sequence as a vector start point, the last data point as a vector end point, and taking the obtained vector as a reference vector of the first temporary sequence

The method comprises the steps of carrying out a first treatment on the surface of the At this time, the vector set

The sum of all vectors in (a) is equal to

And vector set

Can be divided into two classes: away from the reference vector

And approach reference vector

The method comprises the steps of carrying out a first treatment on the surface of the Therefore, the embodiment uses a K-means clustering algorithm to cluster each vector in the vector set, and sets the number of clustering results of the K-means clustering algorithm to be 2, so that two clustering results are obtained; then respectively calculating the sum of all vectors contained in each clustering result, and respectively marking the sum of the vectors corresponding to the two clustering results as accumulated vectors

And

when (when)

And

the larger the included angle, the more the data point with larger deviation from the reference vector exists in the process from the first data point to the last data point in the first temporary sequence, and the longer the distance from the first data point to the last data point is correspondingly, therefore

And

the included angle of (a) may also be indicative of the path length traveled by the data points in the first temporary sequence, so this embodiment will

And

the included angle between them is recorded as first path length

. Similarly, a second path length is obtained according to a second temporary sequence corresponding to the standard data sequence in the time period to be analyzed

。

Then taking the straight line where the reference vector of the first temporary sequence is located as a boundary line to acquire the number of data points in the area below the boundary line in the first temporary sequence

Number of data points in the area above the demarcation line

Will be

And (3) with

The region corresponding to the maximum value is taken as the belonging side of the first temporary sequence

The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the affiliated side of the second temporary sequence is obtained according to the reference vector of the second temporary sequence

. Thereby obtaining the standard numberThe path length difference D of the data sequence to be analyzed corresponding to the data points in the time period to be analyzed is as follows:

wherein ,

and

the first path length and the second path length are respectively;

and

the side to which the first temporary sequence belongs and the side to which the second temporary sequence belongs are represented, respectively.

When (when)

In the time period to be analyzed, the data points in the first temporary sequence and the second temporary sequence are shown to be on the same side of the corresponding dividing line, so that the overall trend of the data point changes in the first temporary sequence and the second temporary sequence is the same

Indicating the difference in path length traversed; while when

In the time period to be analyzed, the different sides of most data points in the first temporary sequence and the second temporary sequence relative to the corresponding dividing line indicate that the overall trend of the data point changes in the first temporary sequence and the second temporary sequence is opposite, so that

Indicating the difference in path length traversed.

Knowing the first temporary sequenceCorresponding vector set

The vector in the vector set is divided into two clustering results, and the vector set

The number of vectors which are not the same clustering result as the adjacent vectors is called the trend change times of the first temporary sequence

Representing the turning times of all data points in the first temporary sequence corresponding to the time period to be analyzed, for representing the variation trend of the data points in the first temporary sequence, such as for vectors

And

when (when)

Adjacent vector thereto

If the clustering results do not belong to the same clustering result, the clustering results are considered

Is a vector whose trend changes, otherwise, it is considered that

Is not a vector whose trend changes. Similarly, the trend change times of the second temporary sequence corresponding to the time period to be analyzed are as follows

The method comprises the steps of representing the variation trend of data points in a second temporary sequence; the initial trend difference F between the first temporary sequence corresponding to the data sequence to be analyzed and the second temporary sequence corresponding to the standard data sequence in the period to be analyzed may be expressed as:

wherein D represents a path length difference between the first temporary sequence and the second temporary sequence;

and

D represents the difference between the overall trend of change of the data points in the first temporary sequence and the second temporary sequence,

the difference between the change trend change times of the data points in the first temporary sequence and the second temporary sequence is represented, the larger the values of the first temporary sequence and the second temporary sequence are, the more obvious the difference between the change trends of the data points in the standard data sequence and the data sequence to be analyzed in the time period to be analyzed is, so the difference between the change trends of the data points in the standard data sequence and the data sequence to be analyzed in the time period to be analyzed is represented by integrating the difference between the path length of the first temporary sequence and the second temporary sequence and the difference between the change trend change times.

Step S003: obtaining a final variation trend difference corresponding to the time period to be analyzed according to the time difference distance and the initial variation trend difference; and obtaining correction coefficients according to the final variation trend difference, the time length, the average value of all data in the first temporary sequence and the average value of all data in the second temporary sequence of all the time periods to be analyzed, and further obtaining the similarity degree.

Because the initial change trend difference F is obtained only by considering the times of the overall change trend and the local change trend change of the data in the first temporary sequence and the second temporary sequence corresponding to the time period to be analyzed, and the time information corresponding to the standard data sequence and the trend change in the data sequence to be analyzed is not considered, the initial change trend difference F of the first temporary sequence and the second temporary sequence still has errors, and the F needs to be corrected according to the time change of the standard data sequence and each local change trend change in the data sequence to be analyzed.

Acquiring the acquisition time of each data point with trend change in the first temporary sequence and the second temporary sequence corresponding to the time period to be analyzed respectively to obtain a first trend change sequence and a second trend change sequence, and acquiring the time difference distance between the first trend change sequence and the second trend change sequence by using a DTW algorithm, wherein the time difference distance is marked as G in the embodiment, so as to obtain the final change trend difference between the first temporary sequence and the second temporary sequence

The method comprises the following steps:

wherein F represents an initial change trend difference between the first temporary sequence and the second temporary sequence, and G represents a time difference distance between the first temporary sequence and the second temporary sequence; the larger the value of G, the larger the time difference between the data points corresponding to the trend change in the first temporary sequence and the second temporary sequence, so the embodiment uses the time difference distance G as the correction coefficient of the initial trend change difference F to obtain the real change trend difference between the first temporary sequence and the second temporary sequence, that is, the final trend change difference between the first temporary sequence and the second temporary sequence.

And processing the standard data sequence and all the time periods to be analyzed in the data sequence to be analyzed to obtain the final variation trend difference corresponding to all the time periods to be analyzed.

Because the DTW algorithm performs distortion matching in the dissimilar time periods in the standard data sequence and the data sequence to be analyzed, the DTW distance of the area is forcibly reduced, and the final similarity result is higher. The present embodiment therefore calculates the degree of dissimilarity between the standard data sequence and each of the time periods to be analyzed in the data sequence to be analyzed, and uses the obtained degree of dissimilarity as a correction coefficient for the DTW distance. The correction factor P for the distance of the standard data sequence and the data sequence DTW to be analyzed is:

wherein P is a correction coefficient;

representing the time length of the j-th time period to be analyzed;

and

is an exponential function based on natural constants.

The final variation trend difference between the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed is represented, namely the final variation trend difference corresponding to the j-th time period to be analyzed;

representing the difference between the data values in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, namely the difference of the data values corresponding to the j-th time period to be analyzed; the product of the two is expressed as the degree of dissimilarity between the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, and the larger the value is, the lower the reliability of the DTW distance in the time period to be analyzed is, and the lower the DTW distance is, namely the distortion is causedThe greater the matching degree, the greater the correction coefficient is required, and in this embodiment, the correction coefficient of the DTW distance of the standard data sequence and the data sequence to be analyzed is obtained according to the final variation trend differences and the corresponding data value differences corresponding to all the time periods to be analyzed. According to the correction coefficient and each distance value on the optimal matching path, the similarity degree R of the standard data sequence and the data sequence to be analyzed is as follows:

wherein R is the similarity between the data sequence to be analyzed and the standard data sequence, P is the correction coefficient of the DTW distance between the standard data sequence and the data sequence to be analyzed,

is an exponential function based on natural constants;

the DTW distance is the DTW distance between the standard data sequence and the data sequence to be analyzed, and the smaller the DTW distance is, the greater the similarity between the standard data sequence and the data sequence to be analyzed is; when distortion matching exists, the DTW distances between the data sequence to be analyzed and the standard data sequence in the first temporary sequence and the second temporary sequence corresponding to the data sequence to be analyzed in the time period to be analyzed are smaller, namely the degree of similarity is increased due to the distortion matching, and if the correction coefficient is larger, the degree of similarity is more unreliable, the degree of correction is also greater, and the degree of similarity is reduced; that is, the magnitude of the correction coefficient and the similarity degree are in a negative correlation, and the DTW distance and the similarity degree are also in a negative correlation.

Step S004: obtaining a clearance coefficient according to the similarity and the storage days; and clearing the data sequence to be analyzed according to the clearing coefficient.

When the similarity between the data sequence to be analyzed and the standard data sequence is smaller, the probability that the data sequence to be analyzed is abnormal is larger, the data sequence to be analyzed is more important, long-term storage is needed, the data with longer retention time can be automatically cleared by considering the existing database management mode, and the clearing coefficient T of the data sequence to be analyzed is as follows:

wherein T represents a clearing coefficient, R is the similarity degree of the data sequence to be analyzed and the standard data sequence, the smaller the value is, the more important, long-term storage is needed, K is the number of days the data sequence to be analyzed is reserved, and the smaller the value is, the smaller the probability of clearing is needed.

In this embodiment, the clearing threshold is set to 0.9, and the practitioner can set according to the actual requirement, so that when T >0.9, the data sequence to be analyzed is cleared, otherwise, the data sequence to be analyzed is reserved. And similarly, judging whether the time sequence data corresponding to other dates need to be cleared or not, thereby realizing the efficient data management of the application service system of the Internet of things.

According to the embodiment, firstly, an optimal matching path between a data sequence to be analyzed and a standard data sequence is obtained according to a DTW algorithm, a distance set is obtained according to the optimal matching path, so that a standard data sequence and an dissimilar time period in the data sequence to be analyzed are obtained, namely, the time period to be analyzed, but the DTW algorithm only carries out distortion matching according to Euclidean distances in the time periods to be analyzed, so that the DTW distances between the data sequence to be analyzed and the standard data sequence in the time periods to be analyzed are forcedly reduced, the similarity degree of the standard data sequence and the data sequence to be analyzed is higher than the actual similarity degree, and therefore, according to the same characteristics of the standard data sequence and the data sequence to be analyzed in the time period to be analyzed, the trend change difference and the actual data value difference between the standard data sequence to be analyzed are analyzed, the actual similarity degree in the time period to be analyzed is obtained, and the correction coefficient of the DTW distances is used, so that the similarity degree of the standard data sequence and the data sequence to be analyzed is accurately obtained, finally, the data sequence to be analyzed is combined with storage to judge whether the data sequence to be cleaned, so that the important data can be cleaned, and the redundant data can be stored in a long time, and invalid data can be timely.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The data management method based on the application service system of the Internet of things is characterized by comprising the following steps of:

obtaining a clearance coefficient according to the similarity and the storage days; clearing the data sequence to be analyzed according to the clearing coefficient;

the method for acquiring the path length difference comprises the following steps:

when the belonging side of the first temporary sequence is the same as the belonging side of the second temporary sequence, the path length difference is the absolute value of the difference between the first path length and the second path length; otherwise, the path length difference is the result of adding the first path length and the second path length;

the acquisition expression of the initial variation trend difference is as follows:

wherein ,

is the initial variation trend difference; d represents a path length difference; />

and />

The trend change times of the first temporary sequence and the second temporary sequence are respectively represented;

the acquisition expression of the correction coefficient is as follows:

wherein P is a correction coefficient;

represents the final variation trend difference of the j-th time period to be analyzed,/>

Representing the time length of the j-th time period to be analyzed; />

and />

Respectively representing the average value of all data in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, wherein m is the number of the time periods to be analyzed, and +.>

Is an exponential function based on natural constants;

the obtaining expression of the similarity degree is as follows:

wherein R is the similarity degree of the data sequence to be analyzed and the standard data sequence, P represents a correction coefficient,

for the i-th distance value in the distance set, n is the number of distance values contained in the distance set,/->

Is an exponential function based on natural constants;

the clearance coefficient calculation formula is:

the method for clearing the data sequence to be analyzed according to the clearing coefficient comprises the following steps:

2. The method for data management based on an application service system of the internet of things according to claim 1, wherein the step of obtaining the time period to be analyzed according to each data point on the distance change curve comprises:

3. The method for data management based on the application service system of the internet of things according to claim 2, wherein the step of obtaining the peak area comprises: for any one peak point, two valley points closest to the peak point are taken as two end points of a peak region, and all data points between the two end points form the peak region.

4. The data management method based on the internet of things application service system according to claim 1, wherein the vector sets of the first temporary sequence and the second temporary sequence and the reference vector acquiring method are as follows:

5. The data management method based on the application service system of the internet of things according to claim 1, wherein the acquiring method of the first path length and the second path length is as follows:

6. The data management method based on the application service system of the internet of things according to claim 1, wherein the acquiring method of the belonging side of the first temporary sequence and the belonging side of the second temporary sequence is: