CN115994137A - Data management method based on application service system of Internet of things - Google Patents

Data management method based on application service system of Internet of things Download PDF

Info

Publication number
CN115994137A
CN115994137A CN202310287966.7A CN202310287966A CN115994137A CN 115994137 A CN115994137 A CN 115994137A CN 202310287966 A CN202310287966 A CN 202310287966A CN 115994137 A CN115994137 A CN 115994137A
Authority
CN
China
Prior art keywords
sequence
data
analyzed
temporary sequence
temporary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310287966.7A
Other languages
Chinese (zh)
Other versions
CN115994137B (en
Inventor
秦少卿
张梓韦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Hongding Software Technology Co ltd
Original Assignee
Wuxi Hongding Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Hongding Software Technology Co ltd filed Critical Wuxi Hongding Software Technology Co ltd
Priority to CN202310287966.7A priority Critical patent/CN115994137B/en
Publication of CN115994137A publication Critical patent/CN115994137A/en
Application granted granted Critical
Publication of CN115994137B publication Critical patent/CN115994137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of data processing, in particular to a data management method based on an application service system of the Internet of things, which comprises the following steps: obtaining a distance set according to the data sequence to be analyzed and the standard data sequence, and further obtaining a time period to be analyzed; obtaining a first temporary sequence and a second temporary sequence according to a time period to be analyzed; obtaining initial change trend difference according to the path length difference and trend change times of the first temporary sequence and the second temporary sequence, and obtaining final change trend difference by combining the time difference distance; determining a correction coefficient, and further obtaining a similarity degree; and obtaining a clearing coefficient according to the storage days, and further clearing the data sequence to be analyzed. The method and the system can enable the application service system of the Internet of things to store important abnormal data for a long time and clear redundant invalid data timely and accurately.

Description

Data management method based on application service system of Internet of things
Technical Field
The invention relates to the field of data processing, in particular to a data management method based on an application service system of the Internet of things.
Background
The internet of things combines different information sensing devices with the internet through recognition sensing and network communication to form a huge network capable of connecting people, machines and objects in various scenes so as to realize the functions of intelligent recognition, positioning, tracking, supervision and the like. Along with the increasing wide application of the internet of things technology in the fields of green agriculture, industrial monitoring, environmental monitoring and the like, the number of different types of data acquisition equipment such as sensors, cameras and the like is rapidly increased, so that the data volume stored in a database is also rapidly increased.
To ensure that the storage space of the database always maintains a certain available capacity, the database automatically clears the expiration data according to the number of days of reservation. However, in many situations, the data collected by the data collection device daily has large repeatability, such as workshop temperature, device operation data and the like, and only when abnormality occurs, the data collected by the data collection device daily can change greatly, and the abnormal data are often important data which need to be analyzed. If the data is cleared only according to the retention days, important data which is easy to cause abnormality is cleared, and normal data which repeatedly appear every day is saved in a large amount. Therefore, accurate removal of redundant invalid data is one of the keys of data management of the application service system of the internet of things.
Disclosure of Invention
The invention provides a data management method based on an application service system of the Internet of things, which aims to solve the existing problems.
The data management method based on the application service system of the Internet of things adopts the following technical scheme:
the embodiment of the invention provides a data management method based on an application service system of the Internet of things, which comprises the following steps:
acquiring a data sequence to be analyzed and a standard data sequence; obtaining an optimal matching path of a data sequence to be analyzed and a standard data sequence, and obtaining a distance set according to the optimal matching path; obtaining a distance change curve according to the distance set, and obtaining a time period to be analyzed according to each data point on the distance change curve; respectively calling a sequence formed by corresponding data in a data sequence to be analyzed and a standard data sequence in a time period to be analyzed as a first temporary sequence and a second temporary sequence;
respectively obtaining vector sets and reference vectors of the first temporary sequence and the second temporary sequence according to each data point corresponding to the first temporary sequence and the second temporary sequence; obtaining a first path length and a second path length according to clustering results of each vector in the vector sets of the first temporary sequence and the second temporary sequence; obtaining the side of the first temporary sequence and the side of the second temporary sequence according to the reference vectors of the first temporary sequence and the second temporary sequence; obtaining a path length difference corresponding to the time period to be analyzed according to the first path length, the second path length, the side of the first temporary sequence and the side of the second temporary sequence; obtaining a time difference distance and trend change times of the first temporary sequence and the second temporary sequence according to clustering results of each vector and adjacent vectors in a vector set corresponding to the first temporary sequence and the second temporary sequence; obtaining initial change trend difference according to the path length difference and trend change times of the first temporary sequence and the second temporary sequence;
obtaining a final variation trend difference corresponding to the time period to be analyzed according to the time difference distance and the initial variation trend difference; obtaining correction coefficients according to the final variation trend difference, the time length, the average value of all data in the first temporary sequence and the average value of all data in the second temporary sequence of all the time periods to be analyzed; obtaining the similarity degree according to the correction coefficient and the distance set;
obtaining a clearance coefficient according to the similarity and the storage days; and clearing the data sequence to be analyzed according to the clearing coefficient.
Preferably, the step of obtaining the time period to be analyzed according to each data point on the distance change curve includes:
and acquiring all peak points of the distance change curve, taking an average value of all data in the distance set as a distance threshold value, and taking a peak region where the peak point with the ordinate larger than the distance threshold value is located as a time period to be analyzed.
Preferably, the step of obtaining the peak area includes: for any one peak point, two valley points closest to the peak point are taken as two end points of a peak region, and all data points between the two end points form the peak region.
Preferably, the method for obtaining the vector sets of the first temporary sequence and the second temporary sequence and the reference vector comprises the following steps:
each data point in the first temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the first temporary sequence form a vector set of the first temporary sequence; taking a vector formed by taking a first data point in the first temporary sequence as a vector starting point and a last data point as a vector end point as a reference vector of the first temporary sequence;
each data point in the second temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the second temporary sequence form a vector set of the second temporary sequence; the vector formed by taking the first data point in the second temporary sequence as a vector starting point and the last data point as a vector end point is taken as a reference vector of the second temporary sequence.
Preferably, the method for acquiring the first path length and the second path length includes:
taking the sum of all vectors contained in each clustering result in the first temporary sequence as an accumulated vector corresponding to each clustering result, and taking the included angle between the accumulated vectors corresponding to each clustering result as a first path length;
and taking the sum of all vectors contained in each clustering result in the second temporary sequence as an accumulation vector corresponding to each clustering result, and taking the included angle between the accumulation vectors corresponding to each clustering result as a second path length.
Preferably, the acquiring method of the belonging side of the first temporary sequence and the belonging side of the second temporary sequence is as follows:
taking a straight line where a reference vector of the first temporary sequence is located as a boundary line, counting the number of data points in an area above and an area below the boundary line of each data point in the first temporary sequence, and taking the area where the maximum number of data points is located as the side of the first temporary sequence;
and counting the number of data points in the area above and the area below the boundary line of each data point in the second temporary sequence by taking the straight line where the reference vector of the second temporary sequence is located as the boundary line, and taking the area where the maximum number of data points is located as the side of the second temporary sequence.
Preferably, the method for obtaining the path length difference comprises the following steps:
when the belonging side of the first temporary sequence is the same as the belonging side of the second temporary sequence, the path length difference is the absolute value of the difference between the first path length and the second path length; otherwise, the path length difference is the result of the addition between the first path length and the second path length.
Preferably, the obtaining expression of the initial variation trend difference is:
Figure SMS_1
wherein ,
Figure SMS_2
is the initial variation trend difference; d represents a path length difference;
Figure SMS_3
and
Figure SMS_4
the trend change times of the first temporary sequence and the second temporary sequence are respectively shown.
Preferably, the expression for obtaining the correction coefficient is:
Figure SMS_5
wherein P is a correction coefficient;
Figure SMS_6
indicating the final variation trend difference of the jth time period to be analyzed,
Figure SMS_7
representing the time length of the j-th time period to be analyzed;
Figure SMS_8
and
Figure SMS_9
respectively representing the average value of all data in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, wherein m is the number of the time periods to be analyzed,
Figure SMS_10
is an exponential function based on natural constants.
Preferably, the obtaining expression of the similarity degree is:
Figure SMS_11
wherein R is the similarity degree of the data sequence to be analyzed and the standard data sequence, P is the correction coefficient,
Figure SMS_12
for the i-th distance value in the distance set, n is the number of distance values contained in the distance set,
Figure SMS_13
is an exponential function based on natural constants.
Preferably, the clearance coefficient calculation formula is:
Figure SMS_14
wherein T represents a clearing coefficient, R represents the similarity degree between the data sequence to be analyzed and the standard data sequence, the smaller the value is, the more important the value is, long-term storage is needed, and K represents the reserved days of the data sequence to be analyzed;
preferably, the clearing of the data sequence to be analyzed according to the clearing coefficient comprises:
and clearing the data sequence to be analyzed, wherein the clearing coefficient is larger than a clearing threshold value.
The beneficial effects of the invention are as follows: according to the method, the standard data sequence and the data sequence to be analyzed are subjected to continuous analysis, the DTW algorithm is used for obtaining the optimal matching path of the data sequence to be analyzed and the standard data sequence according to the optimal matching path, so that the standard data sequence and the dissimilar time period in the data sequence to be analyzed are obtained, namely the time period to be analyzed, the DTW algorithm is only subjected to distortion matching according to Euclidean distances in the time periods to be analyzed, the DTW distances between the data sequence to be analyzed and the standard data sequence in the time periods to be analyzed are forcibly reduced, the similarity degree of the standard data sequence and the data sequence to be analyzed is higher than the actual similarity degree, and therefore the database can be used for carrying out long-term storage on important data according to the same characteristics of the standard data sequence and the data sequence to be analyzed, analyzing the actual similarity degree of the standard data sequence and the data sequence to be analyzed, and obtaining the actual similarity degree of the data sequence to be analyzed according to the actual data value difference, and the correction coefficient of the DTW distance is used for accurately obtaining the similarity degree of the standard data sequence and the data sequence to be analyzed, and finally judging whether the data sequence to be analyzed needs to be cleared or not is required to be cleared according to the storage days.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of steps of a data management method based on an application service system of the internet of things.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of a data management method based on an application service system of the internet of things according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the data management method based on the application service system of the internet of things provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a data management method based on an application service system of the internet of things according to an embodiment of the present invention is shown, where the method includes the following steps:
step S001: and obtaining a standard data sequence and a data sequence to be analyzed.
In the application field of the internet of things technology, data acquired by data acquisition equipment every day repeatedly appear, such as temperature, humidity, plant equipment operation data and the like in an agricultural greenhouse, and the plant equipment operation data is taken as an example, under the condition of no faults, the equipment operation data in two days are similar in change, if faults occur, the equipment operation data in the day can be greatly changed, and the abnormal data are often important data which need to be analyzed in the embodiment.
If the data is cleared only according to the retention days, important data which is easy to cause abnormality is cleared, and normal data which repeatedly appear every day is saved in a large amount. Therefore, according to the abnormal degree of the daily data collection and the number of reserved days, the embodiment sets the clearing threshold value, stores important abnormal data for a long time, and timely and accurately clears redundant invalid data.
The data acquired from the Internet of things equipment are streaming data, namely time sequence data, which are freely and continuously generated in a mode; when judging whether the time sequence data generated by the factory equipment in a certain date needs to be cleared or not, taking the date as a target date, firstly acquiring the time sequence data generated in the target date, and referring the time sequence data as a data sequence to be analyzed; and then manually selecting time sequence data generated in one day of normal operation of the equipment, and using the obtained time sequence data as a standard data sequence to serve as reference data of the abnormality degree of the data sequence to be analyzed, namely judging the abnormality degree of the data sequence to be analyzed by analyzing the similarity between the data sequence to be analyzed and the standard data sequence.
Step S002: according to the vector sets and the reference vectors respectively corresponding to the first temporary sequence and the second temporary sequence, a first path length, a second path length, a side of the first temporary sequence and a side of the second temporary sequence are obtained, and then a path length difference and a time difference distance are obtained; and combining the trend change times of the first temporary sequence and the second temporary sequence to obtain initial change trend difference.
1. And obtaining the corresponding relation between the data sequence to be analyzed and the standard data sequence by using a DTW algorithm, and obtaining the standard data sequence and the time period to be analyzed in the data sequence to be analyzed by threshold segmentation.
When the factory equipment in the target date fails, the obtained data sequence to be analyzed and the standard data sequence have larger difference, the DTW algorithm requires that all data points are matched by utilizing Euclidean distance, and the matching should have continuity, so that the matching distortion of dissimilar parts in time sequence data is serious, and the deviation is necessarily present in the process of calculating the similarity between the dissimilar parts. According to the embodiment, the to-be-analyzed time periods in the to-be-analyzed data sequence and the standard data sequence are extracted, the to-be-analyzed time periods are used as the to-be-analyzed time periods of the embodiment, the features in the to-be-analyzed time periods are quantitatively analyzed, the DTW distance is adjusted according to the quantitative result, and therefore the more accurate similarity degree of the to-be-analyzed time periods is obtained. The specific process is as follows:
firstly, matching a standard data sequence and a data sequence to be analyzed by using a DTW algorithm, wherein the DTW algorithm is a known technology, and the principle is that a distance matrix is constructed according to the distance between each data in one sequence and each data in the other sequence, and then the minimum distance path from the starting point of the two sequences to the end point of the two sequences is searched in the distance matrix, and the standard data sequence and the data sequence to be analyzed are correspondingThe minimum distance path of (a) is called as the best matching path, and the distance set A is formed by recording the corresponding Euclidean distances on the best matching path
Figure SMS_15
, wherein
Figure SMS_16
The i-th distance value on the best matching path is used, and n is the number of the distance values contained in the distance set;
then, taking the serial number of each distance value in the distance set A as the horizontal axis and the size of each distance value as the vertical axis, and performing curve fitting on each distance value in the distance set A by using a least square method to obtain a distance change curve, wherein the coordinate of the ith data point on the distance change curve can be expressed as
Figure SMS_17
The method comprises the steps of carrying out a first treatment on the surface of the Because the corresponding area of the distance change curve approaches to a straight line in the time period when the two time sequence data changes are similar, and the corresponding area of the distance change curve appears as a wave crest in the time period when the two time sequence data changes are dissimilar; therefore, each peak point and trough point on the distance change curve are obtained, the average value of all data in the distance set A is taken as a distance threshold, and the peak area where the peak point with the ordinate of the peak point larger than the distance threshold is located is taken as a time period in which the standard data sequence is dissimilar to the change of the data sequence to be analyzed, namely, each time period to be analyzed in the embodiment.
The wave crest region acquisition process comprises the following steps: for any one peak point, taking two valley points closest to the peak point as two end points of a peak region, all data points between the two end points form a peak region, and processing each peak point to obtain each peak region.
The wave crest regions where the wave crest points with the ordinate larger than the distance threshold value are located form each time period to be analyzed of the embodiment at the corresponding data acquisition time in the data sequence to be analyzed; the corresponding data in the data sequence to be analyzed and the standard data sequence in each time period to be analyzed respectively form each first temporary sequence and each second temporary sequence in the embodiment.
2. And carrying out quantitative analysis on the characteristics in the time period to be analyzed.
Because the DTW algorithm performs distortion matching in the standard data sequence and dissimilar time periods in the data sequence to be analyzed to forcibly reduce the DTW distance of the region so that the final similarity result is higher, the embodiment obtains the similarity degree corresponding to each time period to be analyzed according to the difference between the change trends of the first temporary sequence and the second temporary sequence and the difference between the data values in each time period to be analyzed, and takes the similarity degree as the correction coefficient of the DTW distance.
For the first temporary sequence and the second temporary sequence corresponding to any one time period to be analyzed, the Euclidean distance between the starting point and the ending point of the standard data sequence in the time period to be analyzed and the starting point and the ending point of the data sequence to be analyzed in the time period to be analyzed is smaller, namely, the two pairs of data points are in a coincident or nearly coincident state, at the moment, in the time period to be analyzed, the vector summation of all adjacent data points corresponding to the standard data sequence is consistent with the vector summation of all adjacent data points corresponding to the data sequence to be analyzed, so that the difference characteristic of the standard data sequence and the data sequence to be analyzed can be analyzed through the consistent characteristic;
firstly, acquiring vectors of adjacent data points of a data sequence to be analyzed in the time period to be analyzed, namely firstly acquiring vectors formed by the adjacent data points in a first temporary sequence corresponding to the time period to be analyzed, for example, for the kth data point of the first temporary sequence, the data point adjacent to the data point is the kth+1st data point, and the vector corresponding to the kth data point is a vector formed by taking the kth data point as a vector starting point and the kth+1st data point as a vector end point; thereby obtaining a vector set corresponding to the data sequence to be analyzed, namely a vector set corresponding to the first temporary sequence, in the time period to be analyzed
Figure SMS_28
The method comprises the steps of carrying out a first treatment on the surface of the Taking the first data point in the first temporary sequence as a vector start point, the last data point as a vector end point, and taking the obtained vector as a reference vector of the first temporary sequence
Figure SMS_20
The method comprises the steps of carrying out a first treatment on the surface of the At this time, the vector set
Figure SMS_25
The sum of all vectors in (a) is equal to
Figure SMS_29
And vector set
Figure SMS_33
Can be divided into two classes: away from the reference vector
Figure SMS_32
And approach reference vector
Figure SMS_34
The method comprises the steps of carrying out a first treatment on the surface of the Therefore, the embodiment uses a K-means clustering algorithm to cluster each vector in the vector set, and sets the number of clustering results of the K-means clustering algorithm to be 2, so that two clustering results are obtained; then respectively calculating the sum of all vectors contained in each clustering result, and respectively marking the sum of the vectors corresponding to the two clustering results as accumulated vectors
Figure SMS_26
And
Figure SMS_30
when (when)
Figure SMS_18
And
Figure SMS_22
the larger the included angle, the more the data point with larger deviation from the reference vector exists in the process from the first data point to the last data point in the first temporary sequence, and the longer the distance from the first data point to the last data point is correspondingly, therefore
Figure SMS_19
And
Figure SMS_24
the included angle of (a) may also be indicative of the path length traveled by the data points in the first temporary sequence, so this embodiment will
Figure SMS_27
And
Figure SMS_31
the included angle between them is recorded as first path length
Figure SMS_21
. Similarly, a second path length is obtained according to a second temporary sequence corresponding to the standard data sequence in the time period to be analyzed
Figure SMS_23
Then taking the straight line where the reference vector of the first temporary sequence is located as a boundary line to acquire the number of data points in the area below the boundary line in the first temporary sequence
Figure SMS_35
Number of data points in the area above the demarcation line
Figure SMS_36
Will be
Figure SMS_37
And (3) with
Figure SMS_38
The region corresponding to the maximum value is taken as the belonging side of the first temporary sequence
Figure SMS_39
The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the affiliated side of the second temporary sequence is obtained according to the reference vector of the second temporary sequence
Figure SMS_40
. Thereby obtaining the standard numberThe path length difference D of the data sequence to be analyzed corresponding to the data points in the time period to be analyzed is as follows:
Figure SMS_41
wherein ,
Figure SMS_42
and
Figure SMS_43
the first path length and the second path length are respectively;
Figure SMS_44
and
Figure SMS_45
the side to which the first temporary sequence belongs and the side to which the second temporary sequence belongs are represented, respectively.
When (when)
Figure SMS_46
In the time period to be analyzed, the data points in the first temporary sequence and the second temporary sequence are shown to be on the same side of the corresponding dividing line, so that the overall trend of the data point changes in the first temporary sequence and the second temporary sequence is the same
Figure SMS_47
Indicating the difference in path length traversed; while when
Figure SMS_48
In the time period to be analyzed, the different sides of most data points in the first temporary sequence and the second temporary sequence relative to the corresponding dividing line indicate that the overall trend of the data point changes in the first temporary sequence and the second temporary sequence is opposite, so that
Figure SMS_49
Indicating the difference in path length traversed.
Knowing the first temporary sequenceCorresponding vector set
Figure SMS_52
The vector in the vector set is divided into two clustering results, and the vector set
Figure SMS_55
The number of vectors which are not the same clustering result as the adjacent vectors is called the trend change times of the first temporary sequence
Figure SMS_57
Representing the turning times of all data points in the first temporary sequence corresponding to the time period to be analyzed, for representing the variation trend of the data points in the first temporary sequence, such as for vectors
Figure SMS_51
And
Figure SMS_54
when (when)
Figure SMS_58
Adjacent vector thereto
Figure SMS_59
If the clustering results do not belong to the same clustering result, the clustering results are considered
Figure SMS_50
Is a vector whose trend changes, otherwise, it is considered that
Figure SMS_53
Is not a vector whose trend changes. Similarly, the trend change times of the second temporary sequence corresponding to the time period to be analyzed are as follows
Figure SMS_56
The method comprises the steps of representing the variation trend of data points in a second temporary sequence; the initial trend difference F between the first temporary sequence corresponding to the data sequence to be analyzed and the second temporary sequence corresponding to the standard data sequence in the period to be analyzed may be expressed as:
Figure SMS_60
wherein D represents a path length difference between the first temporary sequence and the second temporary sequence;
Figure SMS_61
and
Figure SMS_62
the trend change times of the first temporary sequence and the second temporary sequence are respectively shown.
D represents the difference between the overall trend of change of the data points in the first temporary sequence and the second temporary sequence,
Figure SMS_63
the difference between the change trend change times of the data points in the first temporary sequence and the second temporary sequence is represented, the larger the values of the first temporary sequence and the second temporary sequence are, the more obvious the difference between the change trends of the data points in the standard data sequence and the data sequence to be analyzed in the time period to be analyzed is, so the difference between the change trends of the data points in the standard data sequence and the data sequence to be analyzed in the time period to be analyzed is represented by integrating the difference between the path length of the first temporary sequence and the second temporary sequence and the difference between the change trend change times.
Step S003: obtaining a final variation trend difference corresponding to the time period to be analyzed according to the time difference distance and the initial variation trend difference; and obtaining correction coefficients according to the final variation trend difference, the time length, the average value of all data in the first temporary sequence and the average value of all data in the second temporary sequence of all the time periods to be analyzed, and further obtaining the similarity degree.
Because the initial change trend difference F is obtained only by considering the times of the overall change trend and the local change trend change of the data in the first temporary sequence and the second temporary sequence corresponding to the time period to be analyzed, and the time information corresponding to the standard data sequence and the trend change in the data sequence to be analyzed is not considered, the initial change trend difference F of the first temporary sequence and the second temporary sequence still has errors, and the F needs to be corrected according to the time change of the standard data sequence and each local change trend change in the data sequence to be analyzed.
Acquiring the acquisition time of each data point with trend change in the first temporary sequence and the second temporary sequence corresponding to the time period to be analyzed respectively to obtain a first trend change sequence and a second trend change sequence, and acquiring the time difference distance between the first trend change sequence and the second trend change sequence by using a DTW algorithm, wherein the time difference distance is marked as G in the embodiment, so as to obtain the final change trend difference between the first temporary sequence and the second temporary sequence
Figure SMS_64
The method comprises the following steps:
Figure SMS_65
wherein F represents an initial change trend difference between the first temporary sequence and the second temporary sequence, and G represents a time difference distance between the first temporary sequence and the second temporary sequence; the larger the value of G, the larger the time difference between the data points corresponding to the trend change in the first temporary sequence and the second temporary sequence, so the embodiment uses the time difference distance G as the correction coefficient of the initial trend change difference F to obtain the real change trend difference between the first temporary sequence and the second temporary sequence, that is, the final trend change difference between the first temporary sequence and the second temporary sequence.
And processing the standard data sequence and all the time periods to be analyzed in the data sequence to be analyzed to obtain the final variation trend difference corresponding to all the time periods to be analyzed.
Because the DTW algorithm performs distortion matching in the dissimilar time periods in the standard data sequence and the data sequence to be analyzed, the DTW distance of the area is forcibly reduced, and the final similarity result is higher. The present embodiment therefore calculates the degree of dissimilarity between the standard data sequence and each of the time periods to be analyzed in the data sequence to be analyzed, and uses the obtained degree of dissimilarity as a correction coefficient for the DTW distance. The correction factor P for the distance of the standard data sequence and the data sequence DTW to be analyzed is:
Figure SMS_66
wherein P is a correction coefficient;
Figure SMS_67
indicating the final variation trend difference of the jth time period to be analyzed,
Figure SMS_68
representing the time length of the j-th time period to be analyzed;
Figure SMS_69
and
Figure SMS_70
respectively representing the average value of all data in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, wherein m is the number of the time periods to be analyzed,
Figure SMS_71
is an exponential function based on natural constants.
Figure SMS_72
The final variation trend difference between the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed is represented, namely the final variation trend difference corresponding to the j-th time period to be analyzed;
Figure SMS_73
representing the difference between the data values in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, namely the difference of the data values corresponding to the j-th time period to be analyzed; the product of the two is expressed as the degree of dissimilarity between the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, and the larger the value is, the lower the reliability of the DTW distance in the time period to be analyzed is, and the lower the DTW distance is, namely the distortion is causedThe greater the matching degree, the greater the correction coefficient is required, and in this embodiment, the correction coefficient of the DTW distance of the standard data sequence and the data sequence to be analyzed is obtained according to the final variation trend differences and the corresponding data value differences corresponding to all the time periods to be analyzed. According to the correction coefficient and each distance value on the optimal matching path, the similarity degree R of the standard data sequence and the data sequence to be analyzed is as follows:
Figure SMS_74
wherein R is the similarity between the data sequence to be analyzed and the standard data sequence, P is the correction coefficient of the DTW distance between the standard data sequence and the data sequence to be analyzed,
Figure SMS_75
for the i-th distance value in the distance set, n is the number of distance values contained in the distance set,
Figure SMS_76
is an exponential function based on natural constants;
Figure SMS_77
the DTW distance is the DTW distance between the standard data sequence and the data sequence to be analyzed, and the smaller the DTW distance is, the greater the similarity between the standard data sequence and the data sequence to be analyzed is; when distortion matching exists, the DTW distances between the data sequence to be analyzed and the standard data sequence in the first temporary sequence and the second temporary sequence corresponding to the data sequence to be analyzed in the time period to be analyzed are smaller, namely the degree of similarity is increased due to the distortion matching, and if the correction coefficient is larger, the degree of similarity is more unreliable, the degree of correction is also greater, and the degree of similarity is reduced; that is, the magnitude of the correction coefficient and the similarity degree are in a negative correlation, and the DTW distance and the similarity degree are also in a negative correlation.
Step S004: obtaining a clearance coefficient according to the similarity and the storage days; and clearing the data sequence to be analyzed according to the clearing coefficient.
When the similarity between the data sequence to be analyzed and the standard data sequence is smaller, the probability that the data sequence to be analyzed is abnormal is larger, the data sequence to be analyzed is more important, long-term storage is needed, the data with longer retention time can be automatically cleared by considering the existing database management mode, and the clearing coefficient T of the data sequence to be analyzed is as follows:
Figure SMS_78
wherein T represents a clearing coefficient, R is the similarity degree of the data sequence to be analyzed and the standard data sequence, the smaller the value is, the more important, long-term storage is needed, K is the number of days the data sequence to be analyzed is reserved, and the smaller the value is, the smaller the probability of clearing is needed.
In this embodiment, the clearing threshold is set to 0.9, and the practitioner can set according to the actual requirement, so that when T >0.9, the data sequence to be analyzed is cleared, otherwise, the data sequence to be analyzed is reserved. And similarly, judging whether the time sequence data corresponding to other dates need to be cleared or not, thereby realizing the efficient data management of the application service system of the Internet of things.
According to the embodiment, firstly, an optimal matching path between a data sequence to be analyzed and a standard data sequence is obtained according to a DTW algorithm, a distance set is obtained according to the optimal matching path, so that a standard data sequence and an dissimilar time period in the data sequence to be analyzed are obtained, namely, the time period to be analyzed, but the DTW algorithm only carries out distortion matching according to Euclidean distances in the time periods to be analyzed, so that the DTW distances between the data sequence to be analyzed and the standard data sequence in the time periods to be analyzed are forcedly reduced, the similarity degree of the standard data sequence and the data sequence to be analyzed is higher than the actual similarity degree, and therefore, according to the same characteristics of the standard data sequence and the data sequence to be analyzed in the time period to be analyzed, the trend change difference and the actual data value difference between the standard data sequence to be analyzed are analyzed, the actual similarity degree in the time period to be analyzed is obtained, and the correction coefficient of the DTW distances is used, so that the similarity degree of the standard data sequence and the data sequence to be analyzed is accurately obtained, finally, the data sequence to be analyzed is combined with storage to judge whether the data sequence to be cleaned, so that the important data can be cleaned, and the redundant data can be stored in a long time, and invalid data can be timely.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (6)

1. The data management method based on the application service system of the Internet of things is characterized by comprising the following steps of:
acquiring a data sequence to be analyzed and a standard data sequence; obtaining an optimal matching path of a data sequence to be analyzed and a standard data sequence, and obtaining a distance set according to the optimal matching path; obtaining a distance change curve according to the distance set, and obtaining a time period to be analyzed according to each data point on the distance change curve; respectively calling a sequence formed by corresponding data in a data sequence to be analyzed and a standard data sequence in a time period to be analyzed as a first temporary sequence and a second temporary sequence;
respectively obtaining vector sets and reference vectors of the first temporary sequence and the second temporary sequence according to each data point corresponding to the first temporary sequence and the second temporary sequence; obtaining a first path length and a second path length according to clustering results of each vector in the vector sets of the first temporary sequence and the second temporary sequence; obtaining the side of the first temporary sequence and the side of the second temporary sequence according to the reference vectors of the first temporary sequence and the second temporary sequence; obtaining a path length difference corresponding to the time period to be analyzed according to the first path length, the second path length, the side of the first temporary sequence and the side of the second temporary sequence; obtaining a time difference distance and trend change times of the first temporary sequence and the second temporary sequence according to clustering results of each vector and adjacent vectors in a vector set corresponding to the first temporary sequence and the second temporary sequence; obtaining initial change trend difference according to the path length difference and trend change times of the first temporary sequence and the second temporary sequence;
obtaining a final variation trend difference corresponding to the time period to be analyzed according to the time difference distance and the initial variation trend difference; obtaining correction coefficients according to the final variation trend difference, the time length, the average value of all data in the first temporary sequence and the average value of all data in the second temporary sequence of all the time periods to be analyzed; obtaining the similarity degree according to the correction coefficient and the distance set;
obtaining a clearance coefficient according to the similarity and the storage days; clearing the data sequence to be analyzed according to the clearing coefficient;
the method for acquiring the path length difference comprises the following steps:
when the belonging side of the first temporary sequence is the same as the belonging side of the second temporary sequence, the path length difference is the absolute value of the difference between the first path length and the second path length; otherwise, the path length difference is the result of adding the first path length and the second path length;
the acquisition expression of the initial variation trend difference is as follows:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
is the initial variation trend difference; d represents a path length difference; />
Figure QLYQS_3
and />
Figure QLYQS_4
The trend change times of the first temporary sequence and the second temporary sequence are respectively represented;
the acquisition expression of the correction coefficient is as follows:
Figure QLYQS_5
wherein P is a correction coefficient;
Figure QLYQS_6
represents the final variation trend difference of the j-th time period to be analyzed,/>
Figure QLYQS_7
Representing the time length of the j-th time period to be analyzed; />
Figure QLYQS_8
and />
Figure QLYQS_9
Respectively representing the average value of all data in the first temporary sequence and the second temporary sequence corresponding to the j-th time period to be analyzed, wherein m is the number of the time periods to be analyzed, and +.>
Figure QLYQS_10
Is an exponential function based on natural constants;
the obtaining expression of the similarity degree is as follows:
Figure QLYQS_11
wherein R is the similarity degree of the data sequence to be analyzed and the standard data sequence, P represents a correction coefficient,
Figure QLYQS_12
for the i-th distance value in the distance set, n is the number of distance values contained in the distance set,/->
Figure QLYQS_13
Is an exponential function based on natural constants;
the clearance coefficient calculation formula is:
Figure QLYQS_14
wherein T represents a clearing coefficient, R represents the similarity degree between the data sequence to be analyzed and the standard data sequence, the smaller the value is, the more important the value is, long-term storage is needed, and K represents the reserved days of the data sequence to be analyzed;
the method for clearing the data sequence to be analyzed according to the clearing coefficient comprises the following steps:
and clearing the data sequence to be analyzed, wherein the clearing coefficient is larger than a clearing threshold value.
2. The method for data management based on an application service system of the internet of things according to claim 1, wherein the step of obtaining the time period to be analyzed according to each data point on the distance change curve comprises:
and acquiring all peak points of the distance change curve, taking an average value of all data in the distance set as a distance threshold value, and taking a peak region where the peak point with the ordinate larger than the distance threshold value is located as a time period to be analyzed.
3. The method for data management based on the application service system of the internet of things according to claim 2, wherein the step of obtaining the peak area comprises: for any one peak point, two valley points closest to the peak point are taken as two end points of a peak region, and all data points between the two end points form the peak region.
4. The data management method based on the internet of things application service system according to claim 1, wherein the vector sets of the first temporary sequence and the second temporary sequence and the reference vector acquiring method are as follows:
each data point in the first temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the first temporary sequence form a vector set of the first temporary sequence; taking a vector formed by taking a first data point in the first temporary sequence as a vector starting point and a last data point as a vector end point as a reference vector of the first temporary sequence;
each data point in the second temporary sequence is taken as a vector starting point, and adjacent data points of each data point are taken as vector end points to obtain vectors corresponding to each data point; vectors corresponding to all data points in the second temporary sequence form a vector set of the second temporary sequence; the vector formed by taking the first data point in the second temporary sequence as a vector starting point and the last data point as a vector end point is taken as a reference vector of the second temporary sequence.
5. The data management method based on the application service system of the internet of things according to claim 1, wherein the acquiring method of the first path length and the second path length is as follows:
taking the sum of all vectors contained in each clustering result in the first temporary sequence as an accumulated vector corresponding to each clustering result, and taking the included angle between the accumulated vectors corresponding to each clustering result as a first path length;
and taking the sum of all vectors contained in each clustering result in the second temporary sequence as an accumulation vector corresponding to each clustering result, and taking the included angle between the accumulation vectors corresponding to each clustering result as a second path length.
6. The data management method based on the application service system of the internet of things according to claim 1, wherein the acquiring method of the belonging side of the first temporary sequence and the belonging side of the second temporary sequence is:
taking a straight line where a reference vector of the first temporary sequence is located as a boundary line, counting the number of data points in an area above and an area below the boundary line of each data point in the first temporary sequence, and taking the area where the maximum number of data points is located as the side of the first temporary sequence;
and counting the number of data points in the area above and the area below the boundary line of each data point in the second temporary sequence by taking the straight line where the reference vector of the second temporary sequence is located as the boundary line, and taking the area where the maximum number of data points is located as the side of the second temporary sequence.
CN202310287966.7A 2023-03-23 2023-03-23 Data management method based on application service system of Internet of things Active CN115994137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310287966.7A CN115994137B (en) 2023-03-23 2023-03-23 Data management method based on application service system of Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310287966.7A CN115994137B (en) 2023-03-23 2023-03-23 Data management method based on application service system of Internet of things

Publications (2)

Publication Number Publication Date
CN115994137A true CN115994137A (en) 2023-04-21
CN115994137B CN115994137B (en) 2023-06-23

Family

ID=85995350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310287966.7A Active CN115994137B (en) 2023-03-23 2023-03-23 Data management method based on application service system of Internet of things

Country Status (1)

Country Link
CN (1) CN115994137B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306A (en) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 Sewage treatment system data storage method using potamogeton crispus
CN116400870A (en) * 2023-06-08 2023-07-07 西安品盛互联网技术有限公司 On-site construction on-line management system based on Internet of things
CN116405532A (en) * 2023-06-09 2023-07-07 深圳市乗名科技有限公司 Industrial control and automation method and device based on Internet of things and electronic equipment
CN116756493A (en) * 2023-08-15 2023-09-15 湖南湘江智慧科技股份有限公司 Data management method for security and fire control finger collecting platform
CN116843368A (en) * 2023-07-17 2023-10-03 杭州火奴数据科技有限公司 Marketing data processing method based on ARMA model
CN116821940B (en) * 2023-08-23 2024-02-13 青岛阿斯顿工程技术转移有限公司 Intelligent training assessment data acquisition method
CN117762106A (en) * 2023-12-23 2024-03-26 济宁市铠铠食品有限公司 Method for monitoring processing process of poultry blood product based on Internet of things
CN117975742A (en) * 2024-03-29 2024-05-03 大连禾圣科技有限公司 Smart city traffic management system and method based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445832A (en) * 2019-08-28 2021-03-05 北京达佳互联信息技术有限公司 Data anomaly detection method and device, electronic equipment and storage medium
CN112788066A (en) * 2021-02-26 2021-05-11 中南大学 Abnormal flow detection method and system for Internet of things equipment and storage medium
CN114911846A (en) * 2022-05-17 2022-08-16 河海大学 FAD and DTW-based hydrological time sequence similarity searching method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445832A (en) * 2019-08-28 2021-03-05 北京达佳互联信息技术有限公司 Data anomaly detection method and device, electronic equipment and storage medium
CN112788066A (en) * 2021-02-26 2021-05-11 中南大学 Abnormal flow detection method and system for Internet of things equipment and storage medium
CN114911846A (en) * 2022-05-17 2022-08-16 河海大学 FAD and DTW-based hydrological time sequence similarity searching method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306A (en) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 Sewage treatment system data storage method using potamogeton crispus
CN116400870A (en) * 2023-06-08 2023-07-07 西安品盛互联网技术有限公司 On-site construction on-line management system based on Internet of things
CN116400870B (en) * 2023-06-08 2023-08-18 西安品盛互联网技术有限公司 On-site construction on-line management system based on Internet of things
CN116405532A (en) * 2023-06-09 2023-07-07 深圳市乗名科技有限公司 Industrial control and automation method and device based on Internet of things and electronic equipment
CN116405532B (en) * 2023-06-09 2023-08-18 深圳市乗名科技有限公司 Industrial control and automation method and device based on Internet of things and electronic equipment
CN116843368A (en) * 2023-07-17 2023-10-03 杭州火奴数据科技有限公司 Marketing data processing method based on ARMA model
CN116843368B (en) * 2023-07-17 2024-01-26 杭州火奴数据科技有限公司 Marketing data processing method based on ARMA model
CN116756493A (en) * 2023-08-15 2023-09-15 湖南湘江智慧科技股份有限公司 Data management method for security and fire control finger collecting platform
CN116756493B (en) * 2023-08-15 2023-10-27 湖南湘江智慧科技股份有限公司 Data management method for security and fire control finger collecting platform
CN116821940B (en) * 2023-08-23 2024-02-13 青岛阿斯顿工程技术转移有限公司 Intelligent training assessment data acquisition method
CN117762106A (en) * 2023-12-23 2024-03-26 济宁市铠铠食品有限公司 Method for monitoring processing process of poultry blood product based on Internet of things
CN117975742A (en) * 2024-03-29 2024-05-03 大连禾圣科技有限公司 Smart city traffic management system and method based on big data
CN117975742B (en) * 2024-03-29 2024-06-18 大连禾圣科技有限公司 Smart city traffic management system and method based on big data

Also Published As

Publication number Publication date
CN115994137B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN115994137B (en) Data management method based on application service system of Internet of things
CN110751096B (en) Multi-target tracking method based on KCF track confidence
CN112560932A (en) Vehicle weight identification method based on dual-branch network feature fusion
CN116320042B (en) Internet of things terminal monitoring control system for edge calculation
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN113343795B (en) Target associated video tracking processing method
CN111739053A (en) Online multi-pedestrian detection tracking method under complex scene
CN116992322A (en) Smart city data center management system
CN117454671A (en) Artificial intelligence-based field effect transistor life assessment method
CN115457403A (en) Intelligent crop identification method based on multi-type remote sensing images
CN109614512B (en) Deep learning-based power equipment retrieval method
CN113021355B (en) Agricultural robot operation method for predicting sheltered crop picking point
CN116958841B (en) Unmanned aerial vehicle inspection system for power distribution line based on image recognition
CN107578069B (en) Image multi-scale automatic labeling method
CN116630662A (en) Feature point mismatching eliminating method applied to visual SLAM
CN108762963B (en) Method for repairing abnormal data points in time series data based on global information
CN114706900B (en) Precipitation similarity forecasting method based on image feature combination
CN115588149A (en) Cross-camera multi-target cascade matching method based on matching priority
Campos et al. Global localization with non-quantized local image features
CN109766467B (en) Remote sensing image retrieval method and system based on image segmentation and improved VLAD
CN111882543B (en) Cigarette filter stick counting method based on AA R2Unet and HMM
CN117251749B (en) Data processing method of Internet of things based on incremental analysis
CN103927351A (en) Posture correcting based fingerprint retrieval method and system
CN115879059A (en) Power transmission line icing monitoring method and device based on multi-source data fusion
CN117173385B (en) Detection method, device, medium and equipment of transformer substation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant