CN116910595B - Efficient storage method for hydraulic circular ecological restoration data - Google Patents

Efficient storage method for hydraulic circular ecological restoration data Download PDF

Info

Publication number
CN116910595B
CN116910595B CN202311185325.7A CN202311185325A CN116910595B CN 116910595 B CN116910595 B CN 116910595B CN 202311185325 A CN202311185325 A CN 202311185325A CN 116910595 B CN116910595 B CN 116910595B
Authority
CN
China
Prior art keywords
data
dimensional
environmental data
time sequence
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311185325.7A
Other languages
Chinese (zh)
Other versions
CN116910595A (en
Inventor
陈燕
鲁震
谢筱建
赵宇辉
吕晶
姜晓芬
韩子晨
于巾萃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
No 801 Hydrogeological Engineering Geology Brigade of Shandong Bureau of Geology and Mineral Resources
Original Assignee
No 801 Hydrogeological Engineering Geology Brigade of Shandong Bureau of Geology and Mineral Resources
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by No 801 Hydrogeological Engineering Geology Brigade of Shandong Bureau of Geology and Mineral Resources filed Critical No 801 Hydrogeological Engineering Geology Brigade of Shandong Bureau of Geology and Mineral Resources
Priority to CN202311185325.7A priority Critical patent/CN116910595B/en
Publication of CN116910595A publication Critical patent/CN116910595A/en
Application granted granted Critical
Publication of CN116910595B publication Critical patent/CN116910595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The application relates to the technical field of electronic data processing, in particular to a high-efficiency storage method of hydraulic ecological restoration data, which aims at any type of hydraulic ecological restoration data in a hydraulic ecological environment, acquires N-dimensional environmental data corresponding to the hydraulic ecological restoration data at each sampling moment based on sampling frequency, and forms time sequence environmental data; calculating local information trend optimization factors of the N-dimensional environmental data aiming at any N-dimensional environmental data in the time sequence environmental data; calculating the difference degree of two sides of the N-dimensional environment data according to the local information trend optimization factor, and obtaining all storage break points in the time sequence environment data according to the difference degree of the two sides; and dividing the time sequence environmental data according to all the storage break points so as to store the linear fitting straight lines of all the environmental data segments of the time sequence environmental data, thereby improving the accuracy and the efficiency of time sequence environmental data storage.

Description

Efficient storage method for hydraulic circular ecological restoration data
Technical Field
The application relates to the technical field of electronic data processing, in particular to a high-efficiency storage method of hydraulic circular ecological restoration data.
Background
The hydraulic circular ecological restoration is an important field in water resources and water environment engineering, and relates to data of ecology, hydrology, water quality, soil, vegetation, microorganisms and the like. In the process of monitoring and recording ecological data, the hydraulic circular ecological restoration data relates to multi-time scale recording of various data, so that the recorded data needs to be compressed and stored. In the process of storing the hydraulic circular ecological restoration data, in order to ensure the efficient storage of the hydraulic circular ecological restoration data, various ecological data in the hydraulic circular ecological restoration process are required to be stored efficiently.
In the prior art, data storage is generally performed based on clustering, that is, in the process of data storage, for monitoring data of various species recorded in ecological restoration data, a clustering process is performed on each tag, a cluster result corresponding to a time sequence data point in the tag is obtained, for the recorded data in each cluster, the central point of the cluster in the cluster is used as a recorded value of the data point, in the process of storage, efficient storage is performed by recording a corresponding cluster number, and in the process of data reading and extraction, the recorded value corresponding to the data point is obtained in a cluster model record through the recorded cluster number.
In the existing cluster-based data storage process, as the numerical deviation in the same cluster exists in the cluster center point serving as the record value of the monitoring data, the trend change information of the data points in the recorded time sequence data is lost for the hydraulic loop ecological restoration data containing the time sequence information, and the analysis deviation occurs in the subsequent analysis of the recorded hydraulic loop ecological restoration data.
Therefore, how to improve the data accuracy of the hydraulic circular ecological restoration data containing the time sequence information in the data storage process is a problem to be solved.
Disclosure of Invention
In view of the above, the embodiment of the application provides a high-efficiency storage method of hydraulic circular ecological restoration data, so as to solve the problem of improving the data accuracy of the hydraulic circular ecological restoration data containing time sequence information in the data storage process.
The embodiment of the application provides a high-efficiency storage method of hydraulic circular ecological restoration data, which comprises the following steps:
aiming at any type of hydraulic ecological restoration data in the hydraulic environment, acquiring N-dimensional environmental data corresponding to the hydraulic ecological restoration data at each sampling time based on sampling frequency, wherein N is more than 0, and forming time sequence environmental data from all the N-dimensional environmental data in a preset time period according to a sampling sequence;
constructing a local window of the N-dimensional environment data aiming at any N-dimensional environment data in the time sequence environment data, acquiring Euclidean distances between any two N-dimensional environment data in the local window by utilizing an Euclidean distance calculation formula, and calculating local information trend optimization factors of the N-dimensional environment data according to all Euclidean distances of the local window;
according to the data change of the time sequence environmental data on the time sequence and the local information trend optimization factor of each N-dimensional environmental data, calculating the difference degree of two sides of each N-dimensional environmental data in the time sequence environmental data, and taking the N-dimensional environmental data corresponding to the difference degree of the two sides being larger than a difference degree threshold value as a storage break point to obtain all storage break points in the time sequence environmental data;
and dividing the time sequence environmental data according to all the storage break points, correspondingly obtaining at least one environmental data segment, linearly fitting all the N-dimensional environmental data in the environmental data segment according to any environmental data segment, obtaining a corresponding linear fitting straight line, and storing the linear fitting straight line of all the environmental data segments of the time sequence environmental data.
Further, the calculating the local information trend optimization factor of the N-dimensional environmental data according to all the difference indexes of the local window includes:
aiming at any target N-dimensional environment data in the local window, acquiring a minimum Euclidean distance according to Euclidean distances between the target N-dimensional environment data and other N-dimensional environment data in the local window, and calculating to obtain a local traversal path distance of the N-dimensional environment data according to the minimum Euclidean distance of all the N-dimensional environment data in the local window;
and obtaining the local traversal path distance of each N-dimensional environment data in the local window, and carrying out normalization processing on the local traversal path distance of the N-dimensional environment data according to the local traversal path distance of each N-dimensional environment data in the local window, wherein an obtained normalization result is a local information trend optimization factor of the N-dimensional environment data.
Further, the calculation formula of the local traversal path distance of the N-dimensional environment data is as follows:
wherein,the local traversal path distance of the N-dimensional environment data at the t-th sampling time in the time sequence environment data is obtained; />The number of the N-dimensional environment data contained in the local window corresponding to the N-dimensional environment data at the t-th sampling time in the time sequence environment data is determined; />Is the ith N-dimensional environmental data in the local window; />Is the minimum Euclidean distance of the ith N-dimensional environmental data in the local window.
Further, the calculation formula of the local information trend optimization factor of the N-dimensional environment data is as follows:
wherein,a local information trend optimization factor of the N-dimensional environment data at the t-th sampling time in the time sequence environment data; />The local traversal path distance of the N-dimensional environment data at the p-th sampling time in the local window is obtained; norms are normalization functions.
Further, the calculating the difference degree of the two sides of each N-dimensional environmental data in the time sequence environmental data according to the data change of the time sequence environmental data in the time sequence and the local information trend optimization factor of each N-dimensional environmental data includes:
clustering all N-dimensional environment data in the time sequence environment data to obtain a clustering center point of each N-dimensional environment data;
for the nth N-dimensional environmental data in the time sequence environmental data, calculating a first distance between a clustering center point of the nth N-dimensional environmental data and a clustering center point of the (t-1) th N-dimensional environmental data, calculating a second distance between the clustering center point of the (t) th N-dimensional environmental data and a clustering center point of the (t+1th) th N-dimensional environmental data, calculating a distance difference value between the first distance and the second distance, and obtaining an absolute value of an addition result between a constant 1 and the distance difference value;
calculating a first difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t-1 th N-th environmental data, calculating a second difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t+1 th N-th environmental data, and calculating a difference value between the first difference value and the second difference value;
acquiring the number of the N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the t-th N-dimensional environmental data;
and obtaining the product of the absolute value of the addition result, the difference value and the quantity, and taking the normalized product as the difference degree of the two sides of the nth N-dimensional environmental data.
Further, the obtaining the number of the N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the nth N-dimensional environmental data includes:
the method comprises the steps of respectively obtaining an average value difference value between data average values of N-dimensional environment data and N-dimensional environment data before t N-dimensional environment data according to data average values of each N-dimensional environment data before t N-dimensional environment data, confirming that the data change direction of the N-dimensional environment data is a data increase direction if the average value difference value is greater than 0, and confirming that the data change direction of the N-dimensional environment data is a data decrease direction if the average value difference value is less than 0;
and counting the N-dimensional environmental data with the data change direction being the data increase direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction, or counting the N-dimensional environmental data with the data change direction being the data decrease direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction.
Further, the obtaining the number of the N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the nth N-dimensional environmental data includes:
the method comprises the steps that the average value of each N-dimensional environmental data before the t-th N-dimensional environmental data is obtained, and for any N-dimensional environmental data before the t-th N-dimensional environmental data, the average value difference value between the N-dimensional environmental data and the data average value between the N-dimensional environmental data and the N-dimensional environmental data before the N-th N-dimensional environmental data is obtained, if the average value difference value is smaller than 0, the data change direction of the N-dimensional environmental data is confirmed to be the data reduction direction;
and counting the N-dimensional environment data with the data change direction being the data reduction direction and continuous data, and obtaining the number of the N-dimensional environment data corresponding to the same data change direction.
Further, the step of dividing the data segment of the time sequence environmental data according to all the storage break points, correspondingly obtaining at least one environmental data segment, includes:
and aiming at any two adjacent storage break points in the time sequence environment data, forming all N-dimensional environment data between the two storage break points into an environment data segment.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
according to the method, for any type of hydraulic loop ecological restoration data in the hydraulic loop ecological environment to be stored, time sequence monitoring data corresponding to the type of hydraulic loop ecological restoration data is obtained, namely time sequence environment data consisting of N-dimensional environment data at each sampling time, further local information trend analysis is carried out on the time sequence environment data to extract local information trend optimization factors of each N-dimensional environment data, distance measurement between data points and cluster center points can be optimized in a subsequent data clustering compression process, so that data points with the same change locally are divided into one cluster, further storage break points in the time sequence environment data are obtained through the local information trend optimization factors of each N-dimensional environment data and cluster distribution information of the local time sequence data points, the time sequence environment data are stored in a segmented mode through the storage break points, each segment of environment data is stored in a linear fitting straight line mode, and the accuracy and the efficiency of time sequence environment data storage can be improved through extraction of different time stamps.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for efficiently storing hydraulic loop ecological restoration data according to an embodiment of the present application.
Detailed Description
Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.
It should be noted that the terms "first," "second," and the like in the description of the present disclosure and the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects as disclosed herein.
The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.
In order to illustrate the technical scheme of the application, the following description is made by specific examples.
The specific scene aimed by the application is as follows: and monitoring and storing ecological environment data to be monitored in each hydraulic loop ecological restoration data.
Referring to fig. 1, a method flowchart of a method for efficiently storing hydraulic loop ecological restoration data according to an embodiment of the present application is shown in fig. 1, where the method may include:
step S101, aiming at any type of hydraulic ecological restoration data in the hydraulic environment, acquiring N-dimensional environmental data corresponding to the hydraulic ecological restoration data at each sampling time based on sampling frequency, wherein N is more than 0, and forming time sequence environmental data from all the N-dimensional environmental data in a preset time period according to a sampling sequence.
Specifically, in the process of storing the hydraulic circular ecology restoration data, the hydraulic circular ecology restoration data to be stored is firstly obtained in the process of monitoring the hydraulic circular ecology restoration data. The hydraulic circular ecological restoration data comprise a plurality of types of data of an area to be restored, which are monitored by the hydraulic circular ecological environment, and are respectively hydrologic data, ecological data, water quality data and vegetation data.
For any type of data in the hydraulic engineering ring ecological restoration data, acquiring N-dimensional environment data corresponding to the type of data at each sampling time based on sampling frequency, and forming time sequence environment data from all the N-dimensional environment data in a preset time period according to a sampling sequence. For example, for the vegetation data in the hydraulic circular ecology restoration data, the N-dimensional environmental data of the vegetation data includes a vegetation tag and a corresponding vegetation number, a soil pH value, a soil moisture content, and the like, so that the N-dimensional environmental data corresponding to the vegetation data at each sampling time includes, but is not limited to, the vegetation tag and the corresponding vegetation number, the soil pH value, and the soil moisture content, thereby composing the N-dimensional environmental data at a plurality of consecutive sampling times into the vegetation environmental data.
It should be noted that the sampling frequency may be 1 day, 2 days or one week, and the preset time period may be one month or two months.
In the embodiment of the application, the monitoring data of poplar in vegetation data is taken as an example, and the three-dimensional environment data is formed by the number of poplar at each sampling time, the soil pH value and the soil moisture content, so that the time sequence environment data formed by the three-dimensional environment data at a plurality of continuous sampling times is obtained.
Step S102, a local window of the N-dimensional environment data is constructed according to any N-dimensional environment data in the time sequence environment data, euclidean distance between any two N-dimensional environment data in the local window is obtained by utilizing an Euclidean distance calculation formula, and local information trend optimization factors of the N-dimensional environment data are calculated according to all Euclidean distances of the local window.
Specifically, in the process of monitoring hydraulic ecological restoration data, time sequence changes exist in the monitoring data, for example, in the process of recording the number of poplars in an area, the time scale changes exist in the data, and the change of the data can reflect the change information of the hydraulic ecological restoration, so that in the process of storing the data, the change among the data points needs to be ensured to be reserved in the storing process.
However, in the existing clustering process, the data points of one cluster class in the data space (eliminating the time sequence of the data points and forming discrete data points) are recorded as the same data value, so that one section of data points in the time sequence are all in one cluster class in the storing process, and the corresponding data points are recorded as the same data value, but in the adjacent time sequence data points, different trend changes of the recorded data exist, such as the number of poplars in a certain area is abnormal in trend. Therefore, local information trend optimization factors of each data point need to be obtained through local trend change information of the data points in time sequence, wherein the local information trend optimization factors are used for evaluating the change states of the data points in the clustering process of eliminating the time sequence of the data points.
For any one of the N-dimensional environmental data in the time series environmental dataN-dimensional environment data representing the t-th sampling time in the time sequence environment data, and constructing N-dimensional environment data +.>The length of the local window is set to be 20, the Euclidean distance between any two N-dimensional environment data in the local window is obtained by utilizing an Euclidean distance calculation formula, and then N-dimensional environment data are calculated according to all Euclidean distances of the local window>The specific process is as follows:
(1) And aiming at any target N-dimensional environment data in the local window, acquiring a minimum Euclidean distance according to Euclidean distances between the target N-dimensional environment data and other N-dimensional environment data in the local window, and calculating to obtain a local traversal path distance of the N-dimensional environment data according to the minimum Euclidean distance of all the N-dimensional environment data in the local window.
Specifically, the calculation formula of the local traversal path distance of the N-dimensional environment data is as follows:
wherein,the local traversal path distance of the N-dimensional environment data at the t-th sampling time in the time sequence environment data is obtained; />For the number of N-dimensional environmental data contained in the partial window corresponding to the N-dimensional environmental data at the t-th sampling time in the time sequence environmental data, namely +.>;/>Is the ith N-dimensional environmental data in the local window; />Is the minimum Euclidean distance of the ith N-dimensional environmental data in the local window.
Starting from the N-dimensional environment data at the t-th sampling time, the local window is formed byTraversing is carried out in the N-dimensional environment data, wherein the traversing process is to search the next N-dimensional environment data by taking one of the N-dimensional environment data as a starting point, and the searching standard is the non-traversed N-dimensional environment data with the smallest Euclidean distance in the local window. Wherein the traversal path is different for each N-dimensional environment data, so the above formula is passed +.>Indicate->The path traversed, i.e., the minimum Euclidean distance for each N-dimensional environmental data in the local window. Because the path weights are different in the traversal process, that is, the closer the path weights are to the central data (N-dimensional environment data at the t-th sampling time), the higher the weights are, and the farther the path weights are, that is, the lower the weights are, that is, the +.>. Therefore, according to all the minimum euclidean distances in the local window with the window length of the N-dimensional environmental data at the t-th sampling time being 20, the minimum euclidean distances between other N-dimensional environmental data of the local window and the N-dimensional environmental data at the t-th sampling time are obtained as the measurement of the local information trend optimization factor of the N-dimensional environmental data at the t-th sampling time, namely, the shortest traversal path distance is performed from the N-dimensional environmental data (the N-dimensional environmental data at the t-th sampling time) in the 20N-dimensional environmental data in the local window of the N-dimensional environmental data at the t-th sampling time. In the time sequence environment data, the traversing distance between one data and the local data can comprise the change information of the local range of the data, and because the data on the time sequence has time sequence relevance, the local traversing path distance of the data is judged in a local window mode, so that on the basis of guaranteeing the time sequence relevance of the data, whether the local data has obvious trend change is measured, the larger the minimum Euclidean distance is, the larger the local traversing path distance of the N-dimensional environment data at the t sampling moment is, and the obvious trend change exists in the local data of the N-dimensional environment data at the t sampling moment.
(2) And obtaining the local traversal path distance of each N-dimensional environment data in the local window, and carrying out normalization processing on the local traversal path distance of the N-dimensional environment data according to the local traversal path distance of each N-dimensional environment data in the local window, wherein an obtained normalization result is a local information trend optimization factor of the N-dimensional environment data.
Specifically, the calculation formula of the local information trend optimization factor of the N-dimensional environment data is as follows:
wherein,a local information trend optimization factor of the N-dimensional environment data at the t-th sampling time in the time sequence environment data; />The local traversal path distance of the N-dimensional environment data at the p-th sampling time in the local window is obtained; norms are normalization functions.
It should be noted that, after the local traversal path distance in the local window of the N-dimensional environmental data at the t-th sampling time is obtained, in order to measure the trend change characteristics of the N-dimensional environmental data and the surrounding N-dimensional environmental data, the relative measurement is performed by the local traversal path distances of other N-dimensional environmental data in the local window of the N-dimensional environmental data in their respective local windows, that is, in the above formulaAnd obtaining the local information trend optimization factor of the N-dimensional environment data at the t-th sampling time in the time sequence environment data.
The local information trend optimization factors of each N-dimensional environmental data in the time sequence environmental data can be respectively obtained by the method for obtaining the local information trend optimization factors, and the distance measurement between the data point and the cluster center point can be optimized in the subsequent data clustering process by extracting the local information trend optimization factors of each N-dimensional environmental data, so that the data point with the same change in local area is divided into one cluster in the clustering process.
Step S103, according to the data change of the time sequence environmental data on the time sequence and the local information trend optimization factor of each N-dimensional environmental data, calculating the difference degree of two sides of each N-dimensional environmental data in the time sequence environmental data, and taking the N-dimensional environmental data corresponding to the difference degree of the two sides being larger than the difference degree threshold value as a storage break point to obtain all storage break points in the time sequence environmental data.
Specifically, after the local information trend optimization factor of each piece of N-dimensional environment data in the time sequence environment data is obtained, the distance measurement between the N-dimensional environment data and the corresponding local information trend optimization factor can be optimized in the clustering process. However, in the trend optimization process of the local time sequence data of the N-dimensional environment data in the actual clustering process, if the distances between the data points in the clustering process and the cluster center point are weighted and optimized only by the local information trend optimization factor, the relation of the N-dimensional environment data distributed at different positions in the same cluster can exist in the subsequent linear fitting process of the N-dimensional environment data under the continuous change condition of the N-dimensional environment data in time sequence. For example: in the process of carrying out cluster compression storage on monitoring data of poplars in areas with different sampling times in time sequence, N-dimensional environment data which are similar and have similar local information trend optimization factors are divided into the same cluster types, but different position distribution relations of the N-dimensional environment data in a cluster type range in a data space still exist in the cluster types. In the process of storing the data, judgment of a storage break point in the time sequence environment data is needed to be carried out wholly through cluster change of local data in the cluster and distribution change in the cluster, so that the time sequence environment data can be stored in a segmented mode.
For the storage break point in the time sequence environmental data, according to the data change of the time sequence environmental data on the time sequence and the local information trend optimization factor of each N-dimensional environmental data, calculating the difference degree of the two sides of each N-dimensional environmental data in the time sequence environmental data, wherein the difference degree is specifically as follows:
(1) Clustering all the N-dimensional environment data in the time sequence environment data to obtain a clustering center point of each N-dimensional environment data.
Specifically, clustering all N-dimensional environment data in the time-series environment data by using a DBSCAN clustering algorithm to obtain a plurality of clustering clusters, wherein the cluster center of each clustering cluster is used as a clustering center point of each N-dimensional environment data contained in the corresponding clustering cluster.
(2) For the nth N-dimensional environmental data in the time sequence environmental data, calculating a first distance between a clustering center point of the nth N-dimensional environmental data and a clustering center point of the (t-1) th N-dimensional environmental data, calculating a second distance between the clustering center point of the (t) th N-dimensional environmental data and a clustering center point of the (t+1th) th N-dimensional environmental data, calculating a distance difference value between the first distance and the second distance, and obtaining an absolute value of an addition result between a constant 1 and the distance difference value.
Specifically, the calculation formula of the distance difference value is:
wherein,a second distance between the clustering center point of the t-th N-dimensional environment data and the clustering center point of the t+1th N-dimensional environment data; />A first distance between a clustering center point of the t-th N-dimensional environment data and a clustering center point of the t-1 th N-dimensional environment data; />Is the difference in distance between the first distance and the second distance.
(3) Calculating a first difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t-1 th N-th environmental data, calculating a second difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t+1-th N-th environmental data, and calculating a difference value between the first difference value and the second difference value.
In particular, the method comprises the steps of,wherein->For a second difference between the local information trend optimization factor of the nth N-dimensional environmental data and the local information trend optimization factor of the (t+1) th N-dimensional environmental data,>for a first difference between the local information trend optimization factor of the nth N-dimensional environmental data and the local information trend optimization factor of the t-1 th N-dimensional environmental data,/a>Is the difference between the first difference and the second difference.
(4) And acquiring the number of the N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the t-th N-dimensional environmental data.
Specifically, the data average value of each piece of N-dimensional environmental data before the nth piece of N-dimensional environmental data is respectively obtained, and for any piece of N-dimensional environmental data before the nth piece of N-dimensional environmental data, the average value difference value between the N-dimensional environmental data and the data average value between the N-dimensional environmental data and the N-dimensional environmental data before the nth piece of N-dimensional environmental data is obtained, and if the average value difference value is greater than 0, the data change direction of the N-dimensional environmental data is confirmed to be the data increase direction;
and counting the N-dimensional environmental data with the data change direction being the data increase direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction, or counting the N-dimensional environmental data with the data change direction being the data decrease direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction.
Or, the data average value of each N-dimensional environmental data before the nth N-dimensional environmental data is respectively obtained, and for any N-dimensional environmental data before the nth N-dimensional environmental data, the average value difference value between the N-dimensional environmental data and the data average value between the N-dimensional environmental data and the previous N-dimensional environmental data is obtained, if the average value difference value is smaller than 0, the data change direction of the N-dimensional environmental data is confirmed to be the data reduction direction;
and counting the N-dimensional environment data with the data change direction being the data reduction direction and continuous data, and obtaining the number of the N-dimensional environment data corresponding to the same data change direction.
If the average difference is equal to 0, it is indicated that the adjacent N-dimensional environment data is not changed, and it is not necessary to use the N-dimensional environment data as N-dimensional environment data corresponding to the same data change direction to be counted.
(5) And obtaining the product of the absolute value of the addition result, the difference value and the quantity, and taking the normalized product as the difference degree of the two sides of the nth N-dimensional environmental data.
Specifically, the calculation expression of the difference degree of the two sides of the nth dimensional environmental data is:
wherein,two-sided data difference degree representing the nth N-dimensional environmental data, +.>And the number of N-dimensional environment data corresponding to the same data change direction before the t-th N-dimensional environment data is represented.
It should be noted that, the difference of distances between cluster center points of clusters to which data on both sides of the nth N-dimensional environmental data in the time sequence environmental data belongs in the clustering result is used as a difference measure between the corresponding clusters of the nth N-dimensional environmental data and the data on both sides thereof, namely, the distance difference valueThe larger the data difference degree of the two sides corresponding to the nth-dimension environmental data is, the larger the data difference degree is; meanwhile, according to the difference between the t N-dimensional environmental data and the local information trend optimization factors corresponding to the two-side data in the time sequence environmental data, the data difference between any N-dimensional environmental data and the two-side data in the time sequence environmental data is measured, and the difference is->The larger the data difference between the nth N-dimensional environmental data and the data on the two sides is, the larger the data difference degree of the data on the two sides corresponding to the nth N-dimensional environmental data is; the number of N-dimensional environmental data belonging to the same data change direction before the t-th N-dimensional environmental data +.>The more the data change trend of the time sequence environment data is more obvious, the larger the difference between the corresponding data is, so the greater the difference degree of the data at the two sides of the nth-dimension environment data is.
(6) And (3) respectively acquiring the difference degree of the two sides of each N-dimensional environmental data in the time sequence environmental data based on the steps (1) - (5).
In order to preserve local change trend information in the time sequence environmental data in the storage process of the time sequence environmental data, a plurality of data with high data change degree are required to be determined in the time sequence environmental data to serve as break points in the storage process, and the data points are required to be accurately read in the subsequent analysis process due to obvious change, so that any N-dimensional environmental data is required to serve as an evaluation of the storage break points according to the local data change condition of each N-dimensional environmental data in the time sequence environmental data. Specifically, after the difference degree of the two-side data of each piece of N-dimensional environmental data in the time sequence environmental data is obtained, comparing the difference degree of the two-side data of the N-dimensional environmental data with a preset difference degree threshold value for any piece of N-dimensional environmental data, and taking the N-dimensional environmental data corresponding to the difference degree of the two-side data being larger than the difference degree threshold value as a storage break point, so that all storage break points in the time sequence environmental data can be obtained, and preferably, the preset difference degree threshold value is 0.7.
Step S104, dividing the data segments of the time-series environmental data according to all the storage break points, correspondingly obtaining at least one environmental data segment, and linearly fitting all the N-dimensional environmental data in the environmental data segment according to any environmental data segment to obtain a corresponding linear fitting straight line, and storing the linear fitting straight line of all the environmental data segments of the time-series environmental data.
Specifically, after the storage break point in the time sequence environmental data is obtained, the time sequence environmental data can be subjected to segmentation processing through the storage break point, and the specific segmentation processing is as follows: and aiming at any two adjacent storage break points in the time sequence environment data, forming all N-dimensional environment data between the two storage break points into an environment data segment. The method comprises the steps of forming an environment data segment by N-dimensional environment data between every two adjacent storage break points, and then carrying out linear fitting on all N-dimensional environment data in the environment data segment to obtain a corresponding linear fitting straight line, namely, for one segment of environment data, recording the straight line fitted by the segment of environment data as a record of the segment of environment data. After the linear fitting straight line of each environmental data segment in the time sequence environmental data is obtained, only the linear fitting straight line of each environmental data segment is stored, and since the linear fitting straight line contains each N-dimensional environmental data in the corresponding environmental data segment, the N-dimensional environmental data can be extracted by utilizing different time stamps directly based on the linear fitting straight line.
In summary, the embodiment of the application aims at any type of hydraulic loop ecological restoration data in the hydraulic loop ecological environment to be stored, time sequence monitoring data corresponding to the type of hydraulic loop ecological restoration data is obtained, namely time sequence environmental data consisting of N-dimensional environmental data at each sampling time, further, local information trend analysis is carried out on the time sequence environmental data, so as to extract local information trend optimization factors of each N-dimensional environmental data, the distance measurement between data points and cluster center points can be optimized in the subsequent data clustering compression process, so that data points with the same change locally are divided into a cluster, further, the local information trend optimization factors of each N-dimensional environmental data and cluster distribution information of local time sequence data points are obtained, the time sequence environmental data is stored in a segmented mode through storage break points, each segment of environmental data is stored in a linear straight line mode, the data extraction can be carried out through different time stamps on the basis of the linear straight line, and the accuracy and the efficiency of time sequence environmental data storage are improved.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. The efficient storage method of the hydraulic loop ecological restoration data is characterized by comprising the following steps of:
aiming at any type of hydraulic ecological restoration data in the hydraulic environment, acquiring N-dimensional environmental data corresponding to the hydraulic ecological restoration data at each sampling time based on sampling frequency, wherein N is more than 0, and forming time sequence environmental data from all the N-dimensional environmental data in a preset time period according to a sampling sequence;
constructing a local window of the N-dimensional environment data aiming at any N-dimensional environment data in the time sequence environment data, acquiring Euclidean distances between any two N-dimensional environment data in the local window by utilizing an Euclidean distance calculation formula, and calculating local information trend optimization factors of the N-dimensional environment data according to all Euclidean distances of the local window;
according to the data change of the time sequence environmental data on the time sequence and the local information trend optimization factor of each N-dimensional environmental data, calculating the difference degree of two sides of each N-dimensional environmental data in the time sequence environmental data, and taking the N-dimensional environmental data corresponding to the difference degree of the two sides being larger than a difference degree threshold value as a storage break point to obtain all storage break points in the time sequence environmental data;
and dividing the time sequence environmental data according to all the storage break points, correspondingly obtaining at least one environmental data segment, linearly fitting all the N-dimensional environmental data in the environmental data segment according to any environmental data segment, obtaining a corresponding linear fitting straight line, and storing the linear fitting straight line of all the environmental data segments of the time sequence environmental data.
2. The efficient storage method according to claim 1, wherein the calculating the local information trend optimization factor of the N-dimensional environmental data according to all the difference indexes of the local window includes:
aiming at any target N-dimensional environment data in the local window, acquiring a minimum Euclidean distance according to Euclidean distances between the target N-dimensional environment data and other N-dimensional environment data in the local window, and calculating to obtain a local traversal path distance of the N-dimensional environment data according to the minimum Euclidean distance of all the N-dimensional environment data in the local window;
and obtaining the local traversal path distance of each N-dimensional environment data in the local window, and carrying out normalization processing on the local traversal path distance of the N-dimensional environment data according to the local traversal path distance of each N-dimensional environment data in the local window, wherein an obtained normalization result is a local information trend optimization factor of the N-dimensional environment data.
3. The efficient storage method according to claim 2, wherein the calculation formula of the local traversal path distance of the N-dimensional environment data is:
wherein,the local traversal path distance of the N-dimensional environment data at the t-th sampling time in the time sequence environment data is obtained; n (N) k (H t ) The number of the N-dimensional environment data contained in the local window corresponding to the N-dimensional environment data at the t-th sampling time in the time sequence environment data is determined; i is the ith N-dimensional environmental data in the local window; dist (e) i ) Is the minimum Euclidean distance of the ith N-dimensional environmental data in the local window.
4. The efficient storage method of claim 3 wherein the calculation formula of the local information trend optimization factor of the N-dimensional environmental data is:
wherein, xi t A local information trend optimization factor of the N-dimensional environment data at the t-th sampling time in the time sequence environment data;the local traversal path distance of the N-dimensional environment data at the p-th sampling time in the local window is obtained; norms are normalization functions.
5. The efficient storage method according to claim 1, wherein the calculating the degree of difference between the two sides of each N-dimensional environmental data in the time-series environmental data according to the data change of the time-series environmental data in time series and the local information trend optimization factor of each N-dimensional environmental data comprises:
clustering all N-dimensional environment data in the time sequence environment data to obtain a clustering center point of each N-dimensional environment data;
for the nth N-dimensional environmental data in the time sequence environmental data, calculating a first distance between a clustering center point of the nth N-dimensional environmental data and a clustering center point of the (t-1) th N-dimensional environmental data, calculating a second distance between the clustering center point of the (t) th N-dimensional environmental data and a clustering center point of the (t+1th) th N-dimensional environmental data, calculating a distance difference value between the first distance and the second distance, and obtaining an absolute value of an addition result between a constant 1 and the distance difference value;
calculating a first difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t-1 th N-th environmental data, calculating a second difference value between the local information trend optimizing factor of the t N-th environmental data and the local information trend optimizing factor of the t+1 th N-th environmental data, and calculating a difference value between the first difference value and the second difference value;
acquiring the number of the N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the t-th N-dimensional environmental data;
and obtaining the product of the absolute value of the addition result, the difference value and the quantity, and taking the normalized product as the difference degree of the two sides of the nth N-dimensional environmental data.
6. The efficient storage method according to claim 5, wherein the obtaining the number of N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the nth N-dimensional environmental data includes:
the method comprises the steps of respectively obtaining an average value difference value between data average values of N-dimensional environment data and N-dimensional environment data before the nth N-dimensional environment data according to the data average value of each N-dimensional environment data before the nth N-dimensional environment data, and confirming that the data change direction of the N-dimensional environment data is the data increase direction if the average value difference value is greater than 0;
and counting the N-dimensional environmental data with the data change direction being the data increase direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction, or counting the N-dimensional environmental data with the data change direction being the data decrease direction and the data being continuous, so as to obtain the number of the N-dimensional environmental data corresponding to the same data change direction.
7. The efficient storage method according to claim 5, wherein the obtaining the number of N-dimensional environmental data corresponding to the same data change direction according to the data change direction of each N-dimensional environmental data before the nth N-dimensional environmental data includes:
the method comprises the steps that the average value of each N-dimensional environmental data before the t-th N-dimensional environmental data is obtained, and for any N-dimensional environmental data before the t-th N-dimensional environmental data, the average value difference value between the N-dimensional environmental data and the data average value between the N-dimensional environmental data and the N-dimensional environmental data before the N-th N-dimensional environmental data is obtained, if the average value difference value is smaller than 0, the data change direction of the N-dimensional environmental data is confirmed to be the data reduction direction;
and counting the N-dimensional environment data with the data change direction being the data reduction direction and continuous data, and obtaining the number of the N-dimensional environment data corresponding to the same data change direction.
8. The efficient storage method according to claim 1, wherein the performing data segment division on the time-series environmental data according to all storage discontinuities corresponds to at least one environmental data segment, includes:
and aiming at any two adjacent storage break points in the time sequence environment data, forming all N-dimensional environment data between the two storage break points into an environment data segment.
CN202311185325.7A 2023-09-14 2023-09-14 Efficient storage method for hydraulic circular ecological restoration data Active CN116910595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311185325.7A CN116910595B (en) 2023-09-14 2023-09-14 Efficient storage method for hydraulic circular ecological restoration data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311185325.7A CN116910595B (en) 2023-09-14 2023-09-14 Efficient storage method for hydraulic circular ecological restoration data

Publications (2)

Publication Number Publication Date
CN116910595A CN116910595A (en) 2023-10-20
CN116910595B true CN116910595B (en) 2023-12-08

Family

ID=88367390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311185325.7A Active CN116910595B (en) 2023-09-14 2023-09-14 Efficient storage method for hydraulic circular ecological restoration data

Country Status (1)

Country Link
CN (1) CN116910595B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651020A (en) * 2012-03-31 2012-08-29 中国科学院软件研究所 Method for storing and searching mass sensor data
KR102016633B1 (en) * 2018-11-15 2019-08-30 엘아이지넥스원 주식회사 Signal Recirculation Method by Digital Radio Frequency Memory
CN110580552A (en) * 2019-09-12 2019-12-17 南京邮电大学 Universal regional environment information mobile sensing and predicting method
WO2020155755A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Spectral clustering-based optimization method for anomaly point ratio, device, and computer apparatus
CN116343953A (en) * 2023-05-30 2023-06-27 苏州绿华科技有限公司 Intelligent community management system based on artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651020A (en) * 2012-03-31 2012-08-29 中国科学院软件研究所 Method for storing and searching mass sensor data
KR102016633B1 (en) * 2018-11-15 2019-08-30 엘아이지넥스원 주식회사 Signal Recirculation Method by Digital Radio Frequency Memory
WO2020155755A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Spectral clustering-based optimization method for anomaly point ratio, device, and computer apparatus
CN110580552A (en) * 2019-09-12 2019-12-17 南京邮电大学 Universal regional environment information mobile sensing and predicting method
CN116343953A (en) * 2023-05-30 2023-06-27 苏州绿华科技有限公司 Intelligent community management system based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李利民 ; 刘明辉 ; .基于机器学习算法的人脸识别鲁棒性研究.中国电子科学研究院学报.2017,(02),全文. *

Also Published As

Publication number Publication date
CN116910595A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN101821734B (en) Detection and classification of matches between time-based media
CN106600960A (en) Traffic travel origin and destination identification method based on space-time clustering analysis algorithm
CN111339129B (en) Remote meter reading abnormity monitoring method and device, gas meter system and cloud server
CN112766429B (en) Method, device, computer equipment and medium for anomaly detection
US20180192245A1 (en) Extraction and Representation method of State Vector of Sensing Data of Internet of Things
CN108683658B (en) Industrial control network flow abnormity identification method based on multi-RBM network construction reference model
CN113240518A (en) Bank-to-public customer loss prediction method based on machine learning
CN114037140A (en) Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium
CN114819289A (en) Prediction method, training method, device, electronic device and storage medium
CN111148045B (en) User behavior cycle extraction method and device
CN116910595B (en) Efficient storage method for hydraulic circular ecological restoration data
CN111339155B (en) Correlation analysis system
CN110874601B (en) Method for identifying running state of equipment, state identification model training method and device
CN116720079A (en) Wind driven generator fault mode identification method and system based on multi-feature fusion
CN116910526A (en) Model training method, device, communication equipment and readable storage medium
CN107944475B (en) Track outlier detection method based on public fragment subsequence
CN113660147B (en) IP session sequence periodicity evaluation method based on fuzzy entropy
CN115879051A (en) Track big data anomaly detection method and system based on VAE
CN114819260A (en) Dynamic generation method of hydrologic time series prediction model
CN111507397A (en) Abnormal data analysis method and device
CN111177465A (en) Method and device for determining category
Elnekave et al. Predicting future locations using clusters' centroids
CN115965137B (en) Specific object relevance prediction method, system, terminal and storage medium
CN113379125B (en) Logistics storage sales prediction method based on TCN and LightGBM combined model
CN110851450A (en) Accompanying vehicle instant discovery method based on incremental calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant