Time-space domain model modeling method and system for ecological environment monitoring
Technical Field
The invention relates to a data processing method and a data processing system for a prediction purpose, in particular to a time-space domain model modeling method and a time-space domain model modeling system for ecological environment monitoring.
Background
The ecological environment is monitored, a series of monitoring data can be obtained, and the data can be used for analysis, prediction and treatment of environmental protection. At present, for ecological environment monitoring data in the field of environmental protection, a deep learning method is started to analyze the data, but the deep learning needs big data as support, and a training model of the deep learning is difficult to converge when sparse data is encountered. In the field of environmental protection, a considerable part of data is discrete and sparse related to monitoring data of ecological environment; for example, atmospheric pollutants may appear randomly and may disappear soon after appearance. Thus, these discrete, sparse data cannot be an input training set suitable for deep learning.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a time-space domain model for ecological environment monitoring and a modeling method, monitoring data are converted into ordered and dense data through the model, and the ordered and dense data can be provided for deep learning to train, so that the technical problem that in the field of environmental protection, ecological environment monitoring data are discrete and sparse and cannot become an input training set suitable for deep learning in the prior art is solved.
The technical scheme adopted by the invention is that a time-space domain model modeling method for ecological environment monitoring comprises the following steps:
s1, collecting pollutant data at regular time to obtain a plurality of data frames;
s2, overlapping a plurality of data frames in a spatial overlapping mode according to the time sequence to obtain a time-space domain data structure;
s3, extracting characteristics of the time-space domain data structure by taking the change of the time-space domain data as characteristic quantity to obtain a data structure without null values;
s4, performing dimensionality reduction on the data structure without null values by using a data dimensionality reduction method to obtain a time-space domain model;
and S5, inputting the sparse monitoring data into a time-space domain model, converting the monitoring data into dense data through the time-space domain model, and using the dense data as the input of deep learning training.
Further, the data dimension reduction method described in S4 is a PCA algorithm, and the PCA algorithm selects a dimension with the largest data variance in the projection dimension for projection.
Further, the data dimension reduction method in S4 is a laplacian eigenmap algorithm, and the laplacian eigenmap algorithm implements data dimension reduction by constructing a relationship between data according to local angles of data vectors.
Further, the laplacian feature mapping algorithm specifically includes the following steps:
s41, constructing all points into a graph by using a KNN algorithm, and connecting each point with the nearest K points;
s42, determining the weight between each point and each point according to the connection condition between each point;
and S43, using Laplace transform, and taking the data vector of the weight corresponding point which is not 0 as an output result after dimensionality reduction.
Further, the spatial superimposition in S2 is superimposition based on raster data.
Further, in the step S3, the change of the time-space domain data is obtained by projecting two data frames in the time-space domain data into a same constant subspace, and then reading the rotation angle difference between the two data frames.
Further, the input of the deep learning training is 512 × 512 matrix grid or 224 × 224 matrix grid.
Further, the data frame described in S1 includes 4 dimensions of information, where 3 dimensions are spatial coordinate information and 1 dimension is time axis information.
The invention also provides a time-space domain model modeling system for ecological environment monitoring, which comprises: the system comprises a data input unit, a data superposition unit, a data feature extraction unit and a data dimension reduction processing unit;
the data input unit is used for inputting a plurality of data frames obtained according to the collected pollutant information;
the data superposition unit is used for superposing a plurality of data frames in a space superposition mode to obtain a time-space domain data structure;
the data feature extraction unit is used for extracting features of the time-space domain data structure by taking the change of the time-space domain data as a feature quantity to obtain a data structure without null values;
and the data dimension reduction processing unit is used for performing dimension reduction processing on the data structure without the null value by using a data dimension reduction method.
According to the technical scheme, the beneficial technical effects of the invention are as follows:
1. the obtained time-space domain data model can stack discrete and sparse monitoring data of an ecological environment on a time-space domain, extract characteristic quantities so as to delete null value region data, perform data dimension reduction to obtain the time-space domain model, input the sparse monitoring data into the time-space domain model, convert the monitoring data into dense data through the time-space domain model, and obtain input suitable for deep learning.
2. In the process of data dimension reduction processing, a Laplace characteristic mapping algorithm is used, so that the classification effect is good while the dimension of the data is reduced, and the construction of a deep learning training set and a deep learning verification set is facilitated.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a block diagram of the system architecture of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
Example 1
As shown in FIG. 1, the invention provides a time-space domain model modeling method for ecological environment monitoring, which comprises the following steps:
s1, collecting pollutant data at regular time to obtain a plurality of data frames;
s2, overlapping a plurality of data frames in a spatial overlapping mode according to the time sequence to obtain a time-space domain data structure;
s3, extracting characteristics of the time-space domain data structure by taking the change of the time-space domain data as characteristic quantity to obtain a data structure without null values;
s4, performing dimensionality reduction on the data structure without null values by using a data dimensionality reduction method to obtain a time-space domain model;
and S5, inputting the sparse monitoring data into a time-space domain model, converting the monitoring data into dense data through the time-space domain model, and using the dense data as the input of deep learning training.
The working principle of example 1 is explained in detail below:
the invention relates to a time-space domain model modeling method for ecological environment monitoring, which is realized by the following steps:
1. collecting pollutant data regularly to obtain multiple data frames
In the present embodiment, the atmospheric pollutants are exemplified by the pollutants, and include but are not limited to PM2.5, PM10, sulfur oxides, nitrogen oxides, carbon monoxide, hydrocarbons, and the like. The fixed monitoring station and the mobile monitoring station which are built at present can collect the atmospheric pollutants. The monitoring data is acquired using a timed acquisition, preferably 1 time every 10 minutes. The collected data contains the following information: and monitoring the time information and the place information of the atmospheric pollutants, and the components and the concentrations of the atmospheric pollutants. The location information includes 3 kinds of spatial coordinate information of longitude, latitude, and altitude of the location. Therefore, in a monitoring period, a plurality of monitoring data can be collected and taken as a plurality of data frames; each data frame has 4 dimensions of information in terms of time and space, wherein 3 dimensions are space coordinate information and 1 dimension is time axis information. The time axis information represents the chronological order of the data frames.
If the atmospheric pollutants are not monitored at the monitoring time point, defining the time information and the place information of the atmospheric pollutants, wherein the components and the concentration of the atmospheric pollutants are all null values. In the actual monitoring process, because the generation of the atmospheric pollutants is sudden, random, periodic and not continuous, only a small part of data frames generally contain the atmospheric pollutants information, and most of the data frames are null values.
In this embodiment, a digital counter may be used in combination with a reference signal of a clock reference source to implement a function of timing acquisition control.
2. According to the time sequence, a plurality of data frames are overlapped to obtain a time-space domain data structure
According to the plurality of data frames obtained in the step 1, in the data frames with non-null values, 3 kinds of spatial coordinate information, namely longitude, latitude and altitude, of each data frame are embodied in a spatial coordinate system to be a point. And combining the time axis information of the data frames, and superposing the data frames according to the time sequence, which is equivalent to spatially superposing the spatial coordinate points in the same spatial coordinate system. In the process of superposition, each superposition layer is a data frame.
From the perspective of the data structure and the spatial stereoscopic view, each overlay layer may be expressed by using a point-line-plane layer file of a vector structure, or may be expressed by using a layer file of a grid structure. Based on the superposition of vector data, the two layers participating in analysis are both vector data, the storage capacity of the data is small, but the operation process is complex. Based on the superposition of raster data, two layers participating in analysis are both raster data, the storage capacity of the data is large, but the operation process is simpler. In this embodiment, the 3 kinds of spatial coordinate information, i.e., longitude, latitude, and altitude, of each data frame are all embodied as one point in the spatial coordinate system, and one point in the spatial coordinate system can be regarded as one grid, so that it is preferable that the spatial superposition is performed based on grid data in this embodiment.
From the different characteristics of the graphics of the superimposed object, the method can be divided into the superposition of points and lines, the superposition of lines and lines, the superposition of points and polygons, the superposition of lines and polygons, and the superposition of polygons and polygons. In this embodiment, in the process of performing spatial superposition on the spatial coordinates of the data frames according to the time sequence, for the first two data frames with the atmospheric pollutants, the superposition of points is performed; with the increase of data frames, one point is superposed into one line in a space coordinate system, different lines are further superposed into one polygon, and finally the polygon and the polygon are superposed. By superimposing the data frames in a time period, a data structure is obtained, which is a time-space domain data structure, and is a model reflecting the time-domain variation of the atmospheric pollutants on the spatial coordinate system, such as at which spatial points the atmospheric pollutants are generated, diffuse to which spatial points, fade from which spatial points, and the like.
3. Feature extraction for time-space domain data structure
According to the time-space domain data structure obtained in the step 2, only a few data frames contain atmospheric pollution information, and most data are null values, so that from the perspective of a spatial stereoscopic view, most regions in the space are null value regions, the null value regions are not effective input for deep learning, in order to improve the training efficiency of the deep learning, the null value regions need to be deleted for preprocessing, and only the regions containing the atmospheric pollution information are reserved.
And performing feature extraction on the time-space domain data structure, and taking the change of the time-space domain data as a feature quantity. Specifically, when a certain region of the temporal spatial domain data is not changed all the time, the region is determined to be null; if a certain area of the time-space domain data changes, the generation, diffusion or disappearance of the atmospheric pollutants in the area is indicated, and the data of the area and the information of the atmospheric pollutants are reserved. The change of the time-space domain data structure can be embodied by the geometric transformation between the adjacent data frames. The two data frames are respectively projected into a same constant subspace, and the rotation angle difference of the two data frames is read out, so that the geometric transformation information between the two adjacent data frames can be obtained. If there is an angular difference, a change is indicated, and if the angular difference is 0, no change is indicated.
4. Performing dimensionality reduction on the data structure subjected to feature extraction to obtain a time-space domain model
For the time-space domain data structure after feature extraction, a PCA algorithm can be selected to perform dimensionality reduction processing on the data. The PCA algorithm is a linear dimensionality reduction method, and the algorithm maps high-dimensional data into a low-dimensional space through certain linear projection to represent, and selects the dimension with the largest data variance on the projection dimension to carry out projection so as to use less data dimensions and keep the characteristics of more original data points. The PCA algorithm is a linear dimensionality reduction method with minimal loss of raw data information.
In the process of data dimension reduction processing, the original data is n-dimensional data, and in this embodiment, the data has 4 dimensions, including 3 dimensions of space and 1 dimension of time; firstly, selecting the first dimension data with the largest variance in a certain dimension, then selecting the direction perpendicular to or orthogonal to the first coordinate axis for the second coordinate axis, and selecting the direction perpendicular to or orthogonal to the first and second coordinate axes for the third coordinate axis. This process is repeated until the sum of the dimensions of the new coordinate system reaches a given value. In this embodiment, the dimension is eventually reduced to 2 dimensions. When the data is reduced to 2 dimensions, a model of the space-time domain is obtained.
5. Inputting the sparse monitoring data into a time-space domain model to be converted into dense data which is used as the input of deep learning training
For the established time-space domain model, other ecological environment monitoring data, such as monitoring data of other regions or monitoring data of other time periods of the same region, are input into the time-space domain model, and the discrete and sparse monitoring data are converted into dense data through the time-space domain model to be used as input of deep learning training.
For the output 2-dimensional data, it is preferable to use a 512 × 512 matrix grid or a 224 × 224 matrix grid, which can simultaneously take into account the efficiency and accuracy of the calculation in the subsequent data processing. Because if the size of the matrix grid is too small, the information is lost too severely; if the size is too large, the abstraction level of the information is not high enough and the amount of computation is also greater.
Example 2
The PCA algorithm does not have a significant effect on classification in the data dimension reduction process, which may not be particularly desirable for constructing the input for deep learning. As the deeply learned data needs to be separated into a training set and a validation set.
In order to solve the technical problem, a Laplace characteristic mapping algorithm is selected when the dimension of the data is reduced.
The working principle of the method is explained as follows: the laplacian algorithm uses local angles of data vectors to construct the relationship between data, and if two data frames are very similar, the data frames should be as close as possible in the target subspace after dimensionality reduction. The algorithm comprises the specific steps of firstly constructing all points into a graph by using a KNN algorithm, and connecting each point with the nearest K points. In this embodiment, the requirement is to reduce the dimension of the data from 4 dimensions, and the value of K is selected to be 4. Then, the weight between each point and the point is determined, and in the present embodiment, to achieve the effect of fast calculation and to determine the weight by simplifying the setting, the weight W is set to 1 when two points are connected and to 0 when two points are not connected. And finally, using Laplace transform, and taking the data vector between the point corresponding to the weight which is not 0 and the point as an output result after dimensionality reduction. Specifically, in the present embodiment, in the time-space domain data, the data corresponding to the generation-diffusion process of the atmospheric pollutants is classified into one type in time sequence, and the data corresponding to the dissipation process of the atmospheric pollutants is classified into another type.
Example 3
The present invention also provides a time-space domain model modeling system for monitoring ecological environment, as shown in fig. 2, including: the system comprises a data input unit, a data superposition unit, a data feature extraction unit and a data dimension reduction processing unit;
the data input unit is used for inputting a plurality of data frames obtained according to the collected pollutant information;
the data superposition unit is used for superposing a plurality of data frames in a space superposition mode to obtain a time-space domain data structure;
the data feature extraction unit is used for extracting features of the time-space domain data structure by taking the change of the time-space domain data as a feature quantity to obtain a data structure without null values;
and the data dimension reduction processing unit is used for performing dimension reduction processing on the data structure without the null value by using a data dimension reduction method to obtain deep learning input data.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.