CN114138857A - Big data mining method and device based on watershed water environment - Google Patents

Big data mining method and device based on watershed water environment Download PDF

Info

Publication number
CN114138857A
CN114138857A CN202111329268.6A CN202111329268A CN114138857A CN 114138857 A CN114138857 A CN 114138857A CN 202111329268 A CN202111329268 A CN 202111329268A CN 114138857 A CN114138857 A CN 114138857A
Authority
CN
China
Prior art keywords
data
mining
index
evaluation
water quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111329268.6A
Other languages
Chinese (zh)
Inventor
王国强
薛宝林
王溥泽
彭岩波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN202111329268.6A priority Critical patent/CN114138857A/en
Publication of CN114138857A publication Critical patent/CN114138857A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Remote Sensing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of environmental data processing, in particular to a method and a device for mining big data based on a watershed water environment. The method comprises the following steps: acquiring original data from each application through an interface access layer; preprocessing original data through a data acquisition ETL platform to obtain input data, and inputting the input data into a pre-trained data mining model; calculating a data model index based on a first data mining module; based on a second data mining module, carrying out keyword extraction and abstract extraction on the text data, and carrying out structural processing on the information of the text data; acquiring and storing mining data obtained through a data mining model; and when receiving a query request corresponding to the mining data, feeding back the mining data through the data encapsulation exchange interface. By adopting the method and the system, the knowledge discovery of the economic society, the meteorological hydrology, the water environment and other cross-fields can be carried out through the big data mining technology, and the technical support is provided for realizing the intelligent basin management.

Description

Big data mining method and device based on watershed water environment
Technical Field
The invention relates to the technical field of environmental data processing, in particular to a method and a device for mining big data based on a watershed water environment.
Background
Data mining is a process of extracting information and knowledge hidden in massive, incomplete, noisy, fuzzy and random actual data, which is unknown to people but potentially useful, and is an important means for mining knowledge from a database and acquiring decision support key data. The algorithm research on data mining at home and abroad is relatively deep, and comprises association rules, data classification, clustering rules and the like. In the aspect of data classification technology, various methods such as a decision tree and a neural network are formed;
at present, the amount of wading environment management service data (such as water environment monitoring data, environment statistical data and wastewater discharge monitoring data) and social, economic, hydrological, water resource, meteorological data and the like related to the wading environment management service data continuously increases, but as the wading management departments are numerous and lack of overall coordination, the traditional informatization construction is dispersedly and independently carried out by each department, and numerous data isolated islands are formed. Deep processing of data resources is not sufficient, statistical association, logical association and even mechanism association among various types of data are not discovered, and on the basis of big data collection and integration, cross-field knowledge discovery of economic society, meteorological hydrology, water environment and the like through a big data mining technology is urgently needed, so that technical support is provided for intelligent basin management.
Disclosure of Invention
The embodiment of the invention provides a method and a device for mining big data based on a watershed water environment. The technical scheme is as follows:
on one hand, a big data mining method based on a watershed water environment is provided, and the method is realized by a big data mining platform, and comprises the following steps:
acquiring original data from each application through an interface access layer;
preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis;
calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type;
based on the second data mining module, carrying out keyword extraction and abstract extraction on the text data, and carrying out structural processing on the information of the text data;
acquiring and storing mining data obtained through the data mining model;
and when receiving a query request corresponding to the mining data, feeding back the mining data through the data encapsulation exchange interface.
Optionally, the preprocessing the raw data by the data acquisition ETL platform includes:
and performing data cleaning, data format conversion, data completion and data quality management on the original data through a data acquisition ETL platform.
Optionally, the fracture water quality evaluation types include river water quality evaluation, lake and reservoir eutrophication evaluation, surface water drinking water quality evaluation, groundwater drinking water quality evaluation, near shore sea area water quality evaluation, and regional water quality evaluation.
Optionally, the water quality index calculation types include water quality index calculation, water quality comprehensive pollution index, urban water quality index calculation, and Yangtze river economic zone region comprehensive standard exceeding index data calculation.
Optionally, the water environment bearing capacity evaluation type includes a Yangtze river economic area and water environment bearing capacity evaluation, an ecological environment pressure evaluation, an ecological system health evaluation, an ecological service function evaluation, and an ecological risk evaluation.
Optionally, the water ecological safety assessment type comprises a water ecological safety assessment.
Optionally, based on the second data mining module, performing keyword extraction on the text data, including:
based on the TextRank algorithm, the text is divided into a plurality of composition units, a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction is carried out on the text data.
Optionally, based on the second data mining module, performing summary extraction on the text data, including:
searching in the data according to a Query statement of the text data to obtain a plurality of search results;
performing morpheme analysis on the text data to generate a plurality of morphemes;
for each search result, calculating a relevance score of each morpheme and each search result;
and carrying out weighted summation on the relevance scores of the morphemes relative to the search results to obtain the relevance scores of the Query sentences and the search results, and carrying out abstract extraction on the text data according to the relevance scores of the Query sentences and the search results.
Optionally, the structuring the information of the text data includes:
carrying out structuring processing on the information of the text data, searching geographic position information in the mining data by adopting a word segmentation technology based on combination of rules and statistics based on a water environment word segmentation dictionary, and positioning through an electronic map;
and performing classified display on the mining data according to the screening conditions.
On the other hand, the device is applied to the big data mining method based on the watershed water environment, and comprises the following steps:
the acquisition module is used for acquiring original data from each application through the interface access layer;
the preprocessing module is used for preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module for service evaluation and a second data mining module for text analysis;
the calculation module is used for calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type;
the extraction module is used for extracting keywords and abstracts from the text data based on the second data mining module and carrying out structural processing on the information of the text data;
the storage module is used for acquiring and storing the mining data obtained by the data mining model;
and the query module is used for feeding back the mining data through the data encapsulation exchange interface when receiving a query request corresponding to the mining data.
In another aspect, a big data mining platform is provided, and the big data mining platform comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the big data mining method based on the watershed water environment.
In another aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the method for big data mining based on the watershed water environment.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
acquiring original data from each application through an interface access layer; preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis; calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type; based on the second data mining module, carrying out keyword extraction and abstract extraction on the text data, and carrying out structural processing on the information of the text data; acquiring and storing mining data obtained through the data mining model; and when receiving a query request corresponding to the mining data, feeding back the mining data through the data encapsulation exchange interface. Therefore, the method can surround the water environment management target, take hydrology, water resources, water environment, meteorology, social economy and other big data as analysis objects, generalize and analyze the mining requirements of the watershed water environment data from the aspect of evaluation decision and service management, determine the data mining theme and target by combining the time characteristics and the space characteristics of the water environment management service, construct a data mining service model taking the application scenes of current state analysis, cause analysis, traceability analysis, potential evaluation, anomaly identification, trend early warning and the like as analysis objects, and realize data mining.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is an implementation environment diagram of a big data mining method based on a watershed water environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for mining big data based on a watershed water environment according to an embodiment of the invention;
FIG. 3 is a block diagram of a big data mining device based on a watershed water environment according to an embodiment of the invention;
fig. 4 is a schematic structural diagram of a large data mining platform according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides an implementation environment of a large data mining method based on a watershed water environment, as shown in fig. 1, the implementation environment at least comprises a large data mining platform, and the large data mining platform can comprise an interface access layer, a data base layer, a data service layer, a data calculation layer, a data acquisition layer and a source data storage layer;
the interface access layer is used for acquiring original data from a plurality of applications;
the data base layer is used for carrying out authority verification, safety verification, resource management and service management;
the data service layer is used for calculating data model indexes, performing semantic analysis and managing metadata;
the data calculation layer performs data calculation based on hadoop, Spark and HBase;
the data acquisition layer performs data acquisition and cleaning conversion based on Sqoop and Kafka;
the data storage layer stores data based on SQL Server, MySQL and Oracle.
Wherein, the metadata is a data source, a data warehouse and a data application which are opened, and a complete link from generation to consumption of the data is recorded. The metadata contains static table, column, partition information (i.e., MetaStore). Dynamic task, table dependence mapping relation; model definition and data life cycle of a data warehouse; and metadata such as ETL task scheduling information, input and output are the basis for data management, data content, data applications. The whole big data system is based on metadata, and without a set of complete metadata design, the problems that the data is difficult to track, the authority is difficult to control, the resources are difficult to manage, the data is difficult to share and the like occur.
ETL, Extract-Transform-Load, is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source to a destination. The term ETL is more commonly used in data warehouses, but its objects are not limited to data warehouses. The ETL platform plays an important role in data cleaning, data format conversion, data completion, data quality management and the like. As an important data cleansing middle layer, ETL should support a variety of data sources, such as a messaging system, a file system, etc.
And establishing a data model according to a service scene and a service rule, converting and cleaning the multi-source data set acquisition into input data meeting the standard of the data model, performing calculation training optimization on the model, and outputting an index calculation result. Aiming at service data with complex water environment, according to a data mining target, 5 types of 20 data model indexes are established for a data mining tool, and the indexes comprise: evaluating the water quality of the section, calculating a water quality index, evaluating the bearing capacity of the water environment, evaluating the safety of the water ecology and carrying out semantic analysis.
Most data queries are driven by requirements, one requirement develops one or more interfaces, interface documents are written, and the interface documents are opened to be called by a service party.
The embodiment of the invention provides a basin water environment-based big data mining method, which can be realized by a big data mining platform. As shown in fig. 2, a flow chart of a big data mining method based on watershed water environment, a processing flow of the method may include the following steps:
step 201, obtaining original data from each application through an interface access layer.
Step 202, preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model.
The data mining model is divided into a first data mining module facing business evaluation and a second data mining module facing text analysis.
Optionally, the raw data is preprocessed by the data acquisition ETL platform, including:
and performing data cleaning, data format conversion, data completion and data quality management on the original data through a data acquisition ETL platform.
And step 203, calculating the data model indexes based on the first data mining module. The data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type.
Alternatively, the fracture water quality evaluation types include river water quality evaluation, lake and reservoir eutrophication evaluation, surface water drinking water quality evaluation, groundwater drinking water quality evaluation, near shore sea area water quality evaluation, and regional water quality evaluation.
Optionally, the water quality index calculation types include water quality index calculation, water quality comprehensive pollution index, urban water quality index calculation, and Yangtze river economic zone region comprehensive standard exceeding index data calculation.
Optionally, the water environment bearing capacity evaluation type includes a Yangtze river economic area and water environment bearing capacity evaluation, ecological environment pressure evaluation, ecological system health evaluation, ecological service function evaluation and ecological risk evaluation.
Optionally, the water ecological safety assessment type comprises a water ecological safety assessment.
Each index is explained specifically below:
1. river water quality evaluation
(1) Calculating evaluation index concentration standard exceeding index (R)
The indexes involved in evaluation adopt 22 indexes except water temperature and fecal coliform in a 'surface water environmental quality standard' (GB3838-2002) table 1, and the indexes comprise: pH, dissolved oxygen, permanganate index, biochemical oxygen demand, ammonia nitrogen, petroleum based, volatile phenols, mercury, lead, total nitrogen (not rated in the river section), total phosphorus, chemical oxygen demand, copper, zinc, fluoride, selenium, arsenic, cadmium, chromium (hexavalent), cyanide, anionic surfactants, and sulfides.
The evaluation index concentration superstandard index (R) is reflected by the comparison value of the evaluation index concentration monitoring value and the index standard concentration limit value corresponding to the section target, and the calculation method is shown in the formulas (2-1) to (2-5). And (3) taking the maximum overproof index value in the section evaluation indexes as the overproof index (R) of the section by adopting a short plate effect, wherein the corresponding index is the primary pollution index of the section.
A single water quality index overproof index (R):
Figure BDA0003348090210000061
in the formula, R represents an overproof index; c represents the actually measured concentration value, mg/L; s represents an evaluation standard limit value, mg/L;
② exceeding index (R) of dissolved oxygenDO):
When C is presentDO≥SDOWhen the temperature of the water is higher than the set temperature,
Figure BDA0003348090210000062
when C is presentDO<SDOWhen the temperature of the water is higher than the set temperature,
Figure BDA0003348090210000071
in the formula, RDOA contamination index representing dissolved oxygen; cDOThe measured concentration value of the dissolved oxygen is expressed as mg/L; sDOThe evaluation standard limit value of the dissolved oxygen, mg/L; cDO,fRepresents the saturated dissolved oxygen concentration, mg/L.
③ pH value over standard index (R)pH):
When the pH value is less than or equal to 7,
Figure BDA0003348090210000072
when the pH is higher>When the number of the holes is 7,
Figure BDA0003348090210000073
in the formula, RpHA contamination index representing a pH value; cpHRepresents the measured value of pH; spHdRepresents the lower limit of pH in the evaluation criteria; spHuThe upper limit of pH in the evaluation criteria is shown.
(2) Superscalar type determination
According to the technical scheme, the method comprises the following steps that R is more than 0.2, 0.2 is more than or equal to R0, 0 is more than or equal to R-0.2, and R is less than or equal to-0.2, the quality evaluation result of the water environment of the section (point location) is divided into three types, namely, the pollution index concentration is seriously overproof, close to overproof and not overproof.
(3) Early warning level determination
In order to improve the accuracy of surface water environment quality early warning and management measure formulation, process evaluation of reaction water quality change is introduced into a section standard exceeding type, and early warning types are divided according to the rising and the lowering of the concentration of a primary pollution index.
The serious overproof sections and the overproof sections with the first pollution index overproof indexes rising are defined as red early warnings, the overproof sections with the first pollution index overproof indexes not rising are defined as orange early warnings, the approximately overproof sections with the first pollution index overproof indexes rising are defined as yellow early warnings, the approximately overproof sections with the first pollution index overproof indexes not rising are defined as blue early warnings, and the sections without overproof indexes are defined as no early warnings.
2. Evaluation of lake water quality
The lake and reservoir water quality evaluation algorithm is based on the 'surface water environment quality standard' (GB3838-2002), 22 items except water temperature and fecal coliform group bacteria in the table 1 are selected as evaluation indexes, a single-factor evaluation method is adopted, namely, the 22 indexes are evaluated one by one according to standard limit values, one item with the highest category in the evaluated indexes is selected as a water quality grade of a section, evaluation of the surface lake and reservoir monitoring section water quality grade is realized, and the evaluation result comprises the water quality grade of each single index and the water quality grade of the section.
3. Evaluation of lake and reservoir eutrophication
(1) The nutrient state index calculation formula of each item is as follows:
TLI(chla)=10(2.5+1.086lnchla) (2-6)
TLI(TP)=10(9.436+1.624lnTP) (2-7)
TLI(TN)=10(5.453+1.694lnTN) (2-8)
TLI(SD)=10(5.118-1.94lnSD) (2-9)
TLI(CODMn)=10(0.109+2.661lnCODMn) (2-10)
in the formula: chla unit is mg/m3, SD unit is m; other index units are mg/L.
(2) The comprehensive nutritional state index calculation formula is as follows:
Figure BDA0003348090210000081
in the formula: TLI (Sigma) represents the integrated nutrient status index; wj represents the relative weight of the nutritional status index of the jth parameter; TLI (j) denotes the index for nutritional status which represents the j-th parameter.
With chla as the reference parameter, the normalized correlation weight calculation formula of the jth parameter is:
Figure BDA0003348090210000082
in the formula: r isijRepresenting the correlation coefficient of the jth parameter and the reference parameter chla; m represents the number of evaluation parameters.
4. Surface water drinking water quality evaluation
The drinking water quality evaluation-surface water evaluation algorithm is based on the 'surface water environment quality standard' (GB3838-2002), 22 items (except for water temperature and faecal coliform group) in 24 items of basic items in table 1 are selected (except for total nitrogen of rivers), 5 items of supplementary items of a centralized domestic drinking water surface water source in table 2 and 80 items of specific items of a centralized domestic drinking water surface water source in table 3 are selected as evaluation indexes, a single-factor evaluation method is adopted, namely, the grades are evaluated one by one according to 107 indexes of standard limit values, the highest-class one of the evaluated indexes is selected as the water quality grade of a section, the water quality grade evaluation of the monitoring section of the surface water river type and the lake reservoir type drinking water source is realized, and the evaluation result comprises the water quality grade of each single index and the water quality grade of the section.
5. Groundwater drinking water quality evaluation
The drinking water quality evaluation-underground water evaluation algorithm is based on the underground water quality standard (GB/T148480-2017), 39 conventional indexes in the table 1 and 54 unconventional indexes in the table 2 are selected as evaluation indexes, a single-factor evaluation method is adopted, namely, the indexes are evaluated one by one according to standard limit values to 93 indexes, one with the highest category in the indexes is selected as the water quality grade of a section, the evaluation of the water quality grade of the underground water type drinking water source monitoring section is realized, and the evaluation result comprises the water quality grade of each single index and the water quality grade of the section.
6. Water quality evaluation in offshore area
The offshore area water quality evaluation algorithm is based on seawater quality standards (GB3097-1997), 38 indexes except water temperature in the table 1 are selected as evaluation indexes, a single-factor evaluation method is adopted, namely the 38 indexes are evaluated one by one according to standard limit values, one with the highest category in the indexes is selected as the water quality grade of a section, the evaluation of the offshore area monitoring section water quality grade is realized, and the evaluation result comprises the water quality grade of each single index and the water quality grade of the section.
7. Regional water quality assessment
(1) Regional water quality assessment
When the total number of the sections in the evaluation area is less than 5, calculating the arithmetic mean value of the evaluation index concentrations of all the sections, then evaluating the water quality of the sections, and determining the water quality condition of the area according to the water quality types of the sections in the following table.
TABLE-1 Water quality classes
Figure BDA0003348090210000091
When the total number of the sections in the evaluation area is more than 5 (including 5), a section water quality type proportion method is adopted, namely, the water quality condition is evaluated according to the percentage of the number of the sections of each water quality type in the evaluation area to the total number of all the evaluation sections. The evaluation criteria are shown in the following table:
TABLE-2 Water quality evaluation criteria
Figure BDA0003348090210000092
Figure BDA0003348090210000101
(2) Primary pollution index determination
The method for determining the main pollution indexes of the section comprises the following steps:
when the water quality of the cross section is excellent or good, the main pollution index is not evaluated.
When the water quality of the cross section exceeds the III-class standard, the first three indexes with the worst water quality class are selected as main pollution indexes according to the quality of the water quality classes corresponding to different indexes. And when the water quality types corresponding to different indexes are the same, calculating the exceeding standard multiple, arranging the exceeding standard indexes according to the exceeding standard multiple, and taking the first three items with the maximum exceeding standard multiple as main pollution indexes. When heavy metals such as cyanide, lead, chromium and the like exceed the standard, the main pollution index is preferentially taken.
While the main pollution index is determined, the index concentration is marked to exceed the standard multiple of III-class water quality, i.e. exceeding multiple, such as permanganate index. And the overproof times of the items such as water temperature, pH value and dissolved oxygen are not calculated.
Figure BDA0003348090210000102
The method for determining the main pollution indexes of the area comprises the following steps:
indexes of water quality exceeding III standard are arranged according to the standard exceeding rate of the section, and the first three items with the maximum exceeding rate of the section are generally taken as main pollution indexes. For rivers and basins (water systems) with less than 5 sections, the main pollution index of each section is determined according to the method for determining the main pollution index of the section (1).
Figure BDA0003348090210000103
8. Water quality index calculation
(1) Water quality index of single index
Dividing the concentration value of each single index by the III-class standard limit value of the surface water corresponding to the index to calculate the water quality index of the single index, wherein the water quality index is shown in the following formula:
Figure BDA0003348090210000104
in the formula: c (i) is the concentration value of the ith water quality index; cs(i) The standard limit value of the ith water quality index surface water class III; CWQI (i) is the water quality index of the ith water quality index.
Further:
method for calculating dissolved oxygen
Figure BDA0003348090210000111
In the formula: c (DO) is the concentration value of dissolved oxygen; cs(DO) is the surface water class III standard limit for dissolved oxygen; CWQI (DO) is the water quality index of dissolved oxygen.
② calculation method of pH value
If the pH value is less than or equal to 7, the calculation formula is as follows:
Figure BDA0003348090210000112
if pH >7, the formula is calculated as:
Figure BDA0003348090210000113
in the formula: pH valuesdIs the lower limit value of the pH value in the quality standard of surface water environment (GB 3838-2002); pH valuesdIs the upper limit value of pH in the Standard for the quality of surface Water Environment (GB 3838-2002); CWQI (pH) is the water quality index of pH.
(2) Cross section water quality index
According to each single index CQI, taking the added value as the CQI of the section, and the calculation formula is as follows:
Figure BDA0003348090210000114
9. index of water quality comprehensive pollution
A water quality comprehensive pollution index algorithm is based on a single index water quality index evaluation method in the technical specification (trial) of urban surface water environment quality ranking, 21 indexes except water temperature, faecal coliform and total nitrogen in the table 1 of the quality standard of surface water environment (GB3838-2002) are adopted, the water quality indexes of the single indexes are calculated and then summed to serve as surface water monitoring section water quality indexes, and evaluation results comprise the water quality indexes of the single indexes and the section water quality indexes.
10. Urban water quality index calculation
(1) Water quality index of river
Firstly, calculating the arithmetic mean value of the concentrations of all single indexes of all river monitoring sections, taking the sum of CQI of all the single indexes as the CQI of the river, and calculating as shown in the following formula:
Figure BDA0003348090210000121
in the formula: CQIRiver flowIs the water quality index of the river; CWQI (i) is the water quality index of the ith water quality index; n is the number of water quality indexes.
(2) Water quality index of lake or reservoir
The method for calculating the water quality index of the lake and the reservoir is consistent with that of a river, the arithmetic mean value of the concentrations of all the single indexes of the monitoring points of the lake and the reservoir is calculated, the water quality index of the single index is calculated, and then the water quality index of the lake and the reservoir is comprehensively calculated. In addition, when calculating the water quality index of a single index, the class III standard limit of the total phosphorus in lakes and reservoirs is 0.05mg/L, unlike that in rivers.
(3) Water quality index of city
According to the CQI of rivers and lakes and reservoirs in the urban district, taking the weighted mean value as the CQI city of the city, and calculating as shown in the following formula:
Figure BDA0003348090210000122
in the formula: CQICityIs the water quality index of the city; CQIRiver flowIs the water quality index of the river; CQILake and reservoirThe water quality index of the lake reservoir is obtained; m is the number of river sections of the city; n is the number of lake and reservoir points in the city.
11. Calculating comprehensive standard exceeding index of Yangtze river economic zone area
The region water pollution concentration overproof index calculation formula is as follows:
Rwater jk=max(RWater ijk) (2-22)
Figure BDA0003348090210000123
In the formula, RWater jkIs the water pollutant concentration standard exceeding index, R, of the kth section of the area jWater jIs the water contaminant concentration over-standard index for zone j.
Threshold and parameters:
and dividing the evaluation result into three types according to the following intervals according to the comprehensive standard exceeding index value of the pollutant concentration: when R is greater than 0, the environment is in an overload state; when R is-0.2-0, the environment is in a critical overload state; when R < -0.2, the environment is in a non-overload state. The smaller the pollutant concentration standard exceeding index is, the stronger the supporting capability of the regional environment system on the social and economic system is.
12. Assessment of water environment bearing capacity in Yangtze river economic area
And (4) adopting a water environment quality evaluation index method. The calculation process of the water environment quality evaluation index (R) comprises three steps: calculating the national control section CODCr、BOD5. Ammonia nitrogen, TP, TN (river without calculating TN index) and CODThe acceptance index of 6 contaminants of Mn; the containment index is the ratio of the current pollutant value to the standard limit value of class III water in surface water, i.e.
Figure BDA0003348090210000131
Calculating each state control sectionMaximum containment index of contaminants, i.e.
Figure BDA0003348090210000132
And thirdly, calculating the arithmetic average value of the maximum accommodation indexes of all state-controlled section pollutants in the area to be evaluated. The comprehensive calculation formula is as follows:
Figure BDA0003348090210000133
in the formula, CijThe annual average concentration monitoring value of the water pollutant i of the state control section j is mg/L; siIs the standard limit value of the pollutant i in the III-class water of the surface water, mg/L; i is 1,2, …,6 corresponds to CODCr、BOD5. Ammonia nitrogen, TP, TN and CODMn; j is 1,2, …, and N is the number of country control sections.
And dividing the evaluation result into three types of water environment overload, critical overload and non-overload according to the water environment quality evaluation index of the evaluation area. Generally, when the water environment quality evaluation index R is less than or equal to 0.7, the water environment is not overloaded; when R is more than 0.7 and less than or equal to 1.0, the water environment reaches the maximum bearing capacity, and the water environment is critical overload; when R >1.0, the aqueous environment is said to be "overloaded".
13. Ecological environment stress assessment
(1) Evaluation index score calculation method
And determining the type of the evaluation index according to the original data of the evaluation index and the assigning standard, and calculating by using a formula to obtain the score of the evaluation index. The evaluation index scores are all in the range of 0-100.
The evaluation index types are divided into 3 types, and the score value is calculated by the following method:
1) for the index of which the evaluation value is a fixed value, the median of the level is directly taken during assignment:
Figure BDA0003348090210000134
2) for the larger index, the better index is considered:
segmenting indexes:
Figure BDA0003348090210000135
no upper limit index:
Figure BDA0003348090210000136
when V isiWhen > 100, 100 is taken as the Vi value.
3) For the smaller and better type index, the following is considered in the assignment:
segmenting indexes:
Figure BDA0003348090210000141
no upper limit index:
Figure BDA0003348090210000142
when Vi < 0, 0 is taken as Vi value.
In the above formula, ViA score representing the evaluation index i; vilThe evaluation index i is the lower limit value of the category standard; vihThe upper limit value of the category standard of the evaluation index i is obtained; i isiTo evaluate the index I raw data, IilAs raw data IiThe lower limit of the classification; i isihAs raw data IiThe upper limit of the classification.
(2) Method for calculating subentry index
And respectively calculating the scores of 6 subentry indexes of population pressure, land utilization, town pollution emission, rural non-point source emission, water resource utilization and basin external pressure by using a weighted summation method according to the scores of the single evaluation indexes.
Figure BDA0003348090210000143
In the formula, EjThe value of the jth subentry index; w is ajiThe weight of the ith evaluation index in the jth subentry index; vjiThe score of the ith evaluation index in the jth subentry index; n is the number of the evaluation indexes in the jth subentry index.
(3) Ecological environment pressure special comprehensive assessment
And calculating the score of the ecological environment pressure special index by using a weighted summation method according to the scores of the subentry indexes. And (4) carrying out grade classification on the special indexes of the ecological environment pressure according to the scores to obtain a comprehensive evaluation result of river ecological environment pressure by river basin human activities.
Figure BDA0003348090210000144
In the formula, C is the value of the special index; wj is the weight of the jth subentry index; ej is the score of the jth subentry index; n is the number of the subentry indexes.
The pressure of the river ecological environment by the human activities in the drainage basin is divided into five levels: light, normal, heavy and severe.
TABLE-3 river pressure class description of basin human activities
Figure BDA0003348090210000145
Figure BDA0003348090210000151
14. Ecosystem health assessment
(1) Evaluation index score calculation
Comprehensive water quality condition B1:
the water quality comprehensive score B1-1:
Figure BDA0003348090210000152
physical habitat integrated conditions B2:
annual ecological base flow satisfaction rate B2-1:
and 4-9 months:
Figure BDA0003348090210000153
10 to 3 months
Figure BDA0003348090210000154
Connectivity B2-2:
Figure BDA0003348090210000155
natural shoreline ratio B2-3:
Figure BDA0003348090210000156
vegetation coverage of the riverbank B2-4:
Figure BDA0003348090210000157
the ratio of the wetland area to the total area B2-5: expert scoring
Comprehensive condition of aquatic organisms B3:
algal integrity B3-1:
Figure BDA0003348090210000161
integrity of Large benthonic animals B3-2:
Figure BDA0003348090210000162
fish integrity B3-3:
Figure BDA0003348090210000163
aquatic plant integrity B3-4:
Figure BDA0003348090210000164
and after the calculation of each index is finished, determining the type of the evaluation index according to the assigning standard, and calculating by using a formula to obtain the score of the evaluation index. The evaluation index scores are all in the range of 0-100. The evaluation index types are divided into 3 types, and the score calculation method is as follows:
Figure BDA0003348090210000165
index of fixed value of evaluation value:
the larger the better the type index:
segmenting indexes:
Figure BDA0003348090210000166
no upper limit index:
Figure BDA0003348090210000167
the smaller the better the type index:
segmenting indexes:
Figure BDA0003348090210000168
no upper limit index:
Figure BDA0003348090210000171
in the above formula, ViA score representing the evaluation index i; vilThe evaluation index i is the lower limit value of the category standard; vihThe upper limit value of the category standard of the evaluation index i is obtained; i isiTo evaluate the index I raw data, IilAs raw data IiThe lower limit of the classification; i isihAs raw data IiThe upper limit of the classification.
(2) Fractional index score
Calculating the score of the population pressure subentry index by using a weighted summation method according to the score of the single evaluation index
Figure BDA0003348090210000172
In the formula, EjThe value of the jth subentry index; w is ajiThe weight of the ith evaluation index in the jth subentry index; vjiThe score of the ith evaluation index in the jth subentry index; n is the number of the evaluation indexes in the jth subentry index.
(3) Itemized index grading
TABLE-4 rating Scale
Figure BDA0003348090210000173
15. Ecological service function assessment
(1) Evaluation index score calculation
Drinking water service function C1:
water quality standard reaching rate of a centralized drinking water source:
Figure BDA0003348090210000181
water source conservation function C2:
water source conservation index:
c2-1 ═ mudflat wetland and marsh coverage x 0.5+ forest land coverage x 0.35+ meadow coverage x 0.15(2-48)
Water environment purification function C3:
the runoff-to-dirt ratio:
Figure BDA0003348090210000182
in the formula, the diameter Q is the designed flow of the river channel, and is determined according to the flow of the shortest month in 10 years, and the inlet Q is the river inflow amount of sewage.
The data source is as follows: the related data mainly come from environmental protection statistical data, hydrology yearbook data and water conservancy general survey data.
Biodiversity function C4:
representative rare species habitat C4-1: expert scoring
Invasion of foreign species C4-2: expert scoring
Aquatic product supply function C5:
Figure BDA0003348090210000183
fishing amount per unit water area:
age of fish C5-2: expert scoring
Protection zone function C6:
natural protected area level C6-1: expert scoring
And after the calculation of each index is finished, determining the type of the evaluation index according to the assigning standard, and calculating by using a formula to obtain the score of the evaluation index. The evaluation index scores are all in the range of 0-100. The evaluation index types are divided into 3 types, and the score calculation method is as follows:
Figure BDA0003348090210000184
index of fixed value of evaluation value:
the larger the better the type index:
segmenting indexes:
Figure BDA0003348090210000191
no upper limit index:
Figure BDA0003348090210000192
the smaller the better the type index
Segmenting indexes:
Figure BDA0003348090210000193
no upper limit index:
Figure BDA0003348090210000194
in the above formula, ViA score representing the evaluation index i; vilThe evaluation index i is the lower limit value of the category standard; vihThe upper limit value of the category standard of the evaluation index i is obtained; i isiTo evaluate the index I raw data, IilThe lower limit of the hierarchy where the original data Ii is located; i isihAs raw data IiThe upper limit of the classification.
(2) Fractional index score
And calculating the score of the population pressure subentry index by using a weighted summation method according to the score of the single evaluation index:
Figure BDA0003348090210000195
in the formula, EjThe value of the jth subentry index; w is ajiThe weight of the ith evaluation index in the jth subentry index; vjiFor the score of the ith evaluation index in the jth subentry indexA value; n is the number of the evaluation indexes in the jth subentry index.
(3) Itemized index grading
TABLE-5 itemized index rankings
Figure BDA0003348090210000196
Figure BDA0003348090210000201
16. Ecological risk assessment
(1) Evaluation index score calculation
Risk of outbreak D1:
critical ratio of chemicals D1-1:
Figure BDA0003348090210000202
production process D1-2:
the calculation method comprises the following steps: and evaluating the enterprise production process by adopting a scoring method, according to an enterprise risk unit questionnaire and referring to a method of 'enterprise emergency environment incident risk evaluation guideline', and evaluating and scoring the production process evaluation according to a table 5 respectively, and adding to determine the unit risk degree. The score is divided into five grades: 0 to 5, 5 to 10, 10 to 20, 20 to 30, and 30 to 40 minutes.
Exposure population D1-3:
the calculation method comprises the following steps: the grading standard of the number of the exposed population refers to the provision of the enterprise emergency environment incident risk assessment guideline and the corresponding research paper. And 5km of population of the organizations such as residential areas, medical health, cultural education, scientific research, administrative offices and the like around the enterprise is counted by referring to a sensitive environment protection target questionnaire. The classification is five grades: 0 to 0.1, 0.1 to 1, 1 to 5, 5 to 10, >10 thousands of people.
Sensitive Environment object D1-4:
the calculation method comprises the following steps: the different environmental sensitive zones were assigned values (tables 5-3) and then superimposed as the index score. The score is divided into five grades: 0 to 5, 5 to 10, 10 to 15, 15 to 20 and >20 minutes.
Security management and risk prevention D1-5:
the calculation method comprises the following steps: and evaluating according to enterprise security management and risk prevention questionnaires, wherein each item is 1 point if yes, 0 point if no and 80 points if full. The classification is five grades: 75-80, 70-75, 65-70, 60-65 and 55-60 minutes.
Risk of land and shipping D1-6:
the calculation method comprises the following steps: the flow risk sources were evaluated with reference to the "guidelines for environmental protection of centralized drinking water sources" (tables 5-4). And taking the total score R as a final index classification basis. R ═ f1+ f2+ f3, and R is classified into five stages: 0 to 3, 3 to 7, 7 to 9, 9 to 15 and >15 points.
Cumulative risk indicator D2:
toxic and harmful organic matters D2-1:
Figure BDA0003348090210000211
heavy metal D2-2:
Figure BDA0003348090210000212
mine nonmetal D2-3:
Figure BDA0003348090210000213
(2) fractional index score
Calculating the score of the population pressure subentry index by using a weighted summation method according to the score of the single evaluation index
Figure BDA0003348090210000214
In the formula, EjFor j-th subentry indexA score value; w is ajiThe weight of the ith evaluation index in the jth subentry index; vjiThe score of the ith evaluation index in the jth subentry index; n is the number of the evaluation indexes in the jth subentry index.
(3) Itemized index grading
TABLE-6 itemized index rankings
Figure BDA0003348090210000215
Figure BDA0003348090210000221
17. Ecological safety assessment
Selecting a weighted summation method as a basic algorithm of a model
(1) The solution layer is calculated by the formula:
Figure BDA0003348090210000222
in the formula, BiCalculating the result of the ith scheme layer; x is the number ofijThe j index value of the ith scheme layer; w is ajIs the weight of the jth index of the ith scheme layer.
(2) The Ecological Safety Index (ESI) of the target layer is calculated by the following formula, and the result is a value between 0 and 100:
Figure BDA0003348090210000223
in the formula, ESI is an ecological safety index; bi is the value of the ith scheme layer.
(3) Weight determination
Based on the re-screened evaluation index system, the index weight needs to be re-determined, namely the judgment matrix, and the judgment can be carried out by adopting an expert consulting method. The evaluation index weight of each expert can be obtained according to the AHP method, however, the judgment results among the experts are often large in inconsistency and influenced by the preference of the experts, and therefore a multi-criterion group decision model is introduced to obtain a comprehensive judgment matrix with more objectivity.
(4) Ecological safety assessment standard and grade
And taking an Ecological Safety Index (ESI) as a longitudinal comparison result of the current situation of the rivers and the standard state, and reflecting the deviation degree of each river relative to the standard state. The ESI is 100 as the non-deviation state, and the smaller the ESI is, the unsafe the river is.
TABLE-7 evaluation standard for ecological safety of river
Grade Representative color Score value
Secure Blue color (80,100]
Is safer Green colour (60,80]
In general Yellow colour (40,60]
Is not safe Red colour (20,40]
Is very unsafe Black color [0,20]
And step 204, extracting keywords and abstract of the text data based on a second data mining module, and performing structured processing on the information of the text data.
Optionally, based on the second data mining module, performing keyword extraction on the text data, including: based on the TextRank algorithm, the text is divided into a plurality of composition units, a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction is carried out on the text data.
In one possible implementation, the TextRank algorithm is a graph-based ranking algorithm for text. The basic idea of the PageRank algorithm from Google is that a text is divided into a plurality of composition units (words and sentences), a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction and abstract can be realized only by using the information of a single document. Different from models such as LDA and HMM, the TextRank does not need to learn and train a plurality of documents in advance, and is widely applied due to simplicity and effectiveness.
The TextRank general model can be expressed as a directed weighted graph G ═ V, E, consisting of a set of points V and a set of edges E, E being a subset of V × V. Any two points V in the figureiAnd VjThe weight of the edge in between is wjiFor a given point Vi, In(Vi) To point to the set of points at that point, Out (V)i) Is a point ViThe set of points pointed to. Point ViThe score of (a) is defined as follows:
Figure BDA0003348090210000231
wherein d is a damping coefficient, has a value range of 0 to 1, represents a probability of pointing to any other point from a certain point in the graph, and generally has a value of 0.85. When calculating the score of each point in the graph by using the TextRank algorithm, it is necessary to assign any initial value to the point in the graph and recursively calculate until convergence is reached, that is, when the error rate of any point in the graph is less than a given limit value, the limit value is generally 0.0001.
The task of keyword extraction is to automatically extract a number of meaningful words or phrases from a given piece of text. The TextRank algorithm is to sort subsequent keywords by using the relation (co-occurrence window) between local vocabularies and directly extract the keywords from the text itself. The method mainly comprises the following steps:
given text T is divided according to complete sentences, i.e.
T=[S1,S2,…,Sm] (2-65)
For each sentence SiE.g. T, performing word segmentation and part-of-speech tagging, filtering out stop words, and only retaining words with specified part-of-speech, such as noun, verb and adjective, i.e. Si=[ti,1,ti,2,…,ti,n]Wherein t isi,j∈SjAre the candidate keywords after retention.
And thirdly, constructing a candidate keyword graph G (V, E), wherein V is a node set and consists of candidate keywords generated by the second step, then constructing an edge between any two points by adopting a co-occurrence relation (co-occurrence), wherein the edge exists between the two nodes only when the corresponding words co-occur in a window with the length of K, and the K represents the size of the window, namely the maximum number of the co-occurrence of K words.
And fourthly, iteratively propagating the weight of each node according to the formula until convergence.
And fifthly, sorting the node weights in a reverse order to obtain the most important T words as candidate keywords.
Obtaining the most important T words, marking them in the original text, if forming adjacent phrase, combining them into multiword key words. For example, the text has a sentence "Matlab code for marking ambiguy function", and if "Matlab" and "code" both belong to candidate keywords, they are combined into "Matlab code" to be added into the keyword sequence.
Optionally, based on the second data mining module, performing summary extraction on the text data, including: searching in the data according to a Query statement of the text data to obtain a plurality of search results; performing morpheme analysis on the text data to generate a plurality of morphemes; for each search result, calculating a relevance score of each morpheme and each search result; and carrying out weighted summation on the correlation scores of the morphemes relative to the search results to obtain the correlation scores of the Query sentences and the search results, and carrying out abstract extraction on the text data according to the correlation scores of the Query sentences and the search results.
In one possible implementation, an automatic summarization algorithm is typically used for search relevance scoring. The main idea is to perform morpheme analysis on Query to generate morpheme qi(ii) a Then, for each search result D, each morpheme q is calculatediScoring the correlation with D, and finally, scoring qiAnd carrying out weighted summation relative to the relevance scores of D, thereby obtaining the relevance scores of Query and D.
The general formula is as follows:
Figure BDA0003348090210000241
wherein Q represents Query, QiRepresenting a morpheme after Q-parsing; d represents a search result document; wiRepresenting morphemes qiThe weight of (c); r (q)iAnd d) represents morpheme qiA relevance score to document d.
Definition of WiTaking IDF as an example, the formula is as follows:
Figure BDA0003348090210000242
where N is the number of all documents in the index, N (q)i) To comprise qiThe number of documents.
Relevance score R (q) for morpheme qi and document diD) is calculated as follows:
Figure BDA0003348090210000243
Figure BDA0003348090210000244
wherein k is1,k2B is an adjustment factor, usually set empirically, and is generally k1=2,b=0.75;fiFor the frequency of occurrence of qi in d, qfiIs qiFrequency of occurrence in Query. dl is the length of document d and avgdl is the average length of all documents.
The function of parameter b is to adjust the size of the influence of the document length on the relevance. The larger b, the greater the influence of the document length on the relevance score and vice versa. And the longer the relative length of the document, the greater the value of K will be, and the smaller the relevance score will be. This can be understood as when the document is long, containing qiThe greater the chance of (f), and therefore, the same fiIn the case of (1), a long document is associated with qiShould be more relevant than the short document and qiThe correlation of (2) is weak.
In summary, the formula can be summarized as:
Figure BDA0003348090210000251
as can be seen from the formula, different search relevance score calculation methods can be derived by using different morpheme analysis methods, morpheme weight determination methods and morpheme-document relevance determination methods, so that great flexibility is provided for designing an algorithm.
Optionally, the information of the text data is structured, including:
carrying out structuring processing on the information of the text data, searching geographic position information in the mining data by adopting a word segmentation technology based on combination of rules and statistics based on a water environment word segmentation dictionary, and positioning through an electronic map; and carrying out classified display on the digging data according to the screening conditions.
In a feasible implementation mode, the comprehensive application is that information contained in a text is subjected to structured processing, a water environment word segmentation dictionary accumulated step by step is adopted, a word segmentation technology based on combination of rules and statistics is adopted, geographic position information is retrieved from the content, positioning is carried out through an electronic map, and meanwhile classified display of viewed content according to screening conditions is provided.
And step 205, acquiring and storing the mining data obtained through the data mining model.
In a possible implementation manner, after the standardized output data mining result is obtained through the data mining model, the mining data can be stored in a storage area of the data mining platform, so that the mining data can be provided for a user in a subsequent user query.
And step 206, feeding back the mining data through the data encapsulation exchange interface when receiving the query request corresponding to the mining data.
In a possible implementation manner, the data mining tool encapsulation technology adopts a network address and interface encapsulation technology, and adopts Web Service as a data encapsulation exchange interface. The overall exchange method comprises the following steps: the data supplier provides Web Service interface to issue data, and the data demander calls the Web Service interface to obtain data.
In the embodiment of the invention, the original data is obtained from each application through an interface access layer; preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis; calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type; based on the second data mining module, carrying out keyword extraction and abstract extraction on the text data, and carrying out structural processing on the information of the text data; acquiring and storing mining data obtained through the data mining model; and when receiving a query request corresponding to the mining data, feeding back the mining data through the data encapsulation exchange interface. Therefore, large data such as hydrology, water resources, water environments, meteorology, social economy and the like are taken as analysis objects around a water environment management target, the mining requirements of watershed water environment data are summarized and analyzed from the aspects of evaluation decision and service management, the data mining theme and target are determined by combining the time characteristics and the space characteristics of the water environment management service, a data mining service model taking application scenes such as current state analysis, cause analysis, traceability analysis, potential evaluation, anomaly identification, trend early warning and the like as analysis objects is constructed, and data mining is realized.
FIG. 3 is a block diagram illustrating a watershed water environment-based big data mining device according to an exemplary embodiment. Referring to fig. 3, the apparatus 300 includes an obtaining module 310, a preprocessing module 320, a calculating module 330, an extracting module 340, a storing module 350, and a querying module 360; wherein:
an obtaining module 310, configured to obtain original data from each application through an interface access layer;
the preprocessing module 320 is used for preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis;
a calculation module 330, configured to calculate a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type;
the extracting module 340 is configured to perform keyword extraction and abstract extraction on the text data based on the second data mining module, and perform structured processing on the information of the text data;
a storage module 350, configured to obtain and store mining data obtained through the data mining model;
and the query module 360 is configured to feed back the mining data through the data encapsulation exchange interface when receiving a query request corresponding to the mining data.
Fig. 4 is a schematic structural diagram of a big data mining platform 400 according to an embodiment of the present invention, where the big data mining platform 400 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 401 and one or more memories 402, where at least one instruction is stored in the memory 402, and the at least one instruction is loaded and executed by the processors 401 to implement the steps of the big data mining method based on the watershed water environment.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal, is also provided to perform the above-described watershed water environment-based big data mining method. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A big data mining method based on watershed water environment is characterized by comprising the following steps:
acquiring original data from each application through an interface access layer;
preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard, and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis;
calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type;
based on the second data mining module, carrying out keyword extraction and abstract extraction on the text data, and carrying out structural processing on the information of the text data;
acquiring and storing mining data obtained through the data mining model;
and when receiving a query request corresponding to the mining data, feeding back the mining data through the data encapsulation exchange interface.
2. The method of claim 1, wherein said pre-processing said raw data by a data acquisition ETL platform comprises:
and performing data cleaning, data format conversion, data completion and data quality management on the original data through a data acquisition ETL platform.
3. The method according to claim 1, wherein the fracture surface water quality evaluation types comprise river water quality evaluation, lake and reservoir eutrophication evaluation, surface water drinking water quality evaluation, groundwater drinking water quality evaluation, coastal sea area water quality evaluation and regional water quality evaluation.
4. The method of claim 1, wherein the water quality index calculation types comprise water quality index calculation, water quality comprehensive pollution index, urban water quality index calculation and Changjiang river economic zone comprehensive overproof index data calculation.
5. The method according to claim 1, wherein the water environment bearing capacity assessment types comprise Yangtze river economic area and water environment bearing capacity assessment, ecological environment pressure assessment, ecological system health assessment, ecological service function assessment and ecological risk assessment.
6. The method of claim 1, wherein the type of water ecological safety assessment comprises a water ecological safety assessment.
7. The method of claim 1, wherein extracting keywords from text data based on the second data mining module comprises:
based on the TextRank algorithm, the text is divided into a plurality of composition units, a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction is carried out on the text data.
8. The method of claim 1, wherein abstracting text data based on the second data mining module comprises:
searching in the data according to a Query statement of the text data to obtain a plurality of search results;
performing morpheme analysis on the text data to generate a plurality of morphemes;
for each search result, calculating a relevance score of each morpheme and each search result;
and carrying out weighted summation on the relevance scores of the morphemes relative to the search results to obtain the relevance scores of the Query sentences and the search results, and carrying out abstract extraction on the text data according to the relevance scores of the Query sentences and the search results.
9. The method according to claim 1, wherein the structuring the information of the text data comprises:
carrying out structuring processing on the information of the text data, searching geographic position information in the mining data by adopting a word segmentation technology based on combination of rules and statistics based on a water environment word segmentation dictionary, and positioning through an electronic map;
and performing classified display on the mining data according to the screening conditions.
10. A big data mining device based on watershed water environment is characterized by comprising:
the acquisition module is used for acquiring original data from each application through the interface access layer;
the preprocessing module is used for preprocessing the original data through a data acquisition ETL platform to obtain input data meeting the model standard and inputting the input data into a pre-trained data mining model; the data mining model is divided into a first data mining module facing service evaluation and a second data mining module facing text analysis;
the calculation module is used for calculating a data model index based on the first data mining module; the data model indexes are divided into 4 types, namely a section water quality evaluation type, a water quality index calculation type, a water environment bearing capacity evaluation type and a water ecological safety evaluation type;
the extraction module is used for extracting keywords and abstracts from the text data based on the second data mining module and carrying out structural processing on the information of the text data;
the storage module is used for acquiring and storing the mining data obtained by the data mining model;
and the query module is used for feeding back the mining data through the data encapsulation exchange interface when receiving a query request corresponding to the mining data.
CN202111329268.6A 2021-11-10 2021-11-10 Big data mining method and device based on watershed water environment Pending CN114138857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111329268.6A CN114138857A (en) 2021-11-10 2021-11-10 Big data mining method and device based on watershed water environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111329268.6A CN114138857A (en) 2021-11-10 2021-11-10 Big data mining method and device based on watershed water environment

Publications (1)

Publication Number Publication Date
CN114138857A true CN114138857A (en) 2022-03-04

Family

ID=80393516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111329268.6A Pending CN114138857A (en) 2021-11-10 2021-11-10 Big data mining method and device based on watershed water environment

Country Status (1)

Country Link
CN (1) CN114138857A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217037A (en) * 2021-12-15 2022-03-22 中国环境监测总站 Safety early warning method and system based on water environment monitoring data
CN114595964A (en) * 2022-03-08 2022-06-07 中国环境科学研究院 Surface water section data processing method, condition access method and related device
CN115392231A (en) * 2022-08-10 2022-11-25 山东大学 Water environment public opinion identification method based on artificial intelligence
CN116228499A (en) * 2023-04-26 2023-06-06 四川省林业科学研究院 Species intrusion detection method and system based on association relation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202163A (en) * 2016-06-24 2016-12-07 中国环境科学研究院 Tongjiang lake ecological monitoring information management and early warning system
CN108304382A (en) * 2018-01-25 2018-07-20 山东大学 Mass analysis method based on manufacturing process text data digging and system
CN109857854A (en) * 2019-01-02 2019-06-07 新浪网技术(中国)有限公司 A kind of user's commercial labels method for digging and device, server
CN110321561A (en) * 2019-06-27 2019-10-11 腾讯科技(深圳)有限公司 A kind of keyword extracting method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202163A (en) * 2016-06-24 2016-12-07 中国环境科学研究院 Tongjiang lake ecological monitoring information management and early warning system
CN108304382A (en) * 2018-01-25 2018-07-20 山东大学 Mass analysis method based on manufacturing process text data digging and system
CN109857854A (en) * 2019-01-02 2019-06-07 新浪网技术(中国)有限公司 A kind of user's commercial labels method for digging and device, server
CN110321561A (en) * 2019-06-27 2019-10-11 腾讯科技(深圳)有限公司 A kind of keyword extracting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余立毅: ""基于《同义词词林》的商品搜索排序算法实现"", 《电脑知识与技术》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217037A (en) * 2021-12-15 2022-03-22 中国环境监测总站 Safety early warning method and system based on water environment monitoring data
CN114595964A (en) * 2022-03-08 2022-06-07 中国环境科学研究院 Surface water section data processing method, condition access method and related device
CN114595964B (en) * 2022-03-08 2022-10-28 中国环境科学研究院 Surface water section data processing method, condition access method and related device
CN115392231A (en) * 2022-08-10 2022-11-25 山东大学 Water environment public opinion identification method based on artificial intelligence
CN116228499A (en) * 2023-04-26 2023-06-06 四川省林业科学研究院 Species intrusion detection method and system based on association relation
CN116228499B (en) * 2023-04-26 2023-08-04 四川省林业科学研究院 Species intrusion detection method and system based on association relation

Similar Documents

Publication Publication Date Title
CN114138857A (en) Big data mining method and device based on watershed water environment
DasGupta et al. An indicator based approach to assess coastal communities’ resilience against climate related disasters in Indian Sundarbans
Pettigrove et al. A field‐based microcosm method to assess the effects of polluted urban stream sediments on aquatic macroinvertebrates
Maurer et al. The Infaunal Trophic Index (ITI): its suitability for marine environmental monitoring
Rubec et al. Spatial methods being developed in Florida to determine essential fish habitat
Meador et al. Relations between altered streamflow variability and fish assemblages in eastern USA streams
Munger et al. US National Wetland Inventory Classif ications as Predictors of the Occurrence of Columbia Spotted Frogs (Rana luteiventris) and Pacific Treefrogs (Hylaregilla)
Burgad et al. Temporal and spatial dynamics of fish community structure during watershed alteration in two Ouachita River systems
Smith et al. Assessing macroinvertebrate community response to restoration of Big Spring Run: Expanded analysis of before‐after‐control‐impact sampling designs
Capmourteres et al. Assessing the causal relationships of ecological integrity: a re‐evaluation of Karr's iconic Index of Biotic Integrity
Wisdom et al. Performance of greater sage‐grouse models for conservation assessment in the Interior Columbia Basin, USA
Beier et al. Processes of collating a European fisheries database to meet the objectives of the European Union Water Framework Directive
Ipe Issues in the management of the environment and natural resources in Bangladesh
Akindele et al. Analysis of benthic macroinvertebrates, biological water quality and conservation value of a tropical river and UNESCO‐protected environment
CN114066077B (en) Environmental sanitation risk prediction method based on emergency event space warning sign analysis
Carnicer et al. Global trends, biases and gaps in the scientific literature about freshwater fish eggs and larvae
Sillitoe et al. What local people want with forests: ideologies and attitudes in Papua New Guinea
Hamm Development and evaluation of a data dictionary to standardize salmonid habitat assessments in the Pacific Northwest
Mobasher et al. Ecological indicators for qualitative assessment of Ojarud River: A case study
Yustiana et al. Analysis Study Of Coastal Reclamation Impact To The National Resilience (A Literary, Philosophical, Juridical And Sociological Approach)
Li et al. Prediction of ecological sustainable development capacity of Bohai Sea in China based on DPSIR model
Smith et al. Declines and extinctions of mountain yellow‐legged frogs have small effects on benthic macroinvertebrate communities
Keeling Charting marine pollution science: oceanography on Canada's Pacific coast, 1938–1970
YOO A Study on the analysis of water management issues using text mining based on government press release and online news data
Xavier et al. Dynamic and maintenance of water purification ecosystem service in the Guandu River Hydrographic Region, Rio de Janeiro, Brazil

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220304

RJ01 Rejection of invention patent application after publication