CN115878599A - Sewage industry data cleaning method - Google Patents

Sewage industry data cleaning method Download PDF

Info

Publication number
CN115878599A
CN115878599A CN202211320749.5A CN202211320749A CN115878599A CN 115878599 A CN115878599 A CN 115878599A CN 202211320749 A CN202211320749 A CN 202211320749A CN 115878599 A CN115878599 A CN 115878599A
Authority
CN
China
Prior art keywords
data
industry
test
industry data
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211320749.5A
Other languages
Chinese (zh)
Inventor
田志民
牛豫海
张自力
马景春
周晓萍
滕国宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cangzhou Water Supply And Drainage Group Co ltd
Hebei Construction & Investment Water Investment Co ltd
Hebei Construction Investment Hengshui Water Affairs Co ltd
Korla Longrun Water Treatment Co ltd
Hebei Xiong'an Ruitian Technology Co ltd
Original Assignee
Cangzhou Water Supply And Drainage Group Co ltd
Hebei Construction & Investment Water Investment Co ltd
Hebei Construction Investment Hengshui Water Affairs Co ltd
Korla Longrun Water Treatment Co ltd
Hebei Xiong'an Ruitian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cangzhou Water Supply And Drainage Group Co ltd, Hebei Construction & Investment Water Investment Co ltd, Hebei Construction Investment Hengshui Water Affairs Co ltd, Korla Longrun Water Treatment Co ltd, Hebei Xiong'an Ruitian Technology Co ltd filed Critical Cangzhou Water Supply And Drainage Group Co ltd
Priority to CN202211320749.5A priority Critical patent/CN115878599A/en
Publication of CN115878599A publication Critical patent/CN115878599A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data cleaning method for sewage industry, belonging to the field of data safety, and the cleaning method comprises the following specific steps: (1) receiving sewage industry data and carrying out risk investigation; (2) constructing an industry database and carrying out data quality detection; (3) Constructing a data cleaning framework to clean the industry data; (4) Detecting the operation efficiency of the server in real time and optimizing the performance; the invention can improve the detection precision of the quality detection model and the efficiency of searching parameters, does not need to manually set the parameters, has simple operation process and easy operation, improves the use experience of workers, can perform large-granularity compression, improves the compression efficiency, effectively improves the response efficiency of the server, and saves the time required by compressing the memory.

Description

Sewage industry data cleaning method
Technical Field
The invention relates to the field of data security, in particular to a data cleaning method in sewage industry.
Background
In order to simplify the sewage database, data cleaning in the sewage industry becomes one of important attention objects in the industry; therefore, the invention of a data cleaning method in the sewage industry becomes particularly important;
through retrieval, the Chinese patent number CN109783813A discloses a data cleaning method and system, the invention standardizes irregular industry data by combining word segmentation with a method for calculating the Jaccard distance, cleans the irregular enterprise industry data into corresponding data in national standard, increases the usability of the industry data, but has low detection precision, requires manual parameter setting and has complex operation steps; in addition, the existing sewage industry data cleaning method has low server response efficiency and long time for memory compression; therefore, a data cleaning method in the sewage industry is provided.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a method for cleaning data in the sewage industry.
In order to achieve the purpose, the invention adopts the following technical scheme:
the method for cleaning the data in the sewage industry comprises the following specific steps:
(1) Receiving sewage industry data and carrying out risk investigation;
(2) An industry database is constructed and data quality detection is carried out;
(3) Constructing a data cleaning framework to clean the industry data;
(4) And detecting the operation efficiency of the server in real time and optimizing the performance.
As a further scheme of the invention, the risk investigation in the step (1) comprises the following specific steps:
the method comprises the following steps: the server receives and receives the industry data, then converts non-binary data in the received industry data into binary data, and converts each group of industry data sets into a specified detection interval by a Min-Max normalization method;
step two: then the server is in communication connection with the virus database and the cloud virtual machine, analyzes each group of industry data, performs data retrieval comparison in the virus database according to an analysis result, and intercepts corresponding industry data if data with consistent comparison results exist;
step three: and if the data with the consistent comparison result does not exist, uploading the related industry data to the cloud virtual machine for infection simulation, then performing virus analysis on the simulation result by the server according to the infection standard established by the network virus definition, and intercepting the industry data with the consistent analysis result.
As a further scheme of the present invention, the data quality detection in step (2) specifically comprises the following steps:
step I: constructing a quality detection model, training and optimizing the quality detection model according to data quality dimensions, sequentially inputting industry data into the quality detection model, and classifying each group of industry data by the quality test model according to different enterprises;
and step II: then, performing feature dimension reduction processing on each group of industrial data, screening out feature parameters capable of expressing the quality of the industrial data, screening out feature parameters with poor characterization capability, dividing the industrial data into a training set and a testing set, and performing standardized processing on the training set to generate a training sample;
step III: and conveying the training samples to a quality detection model, setting optimal parameters of the model according to an optimization result, training the quality detection model by adopting a long-term iteration method, inputting a test set into the trained model, drawing data accuracy, universality, completeness and consistency curves, analyzing the data, and marking and recording industrial data with data loss, similar repetition, abnormality, logic errors and inconsistency.
As a further scheme of the present invention, the data quality dimension in step i specifically includes data specification, data integrity criterion, data repetition, data accuracy, consistency and synchronization, timeliness and availability, ease and maintainability, data coverage, expression quality, data decay, utility and understandability, correlation and credibility;
in the step II, the specific formula of the feature dimension reduction is as follows:
Figure SMS_1
wherein σ represents a standard deviation of the characteristic parameter; μ represents a mean value of the characteristic parameter; CV represents the variance coefficient of the characteristic parameter, if the variance coefficient is larger, the CV represents more important, otherwise, the CV represents unimportant, and the CV is eliminated;
the specific formula of the standardization treatment in the step II is as follows:
Figure SMS_2
wherein x represents a proposed characteristic parameter; mean (x) represents the average processing of the characteristic parameters; std (x) represents the standard deviation of the characteristic parameter.
As a further scheme of the invention, the quality detection model training optimization in the step I comprises the following specific steps:
s1.1: the server receives a test data set and data quality dimensions uploaded by workers, selects a group of test data from N groups of test data sets as verification data, fits the rest data into a group of test models, verifies the precision of the test models by the verification data, calculates the detection capability of the test models through root-mean-square errors, repeats the steps for N times, and performs parameter optimization processing on generated precision parameters;
s1.2: initializing a parameter range, then confirming a learning rate according to a system default or manual setting mode, dividing data samples, selecting any subset as a test set and the rest subsets as training sets for each group of data, predicting the test set after training a test model through the training sets, and counting the root mean square error of a test result;
s1.3: meanwhile, the test set is replaced by another subset, the residual subset is taken as a training set, the root mean square error is counted again until all data are predicted once, the corresponding combination parameter when the root mean square error is minimum is selected as the optimal parameter in the data interval, and the original parameter in the quality test model is replaced by the optimal parameter;
s1.4: recording each group of data and detection results detected by the quality detection model, replacing original data in the test data set for subsequent parameter updating, simultaneously evaluating the accuracy, the detection rate and the false alarm rate of the real-time quality detection model, and feeding the evaluation results back to working personnel for checking.
As a further scheme of the present invention, the specific data cleaning steps of the data cleaning framework in step (3) are as follows:
s2.1: the data cleaning framework intercepts all groups of marked industry data and classifies the industry data with data loss, similar repetition, abnormity, logic errors and inconsistency;
s2.2: neglecting the industrial data records with data missing, removing the data attribute, then using a system default value, an attribute average value and a similar sample average value to perform relevance estimation on the missing value, and filling the estimation result as a filling value;
s2.3: for similar and repeated industry data, the data cleaning framework selects key attribute fields, assigns corresponding weights to each key attribute according to the importance degree of the key attributes in the expression record characteristics, enables the key fields to express the characteristics of the record more accurately, selects an attribute field matching degree algorithm to perform secondary detection on the marked similar and repeated industry data, cleans the similar and repeated industry data according to a set cleaning rule, stores the data which cannot be automatically processed into a log table, and provides a corresponding cleaning result report;
s2.4: for abnormal data, the data cleaning framework carries out clustering processing on similar industry data, takes values falling outside a clustering set as isolated points, removes the data falling in the isolated points, then carries out box separation processing on the residual abnormal data, and carries out smoothing processing according to a box average value;
s2.5: for the industry data with logic errors, the data cleaning framework calls related rules from the logic definition library to process error attribute values in the industry data, and if no proper processing rule exists, the data is stored in a log table and is manually processed; for inconsistent industry data, the data cleansing framework cleanses the industry data by transforming, formatting or summarizing the industry data.
As a further scheme of the present invention, the specific steps of the server performance optimization in step (4) are as follows:
p1: the method comprises the steps that a server internal performance optimization framework generates a starting linked list for each port of a server, the head of each set of starting linked lists is further linked according to the LRU sequence of the ports, port information with the minimum interaction frequency is collected, the starting linked list of the port is arranged at the head of the LRU linked list and is sequentially ordered;
p2: clearing access bits of all updated page table entries before starting a port, rechecking the access bits of all pages before the port starting time is finished by a performance optimization framework, updating data of each group of pages in a starting linked list after the check is finished, sequentially selecting the least active port from the head of an LRU linked list, and selecting a victim page from the corresponding starting linked list until enough pages are obtained;
p3: merging the selected victim pages into a block, marking the block, waking up a compression driver to analyze the marked block and obtain physical pages belonging to the block, copying the physical pages into a buffer area, calling a compression algorithm to compress the physical pages in the buffer area into a compression block, and storing the compression block into a performance optimization framework area.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the conventional cleaning method, the method for cleaning the sewage industry data has the advantages that the quality detection model is constructed, parameters of the quality detection model are calculated and selected according to a test data set uploaded by a worker and data quality dimensions, then characteristic dimension reduction processing is carried out on each group of industry data, characteristic parameters capable of expressing the quality of the industry data are screened out, the characteristic parameters poor in characterization capability are screened out, the industry data are divided into a training set and a test set, the training set is subjected to standardization processing to generate a training sample, the training sample is conveyed into the quality detection model, the optimal parameters of the model are set according to the optimization result, the quality detection model is trained by adopting a long-term iteration method, the test set is input into the trained model, data accuracy, universality, completeness and consistency curves are drawn, the detection accuracy of the quality detection model and the parameter searching efficiency can be improved, manual parameter setting is not needed, the operation process is simple and easy to operate, and the use experience of the worker is improved;
2. the sewage industry data cleaning method comprises the steps of generating a starting linked list for each port of a server through a server internal performance optimization framework, further linking the head of each group of starting linked lists according to the LRU sequence of the ports, sequencing the starting linked lists of each port in sequence from small to large according to interaction frequency, updating data of each group of pages in the starting linked list before the port starting time is over, sequentially selecting the most inactive port from the head of the LRU linked list, selecting a victim page from the corresponding starting linked list, combining the selected victim pages into a block, marking the block, waking up a compression driving program to analyze the marked block and obtain a physical page belonging to the block, copying the physical page into a buffer area, then calling a compression algorithm to compress the physical page in the buffer area into a compression block, storing the compression block into a performance optimization framework area, compressing at large granularity, improving compression efficiency, effectively improving server response efficiency, and saving time required by a compression memory.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a block diagram of a process of a sewage industry data cleaning method according to the present invention.
Detailed Description
Example 1
Referring to fig. 1, the method for cleaning the data in the sewage industry comprises the following specific steps:
and receiving sewage industry data and carrying out risk investigation.
Specifically, the server receives and receives industry data, then converts non-binary data in the received industry data into binary data, converts each group of industry data into a specified detection interval through a Min-Max normalization method, then is in communication connection with the virus database and the cloud virtual machine, analyzes each group of industry data, carries out data retrieval comparison in the virus database according to an analysis result, intercepts corresponding industry data if data with consistent comparison results exist, uploads the related industry data to the cloud virtual machine for infection simulation if data with consistent comparison results do not exist, and then carries out virus analysis on simulation results according to infection standards established by network virus definition and intercepts the industry data with consistent analysis results.
And (4) constructing an industry database and detecting the data quality.
Specifically, a quality detection model is established, training and optimization are carried out on the quality detection model according to data quality dimensionality, then industry data are sequentially recorded into the quality detection model, the quality test model carries out classification processing on various groups of industry data according to different enterprises, feature dimension reduction processing is carried out on various groups of industry data, feature parameters capable of representing the quality of the industry data are screened out, the feature parameters poor in characterization capability are screened out, the industry data are divided into a training set and a testing set, the training set is subjected to standardization processing to generate training samples, the training samples are conveyed into the quality detection model, model optimal parameters are set according to optimization results, the quality detection model is trained by a long-term iteration method, the testing set is input into the trained model, data accuracy, generality, completeness and consistency curves are drawn out, analysis is carried out, and meanwhile, the industry data with data loss, similar repetition, abnormity, logic errors and inconsistency are marked and recorded.
The method includes the steps that a server receives a test data set and data quality dimensions uploaded by workers, a group of test data is selected from N groups of test data sets to serve as verification data, the rest data are fitted into a group of test models, the precision of the test models is verified through the verification data, the detection capacity of the test models is calculated through root mean square errors, the operation is repeated for N times, parameter optimization processing is conducted on generated precision parameters, a parameter range is initialized, learning rates are confirmed according to a system default or manual setting mode, data samples are divided, any subset is selected to serve as the test set and the rest subsets serve as training sets, after training of the test models through the training sets is completed, the test sets are predicted, root mean square errors of test results are counted, the test sets are replaced by another subset, the rest subsets serve as the training sets, errors are counted again until all data are predicted once, corresponding combination parameters when the root mean square errors are smallest are selected to serve as optimal parameters in a data interval, original parameters in the quality test models are replaced by the optimal parameters, all groups of the quality detection models and the detection results and the replacement results of the original data are recorded and the new quality parameters are evaluated in real time.
In this embodiment, the data quality dimension specifically includes data specification, data integrity criteria, data duplication, data accuracy, consistency and synchronization, timeliness and availability, ease of use and maintainability, data coverage, expression quality, data decay, utility, and understandability, relevance, and credibility.
It should be further explained that the specific formula of feature dimension reduction is as follows:
Figure SMS_3
wherein σ represents a standard deviation of the characteristic parameter; μ represents a mean value of the characteristic parameter; CV represents the variance coefficient of the characteristic parameter, if the variance coefficient is larger, the CV represents more important, otherwise, the CV represents unimportant, and the CV is eliminated;
the normalization process is specifically formulated as follows:
Figure SMS_4
wherein x represents a proposed characteristic parameter; mean (x) represents the average processing of the characteristic parameters; std (x) represents the standard deviation of the characteristic parameter.
Example 2
Referring to fig. 1, the method for cleaning data in sewage industry comprises the following steps:
and constructing a data cleaning framework to clean the industry data.
Specifically, the data cleaning framework intercepts all groups of marked industry data, classifies and processes the industry data with data missing, similar repetition, abnormity, logic errors and inconsistency, ignores the industry data records with data missing, removes the data attributes, uses a system default value, an attribute average value and a similar sample average value to perform relevance estimation on the missing values, fills the estimation results as filling values, selects a key attribute field for the similar and repeated industry data, distributes corresponding weight for each key attribute according to the importance degree of the key attribute in the expression record characteristics, enables the key field to more accurately express the characteristics of the record, selects an attribute field matching degree algorithm to perform secondary detection on the marked similar and repeated industry data, cleans the similar and repeated industry data according to the set cleaning rule, stores the data which cannot be automatically processed into a log table, provides a corresponding cleaning result report, for the abnormal data, performs clustering processing on the similar industry data, treats the values which fall outside the cluster set as an isolated data set, cleans the data set, stores the data in the logic error processing log table, and defines the residual data in a logic error processing database, and processes the data box according to the logic error of the logic data, if the residual data is not defined in the logic error processing log box, and the data, the data box, and processes the similar industry data which are not defined by the isolated and the logic error; for inconsistent industry data, the data cleansing framework cleanses the industry data by transforming, formatting or summarizing the industry data.
And detecting the operating efficiency of the server in real time and optimizing the performance.
Specifically, a server internal performance optimization framework generates a start linked list for each port of a server, the head of each set of start linked lists is further linked according to the LRU sequence of the ports, port information with the minimum interaction frequency is collected, the start linked lists of the ports are arranged at the head of the LRU linked list and are sequentially ordered, access bits of all updated page table items are cleared before the ports are started, the performance optimization framework rechecks the access bits of all pages before the port starting time is over, data updating is carried out on each set of pages in the start linked list after the check is completed, the least active port is sequentially selected from the head of the LRU linked list, a victim page is selected from the corresponding start linked list until enough pages are obtained, the selected victim pages are combined into a block and marked, a compression driving program is awakened to analyze the marked block and obtain physical pages belonging to the block, then the physical pages are copied into a buffer area, then a compression algorithm is called to compress the physical pages in the buffer area into a compression block, and the compression block is stored in the performance optimization framework area.

Claims (7)

1. The method for cleaning the data in the sewage industry is characterized by comprising the following specific steps:
(1) Receiving sewage industry data and carrying out risk investigation;
(2) An industry database is constructed and data quality detection is carried out;
(3) Constructing a data cleaning framework to clean the industry data;
(4) And detecting the operating efficiency of the server in real time and optimizing the performance.
2. The sewage industry data cleaning method according to claim 1, wherein the risk investigation in step (1) comprises the following specific steps:
the method comprises the following steps: the server receives and receives the industry data, then converts non-binary data in the received industry data into binary data, and converts each group of industry data sets into a specified detection interval by a Min-Max normalization method;
step two: then the server is in communication connection with the virus database and the cloud virtual machine, analyzes each group of industry data, performs data retrieval comparison in the virus database according to an analysis result, and intercepts corresponding industry data if data with consistent comparison results exist;
step three: and if no data with consistent comparison results exist, uploading the related industry data to a cloud virtual machine for infection simulation, then carrying out virus analysis on the simulation results by the server according to infection standards established by network virus definitions, and intercepting the industry data with consistent analysis results.
3. The sewage industry data cleaning method according to claim 1, wherein the data quality detection in step (2) specifically comprises the following steps:
step I: constructing a quality detection model, training and optimizing the quality detection model according to data quality dimensions, sequentially inputting industry data into the quality detection model, and classifying each group of industry data by the quality test model according to different enterprises;
and step II: then, carrying out feature dimension reduction processing on each group of industrial data, screening out feature parameters capable of expressing the quality of the industrial data, screening out feature parameters with poor characterization capability, dividing the industrial data into a training set and a testing set, and carrying out standardized processing on the training set to generate a training sample;
step III: and conveying the training samples to a quality detection model, setting optimal parameters of the model according to an optimization result, training the quality detection model by adopting a long-term iteration method, inputting a test set into the trained model, drawing data accuracy, universality, completeness and consistency curves, analyzing the data, and marking and recording industrial data with data loss, similar repetition, abnormality, logic error and inconsistency.
4. The sewage industry data cleaning method according to claim 3, wherein the data quality dimension in step I specifically includes data specification, data integrity criteria, data duplication, data accuracy, consistency and synchronicity, timeliness and availability, ease and maintainability, data coverage, expression quality, data decay, utility and intelligibility, correlation and credibility;
in the step II, the specific formula of the characteristic dimension reduction is as follows:
Figure FDA0003910298930000021
wherein σ represents a standard deviation of the characteristic parameter; μ represents a mean value of the characteristic parameter; CV represents variance coefficient of characteristic parameter, if the variance coefficient is larger, it represents more important, otherwise, it represents unimportant, and it is eliminated;
the specific formula of the standardization treatment in the step II is as follows:
Figure FDA0003910298930000031
wherein x represents a proposed characteristic parameter; mean (x) represents the average processing of the characteristic parameters; std (x) represents the standard deviation of the characteristic parameter.
5. The sewage industry data cleaning method according to claim 3, wherein the quality detection model training optimization in step I specifically comprises the following steps:
s1.1: the server receives a test data set and data quality dimensions uploaded by workers, selects a group of test data from N groups of test data sets as verification data, fits the rest data into a group of test models, verifies the precision of the test models by the verification data, calculates the detection capability of the test models through root-mean-square errors, repeats the steps for N times, and performs parameter optimization processing on generated precision parameters;
s1.2: initializing a parameter range, then confirming a learning rate according to a system default or manual setting mode, dividing data samples, selecting any subset as a test set and the rest subsets as training sets for each group of data, predicting the test set after training a test model through the training sets, and counting the root mean square error of a test result;
s1.3: meanwhile, the test set is replaced by another subset, the residual subset is taken as a training set, the root mean square error is counted again until all data are predicted once, the corresponding combination parameter when the root mean square error is minimum is selected as the optimal parameter in the data interval, and the original parameter in the quality test model is replaced by the optimal parameter;
s1.4: and recording all groups of data and detection results detected by the quality detection model, replacing original data in the test data set for subsequent parameter updating, simultaneously evaluating the accuracy, the detectable rate and the false alarm rate of the real-time quality detection model, and feeding the evaluation results back to workers for checking.
6. The sewage industry data cleaning method according to claim 3, wherein the data cleaning framework in step (3) comprises the following specific data cleaning steps:
s2.1: the data cleaning framework intercepts each marked group of industry data and classifies and processes the industry data with data loss, similar repetition, abnormity, logic errors and inconsistency;
s2.2: neglecting the industrial data records with data missing, removing the data attribute, then using a system default value, an attribute average value and a similar sample average value to carry out relevance estimation on the missing value, and filling the estimation result as a filling value;
s2.3: for similar and repeated industry data, the data cleaning framework selects key attribute fields, assigns corresponding weights to each key attribute according to the importance degree of the key attributes in the expression record characteristics, enables the key fields to more accurately express the characteristics of records, selects an attribute field matching degree algorithm to perform secondary detection on the marked similar and repeated industry data, cleans the similar and repeated industry data according to a set cleaning rule, stores the data which cannot be automatically processed into a log table, and provides a corresponding cleaning result report;
s2.4: for abnormal data, the data cleaning framework carries out clustering processing on similar industry data, takes values falling out of a clustering set as isolated points, removes the data falling in the isolated points, carries out binning processing on the residual abnormal data and carries out smoothing processing according to a binning average value;
s2.5: for the industry data with logic errors, the data cleaning framework calls related rules from the logic definition library to process error attribute values in the industry data, and if no proper processing rule exists, the data is stored in a log table and is manually processed; for inconsistent industry data, the data cleansing framework cleanses the industry data by transforming, formatting or summarizing the industry data.
7. The sewage industry data cleaning method according to claim 1, wherein the server performance optimization in step (4) specifically comprises the following steps:
p1: the method comprises the steps that a server internal performance optimization framework generates a starting linked list for each port of a server, the head of each set of starting linked lists is further linked according to the LRU sequence of the ports, port information with the minimum interaction frequency is collected, the starting linked list of the port is arranged at the head of the LRU linked list and is sequentially ordered;
p2: clearing access bits of all updated page table entries before starting a port, rechecking the access bits of all pages before the port starting time is finished by a performance optimization framework, updating data of each group of pages in a starting linked list after the check is finished, sequentially selecting the least active port from the head of an LRU linked list, and selecting a victim page from the corresponding starting linked list until enough pages are obtained;
p3: merging the selected victim pages into a block, marking the block, waking up a compression driver to analyze the marked block and obtain physical pages belonging to the block, copying the physical pages into a buffer area, calling a compression algorithm to compress the physical pages in the buffer area into a compression block, and storing the compression block into a performance optimization framework area.
CN202211320749.5A 2022-10-26 2022-10-26 Sewage industry data cleaning method Pending CN115878599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211320749.5A CN115878599A (en) 2022-10-26 2022-10-26 Sewage industry data cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211320749.5A CN115878599A (en) 2022-10-26 2022-10-26 Sewage industry data cleaning method

Publications (1)

Publication Number Publication Date
CN115878599A true CN115878599A (en) 2023-03-31

Family

ID=85758972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211320749.5A Pending CN115878599A (en) 2022-10-26 2022-10-26 Sewage industry data cleaning method

Country Status (1)

Country Link
CN (1) CN115878599A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061549A (en) * 2023-08-04 2023-11-14 江苏城乡建设职业学院 Intelligent security system based on cloud platform
CN117370331A (en) * 2023-12-08 2024-01-09 河北建投水务投资有限公司 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443503A (en) * 2019-08-07 2019-11-12 成都九鼎瑞信科技股份有限公司 The training method and related system of water utilities system industrial gross output value analysis model
CN113010506A (en) * 2021-03-11 2021-06-22 江苏省生态环境监控中心(江苏省环境信息中心) Multi-source heterogeneous water environment big data management system
CN113378473A (en) * 2021-06-23 2021-09-10 中国地质科学院水文地质环境地质研究所 Underground water arsenic risk prediction method based on machine learning model
CN113656386A (en) * 2021-07-13 2021-11-16 华中科技大学 Industrial equipment data cleaning method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443503A (en) * 2019-08-07 2019-11-12 成都九鼎瑞信科技股份有限公司 The training method and related system of water utilities system industrial gross output value analysis model
CN113010506A (en) * 2021-03-11 2021-06-22 江苏省生态环境监控中心(江苏省环境信息中心) Multi-source heterogeneous water environment big data management system
CN113378473A (en) * 2021-06-23 2021-09-10 中国地质科学院水文地质环境地质研究所 Underground water arsenic risk prediction method based on machine learning model
CN113656386A (en) * 2021-07-13 2021-11-16 华中科技大学 Industrial equipment data cleaning method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张娟,张自力,牛豫海: "超滤膜短流程水处理工艺在沙河城区水厂的应用", 《水利建设与管理》, 28 February 2021 (2021-02-28) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061549A (en) * 2023-08-04 2023-11-14 江苏城乡建设职业学院 Intelligent security system based on cloud platform
CN117370331A (en) * 2023-12-08 2024-01-09 河北建投水务投资有限公司 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium
CN117370331B (en) * 2023-12-08 2024-02-20 河北建投水务投资有限公司 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021184630A1 (en) Method for locating pollutant discharge object on basis of knowledge graph, and related device
CN110263230B (en) Data cleaning method and device based on density clustering
CN115878599A (en) Sewage industry data cleaning method
CN103513983B (en) method and system for predictive alert threshold determination tool
CN111639497B (en) Abnormal behavior discovery method based on big data machine learning
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN105373894A (en) Inspection data-based power marketing service diagnosis model establishing method and system
WO2021159834A1 (en) Abnormal information processing node analysis method and apparatus, medium and electronic device
CN108052542B (en) Multidimensional data analysis method based on presto data
CN114757468B (en) Root cause analysis method for process execution abnormality in process mining
CN113778766B (en) Hard disk fault prediction model establishment method based on multidimensional characteristics and application thereof
CN112905380A (en) System anomaly detection method based on automatic monitoring log
WO2024108973A1 (en) Credit assessment method for construction enterprises
CN114880312B (en) Flexibly-set application system service data auditing method
CN116226103A (en) Method for detecting government data quality based on FPGrow algorithm
CN117453764A (en) Data mining analysis method
CN117411780A (en) Network log anomaly detection method based on multi-source data characteristics
CN114710344B (en) Intrusion detection method based on traceability graph
CN113033694B (en) Data cleaning method based on deep learning
CN114722960A (en) Method and system for detecting incomplete track of event log in business process
CN109582806B (en) Personal information processing method and system based on graph calculation
CN118331952B (en) Financial data cleaning management system and method based on big data
CN111291376A (en) Web vulnerability verification method based on crowdsourcing and machine learning
CN118070281A (en) Malicious code detection method based on log information and graph neural network
CN113822048B (en) Social media text denoising method based on space-time burst characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination