WO2021184630A1 - 基于知识图谱定位排污对象的方法及相关设备 - Google Patents

基于知识图谱定位排污对象的方法及相关设备 Download PDF

Info

Publication number
WO2021184630A1
WO2021184630A1 PCT/CN2020/104753 CN2020104753W WO2021184630A1 WO 2021184630 A1 WO2021184630 A1 WO 2021184630A1 CN 2020104753 W CN2020104753 W CN 2020104753W WO 2021184630 A1 WO2021184630 A1 WO 2021184630A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
feature
preset
knowledge graph
Prior art date
Application number
PCT/CN2020/104753
Other languages
English (en)
French (fr)
Inventor
陈功
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021184630A1 publication Critical patent/WO2021184630A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • This application relates to the field of knowledge relationship analysis, and in particular to a method and related equipment for locating sewage objects based on a knowledge map.
  • the existing products only monitor the final pollutants discharged by the company. However, due to the uneven quality of the monitoring equipment and the operation and maintenance conditions, the measurement data is inaccurate, and the company also has fraudulent behaviors, which makes extensive management.
  • the inventor realized that only relying on the end The one-size-fits-all management method of emission monitoring is difficult to effectively identify and supervise the abnormal discharge behavior of enterprises, resulting in low accuracy in locating the discharge targets.
  • the main purpose of this application is to solve the technical problems of inaccurate measurement data in existing measurement equipment and low accuracy in locating pollutant objects due to abnormal pollutant discharge behavior of enterprises.
  • the first aspect of the present application provides a method for locating pollutant discharge objects based on a knowledge graph, which includes: extracting triples from preset data through a natural language processing algorithm, and storing the triples in the preset
  • a target knowledge map is obtained, the target knowledge map is used to indicate the production standards, pollution discharge standards, and laws and regulations of the target enterprise
  • pollutant discharge monitoring is performed on the target enterprise within a preset time period to obtain pollution discharge monitoring data
  • Preprocess the pollution monitoring data to obtain a standard time series data set; perform feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified; predict the feature data to be identified through a trained model , Obtain the prediction result, and set the target label according to the prediction result, add the target label to the target knowledge graph, and the prediction result is used to indicate the target enterprise with abnormal pollutant discharge; according to the feature data to be identified and The prediction result obtains discrimination basis data from the target knowledge graph of the target company, and sends
  • the second aspect of the present application provides a device for locating pollutant objects based on a knowledge graph, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes
  • the computer-readable instructions implement the following steps: extract triples from preset data through a natural language processing algorithm, and store the triples in a preset graph database to obtain a target knowledge graph, the target knowledge graph Used to indicate the target company’s production standards, pollution discharge standards, and legal and regulatory compliance clauses; monitor the target company’s discharge within a preset period of time to obtain discharge monitoring data; preprocess the discharge monitoring data to obtain standard time series data Set; perform feature extraction and feature fusion on the standard time series data set to obtain the feature data to be identified; predict the feature data to be identified through the trained model to obtain the prediction result, and set the target label according to the prediction result , Adding the target tag to the target knowledge graph, and the prediction result is used to indicate a target company with abnormal pollutant discharge; according to the feature data to be identified
  • the third aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions run on the computer, the computer executes the following steps: through a natural language processing algorithm Extract triplets from the preset data, and store the triplets in the preset graph database to obtain the target knowledge graph, which is used to indicate the production standards, pollution discharge standards, and laws and regulations of the target enterprise ; Perform pollutant discharge monitoring on the target enterprise within a preset time period to obtain pollutant discharge monitoring data; preprocess the pollutant discharge monitoring data to obtain a standard time series data set; perform feature extraction and feature fusion on the standard time series data set, Obtain the feature data to be identified; predict the feature data to be identified through the trained model to obtain the prediction result, and set the target label according to the prediction result, and add the target label to the target knowledge graph, so The prediction result is used to indicate the target company with abnormal discharge; according to the feature data to be identified and the prediction result, the judgment basis data is obtained from the target knowledge graph of the target company
  • the fourth aspect of the present application provides a device for locating pollutant discharge objects based on a knowledge graph, including: an extraction unit for extracting triples from preset data through a natural language processing algorithm, and storing the triples in the preset
  • a target knowledge map is obtained, and the target knowledge map is used to indicate the production standards, pollution discharge standards, and laws and regulations of the target enterprise
  • the monitoring unit is used to monitor the pollution discharge of the target enterprise within a preset time period, Obtain sewage monitoring data; a preprocessing unit for preprocessing the sewage monitoring data to obtain a standard time series data set; an extraction fusion unit for performing feature extraction and feature fusion on the standard time series data set to obtain the to-be-identified Feature data; a prediction unit for predicting the feature data to be identified through a trained model to obtain a prediction result, and set a target label according to the prediction result, and add the target label to the target knowledge graph ,
  • the prediction result is used to indicate the target enterprise with abnormal pollutant discharge; the judgment and
  • a natural language processing algorithm extracts triples from preset data, and stores the triples in a preset graph database to obtain a target knowledge graph, and the target knowledge graph is used to indicate The target company’s production standards, pollution discharge standards, and legal and regulatory compliance clauses; monitor the target company’s discharge within a preset period of time to obtain discharge monitoring data; preprocess the discharge monitoring data to obtain a standard time series data set; Perform feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified; predict the feature data to be identified through the trained model to obtain a prediction result, and set target labels according to the prediction result, and The target tag is added to the target knowledge graph, and the prediction result is used to indicate the target enterprise with abnormal pollutant discharge; and the judgment basis is obtained from the target knowledge graph of the target enterprise according to the feature data to be identified and the prediction result Data and send early warning information to the target enterprise with abnormal pollutant discharge, where the early warning information is used to instruct the target enterprise to be detected according to the discrimination basis
  • FIG. 1 is a schematic diagram of an embodiment of a method for locating pollutant discharge objects based on a knowledge map in an embodiment of the application;
  • FIG. 2 is a schematic diagram of another embodiment of a method for locating pollutant discharge objects based on a knowledge graph in an embodiment of the application;
  • FIG. 3 is a schematic diagram of an embodiment of a device for locating sewage objects based on a knowledge map in an embodiment of the application;
  • FIG. 4 is a schematic diagram of another embodiment of a device for locating sewage objects based on a knowledge map in an embodiment of the application;
  • Fig. 5 is a schematic diagram of an embodiment of a device for locating sewage objects based on a knowledge map in an embodiment of the application.
  • the embodiment of the application provides a method and related equipment for locating pollutant objects based on a knowledge graph, which is used to realize intelligent identification of abnormal pollutant discharge of an enterprise by combining the knowledge graph and artificial intelligence technology. At the same time, it passes conclusion verification and cyclically improves the identification algorithm, and finally achieves The purpose of accurately identifying the abnormal pollutant discharge behavior of enterprises, supervising pollutant enterprises efficiently, and improving the regional environmental quality.
  • An embodiment of the method for locating sewage objects based on the knowledge graph in the embodiment of the present application includes:
  • the execution subject of this application may be a device that locates pollutant discharge objects based on the knowledge graph, or it may be a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description.
  • the server extracts triples from the preset data through natural language processing algorithms, and stores the triples in the preset graph database to obtain the target knowledge graph, which is used to indicate the production standards, pollution discharge standards, and laws and regulations of the target company
  • the preset data includes the original information of the target company, environmental protection laws and regulations data, environmental protection industry standard data, and comprehensive sewage discharge standard data.
  • the knowledge graph is a semantic network that reveals the relationships between entities, and can formally describe things in the real world and their relationships.
  • Kinds of different entities, R r1, r2,..., r
  • the construction of the target knowledge map of key industries is mainly through analyzing the characteristics of key industries and constructing production standards, pollution discharge standards, and laws and regulations within the industry.
  • the server monitors the target company within a preset time period and obtains pollution monitoring data.
  • the pollution monitoring data is time-series, that is, a set of digital sequences arranged by the continuous monitoring values of the same phenomenon at different times.
  • the digital sequence has Regularity.
  • the preset duration is a preset period of time, such as 15 days. Further, the server collects pollutant discharge monitoring data of the target enterprise within a preset time period through a preset device.
  • the server preprocesses the sewage monitoring data to obtain a standard time series data set. Specifically, the server fills in the vacant values for the sewage monitoring data; the server smoothes the filled sewage monitoring data, and the smoothing process is mainly used to deal with random errors or deviations in the sewage monitoring data; the smoothed sewage monitoring data The isolated data is deleted, and a standard time series data set is obtained, where the isolated data is abnormal data.
  • the server performs feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified.
  • sampling the standard time series data set on the time axis is also called extracting features.
  • the corresponding sampling value is the feature value.
  • Feature extraction is to reduce the data time sampling value before classification, reducing the amount of data and improving the classification accuracy.
  • Feature fusion refers to the fusion of multiple features into one feature.
  • the server performs feature extraction on the standard time series data set through a preset algorithm to obtain the first feature vector; the server performs feature fusion on the first feature vector to obtain the second feature vector; the server performs feature extraction on the first feature vector according to the preset feature threshold Screen with the second feature vector to obtain the feature data to be identified.
  • the server predicts the characteristic data to be identified through the trained model, obtains the prediction result, and sets the target label according to the prediction result.
  • the server adds the target label to the target knowledge graph.
  • the prediction result is used to indicate the target enterprise with abnormal pollution discharge, which is understandable Yes, automatically extract the features of the feature data to be identified according to the trained model, calculate the corresponding weights according to the features, and calculate the prediction results based on the features and the corresponding weights.
  • the prediction result is a result data based on two classifications. A good model determines whether the target company is normal or abnormal.
  • 106 Obtain the discrimination basis data from the target knowledge map of the target company according to the feature data to be identified and the prediction result, and send early warning information to the target company with abnormal pollutant discharge.
  • the warning information is used by personnel to detect the target company according to the discrimination basis data.
  • the server obtains the discrimination basis data from the target knowledge map of the target company according to the feature data to be identified and the prediction result, and sends early warning information to the target company with abnormal pollutant discharge.
  • the warning information is used to instruct the target inspector to detect the target company according to the discrimination basis data .
  • the judgment basis data includes production standards, pollution discharge standards, and laws and regulations.
  • the server determines the unique identifier of the target company with abnormal pollution discharge according to the predicted result; the server determines the target knowledge graph according to the unique identifier of the target company; the server determines the target knowledge graph according to the unique identifier of the target company; The characteristic data and prediction results are read from the target knowledge map to determine the basis data, which includes production standards, pollution discharge standards, and laws and regulations; the server sends early warning information to target companies with abnormal pollution discharge, and the warning information is used to indicate the basis of the judgment The data is used to test the target company.
  • the on-site target inspector will conduct an on-site survey of the target company in accordance with production standards, pollution discharge standards, and laws and regulations, and obtain the survey results.
  • the survey results can be consistent with the predicted results. It can also be inconsistent.
  • another embodiment of the method for locating sewage objects based on the knowledge graph in the embodiment of the present application includes:
  • the preset structured data includes environmental protection laws and regulations data, environmental protection industry standard data, and comprehensive sewage discharge standard data;
  • the server obtains preset structured data, and performs data integration on the preset structured data to obtain the first data.
  • the preset structured data includes environmental protection laws and regulations data, environmental protection industry standard data, and comprehensive sewage discharge standard data. Specifically, the server regularly collects environmental protection laws and regulations data, environmental protection industry standard data, and sewage comprehensive discharge standard data from preset webpages.
  • the preset webpages include environmental protection department webpages; the server collects environmental protection laws and regulations data, environmental protection industry standard data, and The comprehensive sewage discharge standard data is set as the preset structured data; the server performs data integration on the preset structured data to obtain the first data.
  • the server obtains the unique identification of the target company, and reads the original information of the target company according to the unique identification of the target company.
  • the original information includes basic information, discharge information, production information, facility information, monitoring information, supervision information, and operating ledgers.
  • basic information includes pollution permits, monitoring factors, emission standards, and emissions.
  • Outlet information includes waste water outlets and exhaust gas outlets.
  • Production information includes products, capacity, raw materials, auxiliary materials and fuels.
  • Facilities information includes production facilities and wastewater.
  • monitoring information includes real-time monitoring data and historical monitoring data.
  • Supervision information includes supervision and law enforcement information, letter and complaint information, and administrative punishment information.
  • Operational ledger includes production facility ledger and governance facility ledger.
  • the knowledge extraction includes entity extraction, relationship extraction, and attribute extraction;
  • the server performs knowledge extraction on the original information of the target company through a natural language processing algorithm to obtain the second data.
  • the knowledge extraction includes entity extraction, relationship extraction and attribute extraction.
  • the second data is identified by a triplet, and the triplet includes (entity 1, relationship, entity 2) or (entity, attribute, attribute value).
  • Natural language processing algorithms include named entity recognition, syntactic dependence and entity relationship recognition.
  • triples include entity 1, relationship, entity 2, concept, attribute, attribute value, etc.
  • entity is the basic element in the knowledge graph, and different entities have different relationships; concept refers to collection, Category, object type, type of things, such as people or geography; attributes refer to the attributes, characteristics, characteristics, characteristics, and parameters that the object may have, such as nationality or birthday; attribute values refer to the value of the specified attributes of the object, such as China.
  • Each entity is represented by a globally unique identifier, each attribute and attribute value pair is used to represent the internal characteristics of the entity, and the relationship is used to connect two entities and indicate the association between the attribute and the relationship.
  • Knowledge fusion includes ontology alignment, entity linking, and data fusion;
  • the server performs knowledge fusion on the first data and the second data.
  • Knowledge fusion includes ontology alignment, entity linking, and data fusion. Due to the wide range of knowledge sources in the knowledge graph, there are problems such as uneven quality of knowledge, duplication of knowledge from different data sources, and insufficient correlation between knowledge, so knowledge integration must be carried out.
  • Knowledge fusion is a high-level knowledge organization that enables knowledge from different knowledge sources to perform heterogeneous data integration, disambiguation, processing, reasoning verification, and update steps under the same framework and norms to achieve data, information, methods, experience, and human thinking Fusion, forming a high-quality knowledge base.
  • the server performs matching processing on the data after the knowledge fusion through the preset enterprise portrait label model, and obtains the label data of the target enterprise, and the label data is expressed in the form of triples. Further, the server inputs the data after knowledge fusion into the preset corporate portrait label model, and the server matches the data after the knowledge fusion through the elements in the preset corporate portrait label model to obtain the knowledge fusion data
  • Corresponding classification The server determines the label data of the target company according to the corresponding classification of the data after knowledge fusion, and the label data is expressed in the form of triples. Among them, a piece of label data is generally represented by a set of triples, where the triples (a, b, c) indicate that the target company a puts the c label on the pollutant discharge behavior b.
  • the server generates a target knowledge graph of the target company according to the label data of the target company, and stores the target knowledge graph in a preset graph database. It is understandable that the construction of the knowledge map of key industries is mainly through the analysis of the characteristics of key industries, and the construction of production standards, pollution discharge standards, and laws and regulations within the industry.
  • the server monitors the target company within a preset time period and obtains pollution monitoring data.
  • the pollution monitoring data is time-series, that is, a set of digital sequences arranged by the continuous monitoring values of the same phenomenon at different times.
  • the digital sequence has Regularity.
  • the preset duration is a preset period of time, such as 7 days. Further, the server collects pollutant discharge monitoring data of the target enterprise within a preset time period through a preset device.
  • the server preprocesses the sewage monitoring data to obtain a standard time series data set. Specifically, the server fills in the vacant values for the sewage monitoring data; the server smoothes the filled sewage monitoring data, and the smoothing process is mainly used to deal with random errors or deviations in the sewage monitoring data; the smoothed sewage monitoring data The isolated data is deleted, and a standard time series data set is obtained, where the isolated data is abnormal data.
  • the server performs feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified.
  • feature extraction is to generate a lower-dimensional feature space from existing features, map relevant information in the original features to a few features, and discard irrelevant information.
  • the server performs feature extraction on the standard time series data set according to a preset algorithm to obtain the first feature vector.
  • the standard time series data set includes stationary series data and non-stationary series data.
  • the preset algorithms include statistical feature extraction algorithms, neural network feature extraction algorithms, and transform feature extraction algorithms.
  • the server when non-stationary sequence data is detected in the standard time series data set, the server performs differential operations on the non-stationary sequence data, that is, differential preprocessing, to obtain stationary sequence data; the server uses an autoregressive moving average model to analyze the stationary sequence data Fitting is performed to obtain the model coefficients, and the model coefficients are set as the first feature vector.
  • the server performs feature fusion on the first feature vector to obtain the second feature vector. Further, the server combines two or more first feature vectors into a second feature vector according to a preset feature fusion algorithm.
  • the set feature fusion algorithm includes a feature fusion algorithm based on Bayesian theory. It can be understood that the fusion of multiple first feature vectors generally has better classification performance than the first feature vector, and the correlation between the multiple first feature vectors that are fused at the same time is smaller.
  • the server screens the first feature vector and the second feature vector according to the preset threshold to obtain the feature data to be identified. Further, the server sets a preset feature threshold; the server selects a chi-square check algorithm to calculate the first feature vector and the second feature vector to obtain a feature check value; the server checks the first feature whose feature check value is greater than the preset feature threshold The vector and the second feature vector are filtered to obtain the feature data to be identified. For example, the server sets a preset feature threshold A for the average slope, and the server sets the first feature vector and the second feature vector with the average slope greater than the preset feature threshold A as feature data to be identified.
  • the server predicts the characteristic data to be identified through the trained model, obtains the prediction result, and sets the target label according to the prediction result.
  • the server adds the target label to the target knowledge graph.
  • the prediction result is used to indicate the target enterprise with abnormal pollution discharge, which is understandable Yes, automatically extract the features of the feature data to be identified according to the trained model, calculate the corresponding weights according to the features, and calculate the prediction results based on the features and the corresponding weights.
  • the prediction result is a result data based on two classifications. A good model determines whether the target company is normal or abnormal.
  • the server uses the trained model to label the feature data to be identified according to preset rules.
  • the preset rules are used to instruct the feature data to be identified to perform two-category labeling categories, where the two-category label category is used to distinguish the feature data to be identified as belonging to
  • the normal emission index data is still the abnormal emission index data.
  • the feature data to be identified includes index data that is more sensitive to abnormal data.
  • the server judges whether the target company has abnormal pollution discharge based on the marked feature data to be identified, obtains the prediction result, and sets the target label based on the prediction result. For example, the server marks the preset feature threshold A as a “mutation frequent” label, that is, the target label, and adds the target label to the target knowledge graph.
  • the server selects sample data and test data to be trained from a preset training sample set; the server uses the sample data to be trained to iteratively train a preset learning model to obtain a trained model.
  • the preset model includes random forest Model and neural network model; the server uses the test data to test the trained model to obtain the trained model.
  • the server randomly selects N sample subsets from the sample data to be trained to generate N decision trees; the server randomly selects m less than M variables at each node to obtain candidate variables for dividing the node, and each node The number of variables is the same.
  • M is a preset constant; the server generates a random forest model based on M decision trees, and performs secondary training on the generated random forest model to obtain the trained model. The secondary training is used to optimize the weight of each node of different decision trees .
  • the category of the terminal node is determined by the mode category corresponding to the node. For the sample data of the new category, the server uses all decision trees to classify it, and the category is generated by the majority principle.
  • the server obtains the discrimination basis data from the target knowledge map of the target company according to the feature data to be identified and the prediction result, and sends early warning information to the target company with abnormal pollutant discharge.
  • the warning information is used to instruct the target company to detect the target company according to the discrimination basis data.
  • the judgment basis data includes production standards, pollution discharge standards, and laws and regulations.
  • the server determines the unique identifier of the target company with abnormal pollution discharge according to the predicted result; the server determines the target knowledge graph according to the unique identifier of the target company; the server determines the target knowledge graph according to the unique identifier of the target company; The characteristic data and prediction results are read from the target knowledge map to determine the basis data, which includes production standards, pollution discharge standards, and laws and regulations; the server sends early warning information to target companies with abnormal pollution discharge, and the warning information is used to indicate the basis of the judgment The data is used to test the target company.
  • the on-site target inspector will conduct an on-site survey of the target company in accordance with production standards, pollution discharge standards, and laws and regulations, and obtain the survey results.
  • the survey results can be consistent with the predicted results. It can also be inconsistent. For example, if the predicted result is A company, but the survey result is determined to be not A company, then the survey result is inconsistent with the predicted result.
  • the server obtains the returned survey result, and compares the returned survey result with the predicted result; if the returned survey result is inconsistent with the recognition result, the server relabels the feature data to be recognized and sets it to a new one Sample data; the server performs iterative training on the trained model based on the new sample data; the server updates the target label based on the new sample data.
  • the new monitoring data is updated iteratively and the trained model is used to make the trained model more accurate in the prediction result of the monitoring data.
  • One embodiment of the object's device includes:
  • the extraction unit 301 is used to extract triples from preset data through natural language processing algorithms, and store the triples in a preset graph database to obtain a target knowledge graph.
  • the target knowledge graph is used to indicate the production standards of the target company, Pollutant discharge standards and laws and regulations based on clauses;
  • the monitoring unit 302 is used to monitor the pollutant discharge of the target enterprise within a preset time period to obtain pollutant discharge monitoring data;
  • the preprocessing unit 303 is configured to preprocess the pollution discharge monitoring data to obtain a standard time series data set
  • the extraction and fusion unit 304 is configured to perform feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified;
  • the prediction unit 305 is used to predict the feature data to be identified through the trained model to obtain the prediction result, and set the target label according to the prediction result, and add the target label to the target knowledge graph, and the prediction result is used to indicate the target company with abnormal pollutant discharge ;
  • the discrimination early warning unit 306 is used to obtain discrimination basis data from the target knowledge graph of the target company according to the feature data to be identified and the prediction result, and send warning information to the target company with abnormal pollutant discharge. Perform testing.
  • another embodiment of the device for locating pollutant discharge objects based on the knowledge graph in the embodiment of the present application includes:
  • the extraction unit 301 is used to extract triples from preset data through natural language processing algorithms, and store the triples in a preset graph database to obtain a target knowledge graph.
  • the target knowledge graph is used to indicate the production standards of the target company, Pollutant discharge standards and laws and regulations based on clauses;
  • the monitoring unit 302 is used to monitor the pollutant discharge of the target enterprise within a preset time period to obtain pollutant discharge monitoring data;
  • the preprocessing unit 303 is configured to preprocess the pollution discharge monitoring data to obtain a standard time series data set
  • the extraction and fusion unit 304 is configured to perform feature extraction and feature fusion on the standard time series data set to obtain feature data to be identified;
  • the prediction unit 305 is used to predict the feature data to be identified through the trained model to obtain the prediction result, and set the target label according to the prediction result, and add the target label to the target knowledge graph, and the prediction result is used to indicate the target company with abnormal pollutant discharge ;
  • the discrimination early warning unit 306 is used to obtain discrimination basis data from the target knowledge graph of the target company according to the feature data to be identified and the prediction result, and send warning information to the target company with abnormal pollutant discharge. Perform testing.
  • the extraction unit 301 may also be specifically used for:
  • the preset structured data includes environmental protection laws and regulations data, environmental protection industry standard data, and comprehensive sewage discharge standard data;
  • Knowledge extraction includes entity extraction, relationship extraction and attribute extraction;
  • Knowledge fusion includes ontology alignment, entity linking, and data fusion;
  • the target knowledge graph of the target company is generated, and the target knowledge graph is stored in the preset graph database.
  • the extraction and fusion unit 304 may further include:
  • the extraction subunit 3041 is used for feature extraction of the standard time series data set through a preset algorithm to obtain the first feature vector, and the standard time series data set includes stationary series data and non-stationary series data;
  • the fusion subunit 3042 is used to perform feature fusion on the first feature vector to obtain the second feature vector;
  • the screening subunit 3043 is configured to screen the first feature vector and the second feature vector according to a preset feature threshold to obtain feature data to be identified.
  • the extraction subunit 3041 may also be specifically used for:
  • the non-stationary sequence data is differentiated to obtain the stationary sequence data
  • the autoregressive moving average model is used to fit the stationary series data to obtain the model coefficients, and the model coefficients are set as the first feature vector.
  • the screening subunit 3043 may also be specifically used for:
  • the first feature vector and the second feature vector whose feature check value is greater than the preset feature threshold are screened to obtain feature data to be identified.
  • the device for locating sewage objects based on the knowledge map further includes:
  • the selecting unit 307 is configured to select sample data and test data to be trained from a preset training sample set
  • the first training unit 308 is used to iteratively train a preset model using sample data to be trained to obtain a trained model.
  • the preset model includes a random forest model and a neural network model;
  • the test unit 309 is used to test the trained model using test data to obtain a trained model.
  • the device for locating sewage objects based on the knowledge map further includes:
  • the judging unit 310 is used to obtain the returned survey result, and judge whether the returned survey result is consistent with the predicted result;
  • the labeling unit 31 if the returned survey result is inconsistent with the predicted result, it is used to relabel the feature data to be identified and set it as new sample data;
  • the second training unit 312 is used to iteratively train the trained model according to the new sample data
  • the updating unit 313 is used to update the target label according to the new sample data.
  • the device 500 for locating sewage objects based on a knowledge map may have relatively large differences due to different configurations or performances, and may include one or One or more processors (central processing units, CPU) 501 (for example, one or more processors) and a memory 509, one or more storage media 508 for storing application programs 507 or data 506 (for example, one or one storage device with a large amount of storage ).
  • processors central processing units, CPU
  • storage media 508 for storing application programs 507 or data 506 (for example, one or one storage device with a large amount of storage ).
  • the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
  • the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of command operations in the equipment for locating sewage objects based on the knowledge graph.
  • the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the device 500 for locating pollutant discharge objects based on the knowledge graph.
  • the device 500 for locating sewage objects based on the knowledge graph may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or, one or more operating systems 505 , Such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD and so on.
  • operating systems 505 Such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD and so on.
  • the device structure for locating sewage objects based on the knowledge map shown in FIG. 5 does not constitute a limitation on the equipment for locating sewage objects based on the knowledge map, and may include more or fewer components than shown in the figure. Or combine certain components, or different component arrangements.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • the prediction result is used to indicate the target company with abnormal discharge; according to the feature data to be identified and the prediction result, the criterion data is obtained from the target knowledge map of the target company, and the abnormal discharge is determined.
  • the target enterprise sends early warning information, and the early warning information is used to detect the target enterprise according to the discrimination basis data.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

Abstract

一种基于知识图谱定位排污对象的方法及相关设备,通过构建企业的知识图谱,提高对企业异常排污行为识别与监控的准确性,所述方法包括:通过自然语言处理算法对预置数据抽取并存储到预置图数据库中,得到目标知识图谱;在预置时长内对目标企业进行排污监测,得到排污监测数据;对排污监测数据进行预处理,得到标准时序数据集;对标准时序数据集进行特征提取和特征融合,得到待识别特征数据;通过训练好的模型对待识别特征数据进行预测,得到预测结果;根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息。

Description

基于知识图谱定位排污对象的方法及相关设备
本申请要求于2020年3月19日提交中国专利局、申请号为202010193960.X、发明名称为“基于知识图谱定位排污对象的方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及知识关系分析领域,尤其涉及基于知识图谱定位排污对象的方法及相关设备。
背景技术
排污企业监管一直是环境保护工作的重中之重,监管不到位,出现偷排、数据造假等异常排污行为直接影响地区的环境质量与人们的生活质量;但企业众多、监管人员有限,异常排污的识别具备需要大量人力且人员经验要求高,时效性要求强等特性,使得目前很难对排污企业进行有效监管。
目前已有产品只是对企业最终排放污染物进行监控,但是由于监测设备质量与运维情况良莠不齐,使得测量数据不准确,而且企业也存在弄虚作假等行为,使得粗放管理,发明人意识到只依靠末端排放监测的一刀切管理方式难以对企业异常排污行为进行有效识别与监管,导致定位排污对象准确率比较低。
发明内容
本申请的主要目的在于解决了现有测量设备存在测量数据不准确,以及企业存在异常排污行为导致定位排污对象准确率比较低的技术问题。
为实现上述目的,本申请第一方面提供了一种基于知识图谱定位排污对象的方法,包括:通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;在预置时长内对所述目标企业进行排污监测,得到排污监测数据;对所述排污监测数据进行预处理,得到标准时序数据集;对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于指示按照所述判别依据数据对所述目标企业进行检测。
本申请第二方面提供了一种基于知识图谱定位排污对象的设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企 业的生产标准、排污标准以及法律法规依据条款;在预置时长内对所述目标企业进行排污监测,得到排污监测数据;对所述排污监测数据进行预处理,得到标准时序数据集;对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
本申请的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;在预置时长内对所述目标企业进行排污监测,得到排污监测数据;对所述排污监测数据进行预处理,得到标准时序数据集;对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
本申请第四方面提供了一种基于知识图谱定位排污对象的装置,包括:抽取单元,用于通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;监测单元,用于在预置时长内对所述目标企业进行排污监测,得到排污监测数据;预处理单元,用于对所述排污监测数据进行预处理,得到标准时序数据集;提取融合单元,用于对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;预测单元,用于通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;判别预警单元,用于根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
本申请提供的技术方案中,通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;在预置时长内对所述目标企业进行排污监测,得到排污监测数据;对所述排污监测数据进行预处理,得到标准时序数据集;对所述 标准时序数据集进行特征提取和特征融合,得到待识别特征数据;通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于指示按照所述判别依据数据对所述目标企业进行检测。本申请实施例中,通过结合知识图谱和人工智能技术,实现企业异常排污智能识别,同时通过结论验证,循环改进识别算法,最终达到精准识别企业异常排污行为,高效监管排污企业,提升区域环境质量的目的。
附图说明
图1为本申请实施例中基于知识图谱定位排污对象的方法的一个实施例示意图;
图2为本申请实施例中基于知识图谱定位排污对象的方法的另一个实施例示意图;
图3为本申请实施例中基于知识图谱定位排污对象的装置的一个实施例示意图;
图4为本申请实施例中基于知识图谱定位排污对象的装置的另一个实施例示意图;
图5为本申请实施例中基于知识图谱定位排污对象的设备的一个实施例示意图。
具体实施方式
本申请实施例提供了一种基于知识图谱定位排污对象的方法及相关设备,用于通过结合知识图谱和人工智能技术,实现企业异常排污智能识别,同时通过结论验证,循环改进识别算法,最终达到精准识别企业异常排污行为,高效监管排污企业,提升区域环境质量的目的。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中基于知识图谱定位排污对象的方法的一个实施例包括:
101、通过自然语言处理算法对预置数据抽取三元组,并将三元组存储到预置图数据库中,得到目标知识图谱,目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
可以理解的是,本申请的执行主体可以为基于知识图谱定位排污对象的装置,还可以 是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。
服务器通过自然语言处理算法对预置数据抽取三元组,并将三元组存储到预置图数据库中,得到目标知识图谱,目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款,其中,预置数据包括目标企业的原始信息、环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据。其中,知识图谱是一种揭示实体之间关系的语义网络,可以对现实世界的事物及其相互关系进行形式化描述。三元组是知识图谱的一种通用表示方式,即G=(E,R,S),其中E=e1,e2,…,e|E|是知识库中的实体集合,共包含|E|种不同实体,R=r1,r2,…,r|E|是知识库中的关系集合,共包含|R|种不同关系,
Figure PCTCN2020104753-appb-000001
代表目标知识图谱中三元组的集合。
可以理解的是,构建重点行业的目标知识图谱主要是通过分析重点行业的特征,构建本行业内的生产标准、排污标准以及法律法规标准。
102、在预置时长内对目标企业进行排污监测,得到排污监测数据;
服务器在预置时长内对目标企业进行监测,得到排污监测数据,其中,排污监测数据具有时序性,也就是同一现象在不同时刻上的连续监测值排列而成的一组数字序列,数字序列具有规律性。预置时长为预设一段时长,比如15天。进一步地,服务器通过预置设备在预置时长内采集目标企业的排污监测数据。
103、对排污监测数据进行预处理,得到标准时序数据集;
服务器对排污监测数据进行预处理,得到标准时序数据集。具体的,服务器对排污监测数据填补空缺值;服务器对填补后的排污监测数据进行平滑处理,平滑处理主要是用于处理排污监测数据中的随机错误或偏差数据;对平滑处理后的排污监测数据删除孤立数据,得到标准时序数据集,其中,孤立数据为异常数据。
104、对标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
服务器对标准时序数据集进行特征提取和特征融合,得到待识别特征数据。其中,对标准时序数据集在时刻轴上进行采样也称为提取特征,其对应的采样值为特征值,特征提取是在分类前对数据时刻采样值上进行归约,减少数据量同时提高分类准确性。特征融合是指将多个特征相互融合成一个特征。
具体的,服务器通过预置算法对标准时序数据集进行特征提取,得到第一特征矢量;服务器对第一特征矢量进行特征融合,得到第二特征矢量;服务器根据预置特征阈值对第一特征矢量和第二特征矢量进行筛选,得到待识别特征数据。
105、通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,将目标标签添加到目标知识图谱中,预测结果用于指示排污异常的目标企业;
服务器通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,服务器将目标标签添加到目标知识图谱中,预测结果用于指示排污异常 的目标企业,可以理解的是,根据训练好的模型自动提取待识别特征数据的特征,并根据特征计算对应的权重,根据特征和对应的权重计算得到预测结果,其中,预测结果是一个基于二分类的结果数据,通过训练好的模型判别目标企业属于正常排污还是属于异常排污。
106、根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于人员按照判别依据数据对目标企业进行检测。
服务器根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于指示目标监察人员按照判别依据数据对目标企业进行检测。其中,判别依据数据包括生产标准、排污标准以及法律法规依据条款,具体的,服务器根据预测结果确定排污异常的目标企业的唯一标识;服务器根据目标企业的唯一标识确定目标知识图谱;服务器根据待识别特征数据和预测结果从目标知识图谱中读取判别依据数据,判别依据数据包括生产标准、排污标准以及法律法规依据条款;服务器对排污异常的目标企业发送预警信息,预警信息用于指示按照判别依据数据对目标企业进行检测。
可以理解的是,服务器对目标企业发送预警信息后,现场的目标监察人员会依据生产标准、排污标准以及法律法规依据条款对目标企业进行现场勘查,得到勘查结果,勘查结果与预测结果可以一致,也可以不一致。
本申请实施例中,通过结合知识图谱和人工智能技术,实现企业异常排污智能识别,同时通过结论验证,循环改进识别算法,最终达到精准识别企业异常排污行为,高效监管排污企业,提升区域环境质量的目的。
请参阅图2,本申请实施例中基于知识图谱定位排污对象的方法的另一个实施例包括:
201、获取预置结构化数据,并对预置结构化数据进行数据整合,得到第一数据,预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据;
服务器获取预置结构化数据,并对预置结构化数据进行数据整合,得到第一数据,预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据。具体的,服务器定期从预置网页中采集环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据,其中预置网页包括环保部门网页;服务器将环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据设置为预置结构化数据;服务器对预置结构化数据进行数据整合,得到第一数据。
202、获取目标企业的唯一标识,并根据目标企业的唯一标识读取目标企业的原始信息;
服务器获取目标企业的唯一标识,并根据目标企业的唯一标识读取目标企业的原始信息,该原始信息包括基本信息、排口信息、生产信息、设施信息,监测信息、监管信息以及运行台账。其中,基本信息包括排污许可证、监测因子、排放标准以及排放量,排口信息包括废水排口和废气排口,生产信息包括产品、产能、原料、辅料和燃料,设施信息包 括生产设施、废水治理设施和废气治理设施,监测信息包括实时监测数据以及历史监测数据,监管信息包括监察执法信息、信访投诉信息和行政处罚信息,运行台账包括生产设施台账和治理设施台账。
203、通过自然语言处理算法对目标企业的原始信息进行知识抽取,得到第二数据,知识抽取包括实体抽取、关系抽取和属性抽取;
服务器通过自然语言处理算法对目标企业的原始信息进行知识抽取,得到第二数据,知识抽取包括实体抽取、关系抽取和属性抽取。其中第二数据采用三元组进行标识,三元组包括(实体1,关系,实体2)或者(实体、属性,属性值)。自然语言处理算法(neuro-linguistic programming,NLP)包括命名实体识别、句法依存以及实体关系识别。
需要说明的是,三元组包括实体1、关系、实体2、概念、属性、属性值等,其中,实体是知识图谱中的基本元素,不同的实体间存在不同的关系;概念是指集合、类别、对象类型、事物的种类,例如人物或者地理;属性是指对象可能具有的属性、特征、特性、特点以及参数,例如国籍或者生日;属性值是指对象指定属性的值,例如中国。每个实体采用一个全局唯一标识表示,每个属性与属性值对用来表示实体的内在特性,而关系用来连接两个实体并表示属性与关系之间的关联。
204、对第一数据和第二数据进行知识融合,知识融合包括本体对齐、实体链接以及数据融合;
服务器对第一数据和第二数据进行知识融合,知识融合包括本体对齐、实体链接以及数据融合。由于知识图谱中的知识来源广泛,存在知识质量良莠不齐、来自不同数据源的知识重复、知识间的关联不够明确等问题,所以必须要进行知识的融合。知识融合是高层次的知识组织,使来自不同知识源的知识在同一框架规范下进行异构数据整合、消歧、加工、推理验证、更新步骤,达到数据、信息、方法、经验以及人的思想的融合,形成高质量的知识库。
205、通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到目标企业的标签数据,标签数据采用三元组形式表示;
服务器通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到目标企业的标签数据,标签数据采用三元组形式表示。进一步地,服务器将知识融合后的数据输入到预设的企业画像标签模型中,服务器通过预设的企业画像标签模型中的元素对知识融合后的数据进行匹配处理,得到知识融合后的数据的对应分类;服务器根据知识融合后的数据的对应分类确定目标企业的标签数据,标签数据采用三元组形式表示。其中,一个标签数据一般由一个三元组的集合表示,其中,三元组(a,b,c)表示目标企业a给排污行为b打上了c标签。
206、根据目标企业的标签数据生成目标企业的目标知识图谱,并将目标知识图谱存储到预置图数据库中;
服务器根据目标企业的标签数据生成目标企业的目标知识图谱,并将目标知识图谱存储到预置图数据库中。可以理解的是,构建重点行业的知识图谱主要是通过分析重点行业的特征,构建本行业内的生产标准,排污标准,以及法律法规标准。
207、在预置时长内对目标企业进行排污监测,得到排污监测数据;
服务器在预置时长内对目标企业进行监测,得到排污监测数据,其中,排污监测数据具有时序性,也就是同一现象在不同时刻上的连续监测值排列而成的一组数字序列,数字序列具有规律性。预置时长为预设一段时长,比如7天。进一步地,服务器通过预置设备在预置时长内采集目标企业的排污监测数据。
208、对排污监测数据进行预处理,得到标准时序数据集;
服务器对排污监测数据进行预处理,得到标准时序数据集。具体的,服务器对排污监测数据填补空缺值;服务器对填补后的排污监测数据进行平滑处理,平滑处理主要是用于处理排污监测数据中的随机错误或偏差数据;对平滑处理后的排污监测数据删除孤立数据,得到标准时序数据集,其中,孤立数据为异常数据。
209、对标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
服务器对标准时序数据集进行特征提取和特征融合,得到待识别特征数据。其中,特征提取就是将已有特征生成一个较低维数的特征空间,将原始特征中的相关信息映射到少数几个特征上,并将不相关信息丢弃。
具体的,首先,服务器根据预置算法对标准时序数据集进行特征提取,得到第一特征矢量,标准时序数据集包括平稳序列数据和非平稳序列数据。其中,预置算法包括统计特征提取算法、神经网络特征提取算法以及变换特征提取算法。可选的,当检测到标准时序数据集中存在非平稳序列数据时,服务器对非平稳序列数据进行差分运算,也就是差分预处理,得到平稳序列数据;服务器采用自回归滑动平均模型对平稳序列数据进行拟合,得到模型系数,将模型系数设置为第一特征矢量。
其次,服务器对第一特征矢量进行特征融合,得到第二特征矢量,进一步地,服务器根据预置特征融合算法将两个或两个以上的第一特征矢量组合成为第二特征矢量,其中,预置特征融合算法包括基于贝叶斯理论的特征融合算法。可以理解的是,融合多个第一特征矢量通常比第一特征矢量具有更好的分类性能,同时融合的多种第一特征矢量之间相关性较小。
最后,服务器根据预置阈值对第一特征矢量和第二特征矢量进行筛选,得到待识别特征数据。进一步地,服务器设置预置特征阈值;服务器选择卡方检验算法对第一特征矢量和第二特征矢量进行计算,得到特征校验值;服务器对特征校验值大于预置特征阈值的第一特征矢量和第二特征矢量进行筛选,得到待识别特征数据。例如,服务器为斜率均值设置预置特征阈值A,服务器将斜率均值大于预置特征阈值A的第一特征矢量和第二特征矢量设置为待识别特征数据。
210、通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,将目标标签添加到目标知识图谱中,预测结果用于指示排污异常的目标企业;
服务器通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,服务器将目标标签添加到目标知识图谱中,预测结果用于指示排污异常的目标企业,可以理解的是,根据训练好的模型自动提取待识别特征数据的特征,并根据特征计算对应的权重,根据特征和对应的权重计算得到预测结果,其中,预测结果是一个基于二分类的结果数据,通过训练好的模型判别目标企业属于正常排污还是属于异常排污。
具体的,服务器通过训练好的模型对待识别特征数据按照预置规则进行标注,预置规则用于指示对待识别特征数据进行二分类标注类别,其中,二分类标注类别用于区分待识别特征数据属于正常排放指标数据,还是属于异常排放指标数据。待识别特征数据包括对异常数据比较敏感的指标数据。进一步地,服务器基于标注的待识别特征数据判断目标企业是否排污异常,得到预测结果,并基于预测结果设置目标标签。例如,服务器标注预置特征阈值A为“突变频繁”标签,也就是目标标签,并将目标标签添加到目标知识图谱中。
可选的,服务器从预置训练样本集中选取待训练的样本数据和测试数据;服务器采用待训练的样本数据对预设的学习模型进行迭代训练,得到已训练的模型,预置模型包括随机森林模型和神经网络模型;服务器采用测试数据对已训练的模型进行测试,得到训练好的模型。
进一步地,服务器从待训练的样本数据中随机抽取N个样本子集,生成N个决策树;服务器在每一个节点随机抽取m小于M个变量,得到分割该节点的候选变量,每一个节点处的变量数量相同。M为预置常量;服务器根据M个决策树生成随机森林模型,并对生成的随机森林模型进行二次训练,得到已训练的模型,二次训练用于优化不同的决策树每个节点的权重。其中,终端节点的所属类别由节点对应的众数类别决定,对于新类别的样本数据,服务器采用所有的决策树对其分类,其类别由多数原则生成。
211、根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于按照判别依据数据对目标企业进行检测。
服务器根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于指示按照判别依据数据对目标企业进行检测。其中,判别依据数据包括生产标准、排污标准以及法律法规依据条款,具体的,服务器根据预测结果确定排污异常的目标企业的唯一标识;服务器根据目标企业的唯一标识确定目标知识图谱;服务器根据待识别特征数据和预测结果从目标知识图谱中读取判别依据数据,判别依据数据包括生产标准、排污标准以及法律法规依据条款;服务器对排污异常的目标企业发送预警信息,预警信息用于指示按照判别依据数据对目标企业进 行检测。
需要说明的是,服务器对目标企业发送预警信息后,现场的目标监察人员会依据生产标准、排污标准以及法律法规依据条款对目标企业进行现场勘查,得到勘查结果,勘查结果与预测结果可以一致,也可以不一致。例如,预测结果为A企业,但是勘察结果确定不是A企业,那么勘查结果与预测结果不一致。
可选的,服务器获取返回的勘查结果,并将返回的勘查结果与预测结果进行比较;若返回的勘查结果与识别理结果不一致时,服务器将待识别特征数据进行重新标注,并设置为新的样本数据;服务器根据新的样本数据对训练好的模型进行迭代训练;服务器根据新的样本数据更新目标标签。
可以理解的是,当勘查结果与预测结果不一致时,将新的监测数据更新迭代训练好的模型,使得训练好的模型对监测数据的预测结果更准确。
本申请实施例中,通过结合知识图谱和人工智能技术,实现企业异常排污智能识别,同时通过结论验证,循环改进识别算法,最终达到精准识别企业异常排污行为,高效监管排污企业,提升区域环境质量的目的。
上面对本申请实施例中基于知识图谱定位排污对象的方法进行了描述,下面对本申请实施例中基于知识图谱定位排污对象的装置进行描述,请参阅图3,本申请实施例中基于知识图谱定位排污对象的装置的一个实施例包括:
抽取单元301,用于通过自然语言处理算法对预置数据抽取三元组,并将三元组存储到预置图数据库中,得到目标知识图谱,目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
监测单元302,用于在预置时长内对目标企业进行排污监测,得到排污监测数据;
预处理单元303,用于对排污监测数据进行预处理,得到标准时序数据集;
提取融合单元304,用于对标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
预测单元305,用于通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,将目标标签添加到目标知识图谱中,预测结果用于指示排污异常的目标企业;
判别预警单元306,用于根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于按照判别依据数据对目标企业进行检测。
请参阅图4,本申请实施例中基于知识图谱定位排污对象的装置的另一个实施例包括:
抽取单元301,用于通过自然语言处理算法对预置数据抽取三元组,并将三元组存储到预置图数据库中,得到目标知识图谱,目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
监测单元302,用于在预置时长内对目标企业进行排污监测,得到排污监测数据;
预处理单元303,用于对排污监测数据进行预处理,得到标准时序数据集;
提取融合单元304,用于对标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
预测单元305,用于通过训练好的模型对待识别特征数据进行预测,得到预测结果,并根据预测结果设置目标标签,将目标标签添加到目标知识图谱中,预测结果用于指示排污异常的目标企业;
判别预警单元306,用于根据待识别特征数据和预测结果从目标企业的目标知识图谱中获取判别依据数据,并对排污异常的目标企业发送预警信息,预警信息用于按照判别依据数据对目标企业进行检测。
可选的,抽取单元301还可以具体用于:
获取预置结构化数据,并对预置结构化数据进行数据整合,得到第一数据,预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据;
获取目标企业的唯一标识,并根据目标企业的唯一标识读取目标企业的原始信息;
通过自然语言处理算法对目标企业的原始信息进行知识抽取,得到第二数据,知识抽取包括实体抽取、关系抽取和属性抽取;
对第一数据和第二数据进行知识融合,知识融合包括本体对齐、实体链接以及数据融合;
通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到目标企业的标签数据,标签数据采用三元组形式表示;
根据目标企业的标签数据生成目标企业的目标知识图谱,并将目标知识图谱存储到预置图数据库中。
可选的,提取融合单元304还可以进一步包括:
提取子单元3041,用于通过预置算法对标准时序数据集进行特征提取,得到第一特征矢量,标准时序数据集包括平稳序列数据和非平稳序列数据;
融合子单元3042,用于对第一特征矢量进行特征融合,得到第二特征矢量;
筛选子单元3043,用于根据预置特征阈值对第一特征矢量和第二特征矢量进行筛选,得到待识别特征数据。
可选的,提取子单元3041还可以具体用于:
当检测到标准时序数据集中存在非平稳序列数据时,对非平稳序列数据进行差分运算,得到平稳序列数据;
采用自回归滑动平均模型对平稳序列数据进行拟合,得到模型系数,将模型系数设置为第一特征矢量。
可选的,筛选子单元3043还可以具体用于:
通过卡方检验算法对第一特征矢量和第二特征矢量进行计算,得到特征校验值;
对特征校验值大于预置特征阈值的第一特征矢量和第二特征矢量进行筛选,得到待识别特征数据。
可选的,基于知识图谱定位排污对象的装置还包括:
选取单元307,用于从预置训练样本集中选取待训练的样本数据和测试数据;
第一训练单元308,用于采用待训练的样本数据对预置模型进行迭代训练,得到已训练的模型,预置模型包括随机森林模型和神经网络模型;
测试单元309,用于采用测试数据对已训练的模型进行测试,得到训练好的模型。
可选的,基于知识图谱定位排污对象的装置还包括:
判断单元310,用于获取返回的勘查结果,并判断返回的勘查结果与预测结果是否一致;
标注单元311,若返回的勘查结果与预测结果不一致,则用于将待识别特征数据进行重新标注,并设置为新的样本数据;
第二训练单元312,用于根据新的样本数据对训练好的模型进行迭代训练;
更新单元313,用于根据新的样本数据更新目标标签。
上面图3和图4从模块化功能实体的角度对本申请实施例中的基于知识图谱定位排污对象的装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于知识图谱定位排污对象的设备进行详细描述。
图5是本申请实施例提供的一种基于知识图谱定位排污对象的设备的结构示意图,该基于知识图谱定位排污对象的设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)501(例如,一个或一个以上处理器)和存储器509,一个或一个以上存储应用程序507或数据506的存储介质508(例如一个或一个以上海量存储设备)。其中,存储器509和存储介质508可以是短暂存储或持久存储。存储在存储介质508的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于知识图谱定位排污对象的设备中的一系列指令操作。更进一步地,处理器501可以设置为与存储介质508通信,在基于知识图谱定位排污对象的设备500上执行存储介质508中的一系列指令操作。
基于知识图谱定位排污对象的设备500还可以包括一个或一个以上电源502,一个或一个以上有线或无线网络接口503,一个或一个以上输入输出接口504,和/或,一个或一个以上操作系统505,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5中示出的基于知识图谱定位排污对象的设备结构并不构成对基于知识图谱定位排污对象的设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算 机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;在预置时长内对所述目标企业进行排污监测,得到排污监测数据;对所述排污监测数据进行预处理,得到标准时序数据集;对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种基于知识图谱定位排污对象的方法,其中,所述基于知识图谱定位排污对象的方法包括:
    通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
    在预置时长内对所述目标企业进行排污监测,得到排污监测数据;
    对所述排污监测数据进行预处理,得到标准时序数据集;
    对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
    通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;
    根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
  2. 根据权利要求1所述的基于知识图谱定位排污对象的方法,其中,所述通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款,包括:
    获取预置结构化数据,并对所述预置结构化数据进行数据整合,得到第一数据,所述预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据;
    获取目标企业的唯一标识,并根据所述目标企业的唯一标识读取所述目标企业的原始信息;
    通过自然语言处理算法对所述目标企业的原始信息进行知识抽取,得到第二数据,所述知识抽取包括实体抽取、关系抽取和属性抽取;
    对所述第一数据和所述第二数据进行知识融合,所述知识融合包括本体对齐、实体链接以及数据融合;
    通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到所述目标企业的标签数据,所述标签数据采用三元组形式表示;
    根据所述目标企业的标签数据生成所述目标企业的目标知识图谱,并将所述目标知识图谱存储到预置图数据库中。
  3. 根据权利要求1所述的基于知识图谱定位排污对象的方法,其中,所述对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据,包括:
    通过预置算法对所述标准时序数据集进行特征提取,得到第一特征矢量,所述标准时序数据集包括平稳序列数据和非平稳序列数据;
    对所述第一特征矢量进行特征融合,得到第二特征矢量;
    根据预置特征阈值对所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  4. 根据权利要求3所述的基于知识图谱定位排污对象的方法,其中,所述通过预置算法对所述标准时序数据集进行特征提取,得到第一特征矢量,所述标准时序数据集包括平稳序列数据和非平稳序列数据,包括:
    当检测到所述标准时序数据集中存在非平稳序列数据时,对所述非平稳序列数据进行差分运算,得到平稳序列数据;
    采用自回归滑动平均模型对所述平稳序列数据进行拟合,得到模型系数,将模型系数设置为第一特征矢量。
  5. 根据权利要求3所述的基于知识图谱定位排污对象的方法,其中,所述根据预置特征阈值对所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据,包括:
    通过卡方检验算法对所述第一特征矢量和所述第二特征矢量进行计算,得到特征校验值;
    对所述特征校验值大于所述预置特征阈值的所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  6. 根据权利要求1所述的基于知识图谱定位排污对象的方法,其中,所述通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款之前,所述基于知识图谱定位排污对象的方法还包括:
    从预置训练样本集中选取待训练的样本数据和测试数据;
    采用所述待训练的样本数据对预置模型进行迭代训练,得到已训练的模型,所述预置模型包括随机森林模型和神经网络模型;
    采用所述测试数据对所述已训练的模型进行测试,得到训练好的模型。
  7. 根据权利要求1-6中任一项所述的基于知识图谱定位排污对象的方法,其中,所述根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测之后,所述基于知识图谱定位排污对象的方法还包括:
    获取返回的勘查结果,并判断所述返回的勘查结果与所述预测结果是否一致;
    若所述返回的勘查结果与所述预测结果不一致,则将所述待识别特征数据进行重新标注,并设置为新的样本数据;
    根据所述新的样本数据对所述训练好的模型进行迭代训练;
    根据所述新的样本数据更新所述目标标签。
  8. 一种基于知识图谱定位排污对象的设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
    在预置时长内对所述目标企业进行排污监测,得到排污监测数据;
    对所述排污监测数据进行预处理,得到标准时序数据集;
    对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
    通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;
    根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
  9. 根据权利要求8所述的基于知识图谱定位排污对象的设备,其中,所述处理器执行所述计算机可读指令实现所述通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款时,包括以下步骤:
    获取预置结构化数据,并对所述预置结构化数据进行数据整合,得到第一数据,所述预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据;
    获取目标企业的唯一标识,并根据所述目标企业的唯一标识读取所述目标企业的原始信息;
    通过自然语言处理算法对所述目标企业的原始信息进行知识抽取,得到第二数据,所述知识抽取包括实体抽取、关系抽取和属性抽取;
    对所述第一数据和所述第二数据进行知识融合,所述知识融合包括本体对齐、实体链接以及数据融合;
    通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到所述目标企业的标签数据,所述标签数据采用三元组形式表示;
    根据所述目标企业的标签数据生成所述目标企业的目标知识图谱,并将所述目标知识图谱存储到预置图数据库中。
  10. 根据权利要求8所述的基于知识图谱定位排污对象的设备,其中,所述处理器执 行所述计算机可读指令实现所述对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据时,包括以下步骤:
    通过预置算法对所述标准时序数据集进行特征提取,得到第一特征矢量,所述标准时序数据集包括平稳序列数据和非平稳序列数据;
    对所述第一特征矢量进行特征融合,得到第二特征矢量;
    根据预置特征阈值对所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  11. 根据权利要求10所述的基于知识图谱定位排污对象的设备,其中,所述处理器执行所述计算机可读指令实现所述通过预置算法对所述标准时序数据集进行特征提取,得到第一特征矢量,所述标准时序数据集包括平稳序列数据和非平稳序列数据时,包括以下步骤:
    当检测到所述标准时序数据集中存在非平稳序列数据时,对所述非平稳序列数据进行差分运算,得到平稳序列数据;
    采用自回归滑动平均模型对所述平稳序列数据进行拟合,得到模型系数,将模型系数设置为第一特征矢量。
  12. 根据权利要求10所述的基于知识图谱定位排污对象的设备,其中,所述处理器执行所述计算机可读指令实现所述根据预置特征阈值对所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据时,包括以下步骤:
    通过卡方检验算法对所述第一特征矢量和所述第二特征矢量进行计算,得到特征校验值;
    对所述特征校验值大于所述预置特征阈值的所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  13. 根据权利要求8所述的基于知识图谱定位排污对象的设备,其中,所述处理器执行所述计算机可读指令实现所述通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款之前时,还包括以下步骤:
    从预置训练样本集中选取待训练的样本数据和测试数据;
    采用所述待训练的样本数据对预置模型进行迭代训练,得到已训练的模型,所述预置模型包括随机森林模型和神经网络模型;
    采用所述测试数据对所述已训练的模型进行测试,得到训练好的模型。
  14. 根据权利要求8-13中任一项所述的基于知识图谱定位排污对象的设备,所述处理器执行所述计算机可读指令实现所述根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测之后时,包括以下步骤:
    获取返回的勘查结果,并判断所述返回的勘查结果与所述预测结果是否一致;
    若所述返回的勘查结果与所述预测结果不一致,则将所述待识别特征数据进行重新标注,并设置为新的样本数据;
    根据所述新的样本数据对所述训练好的模型进行迭代训练;
    根据所述新的样本数据更新所述目标标签。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
    通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
    在预置时长内对所述目标企业进行排污监测,得到排污监测数据;
    对所述排污监测数据进行预处理,得到标准时序数据集;
    对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
    通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;
    根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
  16. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    获取预置结构化数据,并对所述预置结构化数据进行数据整合,得到第一数据,所述预置结构化数据包括环保法律法规数据、环境保护行业标准数据以及污水综合排放标准数据;
    获取目标企业的唯一标识,并根据所述目标企业的唯一标识读取所述目标企业的原始信息;
    通过自然语言处理算法对所述目标企业的原始信息进行知识抽取,得到第二数据,所述知识抽取包括实体抽取、关系抽取和属性抽取;
    对所述第一数据和所述第二数据进行知识融合,所述知识融合包括本体对齐、实体链接以及数据融合;
    通过预设的企业画像标签模型对知识融合后的数据进行匹配处理,得到所述目标企业的标签数据,所述标签数据采用三元组形式表示;
    根据所述目标企业的标签数据生成所述目标企业的目标知识图谱,并将所述目标知识图谱存储到预置图数据库中。
  17. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    通过预置算法对所述标准时序数据集进行特征提取,得到第一特征矢量,所述标准时序数据集包括平稳序列数据和非平稳序列数据;
    对所述第一特征矢量进行特征融合,得到第二特征矢量;
    根据预置特征阈值对所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  18. 根据权利要求17所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    当检测到所述标准时序数据集中存在非平稳序列数据时,对所述非平稳序列数据进行差分运算,得到平稳序列数据;
    采用自回归滑动平均模型对所述平稳序列数据进行拟合,得到模型系数,将模型系数设置为第一特征矢量。
  19. 根据权利要求17所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:
    通过卡方检验算法对所述第一特征矢量和所述第二特征矢量进行计算,得到特征校验值;
    对所述特征校验值大于所述预置特征阈值的所述第一特征矢量和所述第二特征矢量进行筛选,得到待识别特征数据。
  20. 一种基于知识图谱定位排污对象的装置,其中,所述基于知识图谱定位排污对象的装置包括:
    抽取单元,用于通过自然语言处理算法对预置数据抽取三元组,并将所述三元组存储到预置图数据库中,得到目标知识图谱,所述目标知识图谱用于指示目标企业的生产标准、排污标准以及法律法规依据条款;
    监测单元,用于在预置时长内对所述目标企业进行排污监测,得到排污监测数据;
    预处理单元,用于对所述排污监测数据进行预处理,得到标准时序数据集;
    提取融合单元,用于对所述标准时序数据集进行特征提取和特征融合,得到待识别特征数据;
    预测单元,用于通过训练好的模型对所述待识别特征数据进行预测,得到预测结果,并根据所述预测结果设置目标标签,将所述目标标签添加到所述目标知识图谱中,所述预测结果用于指示排污异常的目标企业;
    判别预警单元,用于根据所述待识别特征数据和所述预测结果从所述目标企业的目标知识图谱中获取判别依据数据,并对所述排污异常的目标企业发送预警信息,所述预警信息用于按照所述判别依据数据对所述目标企业进行检测。
PCT/CN2020/104753 2020-03-19 2020-07-27 基于知识图谱定位排污对象的方法及相关设备 WO2021184630A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010193960.X 2020-03-19
CN202010193960.XA CN111460167A (zh) 2020-03-19 2020-03-19 基于知识图谱定位排污对象的方法及相关设备

Publications (1)

Publication Number Publication Date
WO2021184630A1 true WO2021184630A1 (zh) 2021-09-23

Family

ID=71682902

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104753 WO2021184630A1 (zh) 2020-03-19 2020-07-27 基于知识图谱定位排污对象的方法及相关设备

Country Status (2)

Country Link
CN (1) CN111460167A (zh)
WO (1) WO2021184630A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806370A (zh) * 2021-09-27 2021-12-17 平安国际智慧城市科技股份有限公司 基于大数据的环境数据监管方法、装置、设备及存储介质
CN114724078A (zh) * 2022-03-28 2022-07-08 西南交通大学 基于目标检测网络与知识推理的人员行为意图识别方法
CN114841601A (zh) * 2022-05-24 2022-08-02 保定金迪地下管线探测工程有限公司 一种排口水污染动态溯源分析方法及系统
CN114925833A (zh) * 2022-04-20 2022-08-19 中国人民解放军91977部队 一种基于能力数据底图的目标状态规律知识挖掘方法
CN114969018A (zh) * 2022-08-01 2022-08-30 太极计算机股份有限公司 一种数据监控方法及系统
CN115792919A (zh) * 2023-01-19 2023-03-14 合肥中科光博量子科技有限公司 一种气溶胶激光雷达水平扫描监测污染热点区域识别方法
CN116166813A (zh) * 2022-12-15 2023-05-26 深圳银兴智能数据有限公司 大数据自动化运维的管理方法、系统、设备及存储介质
CN116882494A (zh) * 2023-09-07 2023-10-13 山东山大鸥玛软件股份有限公司 面向专业文本的无监督知识图构建方法和装置
CN117037073A (zh) * 2023-09-12 2023-11-10 天津君萌科技有限公司 基于人工智能可视化的对象定位方法及可视化监控系统
CN117076991A (zh) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 治污设备用电异常监测方法、装置及计算机设备
CN117312578A (zh) * 2023-11-28 2023-12-29 烟台云朵软件有限公司 一种非遗传承图谱的构建方法与系统
CN117421611A (zh) * 2023-12-19 2024-01-19 河北金隅鼎鑫水泥有限公司 一种水泥制造厂的废气成分过滤方法及系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131275B (zh) * 2020-09-23 2023-07-25 长三角信息智能创新研究院 全息城市大数据模型和知识图谱的企业画像构建方法
CN112344990A (zh) * 2020-10-21 2021-02-09 平安国际智慧城市科技股份有限公司 环境异常监测方法、装置、设备及存储介质
CN112528040B (zh) * 2020-12-16 2024-03-19 平安科技(深圳)有限公司 基于知识图谱的引导教唆语料的检测方法及其相关设备
CN113449866B (zh) * 2021-06-28 2024-03-29 华东理工大学 燃料乙醇发酵过程工业知识图谱构建方法
CN113655111A (zh) * 2021-08-17 2021-11-16 北京雪迪龙科技股份有限公司 基于走航监测的大气挥发性有机物溯源方法
CN116360387B (zh) * 2023-01-18 2023-09-15 北京控制工程研究所 融合贝叶斯网络和性能-故障关系图谱的故障定位方法
CN116384158B (zh) * 2023-05-26 2023-08-18 广东合诚环境工程有限公司 基于大数据的污水处理设备运行监测方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180053110A1 (en) * 2016-08-22 2018-02-22 The Catholic University Of Korea Industry-Academic Cooperation Foundation Method of predicting crime occurrence in prediction target region using big data
CN107945024A (zh) * 2017-12-12 2018-04-20 厦门市美亚柏科信息股份有限公司 识别互联网金融借贷企业经营异常的方法、终端设备及存储介质
CN109145123A (zh) * 2018-09-30 2019-01-04 国信优易数据有限公司 知识图谱模型的构建方法、智能交互方法、系统及电子设备
CN110277167A (zh) * 2019-05-31 2019-09-24 南京邮电大学 基于知识图谱的慢性非传染性疾病风险预测系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180053110A1 (en) * 2016-08-22 2018-02-22 The Catholic University Of Korea Industry-Academic Cooperation Foundation Method of predicting crime occurrence in prediction target region using big data
CN107945024A (zh) * 2017-12-12 2018-04-20 厦门市美亚柏科信息股份有限公司 识别互联网金融借贷企业经营异常的方法、终端设备及存储介质
CN109145123A (zh) * 2018-09-30 2019-01-04 国信优易数据有限公司 知识图谱模型的构建方法、智能交互方法、系统及电子设备
CN110277167A (zh) * 2019-05-31 2019-09-24 南京邮电大学 基于知识图谱的慢性非传染性疾病风险预测系统

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806370A (zh) * 2021-09-27 2021-12-17 平安国际智慧城市科技股份有限公司 基于大数据的环境数据监管方法、装置、设备及存储介质
CN114724078A (zh) * 2022-03-28 2022-07-08 西南交通大学 基于目标检测网络与知识推理的人员行为意图识别方法
CN114925833A (zh) * 2022-04-20 2022-08-19 中国人民解放军91977部队 一种基于能力数据底图的目标状态规律知识挖掘方法
CN114925833B (zh) * 2022-04-20 2023-07-21 中国人民解放军91977部队 一种基于能力数据底图的目标状态规律知识挖掘方法
CN114841601A (zh) * 2022-05-24 2022-08-02 保定金迪地下管线探测工程有限公司 一种排口水污染动态溯源分析方法及系统
CN114969018A (zh) * 2022-08-01 2022-08-30 太极计算机股份有限公司 一种数据监控方法及系统
CN114969018B (zh) * 2022-08-01 2022-11-08 太极计算机股份有限公司 一种数据监控方法及系统
CN116166813A (zh) * 2022-12-15 2023-05-26 深圳银兴智能数据有限公司 大数据自动化运维的管理方法、系统、设备及存储介质
CN115792919B (zh) * 2023-01-19 2023-05-16 合肥中科光博量子科技有限公司 一种气溶胶激光雷达水平扫描监测污染热点区域识别方法
CN115792919A (zh) * 2023-01-19 2023-03-14 合肥中科光博量子科技有限公司 一种气溶胶激光雷达水平扫描监测污染热点区域识别方法
CN116882494A (zh) * 2023-09-07 2023-10-13 山东山大鸥玛软件股份有限公司 面向专业文本的无监督知识图构建方法和装置
CN116882494B (zh) * 2023-09-07 2023-11-28 山东山大鸥玛软件股份有限公司 面向专业文本的无监督知识图构建方法和装置
CN117037073A (zh) * 2023-09-12 2023-11-10 天津君萌科技有限公司 基于人工智能可视化的对象定位方法及可视化监控系统
CN117076991A (zh) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 治污设备用电异常监测方法、装置及计算机设备
CN117076991B (zh) * 2023-10-16 2024-01-02 云境商务智能研究院南京有限公司 治污设备用电异常监测方法、装置及计算机设备
CN117312578A (zh) * 2023-11-28 2023-12-29 烟台云朵软件有限公司 一种非遗传承图谱的构建方法与系统
CN117312578B (zh) * 2023-11-28 2024-02-23 烟台云朵软件有限公司 一种非遗传承图谱的构建方法与系统
CN117421611A (zh) * 2023-12-19 2024-01-19 河北金隅鼎鑫水泥有限公司 一种水泥制造厂的废气成分过滤方法及系统

Also Published As

Publication number Publication date
CN111460167A (zh) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2021184630A1 (zh) 基于知识图谱定位排污对象的方法及相关设备
CN106778259B (zh) 一种基于大数据机器学习的异常行为发现方法及系统
CN107528832B (zh) 一种面向系统日志的基线构建与未知异常行为检测方法
CN102098180B (zh) 一种网络安全态势感知方法
CN111506478A (zh) 基于人工智能实现告警管理控制的方法
CN108921301A (zh) 一种基于自学习的机器学习模型更新方法及系统
CN110929918A (zh) 一种基于CNN和LightGBM的10kV馈线故障预测方法
CN115578015A (zh) 基于物联网的污水处理全过程监管方法、系统及存储介质
CN111507376A (zh) 一种基于多种无监督方法融合的单指标异常检测方法
CN108304567B (zh) 高压变压器工况模式识别与数据分类方法及系统
CN111310139B (zh) 行为数据识别方法、装置及存储介质
CN110636066B (zh) 基于无监督生成推理的网络安全威胁态势评估方法
CN114385391A (zh) 一种nfv虚拟化设备运行数据分析方法及装置
CN108470022A (zh) 一种基于运维管理的智能工单质检方法
CN108985467A (zh) 基于人工智能的二次设备精益化管控方法
CN114742477B (zh) 企业订单数据处理方法、装置、设备及存储介质
CN114048870A (zh) 一种基于日志特征智能挖掘的电力系统异常监测方法
CN112906738B (zh) 一种水质检测及处理方法
CN114201374A (zh) 基于混合机器学习的运维时序数据异常检测方法及系统
CN114185760A (zh) 系统风险评估方法及装置、充电设备运维检测方法
CN111126820A (zh) 反窃电方法及系统
CN112966259A (zh) 电力监控系统运维行为安全威胁评估方法及设备
CN115719283A (zh) 一种智能化会计管理系统
CN112016769B (zh) 管理相对人风险预测以及信息推荐方法及装置
CN114118524A (zh) 一种基于知识推理的设备状态综合分析方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925421

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 110123)

122 Ep: pct application non-entry in european phase

Ref document number: 20925421

Country of ref document: EP

Kind code of ref document: A1