CN114860875A - Data integration system and method for fixed pollution source - Google Patents

Data integration system and method for fixed pollution source Download PDF

Info

Publication number
CN114860875A
CN114860875A CN202210443669.2A CN202210443669A CN114860875A CN 114860875 A CN114860875 A CN 114860875A CN 202210443669 A CN202210443669 A CN 202210443669A CN 114860875 A CN114860875 A CN 114860875A
Authority
CN
China
Prior art keywords
data
integrated
pollution source
pollution
registered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210443669.2A
Other languages
Chinese (zh)
Other versions
CN114860875B (en
Inventor
毛庆国
尹�民
游勇
费新勇
彭胜巍
刘琳琳
黄为炜
蔡昌才
张德辉
何燕飞
张冬华
伍城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ecological Environment Intelligent Control Center
Original Assignee
Shenzhen Ecological Environment Intelligent Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ecological Environment Intelligent Control Center filed Critical Shenzhen Ecological Environment Intelligent Control Center
Priority to CN202210443669.2A priority Critical patent/CN114860875B/en
Publication of CN114860875A publication Critical patent/CN114860875A/en
Application granted granted Critical
Publication of CN114860875B publication Critical patent/CN114860875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a data integration system and a data integration method for a fixed pollution source. The method comprises the following steps: acquiring data to be integrated of a fixed pollution source and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, wherein the data to be integrated comprises a name and/or an identification code of the pollution source to be integrated, acquiring registered data of a registered pollution source, and the registered data comprises the name and/or the identification code of the registered pollution source; carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated; if so, integrating the data to be integrated into the matched pollution source data of the matched pollution source; and if not, generating new pollution sources and new pollution source data according to the pollution source data to be integrated. The invention can effectively improve the data processing efficiency and reduce the data management difficulty.

Description

Data integration system and method for fixed pollution source
Technical Field
The invention relates to the field of data processing, in particular to a data integration system and method for a fixed pollution source.
Background
With the development of science and technology, the increase of population and the improvement of living standard of people, the resource consumption is gradually increased, and the emission of pollutants is continuously increased. In order to improve the quality of ecological environment, protect the ecological safety of areas and improve the environment supervision capacity, the data of each fixed pollution source needs to be updated at any time to be provided for relevant departments to supervise, analyze and process, but the way of acquiring the data of the fixed pollution source is more, the acquired data is more disordered and is easy to miss, so that the data processing pressure of the relevant departments is high, and the processing efficiency is low.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a data integration system and method for fixing a pollution source, aiming at the above-mentioned defects of the prior art, with high data processing pressure and low processing efficiency.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for integrating the data of the fixed pollution source comprises the following steps:
acquiring data to be integrated of a pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, wherein the data to be integrated comprises a name and/or an identification code of the pollution source to be integrated, acquiring registered data of a registered pollution source, and the registered data comprises the name and/or the identification code of the registered pollution source;
carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated;
if the registered pollution sources are matched pollution sources matched with the pollution sources to be integrated, integrating the data to be integrated into the matched pollution source data of the matched pollution sources;
and if the registered pollution sources are not matched pollution sources matched with the pollution sources to be integrated, generating new pollution sources and new pollution source data according to the pollution source data to be integrated.
The step of generating new pollution sources and new pollution source data according to the pollution source data to be integrated comprises the following steps:
and acquiring the name and/or the identification code of the newly added pollution source according to a preset naming rule and/or a coding rule.
Wherein, after the step of generating the newly added pollution source and the newly added pollution source data according to the pollution source data to be integrated, the method comprises the following steps:
generating a fixed pollution source list according to all newly added pollution sources and registered pollution sources at present, and judging whether the fixed pollution source list has the incorporable pollution sources which are substantially the same;
and performing at least one of duplication removal, coverage and deletion on the combinable pollution source data of the combinable pollution sources, and combining the combinable pollution source data into combined pollution source data.
After the step of generating the fixed pollution source list according to all the newly added pollution sources and the registered pollution sources, the method comprises the following steps:
and constructing a pollution source file for each fixed pollution source in the fixed pollution source list, wherein the pollution source file comprises at least one of a name, an address, a belonging administrative district, a belonging industry, a belonging enterprise, a code, a management attribute, a supervision flow chart, a pollution source emission amount and monitoring information.
The data to be integrated also comprises enterprises to be integrated to which the pollution sources to be integrated belong; the registered data also comprises registered enterprises to which the registered pollution sources belong;
the step of integrating the data to be integrated into the matching pollution source data of the matching pollution source comprises:
acquiring an incidence relation between the enterprise to be integrated and the registered enterprise, wherein the incidence relation comprises any one of an upper-level relation, a lower-level relation, a parallel subordinate same upper-level relation and a substantially same relation;
and integrating the data to be integrated into the matched pollution source data of the matched pollution source according to the incidence relation, and adding corresponding incidence identification and incidence link on the data to be integrated and the matched pollution source data.
Wherein, the step of obtaining the association relationship between the enterprise to be integrated and the registered enterprise comprises:
and establishing an enterprise data knowledge graph matching the pollution source data and the data to be integrated according to the incidence relation.
The data to be integrated comprises a data source of the pollution source to be integrated, the registered data comprises a data source of the registered pollution source, and the data source comprises at least one of a provision unit, a system to which the provision unit belongs and a sharing mode;
the step of judging whether the registered pollution sources are matched pollution sources matched with the pollution source to be integrated comprises the following steps:
if the name and/or the identification code of the pollution source to be integrated is completely consistent with the name and/or the identification code of the registered pollution source, taking the registered pollution source as the matched pollution source;
and if the name and/or the identification code of the pollution source to be integrated is partially consistent with the name and/or the identification code of the registered pollution source, acquiring data sources of the pollution source to be integrated and the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated according to the data sources.
Wherein the step of determining whether the registered pollution sources are matched pollution sources matched with the pollution source to be integrated according to the data source comprises:
judging whether the data source of the pollution source to be integrated is consistent with the data source of the registered pollution source;
if the data source of the pollution source to be integrated is completely consistent with the data source of the registered pollution source, acquiring the data characteristics of the data to be integrated and the data characteristics of the registered pollution source, and judging whether the data characteristics of the data to be integrated and the data characteristics of the registered pollution source are consistent;
if the data characteristics of the data to be integrated are consistent with the data characteristics of the registered pollution sources, taking the registered pollution sources as the matched pollution sources;
and if the data characteristics of the data to be integrated are inconsistent with the data characteristics of the registered pollution sources, judging whether the reliability of the data to be integrated meets a preset requirement, and if the reliability of the data to be integrated meets the preset requirement, taking the pollution sources to be integrated as the newly added pollution sources.
Wherein the data characteristics of the data to be integrated comprise at least one of emission peak, emission valley, emission period, emission trend, peak emission time, and valley emission time of the data to be integrated;
the data characteristics of the registered data include at least one of emission peak, emission valley, emission period, emission trend, peak emission time, valley emission time of the registered data.
Wherein, after the step of generating the newly added pollution source and the newly added pollution source data according to the pollution source data to be integrated, the method comprises the following steps:
and performing data quality audit on the registered data, the newly added pollution source data or the integrated matched pollution source data, and deleting data unqualified in quality audit, wherein the quality audit comprises data integrity audit, data validity audit and data reliability audit.
Wherein the data integrity audit comprises: verifying whether the data content includes necessary data items;
the data validity audit comprises: whether the data content is in the data validity period is checked;
the data reliability audit comprises the following steps: and checking whether the data source belongs to an official data source.
The step of importing the data to be integrated in a data integration mode matched with the data source comprises the following steps of:
recording the data to be integrated into a preset form matched with the fixed pollution source data type, and performing data acquisition on the recorded preset form; and/or
Acquiring the data to be integrated at intervals of a preset period, automatically performing data examination on the data to be integrated, and importing the data to be integrated in a data copying manner; and/or
When detecting to-be-integrated data of a newly-added pollution source to be integrated, acquiring the to-be-integrated data of the newly-added pollution source to be integrated through a standard data interface, converting the to-be-integrated data of the newly-added pollution source to be integrated into a standard format, and importing the to-be-integrated data; and/or
And importing the data to be integrated through a data import tool and a data management tool.
The technical scheme adopted by the invention for solving the technical problems is as follows: there is provided a fixed pollution source data integration system comprising:
the acquisition module is used for acquiring data to be integrated of pollution sources to be integrated and data sources of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data sources, wherein the data to be integrated comprises names and/or identification codes of the pollution sources to be integrated, and acquiring registered data of registered pollution sources, and the registered data comprises names and/or identification codes of the registered pollution sources;
the matching module is used for carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated or not;
the integration module is used for integrating the data to be integrated into the matched pollution source data of the matched pollution source if the registered pollution sources are matched pollution sources matched with the pollution source to be integrated;
and the newly added module is used for generating newly added pollution sources and newly added pollution source data according to the pollution source data to be integrated if the registered pollution sources are not matched pollution sources matched with the pollution sources to be integrated.
The technical scheme adopted by the invention for solving the technical problems is as follows: there is provided a fixed pollution source data integration system comprising a memory storing a computer program and a processor executing the computer program to carry out the steps of the method as described above.
Wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method as described above.
The method has the advantages that compared with the prior art, keyword matching is carried out on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source to judge whether the registered pollution source is the matched pollution source matched with the pollution source to be integrated, if yes, the data to be integrated are integrated into the matched pollution source data of the matched pollution source, and if not, new pollution sources and new pollution source data are generated according to the data of the pollution source to be integrated, so that the data corresponding to the same fixed pollution source substantially can be integrated, and the problems of data disorder and high management difficulty caused by different data sources are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic flow chart of a first embodiment of a fixed pollution source data integration method provided by the present invention;
FIG. 2 is a schematic flow chart of a second embodiment of the method for integrating data of a fixed pollution source provided by the present invention;
FIG. 3 is a schematic flow chart of a third embodiment of the method for integrating data of a fixed pollution source provided by the present invention;
FIG. 4 is a schematic flow chart of a fourth embodiment of the method for integrating data of a fixed pollution source provided by the present invention;
FIG. 5 is a schematic flow chart of a fifth embodiment of the method for integrating data of a fixed pollution source provided by the present invention;
FIG. 6 is a schematic structural diagram of an embodiment of a fixed pollution source data integration system according to the present invention;
FIG. 7 is a schematic structural diagram of an embodiment of a fixed pollution source data integration system provided by the present invention;
fig. 8 is a schematic structural diagram of an embodiment of a storage medium provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a fixed pollution source data integration method according to a first embodiment of the present invention. The method for integrating the fixed pollution source data provided by the invention comprises the following steps:
s101: acquiring data to be integrated of the pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, and acquiring registered data of the registered pollution source.
In a specific implementation scenario, to-be-integrated data of a pollution source to be integrated and a data source of the to-be-integrated data are acquired, and the to-be-integrated data are imported in a data integration mode matched with the data source, wherein the to-be-integrated data can be collected, sorted and reported by an enterprise where a fixed pollution source to be integrated is located, or collected and acquired by a supervision department supervising the fixed pollution source to be integrated, and can also be automatically acquired through equipment such as a sensor.
In one implementation scenario, the data to be integrated is historical data or data integrated or collected by other software, and the ETL data import tool and the corresponding ETL management tool are used to periodically and automatically acquire the data to be integrated, for example, the data to be integrated is imported after being packaged. After the data to be integrated is obtained, due to the fact that the data is acquired or integrated by other software or historical data, the data format or the data acquisition standard of the data may be different from the format or the standard of the system, after the data to be integrated is obtained, data cleaning and screening are carried out on the data to be integrated, whether the data quality of the data to be integrated is qualified or not is judged, and if the data to be integrated is not qualified, data quality improvement including defect supplement, correction of errors of format mapping and the like can be carried out on the data to be integrated through an artificial neural network.
In an implementation scenario, the data to be integrated is service data such as approval permission for environment, which are mainly generated in a daily service processing process, a standard data interface is provided for the part of data, when the data to be integrated of a newly added pollution source to be integrated is detected, the data to be integrated of the newly added pollution source to be integrated is obtained through the standard data interface, and the data to be integrated of the newly added pollution source to be integrated is converted into a standard format and then is imported into the data to be integrated. Furthermore, by modifying the service function according to the standard interface, the data to be integrated can be automatically imported in real time in the normal service handling process of service personnel.
In an implementation scenario, the data to be integrated is online monitoring data, and because the data acquisition department, the mode and the format are relatively fixed, an intermediate format data standard does not need to be formulated, the data to be integrated is automatically subjected to data review after the data to be integrated is acquired, and the data to be integrated is imported in a data copying mode.
In an implementation scene, the data to be integrated is public codes, pollution source basic information, environmental quality measuring points, section information and the like without support of a business system, the data to be integrated is recorded into a preset form matched with the fixed pollution source data type, and data acquisition is carried out on the recorded preset form. For example, a neural network may be trained, a matched preset form may be obtained through the neural network, and the data to be integrated may be filled in the preset form.
The data to be integrated of the pollution source to be integrated and the data source of the data to be integrated are obtained, the data to be integrated are imported in a data integration mode matched with the data source, corresponding import methods can be carried out on the data to be integrated according to the reliability and the data characteristics of the data to be integrated of different data sources, and the reliability and the effectiveness of the imported data to be integrated can be guaranteed.
The pollution source to be integrated can be a fixed pollution source which is registered in the pollution source management system, and can also be a fixed pollution source newly added by an enterprise. The data to be integrated includes the name and/or identification code of the contamination source to be integrated. The identification code is preferably selected when the pollution source to be integrated has the identification code, and if the pollution source to be integrated does not have the identification code (for example, a newly added fixed pollution source is not registered or an acquisition identification code is not applied for) to acquire the name of the pollution source to be integrated, the name of the pollution source to be integrated may include at least one of the location, the emission type, the main pollutants, the enterprise to which the pollution source belongs, the industry to which the pollution source belongs, and the administrative area to which the pollution source to be integrated belongs.
Acquiring registered data of registered pollution sources, wherein the registered pollution sources are fixed pollution sources which are registered in a pollution source management system before, and the registered data are related data of the registered pollution sources collected or sorted before and comprise at least one of sites, emission types, emission amounts, main pollutants, enterprises, items, industries, administrative regions, pollution discharge licenses, real-time monitoring data and supervision information of the registered pollution sources. The identification code of the registered pollution source is obtained according to a preset coding rule when the fixed pollution source is registered. The name of the registered pollution source may include at least one of a location of the registered pollution, an emission type, a major pollutant, an affiliated business, an affiliated industry, and an affiliated administrative area.
S102: carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated; if yes, step S103 is executed, and if no, step S104 is executed.
In a specific implementation scenario, the name and/or the identification code of the pollution source to be integrated is matched with the name and/or the identification code of the registered pollution source by keywords, and whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated is judged. It may be detected, for example, whether the name and/or identification code of the contamination source to be integrated and the content of the name and/or identification code of the registered contamination source at a certain fixed position or positions are identical. For example, whether the second, third, fifth and tenth digits of the identification codes of the two are identical or not. As another example, the names of the two are consistent with the content corresponding to the enterprise, the emission type, and the main pollutants.
S103: and integrating the data to be integrated into the matched pollution source data of the matched pollution source.
In a specific implementation scenario, the registered pollution sources are matched pollution sources matched with the pollution sources to be integrated, and the data to be integrated is integrated into the matched pollution source data of the matched pollution sources. The data to be integrated can be integrated into the matching pollution source data after being subjected to data screening and cleaning, the data to be integrated can be added into the matching pollution source data, part of content in the matching pollution source data can be replaced, and part of content in the data to be integrated can be selected to be added into the matching pollution source data. In one implementation scenario, the time and content of the current integration, the unit of the operation integration step, the responsible person and other content are recorded and are reserved for subsequent data tracking.
In one implementation scenario, a matching data format matching with pollution source data is obtained, and data mapping is performed on data to be integrated according to the matching data format, so that the data to be integrated has the matching data format, and therefore the overall data format is consistent, and reading and storing are convenient.
In an implementation scenario, a data item matching the pollution source data and a data item of the data to be integrated may be obtained, data corresponding to each data item is compared, if the data items are different data, the data is filled, and if the data items are the same data, no modification is performed. Or if different data exist, comparing, if the difference is smaller than a preset threshold value, filling, if the difference is larger, marking, and subsequently confirming the processing mode manually.
In an implementation scenario, the pollution source to be integrated and a registered pollution source are actually the same pollution source, but the two pollution sources are collected and reported by different departments, the registered pollution source has an identification code according to a preset coding rule, and the pollution source to be integrated only has a name, so that when the two pollution sources are judged to be matched according to keyword matching, the data to be integrated of the pollution source to be integrated is integrated into the registered data of the matched registered pollution source, and the problems of data disorder and great management difficulty caused by different data sources are solved.
S104: and generating new pollution sources and new pollution source data according to the pollution source data to be integrated.
In a specific implementation scenario, if there is no matching pollution source matching the pollution source to be integrated in the registered pollution sources, it indicates that the pollution source to be integrated has not been registered before, and the pollution source to be integrated needs to be registered in the pollution source management system as a new pollution source, and the data to be integrated is used as new pollution source data of the new pollution source. The data format of the registered data can be obtained, and data mapping is carried out on the newly added pollution source data, so that the newly added pollution source data and the registered data have a uniform format, and data management is facilitated.
In the implementation scenario, the name and/or the identification code of the newly added pollution source is obtained according to a preset naming rule and/or a preset encoding rule. So that the newly added pollution sources can be uniformly managed together with the registered pollution sources. In the implementation scene macro, the problems of multi-source, information splitting and respective government of the conventional pollution source data are solved by coding and assigning codes of unified rules of the fixed pollution sources, and meanwhile, the pollution source information is dynamically updated and shared in real time by taking the fixed pollution source codes as links to carry out data butt joint with various pollution source management related business systems in the whole city.
As can be seen from the above description, in this embodiment, the name and/or the identification code of the pollution source to be integrated is keyword-matched with the name and/or the identification code of the registered pollution source to determine whether the registered pollution source is a matching pollution source matched with the pollution source to be integrated, if so, the data to be integrated is integrated into the matching pollution source data of the matching pollution source, and if not, new pollution sources and new pollution source data are generated according to the pollution source data to be integrated, so that the data corresponding to substantially the same fixed pollution source can be integrated, and the problems of data clutter and large management difficulty caused by different data sources are avoided.
Referring to fig. 2, fig. 2 is a schematic flow chart of a fixed pollution source data integration method according to a second embodiment of the present invention. The method for integrating the fixed pollution source data provided by the invention comprises the following steps:
s201: acquiring data to be integrated of the pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, and acquiring registered data of the registered pollution source.
S202: and carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated. If yes, go to step S203, otherwise go to step S204.
S203: and integrating the data to be integrated into the data of the matched pollution sources.
S204: and generating new pollution sources and new pollution source data according to the pollution source data to be integrated.
In a specific implementation scenario, steps S201 to S204 are substantially the same as steps S101 to S104 in the first embodiment of the fixed pollution source data integration method provided by the present invention, and are not described herein again.
S205: and generating a fixed pollution source list according to all newly added pollution sources and registered pollution sources at present, and judging whether the fixed pollution source list has the same combinable pollution sources substantially. If yes, go to step S206.
In a specific implementation scenario, a fixed pollution source list is generated according to all newly added pollution sources and registered pollution sources, where the fixed pollution source list includes names and identification codes of each newly added pollution source or registered pollution source, and new pollution source data or registered data of each newly added pollution source or registered pollution source. And judging whether the fixed pollution source list has the substantially same mergeable pollution sources. For example, a fixed pollution source is a wastewater pollution source, and different departments collect emission data of different pollutants of the wastewater pollution source, so that a situation that the same fixed pollution source but recorded data of two fixed pollution sources may occur, and whether mergeable pollution sources which are substantially the same exist in a fixed pollution source list is judged, and whether mergeable pollution sources exist can be judged according to the position of the fixed pollution source, the enterprise to which the fixed pollution source belongs, the industry, the emission type, and the emission amount. If the location, business, industry, emission type, and emission amount of two or more fixed pollution sources are the same or close, the fixed pollution sources can be considered to be mergeable pollution sources.
S206: and performing at least one of duplication removal, coverage and deletion on the combinable pollution source data of the combinable pollution sources, and combining the combinable pollution source data into the combined pollution source data.
In a specific implementation scenario, at least one of deduplication, overlay, and deletion of the combinable pollution sources is combined into combined pollution source data. For example, the mergeable pollution source is a sewage discharge pollution source, and the plurality of mergeable pollution sources each include sewage discharge amount data of the sewage discharge pollution source, the sewage discharge amount data including discharge peak and trough values, discharge average values, discharge time, pause discharge time, and the like. If there are duplicates of these data, then deduplication is performed, if there are significantly erroneous data (e.g., that are significantly different from the other data, or that are significantly unreasonable) deleted, if there are similar or close data, then one of these is selected to overwrite the rest of the data, e.g., an intermediate value may be selected to overwrite an extreme value.
S207: and constructing a pollution source file for each fixed pollution source in the fixed pollution source list, wherein the pollution source file comprises at least one of a name, an address, a belonging administrative district, a belonging industry, a belonging enterprise, an identification code, a management attribute, a supervision flow chart, pollution source emission and monitoring information.
In a specific implementation scenario, a pollution source file is constructed for each fixed pollution source in the fixed pollution source list, and the pollution source file includes at least one of a name, an address, an affiliated administrative district, an affiliated industry, an affiliated enterprise, an identification code, a management attribute, a supervision flowchart, a pollution source emission amount, and monitoring information. After the pollution source file is generated, corresponding links are added in the fixed pollution source list, when a user browses the fixed pollution source list, the user can access the pollution source file by clicking the corresponding links to acquire detailed information without searching and collecting data by the user, and the efficiency of data management and data lookup is greatly improved.
As can be seen from the above description, in this embodiment, a fixed pollution source list is generated according to all newly added pollution sources and registered pollution sources, at least one of duplicate removal, coverage and deletion is performed on the combinable pollution sources that are substantially the same in the fixed pollution source list, the combinable pollution sources are combined into combined pollution source data, and a pollution source file is constructed for each fixed pollution source in the fixed pollution source list, so that erroneous or redundant data can be effectively removed, a storage space is saved, data quality is improved, time required by a user for sorting data can be reduced, and data management efficiency is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for integrating data of a fixed pollution source according to a third embodiment of the present invention. The invention provides a fixed pollution source data integration method which comprises the following steps:
s301: acquiring data to be integrated of the pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, and acquiring registered data of the registered pollution source.
S302: and carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated. If yes, go to step S303, otherwise go to step S304.
S303: and integrating the data to be integrated into the matched pollution source data of the matched pollution source.
S304: and generating new pollution sources and new pollution source data according to the pollution source data to be integrated.
In a specific implementation scenario, steps S301 to S304 are substantially the same as steps S101 to S104 in the first embodiment of the fixed pollution source data integration method provided by the present invention, and are not described herein again.
S305: and performing data quality audit on the registered data, the newly added pollution source data or the integrated matched pollution source data, and deleting the data which is unqualified in quality audit, wherein the quality audit comprises data integrity audit, data validity audit and data reliability audit.
In a specific implementation scenario, data quality audit is performed on all data in the current pollution source management system, and data which is not qualified in quality audit is deleted, so that data quality in the current pollution source management system is ensured, and data management is facilitated. The quality audit comprises data integrity audit, data validity audit and data reliability audit. The data integrity audit is used for checking whether the registered data, the newly added pollution source data or the integrated matching pollution source data comprise necessary data items. The necessary data items may be set by the user. The necessary data items may include data collection time, department uploading the data, responsible person, items corresponding to the data, and the like, and if a certain data lacks the necessary data items, the reliability of the data is in doubt, and the data may be deleted. Further, the necessary data items corresponding to different types of data may be different, for example, data corresponding to pollutant emission amount monitored in real time, and the necessary data items include at least one of detection time, detection position, and detection standard of the detected pollutant.
The data validity audit is used for checking whether the registered data, the newly added pollution source data or the integrated matched pollution source data are in the data validity period. For example, the processing standard of some pollutants is valid, the items corresponding to some pollutants are also valid, the data in the valid period can be kept, and the data beyond the valid period can be deleted, so as to save the storage space.
The data reliability audit is used for examining whether the registered data, the newly added pollution source data or the integrated matched pollution source data belong to the official certification data source, the uploading of some data may require the audit of at least one superior organ or superior leader, and if the data does not pass through the audit process, the data cannot be considered to belong to the official certification data source. In one implementation scenario, an identification code, identification watermark, label, etc. may be added to the official certified data to enable the official certified data to be quickly identified.
In other implementation scenarios, other aspects of auditing including data format auditing, data permission auditing, data accuracy auditing and data correctness auditing can also be performed on registered data, newly added pollution source data or integrated matched pollution source data. The auditing of the data format includes whether the data format meets preset requirements, such as whether the data format has been mapped to a specified format, whether the data format is a specified disallowed format, and so on. The examination of the data authority includes whether the setting of each authority of the data is in compliance, for example, whether the setting of the authority of the data for consulting, modifying, downloading, uploading and forwarding is in compliance with relevant regulations such as data safety management regulations and the like. The data correctness checking includes checking whether the data is correct, for example, data of the same fixed pollution source in adjacent or same period can be obtained and compared, and if the difference of the comparison result is smaller, the data is judged to be correct. Or historical data of the same pollution source can be input into the neural network to obtain predicted data, the predicted data is compared with current data, and if the difference of the comparison result is small, the correct data is judged.
In other implementation scenarios, labeling data which is not qualified in quality audit, determining whether to delete the data by a user, or tracing the data which is not qualified in quality audit, checking a data source of the data, a department uploading the data, a responsible person and the like, and if the data managed by a certain department or a responsible person is not qualified in quality audit for many times, warning the department or the responsible person. If the data of a certain data source is unqualified in quality check for multiple times, judging whether the data source is an alternative data source, if so, deleting the data source, and providing the data by adopting the alternative source of the data source. And the quality audit is carried out again aiming at other data provided by the data source so as to effectively ensure the reliability of the data. In other implementation scenarios, the data source may be intelligently analyzed to obtain the reason for the poor quality of the data provided by the data source, for example, the sensor may be used for a long time, which may result in inaccurate monitoring data, or the data transmission mode may be prone to coding errors and omissions.
According to the above description, in this embodiment, the quality of the data of the fixed pollution source is audited, and the data that is unqualified in quality audit is deleted, so that the quality of the data can be effectively improved, and the efficiency of data management is effectively improved.
Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a fourth embodiment of a method for integrating data of a fixed pollution source according to the present invention. The method for integrating the fixed pollution source data provided by the invention comprises the following steps:
s401: acquiring data to be integrated of the pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, and acquiring registered data of the registered pollution source.
S402: and performing keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated, and if so, executing the step S403.
In a specific implementation scenario, steps S401 to S402 are substantially the same as steps S101 to S102 in the first embodiment of the fixed pollution source data integration method provided by the present invention, and are not described herein again.
S403: acquiring the incidence relation between the enterprise to be integrated and the registered enterprise, wherein the incidence relation comprises any one of a superior-inferior relation, a parallel subordinate same superior relation and a substantially same relation.
In a specific implementation scenario, the data to be integrated further includes an enterprise to be integrated to which the pollution source to be integrated belongs; the registered data also includes registered businesses to which the registered pollution sources belong. The obtaining of the association relationship between the to-be-integrated enterprise and the registered enterprise may be provided by the registered enterprise or provided by the to-be-integrated enterprise, or may be obtained by searching relevant information on the network. The association relationship includes any one of a hierarchical relationship, a parallel dependency on the same hierarchical relationship, and a substantially identical relationship, for example, if company b is a subsidiary of company a, company a and company b are hierarchical. The company B and the company C are all subsidiary companies of the company A, and the company B and the company C are in parallel subordinate same superior relationship. Company D and company C have the same address and the same fixed pollution source, and thus, they can be considered to be substantially the same.
S404: integrating the data to be integrated into the matched pollution source data of the matched pollution source according to the incidence relation, and adding corresponding incidence identification and incidence link on the data to be integrated and the matched pollution source data.
In a specific implementation scenario, the data to be integrated is integrated into the matching pollution source data of the matching pollution sources according to the incidence relation. For example, if the enterprise to be integrated and the registered enterprise matching the pollution source are in substantially the same relationship, the data to be integrated and the registered enterprise are added to the data matching the pollution source, and the part identical to or duplicated with the data matching the pollution source is deleted. And if the enterprise to be integrated and the registered enterprise matched with the pollution source are in the upper-lower level relation, adding the data to be integrated into the data matched with the pollution source as the whole data of the enterprise to be integrated. And if the enterprise to be integrated and the registered enterprise matching the pollution source are in parallel subordinate same superior relation, taking the data to be integrated as the integral data of the enterprise to be integrated as independent data and storing the data and the data matching the pollution source in parallel.
In the implementation scenario, corresponding association identifiers and/or association links are added to the data to be integrated and the data matching the pollution sources. Therefore, when the user refers to the data to be integrated or matches the pollution source data, the user can know other data related to the currently referred data and refer to other data related to the relationship through the association link, and the user can obtain information and manage the data conveniently.
S405: and constructing a pollution knowledge graph matching the pollution source data and the data to be integrated according to the incidence relation.
In a specific implementation scenario, an enterprise data knowledge graph matching pollution source data and data to be integrated is constructed according to the incidence relation. The pollution data knowledge graph can be constructed by taking each data (including the data to be integrated and the data of the matched pollution sources), each enterprise (including the data to be integrated and the enterprise corresponding to the matched pollution sources) and each pollution source (including the pollution sources to be integrated and the matched pollution) as nodes. Therefore, a user can know the relationship among the nodes more clearly through the polluted data knowledge graph, and the user can acquire the data of other nodes related to the node only by inputting the content of any one node during searching, so that the method is greatly convenient for the user to use, and the user does not need to manually search and sort. Further, the pollution relation between each pollution source and each enterprise can be obtained according to the obtained pollution knowledge graph, and key industries and end-point enterprises for pollution control are obtained.
As can be seen from the above description, in this embodiment, the association relationship between the to-be-integrated enterprise and the registered enterprise is obtained, the to-be-integrated data is integrated into the matching pollution source data of the matching pollution source according to the association relationship, and the corresponding association identifier and association link are added to the to-be-integrated data and the matching pollution source data, so that the user can conveniently and quickly obtain related data, and the efficiency of data management and search is improved.
Referring to fig. 5, fig. 5 is a flowchart illustrating a fifth embodiment of a method for integrating data of a fixed pollution source according to the present invention. The method for integrating the fixed pollution source data provided by the invention comprises the following steps:
s501: acquiring data to be integrated of the pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, and acquiring registered data of the registered pollution source.
In a specific implementation scenario, the data to be integrated includes a data source of the pollution source to be integrated, the registered data includes a data source of the registered pollution source, and the data source includes at least one of a provision unit, a system to which the provision unit belongs, and a sharing manner. The providing unit comprises an enterprise or a department to which the pollution source to be integrated or the registered pollution source belongs, and the system comprises the enterprise to be integrated or the system of the pollution source to be integrated. The sharing methods include unconditional sharing, conditional sharing and no sharing.
The data to be integrated includes the name and/or identification code of the contamination source to be integrated. The identification code is preferably selected when the pollution source to be integrated has an identification code, and if the pollution source to be integrated does not have an identification code (for example, a newly added fixed pollution source is not registered or an acquisition identification code is not applied), the name of the pollution source to be integrated may include at least one of a location, an emission type, a major pollutant, an enterprise to which the pollution source belongs, and a government area to which the pollution source to be integrated belongs. The identification code of the registered pollution source is obtained according to a preset coding rule when the fixed pollution source is registered. The name of the registered pollution source may include at least one of a location of the registered pollution, an emission type, a major pollutant, a business of the business, and a political area of the business.
S502: and performing keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, judging whether the name and/or the identification code of the pollution source to be integrated is completely consistent with the name and/or the identification code of the registered pollution source, if so, executing a step S503, and if not, executing a step S504.
In a specific implementation scenario, step S502 is substantially the same as step S102 of the first embodiment of the fixed pollution source data integration method provided by the present invention, and details thereof are not repeated here.
S503: and taking the registered pollution source as a matching pollution source.
In this implementation scenario, the name and/or the identification code of the pollution source to be integrated is completely consistent with the name and/or the identification code of the registered pollution source, and then the registered pollution source and the pollution source to be integrated are considered to be substantially the same, and the registered pollution source is used as the matching pollution source.
S504: and judging whether the data source of the pollution source to be integrated is consistent with the data source of the registered pollution source. If yes, go to step S505. If not, go to step S507.
In a specific implementation scenario, the name and/or the identification code of the pollution source to be integrated partially coincides with the name and/or the identification code of the registered pollution source, and further, the partial coincidence of the preset designated location indicates that the information of the pollution source to be integrated and the registered pollution source are consistent, and the pollution source to be integrated and the registered pollution source may be matched with each other. And judging whether the data source of the pollution source to be integrated is consistent with the data source of the registered pollution source. If the data sources of the two are consistent, the probability of matching the two is high. If the data sources of the two are not consistent, for example, the enterprises to which the two belong are different, and the systems to which the two belong are also different, the two are not matched, and the pollution source to be integrated can be used as a new pollution source.
S505: acquiring the data characteristics of the data to be integrated and the data characteristics of the registered pollution sources, and judging whether the data characteristics of the data to be integrated and the data characteristics of the registered pollution sources are consistent, if so, executing a step S503, and if not, executing a step S506.
In one particular implementation scenario, data characteristics of the data to be integrated and data characteristics of the registered pollution sources are obtained. The data characteristics of the data to be integrated comprise at least one of emission peak value, emission valley value, emission period, emission trend, peak emission time and valley emission time of the data to be integrated; the data characteristics of the registered data include at least one of emission peak, emission trough, emission period, emission trend, peak emission time, and trough emission time of the registered data. If the data to be integrated and the registered data correspond to the same pollutant and the data characteristics are the same, the probability of matching the two is high. Otherwise, the two are not matched.
S506: and judging whether the reliability of the data to be integrated meets the preset requirement, if so, executing the step S507.
In a specific implementation scenario, if the data characteristics of the data to be integrated and the data characteristics of the registered pollution sources are not consistent, the data characteristics and the data characteristics are not matched, whether the reliability of the data to be integrated meets preset requirements is judged, the judgment of the reliability can detect whether the data to be integrated is uploaded through a specified channel by specified personnel of a specified unit, and can also detect whether the data to be integrated has a preset data detection identifier, if so, the data is proved to be subjected to data quality detection, the reliability is higher, and whether a sensor for collecting the data to be integrated is a specified sensor can also be detected.
S507: and taking the pollution source to be integrated as a new pollution source.
In a specific implementation scenario, the reliability of the data to be integrated meets a preset requirement, the pollution source to be integrated is used as a new pollution source, and the data to be integrated is used as new pollution source data. In other implementation scenarios, if the reliability of the data to be integrated meets a preset requirement, the data to be integrated is deleted.
As can be seen from the above description, in this embodiment, it is determined whether the registered pollution source is a matching pollution source matching the pollution source to be integrated according to the data source, and it is determined whether the registered pollution source matches the data feature of the data to be integrated, so that the accuracy and reliability of the determination result can be effectively improved.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a fixed pollution source data integration system according to an embodiment of the present invention. The fixed pollution source data integration system 10 comprises an acquisition module 11, a matching module 12, an integration module 13 and a newly added module 14.
The acquisition module 11 is configured to acquire data to be integrated of a pollution source to be integrated and a data source of the data to be integrated, import the data to be integrated in a data integration manner matched with the data source, acquire registered data of a registered pollution source, where the registered data includes a name and/or an identification code of the registered pollution source; the matching module 12 is used for performing keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated; the integration module 13 is configured to integrate the data to be integrated into the matching pollution source data of the matching pollution source if the registered pollution sources are matching pollution sources matching with the pollution source to be integrated; the newly added module 14 is configured to generate newly added pollution sources and newly added pollution source data according to the pollution source data to be integrated if the registered pollution sources are not matched pollution sources matched with the pollution sources to be integrated.
The adding module 14 is configured to obtain names and/or identification codes of the newly added pollution sources according to preset naming rules and/or encoding rules.
The newly-added module 14 is configured to generate a fixed pollution source list according to all currently-added pollution sources and registered pollution sources, and determine whether substantially identical combinable pollution sources exist in the fixed pollution source list; and performing at least one of duplication removal, coverage and deletion on the combinable pollution source data of the combinable pollution sources, and combining the combinable pollution source data into the combined pollution source data.
The newly added module 14 is configured to construct a pollution source file for each fixed pollution source in the fixed pollution source list, where the pollution source file includes at least one of a name, an address, an affiliated administrative district, an affiliated industry, an affiliated enterprise, an identification code, a management attribute, a supervision flowchart, a pollution source emission amount, and monitoring information.
The data to be integrated also comprises enterprises to be integrated to which pollution sources to be integrated belong; the registered data also includes registered businesses to which the registered pollution sources belong. The integration module 13 is configured to obtain an association relationship between an enterprise to be integrated and a registered enterprise, where the association relationship includes any one of a higher-level relationship, a parallel subordinate same higher-level relationship, and a substantially same relationship; integrating the data to be integrated into the matching pollution source data of the matching pollution source according to the incidence relation, and adding corresponding incidence identification and/or incidence link on the data to be integrated and the matching pollution source data. Acquiring an incidence relation between the enterprise to be integrated and the registered enterprise, wherein the incidence relation comprises any one of a superior-inferior relation, a parallel subordinate same superior relation and a substantially same relation; integrating the data to be integrated into the matching pollution source data of the matching pollution source according to the incidence relation, and adding corresponding incidence identification and/or incidence link on the data to be integrated and the matching pollution source data.
The integration module 13 is configured to construct a pollution data knowledge graph matching the pollution source data and the data to be integrated according to the association relationship.
The data to be integrated comprises a data source of the pollution source to be integrated, the registered data comprises a data source of the registered pollution source, and the data source comprises at least one of a providing unit, a system to which the providing unit belongs and a sharing mode. The integration module 13 is configured to, if the name and/or the identification code of the pollution source to be integrated is completely consistent with the name and/or the identification code of the registered pollution source, take the registered pollution source as a matching pollution source; the matching module 12 is configured to, if the name and/or the identification code of the pollution source to be integrated is partially consistent with the name and/or the identification code of the registered pollution source, acquire the data source of the pollution source to be integrated and the registered pollution source, and determine whether the registered pollution source is a matching pollution source matched with the pollution source to be integrated according to the data source.
The matching module 12 is used for judging whether the data source of the pollution source to be integrated is consistent with the data source of the registered pollution source; and if the data source of the pollution source to be integrated is completely consistent with the data source of the registered pollution source, acquiring the data characteristics of the data to be integrated and the data characteristics of the registered pollution source, and judging whether the data characteristics of the data to be integrated and the data characteristics of the registered pollution source are consistent. The integration module 13 is configured to use the registered pollution source as a matching pollution source if the data characteristics of the data to be integrated are consistent with the data characteristics of the registered pollution source. The newly-added module 14 is configured to determine whether the reliability of the data to be integrated meets a preset requirement if the data characteristics of the data to be integrated are inconsistent with the data characteristics of the registered pollution sources, and use the pollution sources to be integrated as newly-added pollution sources if the reliability of the data to be integrated meets the preset requirement.
The data characteristics of the data to be integrated comprise at least one of emission peak value, emission valley value, emission period, emission trend, peak emission time and valley emission time of the data to be integrated; the data characteristics of the registered data include at least one of emission peak, emission trough, emission period, emission trend, peak emission time, and trough emission time of the registered data.
The integration module 13 is configured to perform data quality audit on the registered data, the newly added pollution source data, or the integrated matched pollution source data, and delete data that is not qualified in the quality audit, where the quality audit includes data integrity audit, data validity audit, and data reliability audit.
The data integrity auditing comprises the following steps: verifying whether the data content includes necessary data items; the data validity audit comprises the following steps: whether the data content is in the data validity period is checked; the data reliability auditing comprises the following steps: and checking whether the data source belongs to an official certification data source.
The acquisition module 11 is configured to enter data to be integrated into a preset form matched with the fixed pollution source data type, and perform data acquisition on the entered preset form; and/or acquiring data to be integrated, automatically performing data examination on the data to be integrated, and importing the data to be integrated in a data copying manner; and/or when detecting that the newly added to-be-integrated data of the pollution source to be integrated exists, acquiring the to-be-integrated data of the newly added to-be-integrated pollution source through a standard data interface, converting the to-be-integrated data of the newly added to-be-integrated pollution source into a standard format, and importing the to-be-integrated data; and/or importing the data to be integrated through a data import tool and a data management tool.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a fixed pollution source data integration system according to an embodiment of the present invention. The fixed pollution source data integration system 20 includes a processor 21 and a memory 22. The processor 21 is coupled to a memory 22. The memory 22 has stored therein a computer program which is executed by the processor 21 in operation to implement the method as shown in fig. 1-5. The detailed methods are described above and will not be described herein.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a storage medium according to an embodiment of the present invention. The storage medium 30 stores at least one computer program 31, and the computer program 31 is used for being executed by a processor to implement the method shown in fig. 1 to 5, and the detailed method can be referred to above and is not described herein again. In one embodiment, the storage medium 30 may be a memory chip in a terminal, a hard disk, or a removable hard disk or a flash disk, an optical disk, or other readable and writable storage tool, and may also be a server or the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method for integrating data of a fixed pollution source, comprising:
acquiring data to be integrated of a pollution source to be integrated and a data source of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data source, wherein the data to be integrated comprises a name and/or an identification code of the pollution source to be integrated, acquiring registered data of a registered pollution source, and the registered data comprises the name and/or the identification code of the registered pollution source;
carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated;
if the registered pollution source is a matched pollution source matched with the pollution source to be integrated, integrating the data to be integrated into the matched pollution source data of the matched pollution source;
and if the registered pollution source is not the matched pollution source matched with the pollution source to be integrated, generating a newly added pollution source and newly added pollution source data according to the pollution source data to be integrated.
2. The fixed pollution source data integration method according to claim 1, wherein the step of generating new pollution sources and new pollution source data according to the pollution source data to be integrated comprises:
and acquiring the name and/or the identification code of the newly added pollution source according to a preset naming rule and/or a coding rule.
3. The fixed pollution source data integration method according to claim 1, wherein the step of generating new pollution sources and new pollution source data according to the pollution source data to be integrated is followed by:
generating a fixed pollution source list according to all newly added pollution sources and registered pollution sources at present, and judging whether the fixed pollution source list has the incorporable pollution sources which are substantially the same;
and performing at least one of duplication removal, coverage and deletion on the combinable pollution source data of the combinable pollution sources, and combining the combinable pollution source data into combined pollution source data.
4. The fixed pollution source data integration method according to claim 3, wherein the step of generating the fixed pollution source list according to all the newly added pollution sources and the registered pollution sources comprises the following steps:
and constructing a pollution source file for each fixed pollution source in the fixed pollution source list, wherein the pollution source file comprises at least one of a name, an address, an affiliated administrative district, an affiliated industry, an affiliated enterprise, an identification code, a management attribute, a supervision flow chart, pollution source emission and monitoring information.
5. The fixed pollution source data integration method according to claim 1, wherein the data to be integrated further comprises an enterprise to be integrated to which the pollution source to be integrated belongs; the registered data further comprises a registered enterprise to which the registered pollution source belongs;
the step of integrating the data to be integrated into the matching pollution source data of the matching pollution source comprises:
acquiring an incidence relation between the enterprise to be integrated and the registered enterprise, wherein the incidence relation comprises any one of an upper-level relation, a lower-level relation, a parallel subordinate same upper-level relation and a substantially same relation;
and integrating the data to be integrated into the matched pollution source data of the matched pollution source according to the incidence relation, and adding corresponding incidence identification and/or incidence link on the data to be integrated and the matched pollution source data.
6. The fixed pollution source data integration method according to claim 5, wherein the step of obtaining the association relationship between the to-be-integrated enterprise and the registered enterprise comprises:
and constructing a pollution data knowledge graph of the matched pollution source data and the data to be integrated according to the incidence relation.
7. The fixed pollution source data integration method according to claim 1, wherein the data to be integrated comprises a data source of the pollution source to be integrated, the registered data comprises a data source of the registered pollution source, and the data source comprises at least one of a providing unit, a system to which the data belongs, and a sharing mode;
the step of judging whether the registered pollution sources are matched pollution sources matched with the pollution sources to be integrated comprises the following steps:
if the name and/or the identification code of the pollution source to be integrated is completely consistent with the name and/or the identification code of the registered pollution source, taking the registered pollution source as the matched pollution source;
and if the name and/or the identification code of the pollution source to be integrated is partially consistent with the name and/or the identification code of the registered pollution source, acquiring data sources of the pollution source to be integrated and the registered pollution source, and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated according to the data sources.
8. The fixed pollution source data integration method according to claim 7, wherein the step of determining whether the registered pollution sources are matched pollution sources matched with the pollution source to be integrated according to the data sources comprises:
judging whether the data source of the pollution source to be integrated is consistent with the data source of the registered pollution source;
if the data source of the pollution source to be integrated is completely consistent with the data source of the registered pollution source, acquiring the data characteristics of the data to be integrated and the data characteristics of the registered pollution source, and judging whether the data characteristics of the data to be integrated and the data characteristics of the registered pollution source are consistent;
if the data characteristics of the data to be integrated are consistent with the data characteristics of the registered pollution sources, taking the registered pollution sources as the matched pollution sources;
and if the data characteristics of the data to be integrated are inconsistent with the data characteristics of the registered pollution sources, judging whether the reliability of the data to be integrated meets a preset requirement, and if the reliability of the data to be integrated meets the preset requirement, taking the pollution sources to be integrated as the newly added pollution sources.
9. The fixed pollution source data integration method according to claim 8, wherein the data characteristics of the data to be integrated include at least one of emission peak, emission valley, emission period, emission trend, peak emission time, valley emission time of the data to be integrated;
the data characteristics of the registered data include at least one of emission peak, emission valley, emission period, emission trend, peak emission time, valley emission time of the registered data.
10. The fixed pollution source data integration method according to claim 1, wherein the step of generating new pollution sources and new pollution source data according to the pollution source data to be integrated is followed by:
and performing data quality audit on the registered data, the newly added pollution source data or the integrated matched pollution source data, and deleting data unqualified in quality audit, wherein the quality audit comprises data integrity audit, data validity audit and data reliability audit.
11. The stationary contamination source data integration method of claim 10,
the data integrity audit comprises the following steps: verifying whether the data content includes necessary data items;
the data validity audit comprises: whether the data content is in the data validity period is checked;
the data reliability audit comprises the following steps: and checking whether the data source belongs to an official certification data source.
12. The fixed pollution source data integration method according to claim 1, wherein the step of importing the data to be integrated in a data integration manner matched with the data source comprises:
recording the data to be integrated into a preset form matched with the fixed pollution source data type, and performing data acquisition on the recorded preset form; and/or
Acquiring the data to be integrated, automatically performing data examination on the data to be integrated, and importing the data to be integrated in a data copying manner; and/or
When detecting to-be-integrated data of a newly-added pollution source to be integrated, acquiring the to-be-integrated data of the newly-added pollution source to be integrated through a standard data interface, converting the to-be-integrated data of the newly-added pollution source to be integrated into a standard format, and importing the to-be-integrated data; and/or
And importing the data to be integrated through a data import tool and a data management tool.
13. A fixed-source data integration system, comprising:
the acquisition module is used for acquiring data to be integrated of pollution sources to be integrated and data sources of the data to be integrated, importing the data to be integrated in a data integration mode matched with the data sources, wherein the data to be integrated comprises names and/or identification codes of the pollution sources to be integrated, and acquiring registered data of registered pollution sources, and the registered data comprises names and/or identification codes of the registered pollution sources;
the matching module is used for carrying out keyword matching on the name and/or the identification code of the pollution source to be integrated and the name and/or the identification code of the registered pollution source and judging whether the registered pollution source is a matched pollution source matched with the pollution source to be integrated or not;
the integration module is used for integrating the data to be integrated into the matched pollution source data of the matched pollution source if the registered pollution sources are matched pollution sources matched with the pollution source to be integrated;
and the newly added module is used for generating newly added pollution sources and newly added pollution source data according to the pollution source data to be integrated if the registered pollution sources are not matched pollution sources matched with the pollution sources to be integrated.
14. A fixed pollution source data integration system comprising a memory storing a computer program and a processor executing the computer program to carry out the steps of the method according to any one of claims 1 to 13.
15. The stationary pollution source data integration system according to claim 14, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 13.
CN202210443669.2A 2022-04-26 2022-04-26 Data integration system and method for fixed pollution source Active CN114860875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210443669.2A CN114860875B (en) 2022-04-26 2022-04-26 Data integration system and method for fixed pollution source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210443669.2A CN114860875B (en) 2022-04-26 2022-04-26 Data integration system and method for fixed pollution source

Publications (2)

Publication Number Publication Date
CN114860875A true CN114860875A (en) 2022-08-05
CN114860875B CN114860875B (en) 2023-06-20

Family

ID=82634224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210443669.2A Active CN114860875B (en) 2022-04-26 2022-04-26 Data integration system and method for fixed pollution source

Country Status (1)

Country Link
CN (1) CN114860875B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238658A (en) * 2022-09-22 2022-10-25 中科三清科技有限公司 Data processing method and device, storage medium and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050165752A1 (en) * 2004-01-28 2005-07-28 Sun Microsystems, Inc. Synchronizing and consolidating information from multiple source systems of a distributed enterprise information system
CN107247787A (en) * 2017-06-15 2017-10-13 山东浪潮云服务信息科技有限公司 A kind of sorting technique based on multisource data fusion
CN108664480A (en) * 2017-03-27 2018-10-16 北京国双科技有限公司 A kind of multi-data source user information integration method and device
CN109376210A (en) * 2018-10-24 2019-02-22 海口金政信息科技有限公司 A kind of intelligence pollution sources dynamic management system and method
CN110852601A (en) * 2019-11-07 2020-02-28 佛山市南海区环境技术中心 Big data application method and system for environmental monitoring law enforcement decision
CN110851667A (en) * 2019-09-25 2020-02-28 中国移动通信集团河南有限公司 Integrated analysis method and tool for multi-source large data
CN111708773A (en) * 2020-08-13 2020-09-25 江苏宝和数据股份有限公司 Multi-source scientific and creative resource data fusion method
CN112732713A (en) * 2020-12-29 2021-04-30 郑州信大捷安信息技术股份有限公司 Data integration method and system based on same user in multiple data sources
CN113297448A (en) * 2021-05-13 2021-08-24 中国电波传播研究所(中国电子科技集团公司第二十二研究所) Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium
CN113312342A (en) * 2021-05-26 2021-08-27 北京航空航天大学 Scientific and technological resource integration system based on multi-source database
CN113360599A (en) * 2021-05-18 2021-09-07 苏州海赛人工智能有限公司 Multi-source heterogeneous information convergence cooperative processing platform based on content identification
CN113792160A (en) * 2021-09-17 2021-12-14 南京大创师智能科技有限公司 Knowledge graph expansion and fusion method for multi-source data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050165752A1 (en) * 2004-01-28 2005-07-28 Sun Microsystems, Inc. Synchronizing and consolidating information from multiple source systems of a distributed enterprise information system
CN108664480A (en) * 2017-03-27 2018-10-16 北京国双科技有限公司 A kind of multi-data source user information integration method and device
CN107247787A (en) * 2017-06-15 2017-10-13 山东浪潮云服务信息科技有限公司 A kind of sorting technique based on multisource data fusion
CN109376210A (en) * 2018-10-24 2019-02-22 海口金政信息科技有限公司 A kind of intelligence pollution sources dynamic management system and method
CN110851667A (en) * 2019-09-25 2020-02-28 中国移动通信集团河南有限公司 Integrated analysis method and tool for multi-source large data
CN110852601A (en) * 2019-11-07 2020-02-28 佛山市南海区环境技术中心 Big data application method and system for environmental monitoring law enforcement decision
CN111708773A (en) * 2020-08-13 2020-09-25 江苏宝和数据股份有限公司 Multi-source scientific and creative resource data fusion method
CN112732713A (en) * 2020-12-29 2021-04-30 郑州信大捷安信息技术股份有限公司 Data integration method and system based on same user in multiple data sources
CN113297448A (en) * 2021-05-13 2021-08-24 中国电波传播研究所(中国电子科技集团公司第二十二研究所) Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium
CN113360599A (en) * 2021-05-18 2021-09-07 苏州海赛人工智能有限公司 Multi-source heterogeneous information convergence cooperative processing platform based on content identification
CN113312342A (en) * 2021-05-26 2021-08-27 北京航空航天大学 Scientific and technological resource integration system based on multi-source database
CN113792160A (en) * 2021-09-17 2021-12-14 南京大创师智能科技有限公司 Knowledge graph expansion and fusion method for multi-source data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何鸿凌;方亮;: "企业间大数据整合的方法和途径" *
吉杰 等: "基于WebGIS的交互式缓冲区分析查询", 计算机应用与软件 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238658A (en) * 2022-09-22 2022-10-25 中科三清科技有限公司 Data processing method and device, storage medium and electronic equipment
CN115238658B (en) * 2022-09-22 2023-01-31 中科三清科技有限公司 Data processing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114860875B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
JP6829762B2 (en) Big data deidentification processing method
CN112328706B (en) Dimension modeling calculation method under digital bin system, computer equipment and storage medium
CN112163724A (en) Environment information data resource integration system
CN111737101B (en) User behavior monitoring method, device, equipment and medium based on big data
CN112231333A (en) Ecological environment data sharing and exchanging method and system
CN114398669B (en) Combined credit scoring method and device based on privacy protection calculation and cross-organization
CN109241223B (en) Behavior track identification method and system
KR102184048B1 (en) System and method for checking of information about estate development plan based on geographic information system
CN114860875B (en) Data integration system and method for fixed pollution source
CN113469857A (en) Data processing method and device, electronic equipment and storage medium
CN115794839B (en) Data collection method based on Php+Mysql system, computer equipment and storage medium
Turner Defining and measuring traffic data quality: White paper on recommended approaches
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN115330087A (en) Drainage permission electronic license chain management system
CN115687787A (en) Industry policy target group portrait construction method, system and storage medium
CN114218383A (en) Method, device and application for judging repeated events
CN115982429B (en) Knowledge management method and system based on flow control
Gale et al. Temporal uncertainty in a small area open geodemographic classification
CN116260866A (en) Government information pushing method and device based on machine learning and computer equipment
KR20180077397A (en) System for constructing software project relationship and method thereof
CN115757655A (en) Data blood relationship analysis system and method based on metadata management
CN113610575B (en) Product sales prediction method and prediction system
JP7108566B2 (en) Digital evidence management method and digital evidence management system
KR100693370B1 (en) Duplicated database merge purge arrangement apparatus and the Method Thereof
Warren et al. Application of rough sets to environmental engineering models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant