CN113868318A - Atmospheric environment comprehensive data acquisition and sharing system - Google Patents

Atmospheric environment comprehensive data acquisition and sharing system Download PDF

Info

Publication number
CN113868318A
CN113868318A CN202111139916.1A CN202111139916A CN113868318A CN 113868318 A CN113868318 A CN 113868318A CN 202111139916 A CN202111139916 A CN 202111139916A CN 113868318 A CN113868318 A CN 113868318A
Authority
CN
China
Prior art keywords
data
module
sharing
target
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111139916.1A
Other languages
Chinese (zh)
Other versions
CN113868318B (en
Inventor
李海生
孙彩萍
王维
王永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Research Academy of Environmental Sciences
Original Assignee
Chinese Research Academy of Environmental Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Research Academy of Environmental Sciences filed Critical Chinese Research Academy of Environmental Sciences
Priority to CN202111139916.1A priority Critical patent/CN113868318B/en
Publication of CN113868318A publication Critical patent/CN113868318A/en
Application granted granted Critical
Publication of CN113868318B publication Critical patent/CN113868318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/80Management or planning
    • Y02P90/84Greenhouse gas [GHG] management systems

Abstract

The invention discloses an atmospheric environment comprehensive data acquisition and sharing system, which comprises: the system comprises a user management module, a data acquisition module, a data quality control module, a filing and warehousing module, a data management module and a sharing management module. The comprehensive data of the collected atmospheric environment are more comprehensive, accurate and timely, data sharing is achieved, the collected data are effectively utilized, waste of resources is avoided, analysis is carried out based on the more comprehensive data, and the acquired atmospheric pollution cause and the air quality are guaranteed to be more accurate in standard improvement. The data acquisition, collection and storage system has the advantages that various data classification management is realized, the safety and controllability of data acquisition, collection and storage work are ensured, the safety and traceability of important scientific data are enhanced, standard management and long-term storage are carried out through the atmospheric environment comprehensive data acquisition and sharing system, data accumulation and open sharing are promoted, and the data sharing safety process is carried out stably. And a data integration sharing mechanism is established, and the quality control and comparability of various monitoring data are enhanced.

Description

Atmospheric environment comprehensive data acquisition and sharing system
Technical Field
The invention relates to the technical field of computers, in particular to an atmospheric environment comprehensive data acquisition and sharing system.
Background
The information quantity of the atmospheric environment comprehensive data is huge, each type of data is not easy to obtain, the problems of incompleteness, inaccuracy and untimely atmospheric environment comprehensive data acquisition exist, and if the acquired data cannot be shared, resource waste is caused, and the trend of environmental protection treatment is not met. With the requirement of 'data operation, national participation and service society', an environmental protection data resource management system needs to be established, an environmental protection data resource catalogue system is compiled, a data resource product is issued, and a data integration sharing mechanism is needed to realize data resource sharing.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide an atmospheric environment comprehensive data acquisition and sharing system, so that the atmospheric environment comprehensive data acquisition is more comprehensive, accurate and timely, the data sharing is realized, the acquired data is effectively utilized, the waste of resources is avoided, the analysis is carried out based on the more comprehensive data, and the acquired meteorological results, meteorological causes and the like are more accurate. The data management is based on the general principle, the main responsibilities of each unit are determined, the classified management of various data is realized, the safety and the controllability of the data acquisition, collection and storage work are ensured, the safety and the traceability of important scientific data are enhanced, the standard management and the long-term storage are carried out through the atmospheric environment comprehensive data acquisition and sharing system, the data accumulation and the open sharing are promoted, and the data sharing safety process is carried out stably. A data integration sharing mechanism is established, the quality control and comparability of various monitoring data are enhanced, the integration, sharing and uploading of the monitoring data of each subject group and city group are facilitated, and meanwhile, a unified environment monitoring information publishing and sharing system is established according to a new environmental protection method.
In order to achieve the above object, an embodiment of the present invention provides an atmospheric environment integrated data acquisition and sharing system, including:
the user management module is used for managing the user registration information and performing role distribution according to the user registration information, wherein the role distribution comprises a subject group, an administrator and a general user;
the data acquisition module is connected with the user management module and is used for:
receiving the atmospheric environment comprehensive data acquired by the subject group, and taking the atmospheric environment comprehensive data and a pre-stored data source as upload data;
collecting metadata of the uploaded data, and dividing the metadata into structured data and unstructured data;
the data quality control module is connected with the data acquisition module and is used for:
receiving unstructured data sent by the data acquisition module, and performing file auditing on the unstructured data;
receiving the structured data sent by the data acquisition module, dividing the structured data into time-series data and non-time-series data, and performing system cleaning on the time-series data;
carrying out service quality control on the non-time-sequence data and the time-sequence data cleaned by the system, and then carrying out manual examination;
the filing and warehousing module is connected with the data quality control module and is used for receiving the unstructured data which are subjected to file examination and the structured data which are subjected to manual examination and are sent by the data quality control module to carry out filing and warehousing;
the data management module is connected with the filing and warehousing module and used for carrying out data management on the filing data in the database;
the shared management module is connected with the data management module and used for approving the data sharing application initiated by the applicant, and when the approval is passed, the shared management module sends approval passing prompt information and corresponding target data called out from the data management module to the applicant; and when the approval is not passed, sending approval failure prompt information to the applicant.
According to some embodiments of the invention, the data management module comprises:
the data set management submodule is used for establishing a data set and indexes for the archived data;
the resource directory management submodule is used for establishing a user-defined resource directory and a user-defined menu sequence for the archived data so as to facilitate query navigation;
the map service management submodule is used for determining map service for the archived data;
and the thematic map management submodule is used for confirming the thematic of the filing data and determining the thematic map.
According to some embodiments of the invention, the resource directory management submodule is further configured to:
creating a directory, and inputting a directory name, a URL (uniform resource locator), and sequencing and selecting types of archived data based on a directory creating window;
creating a view, inputting a view name, selecting a view code, sorting archived data and selecting a type based on a view window;
creating a document, entering a file name, selecting a file, entering a ranking of archived data based on creating a view window.
According to some embodiments of the present invention, the time-series data is used to show a plurality of parameter time-series variation trends of each city, including a city, a time period, a city of interest, a comparison area, and a selection factor.
In one embodiment, the method further comprises:
a data source processing module to:
before uploading a pre-stored data source as uploading data, performing first data extraction on the data source, wherein the first data extraction is data source end extraction, and generating a plurality of independent instance libraries according to the structural characteristics of source data instances; establishing data format conversion rules of different instance libraries according to the preset data standard format requirement, performing format conversion operation on the instance library of the source data, and generating a series of new data tables meeting the preset data standard format requirement;
performing second data extraction on the new data table, classifying according to the structural features of the new data instances, generating a plurality of service instances, and constructing a data detail layer of a data warehouse; multi-table storage, cross-database storage and remote storage of massive heterogeneous data examples are realized through data mapping, and optimization of data storage, access and analysis processing is realized;
performing data clustering on the data detail layer of the data warehouse to generate a subject library for sharing and analyzing the project group; the data clustering comprises theme clustering processing and label management; the theme library comprises an air quality library, a particulate component library, a pollution source library, a weather library and a health library.
According to some embodiments of the invention, the data quality control module is further configured to:
extracting the characteristics of the unstructured data and the structured data sent by the data acquisition module to obtain particle component data, air quality data, meteorological data and source analysis data, and respectively performing threshold management;
and performing quality control on the unstructured data and the structured data sent by the data acquisition module, wherein the quality control comprises integrity abnormity, completeness abnormity, uniqueness abnormity, conformity abnormity, inversion abnormity, carbon component relationship anion and cation charge balance, concentration summation and actual measurement ratio, outlier judgment, total abnormal number and proportion condition of the data.
According to some embodiments of the invention, the data sets and indices include stereography, enterprise emissions, meteorological data, source analytics, municipal source emissions inventory, research reports.
According to some embodiments of the invention, the data management module comprises:
the first determining submodule is used for performing data extraction, conversion and loading on the archived data and determining corresponding metadata;
the obtaining submodule is used for setting a service type, classifying the metadata according to the service type and obtaining a plurality of classification sets;
establishing a submodule for defining data rules under each classification set and establishing an index identification library;
the second determining submodule is used for receiving a calling data request when the shared management module passes the approval, and determining a corresponding target service type and a target identifier under the target service type according to the calling data request;
a third determining submodule, configured to query the index identifier library according to the target identifier to determine corresponding target data;
the fourth determining submodule is used for determining first access information according to the acquisition time of the target identifier and the effective time of the target data and calling the target data based on the first access information;
a fifth determining sub-module, configured to determine second access information according to the archiving time of the target data and the valid time of the target data, and enable the archiving and warehousing module to store the target data based on the second access information;
and the judgment sub-module is used for judging whether the first access information is consistent with the second access information or not, and acquiring the target data and returning the target data to the applicant when the first access information is determined to be consistent with the second access information.
According to some embodiments of the invention, the data quality control module performs quality control on the carbon component relationship anion-cation charge balance, including:
setting a starting condition: the data for quality control comprises NO3-, SO 42-and NH4+ data;
calculating an anion equivalent based on formula (1);
Figure BDA0003283411310000061
wherein AE is an anion equivalent (. mu.mol/m 3),
Figure BDA0003283411310000062
at a chloride ion concentration (. mu.g/m 3),
Figure BDA0003283411310000063
at a nitrate concentration (. mu.g/m 3),
Figure BDA0003283411310000064
at a sulfate concentration (. mu.g/m 3),
Figure BDA0003283411310000065
is the fluoride ion concentration (. mu.g/m 3);
calculating a cation equivalent based on formula (2);
Figure BDA0003283411310000066
wherein CE is a cation equivalent (. mu. mol/m3),
Figure BDA0003283411310000067
the concentration of ammonium ions (. mu.g/m 3),
Figure BDA0003283411310000068
magnesium ion concentration (. mu.g/m 3),
Figure BDA0003283411310000069
potassium ion concentration (. mu.g/m 3),
Figure BDA00032834113100000610
calcium ion concentration (. mu.g/m 3),
Figure BDA00032834113100000611
sodium ion concentration (. mu.g/m 3);
calculating the ratio of the anion equivalent to the cation equivalent, and judging whether the ratio is in a preset ratio range; carrying out classification marking on the calculation result according to the judgment result to obtain a quality control result;
and recording the quality control result in a quality control field group.
According to some embodiments of the invention, the data sources obtained by the data acquisition module include structured data sources and unstructured data sources;
obtaining a structured data source comprising:
establishing connection between a data acquisition module and a data provider, receiving a httpclient request sent by the data provider, carrying out identity verification on the data provider based on the httpclient request, carrying out timing scheduling on a data interface based on a spring quartz technology when the verification is determined to pass, periodically acquiring first data from the data provider, judging whether the acquired first data is missing, and when the acquired first data is determined to be missing, operating in a thread multithreading mode to realize a historical data complement function to obtain second data; screening out data with the data type in the json format from the second data, analyzing, and storing after the analysis is finished;
obtaining an unstructured data source, comprising:
the method comprises the steps that a data acquisition module acquires a first target file acquired through a system uploading file, an FTP uploading file and an administrator copy file; performing antivirus processing on the first target file based on antivirus software, detecting whether viruses exist in the first target file after the antivirus processing is completed, and deleting the files with the viruses to obtain a second target file;
and performing file directory transfer on the second target file, operating in a file monitoring thread mode in the file directory transfer process, reading a directory to obtain directory information, transferring the directory information and the second target file to a formal server, and writing a file path into a database.
According to some embodiments of the invention, the sharing management module comprises:
a sixth determining module, configured to determine a data sharing permission requirement, where the data sharing permission requirement includes at least one of a definition of a data owner, a permission claim of the data owner, a permission type, and a permission granularity;
a seventh determining module, configured to determine tools and manners of data sharing and distribution;
the eighth determining module is used for establishing a data sharing safety mechanism, which comprises an application system, an approval system, a responsibility statement system and network trace;
a receiving module to:
receiving a data request application initiated by an applicant and forwarding the data request application to an auditing terminal;
when the auditing terminal agrees with the protocol, receiving the application information filled by the applicant and the shared data to be applied; the shared data comprises a data interface and an application document;
the calling module is used for calling corresponding target data from the data management module according to the application information and the shared data to be applied, classifying the target data and dividing the target data into structured target data and unstructured target data;
and the sending module is used for sending the unstructured target data to the applicant based on the mail and sending the structured target data to the applicant based on the data interface.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a block diagram of an atmospheric environment integrated data collection and sharing system according to an embodiment of the present invention.
FIG. 2 is a flow chart of the operation of an atmospheric environment integrated data collection and sharing system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data set and indicators according to one embodiment of the invention;
FIG. 4 is a schematic diagram of a system data rendezvous and sharing flow, according to one embodiment of the invention;
FIG. 5 is a schematic diagram of data quality control according to one embodiment of the present invention;
FIG. 6 is a diagram illustrating data quality control exception results according to one embodiment of the present invention;
FIG. 7 is a schematic illustration of water-soluble ion data according to one embodiment of the present invention;
FIG. 8 is a schematic diagram of collecting structured data according to one embodiment of the present invention;
FIG. 9 is a schematic diagram of acquiring unstructured data according to one embodiment of the invention;
FIG. 10 is a flow diagram of a data sharing application in accordance with one embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Before introducing the system for collecting and sharing the atmospheric environment comprehensive data, the operation environment to be met is explained. The user uses the chrome browser to avoid the problem of incompatibility. In the aspect of hardware configuration, the hardware configuration comprises a WEB server and a database server, the WEB server and the database server are both based on a Linux Centos7 operating system, and the specification of a CPU (Central processing Unit) is 2.80 GHz/memory: 32G. In software aspect, a WEB server uses java and a database server uses mysql.
As shown in fig. 1-2, an embodiment of the present invention provides an atmospheric environment integrated data acquisition and sharing system, including:
the user management module is used for managing the user registration information and performing role distribution according to the user registration information, wherein the role distribution comprises a subject group, an administrator and a general user;
the data acquisition module is connected with the user management module and is used for:
receiving the atmospheric environment comprehensive data acquired by the subject group, and taking the atmospheric environment comprehensive data and a pre-stored data source as upload data;
collecting metadata of the uploaded data, and dividing the metadata into structured data and unstructured data;
the data quality control module is connected with the data acquisition module and is used for:
receiving unstructured data sent by the data acquisition module, and performing file auditing on the unstructured data;
receiving the structured data sent by the data acquisition module, dividing the structured data into time-series data and non-time-series data, and performing system cleaning on the time-series data;
carrying out service quality control on the non-time-sequence data and the time-sequence data cleaned by the system, and then carrying out manual examination;
the filing and warehousing module is connected with the data quality control module and is used for receiving the unstructured data which are subjected to file examination and the structured data which are subjected to manual examination and are sent by the data quality control module to carry out filing and warehousing;
the data management module is connected with the filing and warehousing module and used for carrying out data management on the filing data in the database;
the shared management module is connected with the data management module and used for approving the data sharing application initiated by the applicant, and when the approval is passed, the shared management module sends approval passing prompt information and corresponding target data called out from the data management module to the applicant; and when the approval is not passed, sending approval failure prompt information to the applicant.
The working principle of the technical scheme is as follows: the user management module is used for managing the user registration information and performing role distribution according to the user registration information, wherein the role distribution comprises a subject group, an administrator and a general user; the data acquisition module receives the atmospheric environment comprehensive data acquired by the subject group and takes the atmospheric environment comprehensive data and a pre-stored data source as upload data; collecting metadata of the uploaded data, and dividing the metadata into structured data and unstructured data; structuring data: i.e., row data, is stored in a database, and the implemented data can be logically represented in a two-dimensional table structure. Unstructured data: including office documents, text, pictures, XML, HTML, various types of reports, images, audio/video information, and so forth, in all formats. The data quality control module is used for receiving the unstructured data sent by the data acquisition module and performing file auditing on the unstructured data; direct audit processing can be performed on unstructured data. Dividing structured data into time sequence data and non-time sequence data, and carrying out system cleaning on the time sequence data; the integrity and the accuracy of the data are ensured, and redundant data are removed. Carrying out service quality control on the non-time-sequence data and the time-sequence data cleaned by the system, and then carrying out manual examination; and the service quality control performs quality control management according to the service type and performs auditing based on manual work. The filing and warehousing module receives unstructured data which is subjected to file examination and transmitted by the data quality control module and structured data which is subjected to manual examination and subjected to filing and warehousing; the data management module performs data management on the archived data in the database; the shared management module is connected with the data management module and used for approving the data sharing application initiated by the applicant, and when the approval is passed, the shared management module sends approval passing prompt information and corresponding target data called out from the data management module to the applicant; and when the approval is not passed, sending approval failure prompt information to the applicant. Applicants include groups of topics or general users. In the data intersection process, metadata collection is included, mainly unstructured data is taken as a main item, and quality control rules are taken as indispensable items. Data quality control is performed on structured and time-sequenced data sets, where time-sequenced means that data records are arranged in time increments, and one time point has one and only one record for one data set. The atmospheric environment comprehensive data comprises atmospheric environment monitoring data and the like. The data source is from the atmospheric attack and customs project group, the department of ecological environment, the China meteorology bureau, the China academy of sciences observation data, the college and universities super station data, the 2+26 city observation and decision data, the network crawl data, the social public welfare organization data and the like.
The beneficial effects of the above technical scheme are that: the comprehensive data of the collected atmospheric environment are more comprehensive, accurate and timely, data sharing is achieved, the collected data are effectively utilized, waste of resources is avoided, analysis is carried out based on the more comprehensive data, and the acquired atmospheric pollution cause and the air quality are guaranteed to be more accurate in standard improvement. The data management is based on the general principle, the main responsibilities of each unit are determined, the classified management of various data is realized, the safety and the controllability of the data acquisition, collection and storage work are ensured, the safety and the traceability of important scientific data are enhanced, the standard management and the long-term storage are carried out through the atmospheric environment comprehensive data acquisition and sharing system, the data accumulation and the open sharing are promoted, and the data sharing safety process is carried out stably. A data integration sharing mechanism is established, the quality control and comparability of various monitoring data are enhanced, the integration, sharing and uploading of the monitoring data of each subject group and city group are facilitated, and meanwhile, a unified environment monitoring information publishing and sharing system is established according to a new environmental protection method.
Based on a using method of the atmospheric environment comprehensive data acquisition and sharing system, a user opens a browser, inputs a target website, enters a login interface, inputs a user name and a password and an authentication code, and clicks a login button to access the system in a networking environment. When logging in, when the password of the same account is wrongly input for 5 times continuously in a short time, the system can automatically seal the account and the ip of the user, the administrator is called to be contacted when the password is unsealed, and the user and the ip are called to be unsealed simultaneously when the administrator unseals the password. After the user logs in the system, the page automatically jumps to an atmospheric situation analysis page. Based on the wind field map button, the switching between drawing and clearing the wind field map can be performed. Meanwhile, the base map of the page can be adjusted, one is a drawn two-dimensional base map, the other is a satellite map, and the switching of the two is carried out by clicking a satellite map button. Based on the query region switching function, data of regions such as the whole country, Jingjin Ji, Fenwei plain, long triangle, bead triangle and the like can be freely queried, and the default is the whole country after a page is opened. Based on the province query switching function, the data of nationwide provinces can be switched and queried. And switching to inquire the data of cities in the whole country based on the switching of the inquired cities. After switching to the city query, switching of various sites can be performed in the right global search. And switching to show the data by day or show the winter protection data based on the switching of the day data and the winter protection data. Based on the switching of image types, the difference rendering graph, the city brush-up graph, the point bitmap and the point rendering graph can be switched.
The user can also check the air quality evaluation, and specifically shows 2+26 city air quality analysis in each year, 2+26 city air quality standard map in each year, Beijing city air quality synchronization comparative analysis in each year, Beijing city air quality heating period comparative analysis in 2019 year, and Beijing city air quality change trend of nearly 12 months. The user performs air quality spatial analysis, specifically including the factors: clicking the selection box by the left mouse button to pull down the expanded content, and clicking the required factors (AQI and air quality six parameters) again by the left mouse button to take effect; city: clicking the selection box by the left mouse button, pulling down the expanded content, and clicking the required city category (2+26 cities, Fenwei plain) by the left mouse button again to take effect; screenshot: clicking a selection box by a left mouse button to pull down and expand the content, and clicking 'capture for nearly 24 hours' by the left mouse button to capture a nearly 24-hour picture; clicking a pop-up time selection control for 'intercepting a certain moment' by a left mouse button, selecting the required time after the time selection control is unfolded by clicking the click-up time selection control by the left mouse button, and clicking the 'screenshot' by the left mouse button after the 'confirm' button is unfolded to intercept the picture; layer drawing: and clicking the selection frame by the left mouse button to pull down and expand the content to be divided into a national control station, a non-national control station, a regional station and a background station, and clicking the required layer by the left mouse button again to take effect. The user can also change the multi-city time sequence, mainly shows the change trend condition of a plurality of parameter time sequences of each city, can select cities, time periods, concerned cities, comparison areas, selection factors and the like, and the picture supports the downloadable function, and the analysis box on the right side also briefly marks some key information. The method specifically comprises the following steps: city: clicking a selection box by a left mouse button to pull down and expand the content, clicking the required city by the left mouse button again and then clicking 'execution' (only 8 cities can be clicked at most); time switching: the switching can be realized by clicking the execution after clicking the day or the hour; time selection: clicking the selection box to expand the time control, if the time interval to be checked is 2021/5/1-2021/5/5, clicking 2021/5/1, then clicking 2021/5/5 again (in a non-sequential order), and then selecting the time interval, and clicking 'execute' to take effect; concern city: clicking a selection box by a left mouse button to pull down and expand the content, clicking the required city by the left mouse button again and then clicking 'execute'; comparison area: clicking the selection box by the left mouse button to pull down and expand the content, clicking the region required by check again by the left mouse button, and then clicking 'execute'; selecting a factor: clicking the selection box by the left mouse button to pull down and expand the content, clicking the required factor by the left mouse button again and then clicking 'execute'; executing: after the query condition is determined, clicking 'execute' can take effect; the shadow area of the graph is contrast area data, and the shadow area is hidden after the maximum/small value is hidden; clicking the "download" button downloads the chart. The method also comprises multi-city contemporaneous change, and mainly shows that the multi-city air quality grade, the multi-city primary pollutant contemporaneous comparison and the multi-city pollutant mean value contemporaneous comparison can be performed, cities, time, selection factors and the like can be selected, the picture supports a downloadable function, and some key information is briefly marked by an analysis box on the right side. The method also comprises multi-factor time sequence change, mainly shows the time change trend of a plurality of factors, and can select cities, time, selection factors and the like. The method also comprises single-city similarity variation, and mainly shows AQI monthly similarity analysis and PM2.5 monthly mean similarity analysis of a certain city, wherein the default year is 2019-2021. The factor here is mainly the air quality six parameter.
The user can also carry out atmospheric pollution cause analysis, including combination contrast analysis, and utilizes the urban particulate matter component data including online carbon component data and online water-soluble ion data to carry out time change trend analysis, wherein the display mode is a stack histogram, and the urban particulate matter component data and the PM2.5 data are associated with each other to carry out multi-city contrast analysis, and the multi-city simultaneous contrast analysis, the single-city simultaneous contrast analysis and the single-particle mass spectrometry analysis are specifically divided into multi-city simultaneous contrast, single-city simultaneous contrast analysis and single-particle mass spectrometry analysis. The combined trend analysis mainly shows that the variation trend analysis of various factors of a certain city in a selected time period is divided into two types of time sequence variation of main components of particulate matters and time sequence evolution of characteristic ratios, the city is selectable, the date is selectable, the data type is selectable from hour data or day data, the factors are selectable, and a right bar-shaped frame also shows some analysis information concisely. Pollution source analysis, and pollution source data analysis mainly comprises overhead source pollution enterprise statistics and pollution discharge permission enterprise statistics. The statistics of the overhead source pollution enterprises comprises two functional pages of statistics according to industrial sources and industry. The pollution discharge permission enterprise statistics comprises three functional pages, namely fuel statistics, industrial source statistics and industrial statistics.
The user can also analyze weather influence, including weather analysis chart, and mainly display various pictures supported by data, including weather analysis class, satellite cloud chart class, radar chart class, precipitation class, wind field chart class, visibility class, etc. The system comprises a cloud picture, a cloud picture photo taken from a wind cloud second meteorological satellite, and dynamic playing is supported below a page. Including daily weather analysis, primarily provides daily weather reminders and weather bulletins, and supports export and generation reporting functions. The method comprises the steps of analyzing weather change trends, and drawing a weather condition trend analysis chart according to data of selected time by utilizing urban weather data. The method comprises the steps of counting meteorological conditions, and drawing a meteorological condition trend analysis chart, a wind rose chart, a humidity rose chart and a PM2.5 pollution rose chart according to data of selected time by utilizing urban meteorological data.
When data sharing is carried out, data sharing application is carried out, and the data sharing application is divided into two pages of my application and the initiating application. My application: and recording a history application record initiated by the current login account. And (3) initiating application: when the application is clicked, a data use statement page of a data use protocol is popped up. After the reading declaration agrees and continues, displaying an initiating application main page, filling corresponding information according to a prompt (wherein, the data acquisition mode is that structured data is selected as an interface and unstructured data is selected as a document), then clicking and downloading a data sharing application form below the page, uploading the data sharing application form as an accessory after the application form is filled, and finally clicking a submit button to finish the application. And the shared management module is used for approving the data sharing application initiated by the common user and storing the approval record.
According to some embodiments of the invention, the data management module comprises:
the data set management submodule is used for establishing a data set and indexes for the archived data;
the resource directory management submodule is used for establishing a user-defined resource directory and a user-defined menu sequence for the archived data so as to facilitate query navigation;
the map service management submodule is used for determining map service for the archived data;
and the thematic map management submodule is used for confirming the thematic of the filing data and determining the thematic map.
The working principle of the technical scheme is as follows: the data set management submodule is used for establishing a data set and indexes for the archived data; the resource directory management submodule is used for establishing a user-defined resource directory and a user-defined menu sequence for the archived data so as to facilitate query navigation; the map service management submodule is used for determining map service for the archived data; and the thematic map management submodule is used for confirming the thematic of the filing data and determining the thematic map.
The beneficial effects of the above technical scheme are that: the method and the device realize effective data management of the archived data and are convenient for accurately and quickly calling the corresponding data to be shared when data are shared.
According to some embodiments of the invention, the resource directory management submodule is further configured to:
creating a directory, and inputting a directory name, a URL (uniform resource locator), and sequencing and selecting types of archived data based on a directory creating window;
creating a view, inputting a view name, selecting a view code, sorting archived data and selecting a type based on a view window;
creating a document, entering a file name, selecting a file, entering a ranking of archived data based on creating a view window.
The working principle of the technical scheme is as follows: a Uniform Resource Locator (URL) is a compact representation of the location and access method of a Resource available from the internet, and is the address of a standard Resource on the internet. Each file on the internet has a unique URL that contains information indicating the location of the file and how the browser should handle it.
The beneficial effects of the above technical scheme are that: based on three forms of creating catalog, creating view and creating document, the resource catalog is comprehensively and effectively managed.
According to some embodiments of the present invention, the time-series data is used to show a plurality of parameter time-series variation trends of each city, including a city, a time period, a city of interest, a comparison area, and a selection factor. Selection factors include AQI, PM2.5, PM10, SO2, NO2, CO, O3, and the like.
According to some embodiments of the invention, the data quality control module is further configured to:
extracting the characteristics of the unstructured data and the structured data sent by the data acquisition module to obtain particle component data, air quality data, meteorological data and source analysis data, and respectively performing threshold management;
and performing quality control on the unstructured data and the structured data sent by the data acquisition module, wherein the quality control comprises integrity abnormity, completeness abnormity, uniqueness abnormity, conformity abnormity, inversion abnormity, carbon component relationship anion and cation charge balance, concentration summation and actual measurement ratio, outlier judgment, total abnormal number and proportion condition of the data.
The working principle of the technical scheme is as follows: the threshold management is divided into four threshold management interfaces of particle component data, air quality data, meteorological data and source analysis data, the upper limit value and the lower limit value of the instrument of the corresponding factor can be modified or filled by double clicking the upper limit value of the instrument and the lower limit value of the instrument, and the upper limit value and the lower limit value of the instrument can be stored by clicking a storage button with a mouse after the modification or filling is completed.
The beneficial effects of the above technical scheme are that: carry out effectual management respectively to particulate matter component data, air quality data, meteorological data, source analytic data, carry out quality control to data, guarantee the accuracy of data, and then be convenient for obtain accurate analysis result.
As shown in fig. 3, the data sets and indicators include stereo observation, enterprise emissions, meteorological data, source analytics, urban source emissions inventory, and research reports.
When air quality monitoring data is shared, the air quality monitoring data is a normalized data set of each item of air quality monitoring data, and various air quality monitoring data including point location air quality daily data, urban air quality hour data, urban air quality daily data and point location air quality hour data can be inquired on the page. The data can be queried according to input keywords (only the name of a city or a site is input, and fuzzy query is supported), the name of the selected city or site and the selection time, and the queried data can be exported by clicking an export button after the query operation is executed; clicking the "introduction to dataset" button can view the bullet box contents.
The stereoscopic observation includes:
monitoring the air quality: AQI, PM2.5, PM10, SO2, NO2, CO, O3, primary pollutants;
component monitoring: EC/OC, water-soluble ions, inorganic elements, single particle mass spectrum;
radar monitoring: echo signal, signal-to-noise ratio, extinction coefficient, depolarization ratio, atmospheric pollution boundary layer height, wavelength index, visibility, aerosol optical thickness and cloud base height.
When radar monitoring is carried out, radar point location information is obtained and comprises component network aerosol radar point location information, regional station aerosol radar point location information and regional station wind-temperature and humidity radar point location information, and a radar file is obtained and comprises component network aerosol radar data, regional station wind-temperature and humidity radar data and navigation monitoring radar data.
In enterprise emissions, the pollution source information includes enterprise information, pollution discharge approval enterprise information, overhead source enterprise information, self-monitoring enterprise and exhaust outlet information. And acquiring online monitoring data, including discharge port monitoring data, overhead source monitoring data and enterprise self-monitoring data. And obtaining a pollution source file, and displaying the pollution source list files of all cities in a list.
When acquiring meteorological data, the meteorological data comprise ground observation data and refined meteorological forecast data which are all from the national meteorological bureau. The meteorological station measuring point information comprises ground observation point location information and refined forecast point location information. Specifically, the city, the site name, the longitude and latitude, the elevation of an observation field, the site information and the like are provided. The weather profile data includes weather bulletins, daily weather reminders, and weather profiles.
When analyzing the pollution source, the city source prediction comprises regional source prediction and industry source prediction, and pollution source data according to regions and industries are respectively provided. The method comprises the steps of obtaining source analysis file classes, displaying pollutant source analysis files developed in various cities in a file form list, and displaying basic information such as a subject source, a unit to which the subject belongs, a sharing range, stage achievements, data size, updating frequency, creating time, data quality statements, characteristic data elements, starting time and ending time.
In an embodiment, the access amount, the total access amount and the user data application downloading amount of a certain user in a selection time period can be inquired, and the inquiry result can be exported by clicking an export button. The data volume query mainly comprises a query statistical analysis page for data quantity, data size and proportion of various data types of structured data and unstructured data. And acquiring the use condition, and counting user access volume, the download volume of each user to various data and the application volume of shared data, wherein the user access volume is divided into user daily access volume and user accumulated access volume. Based on the data cockpit (graph) to visually display the data volume statistical condition, the acquisition and sharing condition, the data source condition, the subject result convergence condition and the like, the upper right corner clicks 'incremental data viewing', and various structured data and unstructured data are displayed in a list form.
In one embodiment, the system is also log-managed, including a platform operation log (operation records of each user on the platform can be queried here), a platform docking log (update records of platform data can be queried here), and a data operation log (data records downloaded by a user can be queried here).
In one embodiment, the system further stores basic data including industry data, which mainly introduces basic information of pollution source enterprises in various industries in various regions, and displays related information such as enterprise names, provinces, cities, counties and counties, industry category names, industry source classifications, longitude and latitude. The pollutant mainly introduces treatment information of a pollution source enterprise, and the treatment information comprises enterprise names, provinces, cities, production facility numbers, production facility names, corresponding pollution production link names, pollutant types, emission forms, pollution treatment facility numbers, pollution treatment facility names, whether feasible technologies are available, organized discharge port numbers, discharge port types, other information and the like. The process type mainly introduces the main product and capacity information of an enterprise, and displays the information of the name of the enterprise, province, city, main process name, main production unit name, product name, production capacity, metering unit name, design year production time, production facility, data time and the like. And the economic industry classification provides related information such as industry classification codes, industry classification names, industry upper-level classification codes, industry description, industry levels and the like. The fuel information mainly provides province, city, unit name, fuel name, ash content, sulfur content, volatile matter, heat value, annual maximum usage amount, other information and the like. The policy and regulation standards are mainly used for displaying environmental protection laws, environmental laws, local environmental laws, department regulations, standard regulations, management policies and other policy and regulation standard documents and the like in a file list mode, wherein the environmental laws comprise environmental administration laws and local environmental laws, the department regulations comprise environmental department regulations and environment-related administrative regulation documents, the standard regulations comprise pollutant emission standards, monitoring regulations/method standards, environmental quality standards and environmental protection management standards, the management policies comprise national environmental policies, local environmental policies and other national environmental policies, and file downloading operation can be provided on a file display page. The model simulation mainly comprises a simulation method and a simulation data page. The research results are mainly divided into several pages of technical innovation, research reports, technical data, business systems, publications, patents and other research results, wherein the technical innovation comprises forecast technology, supervision technology, treatment technology and other technologies. Multimedia data, which mainly includes three types of data, graphics, video and other multimedia data. And (5) result display, which is to intensively display each result uploaded by each topic and inquire and search the results.
As shown in fig. 4, the corresponding processing of the acquired data source specifically includes performing data extraction twice on the operation-type data source, in the first data extraction process, converting the first instance feature in the metadata liquid into a data stream, and converting the data stream based on the conversion rule. In the second data extraction process, the second instance features in the metadata liquid are converted into data streams, conversion is performed based on the mapping relation, data clustering processing is performed based on filtering and clustering rules, and finally the data streams are stored in a data warehouse. And realizing data collection and sharing processes.
In an embodiment, the method further comprises data interface management, a user can manage the platform data interface in the module, and can enter a related page by clicking the data interface management, and add, view, edit, delete and the like to the data interface service. Data sharing can be achieved based on the data interface.
According to some embodiments of the invention, the data management module comprises:
the first determining submodule is used for performing data extraction, conversion and loading on the archived data and determining corresponding metadata;
the obtaining submodule is used for setting a service type, classifying the metadata according to the service type and obtaining a plurality of classification sets;
establishing a submodule for defining data rules under each classification set and establishing an index identification library;
the second determining submodule is used for receiving a calling data request when the shared management module passes the approval, and determining a corresponding target service type and a target identifier under the target service type according to the calling data request;
a third determining submodule, configured to query the index identifier library according to the target identifier to determine corresponding target data;
the fourth determining submodule is used for determining first access information according to the acquisition time of the target identifier and the effective time of the target data and calling the target data based on the first access information;
a fifth determining sub-module, configured to determine second access information according to the archiving time of the target data and the valid time of the target data, and enable the archiving and warehousing module to store the target data based on the second access information;
and the judgment sub-module is used for judging whether the first access information is consistent with the second access information or not, and acquiring the target data and returning the target data to the applicant when the first access information is determined to be consistent with the second access information.
The working principle and the beneficial effects of the technical scheme are as follows: the first determining submodule is used for performing data extraction, conversion and loading on the archived data and determining corresponding metadata; the obtaining submodule is used for setting a service type, classifying the metadata according to the service type and obtaining a plurality of classification sets; establishing a submodule for defining data rules under each classification set and establishing an index identification library; the data rules are defined to facilitate the determination of the attribution of the data in the classification set, the establishment of the index identification library facilitates the addressing according to the index identification, the accurate finding of the corresponding data is facilitated, the effective management of the data is realized, and the rapid acquisition is facilitated when the data is called. The second determining submodule is used for receiving a calling data request when the shared management module passes the approval, and determining a corresponding target service type and a target identifier under the target service type according to the calling data request; a third determining submodule, configured to query the index identifier library according to the target identifier to determine corresponding target data; the fourth determining submodule is used for determining first access information according to the acquisition time of the target identifier and the effective time of the target data and calling the target data based on the first access information; a fifth determining sub-module, configured to determine second access information according to the archiving time of the target data and the valid time of the target data, and enable the archiving and warehousing module to store the target data based on the second access information; and the judgment sub-module is used for judging whether the first access information is consistent with the second access information or not, and acquiring the target data and returning the target data to the applicant when the first access information is determined to be consistent with the second access information. The method and the device are convenient for realizing the management of the effective time of the target data, simultaneously accurately read the target data in the effective time, and cannot acquire the target data not in the effective time. Other servers need to be called for acquisition. The method and the system can shorten the calling time of the target data, ensure the quick response of the system, simultaneously meet the storage of excessive data and ensure the upgrading requirement of the database.
In one embodiment, the method further comprises:
a data source processing module to:
before uploading a pre-stored data source as uploading data, performing first data extraction on the data source, wherein the first data extraction is data source end extraction, and generating a plurality of independent instance libraries according to the structural characteristics of source data instances; establishing data format conversion rules of different instance libraries according to the preset data standard format requirement, performing format conversion operation on the instance library of the source data, and generating a series of new data tables meeting the preset data standard format requirement;
performing second data extraction on the new data table, classifying according to the structural features of the new data instances, generating a plurality of service instances, and constructing a data detail layer of a data warehouse; multi-table storage, cross-database storage and remote storage of massive heterogeneous data examples are realized through data mapping, and optimization of data storage, access and analysis processing is realized;
performing data clustering on the data detail layer of the data warehouse to generate a subject library for sharing and analyzing the project group; the data clustering comprises theme clustering processing and label management; the theme library comprises an air quality library, a particulate component library, a pollution source library, a weather library and a health library.
The working principle and the beneficial effects of the technical scheme are as follows: as shown in fig. 4, the data source processing module performs data processing on the data source, so as to ensure the quality of the data source and the convenience of subsequently calling the data source. And after the requirement of the platform data format is met, service clustering and data warehouse loading processing are carried out. The first type of extraction is data source end extraction, and a plurality of independent instance libraries are generated according to the structural characteristics of source data instances; according to the standard format requirement of the platform data, establishing data format conversion rules of different instance bases, performing format conversion operation on the instance base of the source data, and generating a series of new data tables meeting the platform data format requirement. It should be noted that, in view of the complexity of the data format, this step may be performed in multiple times until the platform data requirement is met; the second type of extraction is to classify a series of new data tables generated in the previous step according to the structural features of the new data instances to generate a plurality of service instances and construct a data detail layer of the warehouse; multi-table storage, cross-database storage and remote storage of massive heterogeneous data examples are realized through data mapping, and optimization of data storage, access and analysis processing is realized; in the data clustering link, topic clustering processing and label management are carried out on the detail data layers in the warehouse, and a plurality of topic libraries of air quality, particulate matter components, pollution sources, weather, sanitation and the like are generated for sharing and analyzing project groups. In the whole ETL process, metadata is used as a normalization management tool, and the operation of each step and each link is ensured to be orderly, normative and traceable through the whole process. The orderliness and accuracy of the data source are guaranteed.
According to some embodiments of the invention, the data quality control module performs quality control on the carbon component relationship anion-cation charge balance, including:
setting a starting condition: the data for quality control comprises NO3-, SO 42-and NH4+ data;
calculating an anion equivalent based on formula (1);
Figure BDA0003283411310000241
wherein AE is an anion equivalent (. mu.mol/m 3),
Figure BDA0003283411310000242
at a chloride ion concentration (. mu.g/m 3),
Figure BDA0003283411310000243
at a nitrate concentration (. mu.g/m 3),
Figure BDA0003283411310000244
at a sulfate concentration (. mu.g/m 3),
Figure BDA0003283411310000245
is the fluoride ion concentration (. mu.g/m 3);
calculating a cation equivalent based on formula (2);
Figure BDA0003283411310000246
wherein CE is a cation equivalent (. mu. mol/m3),
Figure BDA0003283411310000247
the concentration of ammonium ions (. mu.g/m 3),
Figure BDA0003283411310000248
magnesium ion concentration (. mu.g/m 3),
Figure BDA0003283411310000249
potassium ion concentration (. mu.g/m 3),
Figure BDA00032834113100002410
calcium ion concentration (. mu.g/m 3),
Figure BDA00032834113100002411
sodium ionSub-concentration (. mu.g/m 3);
calculating the ratio of the anion equivalent to the cation equivalent, and judging whether the ratio is in a preset ratio range; carrying out classification marking on the calculation result according to the judgment result to obtain a quality control result;
and recording the quality control result in a quality control field group.
The working principle and the beneficial effects of the technical scheme are as follows: starting conditions are as follows: the data of NO3-, SO 42-and NH4+ are simultaneously > 0. The predetermined ratio is in the range of [0.8,2.8 ]. And carrying out classification marking on the calculation result: "-" (indicating that the starting condition is not reached), normal (indicating that the ratio is within the preset ratio range), abnormal (indicating that the ratio is outside the preset ratio range). And recording the quality control result in a quality control field group. The quality of the data is effectively managed, and the data quality control efficiency is improved. Carrying out red marking processing on an abnormal data record; performing statistics and abnormal percentage on the quality control result of the data in the same period; establishing a set of all abnormal records of the same quality control rule, and performing online query, display and manual examination; and exporting the abnormal record set into an excel file, serving as an attachment, and sending the excel file to a resource side through a mail for data quality correction. As shown in fig. 7, accurate data after data quality control is acquired.
In one embodiment, the quality control module performs quality control on the carbon component relationship anion-cation charge balance, and further includes: setting a processing rule: when a certain factor is negative during calculation, the factor is assigned to be 0 and then participates in calculation.
8-9 according to some embodiments of the invention, the data sources acquired by the data acquisition module include structured data sources and unstructured data sources;
obtaining a structured data source comprising:
establishing connection between a data acquisition module and a data provider, receiving a httpclient request sent by the data provider, carrying out identity verification on the data provider based on the httpclient request, carrying out timing scheduling on a data interface based on a spring quartz technology when the verification is determined to pass, periodically acquiring first data from the data provider, judging whether the acquired first data is missing, and when the acquired first data is determined to be missing, operating in a thread multithreading mode to realize a historical data complement function to obtain second data; screening out data with the data type in the json format from the second data, analyzing, and storing after the analysis is finished;
obtaining an unstructured data source, comprising:
the method comprises the steps that a data acquisition module acquires a first target file acquired through a system uploading file, an FTP uploading file and an administrator copy file; performing antivirus processing on the first target file based on antivirus software, detecting whether viruses exist in the first target file after the antivirus processing is completed, and deleting the files with the viruses to obtain a second target file;
and performing file directory transfer on the second target file, operating in a file monitoring thread mode in the file directory transfer process, reading a directory to obtain directory information, transferring the directory information and the second target file to a formal server, and writing a file path into a database.
The working principle and the beneficial effects of the technical scheme are as follows: there are two main ways to exchange structured data, with the data interface synchronized with the ETL. The request includes the URL address, the request parameter, the request time and other requirements for providing accessible service for the data provider through the data interface, and the basic requirements for providing response data of the data after the request is successful for the data provider, including fields, field descriptions, data types and the like. ETL synchronization is the library-to-library data transfer between database hierarchies. When transmitting through ETL, the data provider is required to provide the address, port, database name, table to be synchronized, and corresponding table structure of the accessible database server. In the data acquisition process, the program for acquiring data generally increases the function of complementing historical data in consideration of the reasons that the network fails, the data is delayed and the like. In the process of the structured data convergence, the method is applied to technologies such as webservice, httpparent, json, spring quartz, java thread and the like. According to the technical safety requirement of the data provider, the identity authentication is firstly carried out before the data request, and the data is acquired after the authentication is passed. All interface requests with the data provider are http protocols, and the interfaces are requested in an http policy. And scheduling the data interface by using a spring quartz technology in the program at regular time, and analyzing the data in the json format after requesting the data, so that the data is conveniently stored. In consideration of the possibility of the number of the broken data caused by special reasons, a history data complement function is added, and the function is started when the data loss is detected and is operated in a thread multithreading mode to improve the complement efficiency. 4. The unstructured data mainly comprises three ways of platform uploading, FTP tool uploading, manual copying and the like. All files need to enter a temporary server to start a virus scanning function, if viruses exist in the scanned process, the files are directly deleted by antivirus software, and otherwise, the files enter the next flow subscribed by the system. After the file entering the temporary server is scanned safely by antivirus software, another file monitoring program is automatically started, the file is copied to a formal server according to a preset directory rule, and the path is stored in a file table related to a database. In the file transmission process, the system uses technologies such as FTP, file monitoring, virus checking and killing and the like. The data acquisition module acquires the structured data source and the unstructured data source in different modes and determines the accuracy of the acquired data.
As shown in fig. 10, according to some embodiments of the invention, the sharing management module includes:
a sixth determining module, configured to determine a data sharing permission requirement, where the data sharing permission requirement includes at least one of a definition of a data owner, a permission claim of the data owner, a permission type, and a permission granularity;
a seventh determining module, configured to determine tools and manners of data sharing and distribution;
the eighth determining module is used for establishing a data sharing safety mechanism, which comprises an application system, an approval system, a responsibility statement system and network trace;
a receiving module to:
receiving a data request application initiated by an applicant and forwarding the data request application to an auditing terminal;
when the auditing terminal agrees with the protocol, receiving the application information filled by the applicant and the shared data to be applied; the shared data comprises a data interface and an application document;
the calling module is used for calling corresponding target data from the data management module according to the application information and the shared data to be applied, classifying the target data and dividing the target data into structured target data and unstructured target data;
and the sending module is used for sending the unstructured target data to the applicant based on the mail and sending the structured target data to the applicant based on the data interface.
The working principle and the beneficial effects of the technical scheme are as follows: determining the data sharing authority requirement: including data owner definitions, data owner rights claims, rights types, rights granularities, etc. The method comprises the following specific steps: in order to protect intellectual property rights of scientific researchers, under the condition of complying with national data management policies and atmospheric attack and customs related data management regulations, data sharing permissions are set by scientific researchers at the same line and comprise data sharing ranges, sharing contents, sharing time requirements, intellectual property right labeling requirements and the like. The sharing range is divided into three categories according to project composition and effective radius of scientific research propagation: project sharing, topic internal sharing, no sharing. The achievement sharing enthusiasm of scientific research personnel is guaranteed. The shared content range comprises file level, table level and field level limits; time range rights can also be designed. Scientific researchers produce results based on shared data, and need to label data sources according to an intellectual property management method to guarantee rights and interests of front-line personnel. Determining tools and modes of data sharing distribution: analyzing data use habits and operation experiences of scientific research personnel, sharing data purposes (such as one-time manual analysis and continuous automatic service system support), and providing various sharing modes, including online export, interface transmission, intermediate library, e-mail document distribution and the like. Establishing a data sharing security mechanism: in the mechanism construction, the subject comprises an application system, an approval system, a responsibility statement system and a network trace formation. According to related management policies, application is carried out by taking a subject group as a unit, and the signature of a receiver, the signature of a subject group leader and the signature of a subject lead unit are required to agree; the attacking and customing center carries out approval according to the data confidentiality level, the data authority, the data timeliness and other requirements of a data owner; the responsibility declaration mainly comprises a data use declaration and a data use protocol, and the handling authority, the confidentiality responsibility and the intellectual property requirement of the applicant on the shared data are clear; the data sharing process mainly comprises three modules of data sharing application, sharing approval and data use feedback under the column of 'data service', and generated data interfaces or documents are uniformly sent to an applicant by a specified service mailbox, and traces are left in the whole process. The data sharing application process comprises six steps: the applicant fills application information and shared data (including interfaces and documents) to be applied on line, downloads a data sharing application form, signs all data use statements and required signatures (names) of the application form and uploads the signed data in a PDF file; the method comprises the following steps that an attack and customs center conducts on-line examination and approval (or joint examination and approval with a data owner) according to original data and data ownership of a data application form (PDF), examination and approval results comprise two types of examination and approval or examination and approval failure, and the processing results and the reasons of disagreement can be sent to an applicant through a specified unified official mailbox without examination and approval results; once the sharing approval is passed, the platform automatically generates a shared file interface or a compressed document as an attachment according to the online filling requirement of the applicant, and sends the shared file interface or the compressed document as an attachment to the data applicant together with an approval mail agreeing to share through a specified unified official mailbox, and the sharing application process is finished. The data sharing management efficiency is improved, data sharing is comprehensively recorded, and the data traceability is improved. Meanwhile, data sharing is realized, and the data utilization rate is improved.
As shown in fig. 5-6, in an embodiment, the data quality control module performs data quality control, and a composite framework of system quality control (data cleaning) + outlier quality control + service quality control is established. The system quality control adopts a common data cleaning technology in the ETL to realize the uniqueness check, the non-null check, the threshold check and the integrity check of the data in the warehouse. After data are put in a warehouse, the data enter different quality control processes according to the time characteristics of the data, time sequence data enter a system for cleaning, component data enter a service for secondary quality control judgment after the system is cleaned, non-time sequence data enter name matching, abnormal marking is carried out by a machine when the data do not accord with the quality control rule in the process, the data are judged to be abnormal after manual examination is correct, the quality control result is transmitted to a user in a sharing mode, and the data are regularly fed back to a data source, so that the data quality is supervised and urged to be ensured.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An atmospheric environment integrated data acquisition and sharing system, comprising:
the user management module is used for managing the user registration information and performing role distribution according to the user registration information, wherein the role distribution comprises a subject group, an administrator and a general user;
the data acquisition module is connected with the user management module and is used for:
receiving the atmospheric environment comprehensive data acquired by the subject group, and taking the atmospheric environment comprehensive data and a pre-stored data source as upload data;
collecting metadata of the uploaded data, and dividing the metadata into structured data and unstructured data;
the data quality control module is connected with the data acquisition module and is used for:
receiving unstructured data sent by the data acquisition module, and performing file auditing on the unstructured data;
receiving the structured data sent by the data acquisition module, dividing the structured data into time-series data and non-time-series data, and performing system cleaning on the time-series data;
carrying out service quality control on the non-time-sequence data and the time-sequence data cleaned by the system, and then carrying out manual examination;
the filing and warehousing module is connected with the data quality control module and is used for receiving the unstructured data which are subjected to file examination and the structured data which are subjected to manual examination and are sent by the data quality control module to carry out filing and warehousing;
the data management module is connected with the filing and warehousing module and used for carrying out data management on the filing data in the database;
the shared management module is connected with the data management module and used for approving the data sharing application initiated by the applicant, and when the approval is passed, the shared management module sends approval passing prompt information and corresponding target data called out from the data management module to the applicant; and when the approval is not passed, sending approval failure prompt information to the applicant.
2. An atmospheric environment integrated data collection and sharing system as defined in claim 1, wherein the data management module includes:
the data set management submodule is used for establishing a data set and indexes for the archived data;
the resource directory management submodule is used for establishing a user-defined resource directory and a user-defined menu sequence for the archived data so as to facilitate query navigation;
the map service management submodule is used for determining map service for the archived data;
and the thematic map management submodule is used for confirming the thematic of the filing data and determining the thematic map.
3. An atmospheric environment integrated data collection and sharing system as defined in claim 2, wherein the resource inventory management submodule, further configured to:
creating a directory, and inputting a directory name, a URL (uniform resource locator), and sequencing and selecting types of archived data based on a directory creating window;
creating a view, inputting a view name, selecting a view code, sorting archived data and selecting a type based on a view window;
creating a document, entering a file name, selecting a file, entering a ranking of archived data based on creating a view window.
4. The integrated atmospheric data collection and sharing system of claim 1 wherein the time-sequenced data is used to show time-series trends in a plurality of parameters for each city, including city, time period, city of interest, comparison area, selection factor.
5. An atmospheric environment integrated data collection and sharing system as defined in claim 1, further comprising:
a data source processing module to:
before uploading a pre-stored data source as uploading data, performing first data extraction on the data source, wherein the first data extraction is data source end extraction, and generating a plurality of independent instance libraries according to the structural characteristics of source data instances; establishing data format conversion rules of different instance libraries according to the preset data standard format requirement, performing format conversion operation on the instance library of the source data, and generating a series of new data tables meeting the preset data standard format requirement;
performing second data extraction on the new data table, classifying according to the structural features of the new data instances, generating a plurality of service instances, and constructing a data detail layer of a data warehouse; multi-table storage, cross-database storage and remote storage of massive heterogeneous data examples are realized through data mapping, and optimization of data storage, access and analysis processing is realized;
performing data clustering on the data detail layer of the data warehouse to generate a subject library for sharing and analyzing the project group; the data clustering comprises theme clustering processing and label management; the theme library comprises an air quality library, a particulate component library, a pollution source library, a weather library and a health library.
6. The atmospheric-environment-integrated-data-acquisition and sharing system of claim 1, wherein the data quality control module is further configured to:
extracting the characteristics of the unstructured data and the structured data sent by the data acquisition module to obtain particle component data, air quality data, meteorological data and source analysis data, and respectively performing threshold management;
and performing quality control on the unstructured data and the structured data sent by the data acquisition module, wherein the quality control comprises integrity abnormity, completeness abnormity, uniqueness abnormity, conformity abnormity, inversion abnormity, carbon component relationship anion and cation charge balance, concentration summation and actual measurement ratio, outlier judgment, total abnormal number and proportion condition of the data.
7. An atmospheric environment integrated data collection and sharing system as defined in claim 1, wherein the data management module includes:
the first determining submodule is used for performing data extraction, conversion and loading on the archived data and determining corresponding metadata;
the obtaining submodule is used for setting a service type, classifying the metadata according to the service type and obtaining a plurality of classification sets;
establishing a submodule for defining data rules under each classification set and establishing an index identification library;
the second determining submodule is used for receiving a calling data request when the shared management module passes the approval, and determining a corresponding target service type and a target identifier under the target service type according to the calling data request;
a third determining submodule, configured to query the index identifier library according to the target identifier to determine corresponding target data;
the fourth determining submodule is used for determining first access information according to the acquisition time of the target identifier and the effective time of the target data and calling the target data based on the first access information;
a fifth determining sub-module, configured to determine second access information according to the archiving time of the target data and the valid time of the target data, and enable the archiving and warehousing module to store the target data based on the second access information;
and the judgment sub-module is used for judging whether the first access information is consistent with the second access information or not, and acquiring the target data and returning the target data to the applicant when the first access information is determined to be consistent with the second access information.
8. The atmospheric environment integrated data acquisition and sharing system of claim 6, wherein the data quality control module performs quality control on carbon component relationship anion-cation charge balance, comprising:
setting a starting condition: the data for quality control comprises NO3-, SO 42-and NH4+ data;
calculating an anion equivalent based on formula (1);
Figure FDA0003283411300000051
wherein AE is an anion equivalent (. mu.mol/m 3),
Figure FDA0003283411300000052
at a chloride ion concentration (. mu.g/m 3),
Figure FDA0003283411300000053
at a nitrate concentration (. mu.g/m 3),
Figure FDA0003283411300000054
at a sulfate concentration (. mu.g/m 3),
Figure FDA0003283411300000055
is the fluoride ion concentration (. mu.g/m 3);
calculating a cation equivalent based on formula (2);
Figure FDA0003283411300000056
wherein CE is a cation equivalent (. mu. mol/m3),
Figure FDA0003283411300000057
is ammonium ionThe concentration (. mu.g/m 3),
Figure FDA0003283411300000058
magnesium ion concentration (. mu.g/m 3),
Figure FDA0003283411300000059
potassium ion concentration (. mu.g/m 3),
Figure FDA00032834113000000510
calcium ion concentration (. mu.g/m 3),
Figure FDA00032834113000000511
sodium ion concentration (. mu.g/m 3);
calculating the ratio of the anion equivalent to the cation equivalent, and judging whether the ratio is in a preset ratio range; carrying out classification marking on the calculation result according to the judgment result to obtain a quality control result;
and recording the quality control result in a quality control field group.
9. The integrated atmospheric-environment data collection and sharing system of claim 1, wherein the data sources acquired by the data collection module include structured data sources and unstructured data sources;
obtaining a structured data source comprising:
establishing connection between a data acquisition module and a data provider, receiving a httpclient request sent by the data provider, carrying out identity verification on the data provider based on the httpclient request, carrying out timing scheduling on a data interface based on a spring quartz technology when the verification is determined to pass, periodically acquiring first data from the data provider, judging whether the acquired first data is missing, and when the acquired first data is determined to be missing, operating in a thread multithreading mode to realize a historical data complement function to obtain second data; screening out data with the data type in the json format from the second data, analyzing, and storing after the analysis is finished;
obtaining an unstructured data source, comprising:
the method comprises the steps that a data acquisition module acquires a first target file acquired through a system uploading file, an FTP uploading file and an administrator copy file; performing antivirus processing on the first target file based on antivirus software, detecting whether viruses exist in the first target file after the antivirus processing is completed, and deleting the files with the viruses to obtain a second target file;
and performing file directory transfer on the second target file, operating in a file monitoring thread mode in the file directory transfer process, reading a directory to obtain directory information, transferring the directory information and the second target file to a formal server, and writing a file path into a database.
10. An atmospheric environment integrated data collection and sharing system as defined in claim 1, wherein the sharing management module includes:
a sixth determining module, configured to determine a data sharing permission requirement, where the data sharing permission requirement includes at least one of a definition of a data owner, a permission claim of the data owner, a permission type, and a permission granularity;
a seventh determining module, configured to determine tools and manners of data sharing and distribution;
the eighth determining module is used for establishing a data sharing safety mechanism, which comprises an application system, an approval system, a responsibility statement system and network trace;
a receiving module to:
receiving a data request application initiated by an applicant and forwarding the data request application to an auditing terminal;
when the auditing terminal agrees with the protocol, receiving the application information filled by the applicant and the shared data to be applied; the shared data comprises a data interface and an application document;
the calling module is used for calling corresponding target data from the data management module according to the application information and the shared data to be applied, classifying the target data and dividing the target data into structured target data and unstructured target data;
and the sending module is used for sending the unstructured target data to the applicant based on the mail and sending the structured target data to the applicant based on the data interface.
CN202111139916.1A 2021-09-28 2021-09-28 Atmospheric environment comprehensive data acquisition and sharing system Active CN113868318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111139916.1A CN113868318B (en) 2021-09-28 2021-09-28 Atmospheric environment comprehensive data acquisition and sharing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111139916.1A CN113868318B (en) 2021-09-28 2021-09-28 Atmospheric environment comprehensive data acquisition and sharing system

Publications (2)

Publication Number Publication Date
CN113868318A true CN113868318A (en) 2021-12-31
CN113868318B CN113868318B (en) 2022-07-12

Family

ID=78991504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111139916.1A Active CN113868318B (en) 2021-09-28 2021-09-28 Atmospheric environment comprehensive data acquisition and sharing system

Country Status (1)

Country Link
CN (1) CN113868318B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116050869A (en) * 2023-04-03 2023-05-02 南京庞特软件科技有限公司 Store full life cycle management method and store management system
CN116069849A (en) * 2023-03-02 2023-05-05 安徽兴博远实信息科技有限公司 Artificial intelligent management system applied to cross-platform data exchange sharing
CN116450747A (en) * 2023-06-16 2023-07-18 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2911036A1 (en) * 2014-02-21 2015-08-26 Samsung Electronics Co., Ltd Method and apparatus for power sharing
US20190220796A1 (en) * 2015-08-22 2019-07-18 Salim B. KHALIL Automated, integrated and complete computer program/project management solutions standardizes and optimizes management processes and procedures utilizing customizable and flexible systems and methods
CN112231333A (en) * 2020-11-09 2021-01-15 南京莱斯网信技术研究院有限公司 Ecological environment data sharing and exchanging method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2911036A1 (en) * 2014-02-21 2015-08-26 Samsung Electronics Co., Ltd Method and apparatus for power sharing
US20190220796A1 (en) * 2015-08-22 2019-07-18 Salim B. KHALIL Automated, integrated and complete computer program/project management solutions standardizes and optimizes management processes and procedures utilizing customizable and flexible systems and methods
CN112231333A (en) * 2020-11-09 2021-01-15 南京莱斯网信技术研究院有限公司 Ecological environment data sharing and exchanging method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
孙彩萍 等: "大气环境科学综合数据采集共享平台建设及应用研究", 《环境科学研究》 *
孙彩萍 等: "面向业务驱动的大气环境数据资源分类体系及应用研究", 《环境工程技术学报》 *
朱保成 等: "《粮智 河南省"粮安工程"仓储智能化升级管理实务》", 31 August 2017, 黄河水利出版社 *
樊重俊 等: "《大数据分析与应用》", 31 January 2016, 立信会计出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069849A (en) * 2023-03-02 2023-05-05 安徽兴博远实信息科技有限公司 Artificial intelligent management system applied to cross-platform data exchange sharing
CN116069849B (en) * 2023-03-02 2023-06-09 安徽兴博远实信息科技有限公司 Artificial intelligent management system applied to cross-platform data exchange sharing
CN116050869A (en) * 2023-04-03 2023-05-02 南京庞特软件科技有限公司 Store full life cycle management method and store management system
CN116450747A (en) * 2023-06-16 2023-07-18 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data
CN116450747B (en) * 2023-06-16 2023-08-29 长沙数智科技集团有限公司 Heterogeneous system collection processing system for office data

Also Published As

Publication number Publication date
CN113868318B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN113868318B (en) Atmospheric environment comprehensive data acquisition and sharing system
Liu et al. CITIESData: a smart city data management framework
CN112685385A (en) Big data platform for smart city construction
CN111190881A (en) Data management method and system
Milojevic-Dupont et al. EUBUCCO v0. 1: European building stock characteristics in a common and open database for 200+ million individual buildings
CN111680153A (en) Big data authentication method and system based on knowledge graph
Xu et al. Developing an IFC-based database for construction quality evaluation
CN112989156A (en) Big data based policy and enterprise matching method and system
Lane The UK environmental change network database: An integrated information resource for long-term monitoring and research
Turner Defining and measuring traffic data quality: White paper on recommended approaches
CN113722301A (en) Big data processing method, device and system based on education information and storage medium
Cuzzocrea et al. An innovative framework for supporting big atmospheric data analytics via clustering-based spatio-temporal analysis
US20170039235A1 (en) Air quality metrology system
US20130232158A1 (en) Data subscription
Janev et al. Modeling, fusion and exploration of regional statistics and indicators with linked data tools
Manase et al. A GIS analytical approach for exploring construction health and safety information
US20110258007A1 (en) Data subscription
CN115794839B (en) Data collection method based on Php+Mysql system, computer equipment and storage medium
Reich et al. The Zoltar forecast archive, a tool to standardize and store interdisciplinary prediction research
Dietze et al. A community convention for ecological forecasting: output files and metadata
CN117455379A (en) Basic intelligent management system and method
KR20180131829A (en) All-round data management device and method supporting long-term ecological research
Dietze et al. A community convention for ecological forecasting: Output files and metadata version 1.0
Mudge et al. A comparison between three unmixing models for source apportionment of PM2. 5 using alkanes in air from Southern Chile
Duda et al. Formation of hypercubes based on data obtained from systems of IoT devices of urban resource networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant