CN117290343A

CN117290343A - Intelligent forestry big data system, method, server and medium based on data lake

Info

Publication number: CN117290343A
Application number: CN202311577209.XA
Authority: CN
Inventors: 王晓亮; 王宇翔; 范磊; 张乐; 徐斌; 李丹彤; 张松; 孙景慕
Original assignee: Aerospace Hongtu Information Technology Co Ltd
Current assignee: Aerospace Hongtu Information Technology Co Ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2023-12-26

Abstract

The invention provides a data lake-based intelligent forestry big data system, a data lake-based intelligent forestry big data method, a server and a medium, wherein the data lake-based intelligent forestry big data system comprises: the data processing and quality inspection module is used for: carrying out data processing on the forestry data according to the data space type of the forestry data, carrying out at least one data check on the forestry data after the data processing, and carrying out data quality check on the forestry data which are processed through the data check; the data entering lake and statistical analysis module is used for: storing the forestry data passing the data quality inspection into a data lake, and carrying out statistical analysis on the forestry data in the data lake to obtain a data statistical analysis result; the data warehouse entry and service sharing module is used for: and storing the forestry data passing the data quality inspection into a master database and a slave database, and providing a service sharing function based on the forestry data stored in the master database and the slave database. The invention has the characteristics of unified management, flexible sharing, intelligence, high efficiency, advanced technology and the like, thereby providing powerful support in the aspects of operation and decision.

Description

Intelligent forestry big data system, method, server and medium based on data lake

Technical Field

The invention relates to the technical field of data storage, processing and application in the forestry industry, in particular to an intelligent forestry big data system, method, server and medium based on a data lake.

Background

Along with the increasing development of technologies such as big data, the Internet of things, cloud computing, artificial intelligence and the like, the conversion from informatization to digital has become a necessary lesson for various industries, and forestry is not exceptional. Intelligent forestry construction has become a necessary option to promote comprehensive implementation of green development. The current digital construction of domestic forestry industry is still in the primary stage. According to the related data, the forestry informatization level of China only ranks midstream in the global scope, and the digital transformation has larger promotion space.

First, the government affair external network is logically isolated from the Internet, part of forestry application systems run on the Internet, the government affair internal network is not fully started, and part of forestry data is encrypted and transmitted in an Internet channel to have hidden trouble. Secondly, along with the construction of a forestry informatization application system, some forestry data are generated and accumulated to form precious data assets, but due to the lack of a unified data center, part of the forestry data are still mainly managed by forestry related departments, and the phenomena of data dispersion and fragmentation occur, so that a large amount of management data resources cannot be shared and utilized. Finally, a batch of information systems are built by forestry related departments according to business needs, but the systems are built in a scattered mode, the data standard and the classification method are five-in-eight, the business process is relatively isolated, a unified entrance cannot be formed, and single sign-on is realized. In summary, due to the non-uniformity of system application and management, the business coordination across departments is not smooth, and powerful support cannot be provided in the aspects of operation and decision.

Disclosure of Invention

In view of the above, the invention aims to provide an intelligent forestry big data system, method, server and medium based on data lakes, which have the characteristics of unified management, flexible sharing, intelligent high efficiency, advanced technology and the like, thereby providing powerful support in operation and decision.

In a first aspect, an embodiment of the present invention provides a data lake-based intelligent forestry big data system, which includes a data processing and quality inspection module, and further includes a data lake entering and statistical analysis module and/or a data warehousing and service sharing module, which are communicatively connected with the data processing and quality inspection module; wherein,

the data processing and quality inspection module is used for: carrying out data processing on forestry data according to the data space type of the forestry data, carrying out at least one data check on the forestry data after the data processing, and carrying out data quality check on the forestry data which pass through the data check processing;

the data entering lake and statistical analysis module is used for: storing the forestry data passing the data quality inspection into a data lake, and carrying out statistical analysis on the forestry data in the data lake to obtain a data statistical analysis result;

The data warehouse entry and service sharing module is used for: and storing the forestry data passing the data quality inspection into a master database and a slave database, and providing a service sharing function based on the forestry data stored in the master database and the slave database.

In one embodiment, the data processing and quality inspection module comprises a forestry data processing unit and a forestry data quality inspection unit; wherein,

the forestry data processing unit is used for: performing data processing on the forestry data of the space class by using the first function set, and/or performing data processing on the forestry data of the non-space class by using the second function set; performing checking before entering the lake of the forestry data and checking after entering the lake of the forestry data on the forestry data processed by the data processing;

the forestry data quality inspection unit is used for: performing data quality inspection on forestry data passing the data inspection; returning the forestry data under the condition that the forestry data does not pass the data quality inspection until new forestry data passes the pre-lake-entering inspection of the forestry data, the post-lake-entering inspection of the forestry data and the data quality inspection; and transmitting the forestry data passing through the data quality inspection to the data lake entering and statistical analysis module.

In one embodiment, the forestry data processing unit is further configured to:

carrying out classified treatment and standardized treatment on forestry data of space class; carrying out data processing on the forestry data of the space class by utilizing a first function set; and carrying out superposition analysis processing on the forestry data of the space class after the data processing.

In one embodiment, the data lake entering and statistical analysis module comprises a forestry data lake entering unit and a forestry data statistical analysis unit; wherein,

the forestry data lake inlet unit is used for: storing the forestry data passing through the data quality inspection into a data lake in a distributed manner by adopting a Parque data file format or an original format according to a data hierarchical classification structure of public foundation, forestry thematic and forestry synthesis;

the forestry data statistical analysis unit is used for: and carrying out statistical analysis on the forestry data in the data lake by using a spatial analysis algorithm or an attribute statistical algorithm to obtain a data statistical analysis result.

In one embodiment, the forestry data lake inlet unit is further configured to:

for the offline forestry data, acquiring the offline forestry data from a service system through the improved postgresqlwriter and hdfswswrite plug-in, and transmitting the acquired forestry data to the data processing and quality testing module;

For real-time forestry data, realizing the butt joint between the intelligent forestry big data system and a service system through a Kafka cluster so as to acquire the real-time forestry data from the service system and transmit the acquired forestry data to the data processing and quality inspection module;

and storing the offset data in the real-time forestry data into a relational database for consuming the offset data according to a database configuration table corresponding to the relational database, thereby achieving the aim of preventing the back pressure and repeated consumption of the data.

In one embodiment, the data warehousing and service sharing module comprises a forestry data warehousing unit and a forestry data service sharing unit; wherein,

the forestry data warehouse-in unit is used for: storing the forestry data passing the data quality inspection into a master database and a slave database; the master database comprises a master database and a plurality of slave databases, wherein the master database is used as a writing database, and the slave databases are used as reading databases;

the forestry data service sharing unit is configured to: providing a service sharing function based on the forestry data stored in the master-slave database;

The forestry data service sharing unit is further configured to: when a data calling request aiming at the service sharing function is received, if the data quantity corresponding to the data calling request is larger than a preset data quantity threshold value, the forestry data is read from the slave database, slicing is carried out on the read forestry data, the sliced forestry data is stored in a specified database in a lasting mode, and the sliced forestry data is read from the specified database and fed back.

In one embodiment, the intelligent forestry big data system further comprises a forestry data presentation module for:

carrying out linkage inquiry display on the forestry data through one or more conditions of a time interval, administrative area codes and space coordinates;

or, carrying out real-time calculation and display of reeling and unreeling on the forestry data through one or more dimensions in administrative areas, time, land types, forest species and land rights;

or, the forestry data is dynamically perceived and displayed through one or more subjects of forest growth, resource management, biodiversity, natural protected land, ecological protection, ecological restoration, ecological industry and others.

In a second aspect, an embodiment of the present invention further provides a data lake-based intelligent forestry big data management method, including:

performing data processing on the forestry data according to the data space type of the forestry data through a data processing and quality checking module, performing at least one data check on the forestry data after the data processing, and performing data quality checking on the forestry data processed through the data check;

storing the forestry data passing through the data quality inspection into a data lake through a data lake and statistical analysis module, and carrying out statistical analysis on the forestry data in the data lake to obtain a data statistical analysis result;

and storing the forestry data passing through the data quality inspection into a master database and a slave database through a data storage and service sharing module, and providing a service sharing function based on the forestry data stored in the master database and the slave database.

In a third aspect, embodiments of the present invention also provide a server comprising a processor and a memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the method provided in the first aspect.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method provided in the first aspect.

The embodiment of the invention provides a data lake-based intelligent forestry big data system, a data lake-based intelligent forestry big data method, a data lake-based intelligent forestry big data server and a data lake-based intelligent forestry big data medium, wherein the data lake-based intelligent forestry big data system comprises a data processing and quality testing module, a data lake-entering and statistical analysis module and/or a data warehouse-in and service sharing module, wherein the data lake-entering and statistical analysis module and the data warehouse-in and service sharing module are in communication connection with the data processing and quality testing module; the data processing and quality inspection module is used for: carrying out data processing on the forestry data according to the data space type of the forestry data, carrying out at least one data check on the forestry data after the data processing, and carrying out data quality check on the forestry data which are processed through the data check; the data entering lake and statistical analysis module is used for: storing the forestry data passing the data quality inspection into a data lake, and carrying out statistical analysis on the forestry data in the data lake to obtain a data statistical analysis result; the data warehouse entry and service sharing module is used for: and storing the forestry data passing the data quality inspection into a master database and a slave database, and providing a service sharing function based on the forestry data stored in the master database and the slave database. The intelligent forestry big data system provided by the embodiment of the invention has the characteristics of unified management, flexible sharing, intelligent and efficient performance, advanced technology and the like, so that powerful support is provided in operation and decision.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data lake-based intelligent forestry big data system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of another intelligent forestry big data system based on data lakes according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of a data lake construction module according to an embodiment of the present invention;

figure 4 is a schematic flow chart of a forestry data service sharing method according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method for managing big data of intelligent forestry based on data lakes according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described in conjunction with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, due to the non-uniformity of system application and management, the business cooperation of cross departments is not smooth, and powerful support can not be provided in the aspects of operation and decision. Based on the method, the intelligent forestry big data system, the method, the server and the medium based on the data lake are provided, and the intelligent forestry big data system, the method, the server and the medium based on the data lake have the characteristics of unified management, flexible sharing, intelligent and efficient performance, advanced technology and the like, so that powerful support is provided in the aspects of operation and decision making.

For the sake of understanding the present embodiment, first, a detailed description will be given of a data lake-based intelligent forestry big data system disclosed in the present embodiment, referring to a schematic structure diagram of the data lake-based intelligent forestry big data system shown in fig. 1, which includes a data processing and quality inspection module 102, and further includes a data lake entering and statistical analysis module 104 and/or a data warehousing and service sharing module 106 communicatively connected to the data processing and quality inspection module 102.

In one example, the data processing and quality inspection module 102 is configured to: and carrying out data processing on the forestry data according to the data space type of the forestry data, carrying out at least one data check on the forestry data after the data processing, and carrying out data quality check on the forestry data which are processed through the data check. The data processing and quality inspection module is divided into a forestry data processing unit and a forestry data quality inspection unit, wherein the forestry data processing unit mainly comprises four processes of forestry data processing, checking before entering a lake of forestry data, checking after entering the lake of forestry data. The rule of the forestry data quality inspection unit mainly comprises topology detection, attribute field detection, abnormal element detection, space relation detection, element difference comparison detection, custom SQL, multi-table accuracy, two-table value comparison, field length verification, uniqueness verification, regular expression, timeliness verification, enumeration value verification, table number verification, null value detection and the like.

In one example, the data entry and statistical analysis module 104 is configured to: and storing the forestry data passing the data quality inspection into a data lake, and carrying out statistical analysis on the forestry data in the data lake to obtain a data statistical analysis result. The data lake entering and statistical analysis module is divided into a forestry data lake entering unit and a forestry data statistical analysis unit. The forestry data lake entering unit mainly realizes that various forestry data with different data formats such as shp, gdb, csv, mdb, tiff, pictures, videos, documents and the like are stored on Hadoop hdfs in a distributed manner by a customized program or an ETL tool after quality inspection according to a data hierarchical classification structure of public foundations, forestry themes and forestry synthesis, and a snappy compression algorithm or an original format of a part data file format is adopted. The forestry data statistical analysis unit is mainly used for carrying out layered design based on forestry data in data lakes and realizing a architecture system integrating lakes and reservoirs. The statistical analysis process is performed by a computational engine, spark, hive, starRocks, or the like.

In one example, the data warehouse entry and service sharing module 106 is configured to: and storing the forestry data passing the data quality inspection into a master database and a slave database, and providing a service sharing function based on the forestry data stored in the master database and the slave database. The data warehouse-in and service sharing module comprises a forestry data warehouse-in unit and a forestry data service sharing unit. The forestry data warehouse-in unit mainly realizes the classified management of various forestry data such as shp, gdb, forms and other types of forestry data through the steps of data arrangement, data catalog management, metadata table management, warehouse-in scheme configuration, path configuration, data identification configuration, uploading data, starting warehouse-in and the like. The forestry data service sharing unit is mainly based on stock, incremental public basic data, forestry thematic data and forestry comprehensive data generated by the service systems, and realizes sharing of the forestry data among the service systems in a mode of sharing application services, so that interconnection and intercommunication of the forestry data among different service systems are achieved.

The intelligent forestry big data system provided by the embodiment of the invention has the characteristics of unified management, flexible sharing, intelligent and efficient performance, advanced technology and the like, so that powerful support is provided in operation and decision.

For easy understanding, the embodiment of the invention provides a specific implementation manner of an intelligent forestry big data system based on a data lake, and referring to a structural schematic diagram of another intelligent forestry big data system based on the data lake shown in fig. 2, the intelligent forestry big data system comprises a data lake construction module, a forestry data processing unit, a forestry data quality inspection unit, a forestry data lake entering unit, a forestry data statistical analysis unit, a forestry data warehouse-in unit, a forestry data service sharing unit and a forestry data display module.

For the foregoing data lake construction module, see a schematic structural diagram of a data lake construction module shown in fig. 3, where the data lake hypothesis module includes a data storage layer, a data management layer, a data calculation layer, and a data query layer.

In one example, the data storage layer: the data lake architecture may employ Hadoop HDFS, S3, OSS as the underlying data storage medium. The invention mainly adopts Hadoop HDFS to carry out distributed multi-copy storage, and compared with single-library storage, the invention improves the high availability and partition fault tolerance of the system.

In one example, the data management layer: the data lake architecture may be managed using Delta, hudi, iceberg, etc. The invention mainly manages data through Hudi.

In one example, the data calculation layer: distributed computing may be performed by a computing engine, such as Hive, flink, spark, presto.

In one example, the data query layer: query analysis may be performed by Hive, spark, starRocks, etc.

In the construction process of the module, the compatibility between an operating system and different components needs to be studied in detail and fully analyzed by depending on the technology of an open source big data component, and the problems of the bottom environment such as different jdks and sdk, packet-dependent conflicts, asynchronous server clocks, few file handles and the like are solved. Compared with the traditional relational database, the data with more than ten millions has the advantage of improving the performance by more than about 10 times by applying the big data technology. In addition, the solution idea aiming at the bottom layer environment problem is to download the git source code from the apache official network, modify, compile and integrate and release by oneself, perfectly solve the compatibility between an operating system and different software, deploy and use jdk and sdk compatible with the operating system and different software, and self-define and configure clock service, thereby solving the problem of different clocks of the cluster servers.

(II) for the aforementioned forestry data processing unit, for: performing data processing on the forestry data of the space class by using the first function set, and/or performing data processing on the forestry data of the non-space class by using the second function set; and checking before entering the lake of the forestry data and checking after entering the lake of the forestry data. The first function set comprises format conversion, image preprocessing, radiation correction, geometric correction, elevation correction, coordinate system conversion, image mosaic, image fusion, image editing, image framing, data format analysis and conversion, data structure normalization, code conversion, graphic processing, attribute processing, information synthesis, vector data rasterization processing, warehouse-in format conversion, data quality inspection and the like. The second functional combination includes: data cleaning, de-duplication, format verification, business rule verification, data conversion, metadata management, table standardization processing, photograph standardization naming, picture vectorization, table vectorization, document standardization, quality inspection and the like.

In practical application, the forestry data processing unit mainly comprises four processes of forestry data processing, checking before entering a lake, checking after entering a lake, and the like, and the forestry data processing not only meets the problem of processing the data, but also can respond to the data requirement of a service system to carry out secondary processing on the forestry data. After finishing the processing of the forestry data, carrying out the next link, checking before entering the lake of the forestry data, and checking whether the forestry data meets the standard of entering the lake or not, and processing according to the requirement of a service system, if so, carrying out the step of entering the lake, otherwise, returning the data to the forestry data processing link; after the forestry data is completely put into the lake, arranging a special person to compare whether the original forestry data content is consistent with the database content, if so, carrying out a service release link, otherwise, needing to track the reason of the inconsistency to the front-end process.

In the data processing link, the embodiment of the invention can process the data of the forestry data of the space class (forestry space data for short) and the forestry data of the non-space class (forestry non-space data for short) by adopting different function sets.

In one example, the spatial class forestry data may be classified and normalized; carrying out data processing on the forestry data of the space class by utilizing the first function set; and carrying out superposition analysis treatment on the forestry data of the space class after the data processing.

Specifically, according to the type and source of forestry data, classification treatment is performed, and the classification treatment is mainly divided into: public foundation, forestry thematic and forestry comprehensive four kinds of data. According to the standardized flow of the system, the information such as the file name, the file type, the file code, the table name corresponding to the file, the vector, the attribute and the like of the forestry data are processed and converted by adopting the method. The bottom technology adopts a self-compiling gdal tool to realize the processing process of the space data.

Forestry data superposition analysis processing mainly relates to forest length grid data, which are respectively superposed and analyzed with a map of forest grass wet resources of 21 years and 22 years to form result data such as forest land, grassland, wet land area and the like in a grid range, forest land approval project result data and national forest land data are superposed and analyzed to form forest land approval project result data in a national forest land range, ancient tree name tree directory result data and ancient tree park data are superposed and analyzed, ancient tree name tree result data in each ancient tree park range, forest length grid data are superposed and analyzed with a forest guard and a technological staff respectively, forest guard responsibility grid result data and technological staff responsibility grid data, forest length system patrol data and fireproof event data are superposed and analyzed to form patrol event result data and the like.

In one example, since different storages and data attributes are used for the non-spatial data and the spatial data, the processing methods are also different, and the second functional set is specifically referred to. Forestry space data is focused on the processing of space attributes, and simultaneously comprises the standardization and processing of attribute structures; the forestry non-space data is mainly standardized and processed aiming at the attribute structure, and the common relational database is stored.

(III) for the forestry data quality inspection unit, the forestry data quality inspection unit is used for: performing data quality inspection on forestry data passing through the data inspection; returning the forestry data under the condition that the forestry data does not pass the data quality inspection until new forestry data passes the inspection before entering the lake, the inspection after entering the lake and the data quality inspection; and transmitting forestry data passing through the data quality inspection to a data lake entering and statistical analysis module.

In one embodiment, the forestry data quality inspection rule mainly comprises topology detection, attribute field detection, abnormal element detection, spatial relationship detection, element difference comparison detection, custom SQL, multi-table accuracy, two-table value comparison, field length verification, uniqueness verification, regular expression, timeliness verification, enumeration value verification, table number verification, null value detection and the like.

After the forestry data are collected, data quality inspection is carried out according to the quality inspection rules, and a data quality inspection report is generated for clients to review. The method relates to special fields such as user pictures, videos and documents, and is used for custom designing a unified media information table for storing data containing media information in forestry data, custom packaging inquiry and application logic and using the custom packaging inquiry and application logic for an application system.

(IV) for the aforementioned forestry data, a lake entering unit for: and storing the forestry data passing through the data quality inspection into a data lake in a distributed manner by adopting a part data file format or an original format according to a data hierarchical classification structure of public foundation, forestry thematic and forestry synthesis. Specifically, the forestry data entering unit mainly realizes that various types of forestry data with different data formats such as shp, gdb, csv, mdb, tiff, pictures, videos, documents and the like are stored on Hadoop hdfs in a distributed manner by a customized program or an ETL tool after quality inspection according to a data hierarchical classification structure of public foundations, forestry themes and forestry synthesis, and by adopting a snappy compression algorithm or an original format of a part data file format.

In addition, the forestry data lake entering unit can be used for collecting the forestry data from the service system and sending the forestry data to the forestry data processing unit for relevant processing.

In one example, for offline-class forestry data, offline-class forestry data is collected from a service system through modified postgresqlwriter and hdfswster plugins, and the collected forestry data is transmitted to a data processing and quality inspection module. In practical applications, the offline data may use the ETL tool of DataX to custom modify postgresqlwriter and hdfswswrite plugins, supporting the collection of spatial data.

In one example, for real-time forestry data, the intelligent forestry big data system is in butt joint with the service system through the Kafka cluster, so that real-time forestry data are collected from the service system, and the collected forestry data are transmitted to the data processing and quality inspection module. In practical application, the real-time data adopts Kafka clusters to respectively dock with business systems such as ancient tree famous trees, natural protected areas, forest long systems, public welfare forests, natural forests, forest cutting and the like, establish different topic, and perform data docking.

In one example, offset data in real-time forestry data is stored in a relational database, and the offset data is consumed according to a database configuration table corresponding to the relational database, so that the problems of data backpressure and repeated consumption are solved. In practical application, the offset is stored by adopting a relational database, and a database configuration table is queried during consumption to consume the corresponding offset, so that the problems of data backpressure, repeated consumption and the like are effectively prevented.

By adopting the technical route, the embodiment of the invention improves the data acquisition efficiency by about 2 times, reduces the data transmission time and flow consumption by 1/3, reduces the disk space storage by 1/3, and improves the efficiency of entering the lake by 1/4 of data.

And (fifth) for the foregoing forestry data statistical analysis unit, for: and carrying out statistical analysis on forestry data in the data lake by using a spatial analysis algorithm or an attribute statistical algorithm to obtain a data statistical analysis result.

In practical application, hierarchical design is carried out based on forestry data in the data lake, so that a lake and bin integrated architecture system is realized. The statistical analysis process is performed by a computational engine, spark, hive, starRocks, or the like. The analysis method adopted involves two major categories of spatial analysis and attribute statistics. The space analysis mainly comprises superposition analysis, buffer analysis and the like; the attribute statistics comprise a self-defining report function and a magic cube function, and relate to statistics of resource indexes such as forest and grass wetland, desert, national forest farm, ancient tree name wood and the like, distribution statistics of various wild animals and plants, statistics of natural resources such as national parks, natural protected areas, natural parks and the like, and the like. A lake and warehouse integrated architecture system is introduced, and CDH6.3.1 big data clusters integrate a Sedona database and a Starblocks database. Wherein, CDH big data cluster integration Sedona: the hive-exec source code is modified, the hive client side is increased to support the thraft service protocol, the package is recompiled and used, and the space calculation of mass data is supported. And synchronizing the high-frequency and huge-data-volume data accessed by services such as one graph of forest grass wet resources of 21 years and 22 years, administrative division of each level of Hunan province of 21 years and the like into a Starblocks cluster, and custom developing a real-time statistic analysis interface to realize tens of millions of data second-level statistics. By adopting the technical route, compared with the traditional technology, the calculation process of mass data has the advantage that the calculation performance is improved by more than 10 times.

Sixth, for the foregoing forestry data warehouse entry unit, it is used for: forestry data passing through data quality inspection are stored in a master database and a slave database; the master database comprises a master database and a plurality of slave databases, wherein the master database is used as a writing database, and the slave database is used as a reading database.

In concrete implementation, the forestry data warehouse entry mainly realizes the classified management of the forestry data such as shp, gdb, forms and the like and other types of forestry data through the steps of data arrangement, data catalog management, metadata table management, warehouse entry scheme configuration, path configuration, data identification configuration, uploading data, starting warehouse entry and the like. By configuring the master database and the slave database, the read-write separation is realized from the service layer, so that the high availability of data and application is realized, and the availability of application programs is improved. The postgresqlwriter plug-in is custom modified using the ETL tool of DataX, supporting the acquisition of spatial data. The forestry data acquisition efficiency is improved by about 2 times, the forestry data transmission time and flow consumption of 1/3 are reduced, and the efficiency of 1/4 of forestry data warehouse entry is improved.

(seventh) for the aforementioned forestry data service sharing unit, for: providing a service sharing function based on forestry data stored in a master database and a slave database; in addition, when a data calling request for the service sharing function is received, if the data amount corresponding to the data calling request is larger than a preset data amount threshold value, the forestry data is read from the database, slicing processing is carried out on the read forestry data, the sliced forestry data is stored in the appointed database in a lasting mode, and the sliced forestry data is read from the appointed database and fed back. Wherein the database is designated MongoDB.

For example, referring to a flow chart of a forestry data service sharing method shown in fig. 4, it is first determined whether a service system performs data fusion; if the judgment result is negative, the incremental data of the business system is directly used as sharing-service application data, and data sharing service is configured, so that business application-data sharing service is realized; if yes, continuing to judge whether the stock data exists; if the judging result is still no, the incremental data of the business system is directly used as sharing-service application data, and the data sharing service is configured, so that the business application-data sharing service is realized; otherwise, the stock data in the database and the increment data of the business system are used as the sharing-service application data to configure the data sharing service, and then the business application-data sharing service is realized.

In practical application, the forestry data service sharing module is mainly based on stock, incremental public basic data, forestry thematic data and forestry comprehensive data generated by the service system, and realizes sharing of the forestry data among the service systems in a mode of sharing application services, so that interconnection and intercommunication of the forestry data among different service systems are achieved. The data range covers all forestry data, and the data range is more comprehensive. In addition, by using MongoDB and flink technology, the custom slicing service, through connecting the library table of the postgresql database, inquiring data according to a primary key and synchronizing data through kafka, and integrating a custom real-time space data tool by using a flink calculation engine, the forestry data slicing of more than 100 graphs such as one graph of forest and grass wet resources of 21 years and 22 years, one graph of forest and tree harvest of 23 years, one graph of ancient tree name and tree directory table, naturally protected supervision of one graph, obligation tree planting base data, forest length grid data and the like is realized, and the forestry data slicing is persisted to MongoDB. Then according to WMS, WFS, WCS, TMS, WMTS, three-dimensional and other service types, associating space table names, service aliases, initial layers, style names, coordinate systems, data types, administrative areas, service catalogues, service time phases, scales, geometric features, spatial resolution, source systems, proxy addresses and other information, registering services, and then carrying out service release according to vector data, vector maps, image maps and other types, so that 17 service systems such as ancient tree names, natural protection places, fireproof systems, resource management, resource monitoring, public welfare and the like can be called in real time, and the timeliness of data service sharing is improved.

Eighth, for the foregoing forestry data display module, it is configured to: carrying out linkage inquiry display on forestry data through one or more conditions of time intervals, administrative area codes and space coordinates; or, carrying out real-time calculation and display on the forestry data by one or more dimensions of administrative areas, time, land types, forest species and land rights; or dynamically perceiving and displaying forestry data through one or more subjects of forest growth, resource management, biodiversity, natural protected land, ecological protection, ecological restoration, ecological industry and others.

When the intelligent forestry driving protection system is specifically implemented, the forestry data display module mainly comprises processing analysis of atomic indexes and derivative indexes of various topics based on various business data of forestry by using a Spark, hive, starRocks computing engine and combining statistical rules of various topics, and finally dynamic perceptibility and auxiliary decision analysis capability of the forestry data are formed by using a large-screen visualization technology, so that intelligent forestry driving protection is achieved. And the linkage inquiry display is carried out by inputting the time interval, administrative area codes and space coordinates, so that the flexibility of data display is improved. Based on the forestry data of the tens of millions of orders, real-time calculation display of reeling and unreeling is carried out according to dimensions of administrative areas, time, land types, forest species, land rights and the like. Meanwhile, the calculation rules are dynamically configured, mainly comprising sum, count, avg, max, min and rule matching is performed by using custom fields, so that service expansion display is rapidly realized. According to different topics: the eight topics such as forest growth, resource management, biodiversity, natural protection land, ecological protection, ecological restoration, ecological industry and others are subjected to dynamic perception. The display system supports two/three-dimensional switching, two-side hiding, annotation displaying, i query opening/closing, transparency, cloud remote image displaying and other operations.

In summary, the intelligent forestry big data system based on the data lake provided by the embodiment of the invention has at least the following characteristics:

(1) The system stores forestry data as it is without requiring prior structuring of the forestry data. The system is based on a centralized repository, allowing all structured, semi-structured and unstructured forestry data to be stored on an arbitrary scale. The stored structured forestry data, such as forestry vector data in the form of tables, shp or gdb and the like in a relational database; stored semi-structured forestry data, such as CSV, JSON, GEOJSON, XML, log, etc. format forestry data generated by a forestry system; stored unstructured forestry data, such as forestry user emails, documents, PDF, TIFF formatted grids, and the like; binary forestry data such as graphics, audio, video, etc. of forestry related services are stored.

(2) The system mainly comprises operations of spatial data standardization, compliance detection, geometric correction, vector data rasterization, warehouse-in format conversion, vector data slicing, code conversion, topology inspection and the like which are developed by using arcgis, postgis and based on tool packages such as geotools and JTS (flexible test) in a self-defining manner; based on standardized processing of non-space data realized by big data calculation engines such as hive and spark, structured data cleaning and de-duplication, non-empty judgment, type conversion, metadata processing, non-structured picture and video data standardized naming and other operations, and according to the processing rules of respective forestry business, statistical analysis, superposition analysis, buffer analysis, attribute i query and other operations are performed.

(3) The application scene related by the system mainly comprises the following steps: 1. sharing data; 2. data insight; 3. and (5) intelligent decision making. The data sharing is to issue the space data subjected to standardized processing and data quality inspection into web services, so as to realize cross-platform sharing of different forestry business data, break the situation of data island and realize forestry data circulation with high efficiency; the data insight is based on hive, spark, starrocks and other big data analysis technologies, and cesium, echarts, datav, 3D and other visualization technologies, so that comprehensive data analysis is provided for a manager, and timeliness and effectiveness of information inquiry and interaction are improved; the intelligent decision is realized by combining the large data analysis technology such as hive, spark, starrocks with the forestry multidimensional data, calling various forestry data resources, models and analysis tools, and helping a decision maker to improve the decision level and the decision quality through intelligent analysis and fingertip decision.

(4) The technology adopted by the system is mainly based on massive forestry business data, and is added with a data lake architecture integrating Hudi calculation separation and lake and storehouse integration and a web project architecture based on front-end and back-end separation of frameworks such as maven, nginx, spring group+mybatis-plus, vue2+ cesium, postgresql +postgis+Starblocks. The development language mainly adopts java language, and the technology is currently the mainstream big data and web application technology.

On the basis, the embodiment of the invention is suitable for an intelligent forestry big data system based on the data lake and a construction method, the intelligent forestry big data system is constructed, the province-city-county three-level forestry data are managed in a centralized mode, the standard is unified, and efficient sharing and exchange are realized. The lake is directly or indirectly accessed from the government network in an off-line mode, the system limit of forestry data is broken, an IT system of a forestry related department is driven to be changed from a cost center to an innovation center, and the transformation of the forestry related department from informatization to digital is accelerated. The construction of the intelligent forestry big data system based on the data lake helps forestry related departments to realize the application of digital wisdom in operation and decision level, the efficiency upgrading of the booster forestry industry and the cross-boundary fusion among multiple industries. At present, the construction system of the embodiment of the invention perfectly supports data analysis of about 2.3 hundred million and about 19.01TB and a large amount of forestry data statistics requirements, and assists in intelligent decision making and analysis service. The system provides the capabilities of one-key login, unified specification, dynamic quality inspection, multiple-input multiple-output, hierarchical management, hierarchical statistics, cross-platform seamless sharing and multi-screen linkage display, greatly simplifies the complexity of forestry data storage and application, reduces the average cost by 30% when compared with other solutions, improves the efficiency of data management and service by 35%, and perfectly realizes cost reduction and synergy.

On the basis of the foregoing embodiments, the embodiment of the present invention provides a data lake-based intelligent forestry big data management method, referring to a flow chart of the data lake-based intelligent forestry big data management method shown in fig. 5, the method mainly includes the following steps S502 to S506:

step S502, carrying out data processing on the forestry data according to the data space type of the forestry data by a data processing and quality inspection module, carrying out at least one data inspection on the forestry data after the data processing, and carrying out data quality inspection on the forestry data which is processed by the data inspection;

step S504, the forestry data passing through the data quality inspection are stored in a data lake through a data lake entering and statistical analysis module, and statistical analysis is carried out on the forestry data in the data lake to obtain a data statistical analysis result;

step S506, through a data storage and service sharing module, the forestry data passing through the data quality inspection are stored in a master database and a slave database, and a service sharing function is provided based on the forestry data stored in the master database and the slave database.

The method provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the embodiment of the system, and for the sake of brief description, reference may be made to the corresponding content in the embodiment of the system where the embodiment of the method is not mentioned.

The embodiment of the invention provides a server, which specifically comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the embodiments described above.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention, where the server 100 includes: a processor 60, a memory 61, a bus 62 and a communication interface 63, the processor 60, the communication interface 63 and the memory 61 being connected by the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The memory 61 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 63 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.

The memory 61 is configured to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus for flow defining disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60 or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 60. The processor 60 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 61 and the processor 60 reads the information in the memory 61 and in combination with its hardware performs the steps of the method described above.

The computer program product of the readable storage medium provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and the specific implementation may refer to the foregoing method embodiment and will not be described herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The intelligent forestry big data system based on the data lake is characterized by comprising a data processing and quality inspection module, and further comprising a data lake entering and statistical analysis module and/or a data warehousing and service sharing module which are in communication connection with the data processing and quality inspection module; wherein,

2. The intelligent forestry big data system based on data lakes of claim 1, wherein the data processing and quality inspection module comprises a forestry data processing unit and a forestry data quality inspection unit; wherein,

3. A data lake-based intelligent forestry big data system of claim 2, wherein the forestry data processing unit is further configured to:

4. The intelligent forestry big data system based on data lake according to claim 1, wherein the data lake entering and statistical analysis module comprises a forestry data lake entering unit and a forestry data statistical analysis unit; wherein,

5. A data lake-based intelligent forestry big data system of claim 4, wherein the forestry data entering unit is further configured to:

6. The intelligent forestry big data system based on data lakes of claim 1, wherein the data warehousing and service sharing module comprises a forestry data warehousing unit and a forestry data service sharing unit; wherein,

7. A data lake-based intelligent forestry big data system of claim 1, further comprising a forestry data presentation module for:

8. The intelligent forestry big data management method based on the data lake is characterized by comprising the following steps of:

9. A server comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of claim 8.

10. A computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of claim 8.