WO2023004881A1 - Smart agriculture aiot distributed big data storage platform - Google Patents

Smart agriculture aiot distributed big data storage platform Download PDF

Info

Publication number
WO2023004881A1
WO2023004881A1 PCT/CN2021/111626 CN2021111626W WO2023004881A1 WO 2023004881 A1 WO2023004881 A1 WO 2023004881A1 CN 2021111626 W CN2021111626 W CN 2021111626W WO 2023004881 A1 WO2023004881 A1 WO 2023004881A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
agricultural
resource library
smart
basic
Prior art date
Application number
PCT/CN2021/111626
Other languages
French (fr)
Chinese (zh)
Inventor
刘天琼
Original Assignee
深圳市爱云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市爱云信息科技有限公司 filed Critical 深圳市爱云信息科技有限公司
Publication of WO2023004881A1 publication Critical patent/WO2023004881A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mining & Mineral Resources (AREA)
  • General Health & Medical Sciences (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Animal Husbandry (AREA)
  • Agronomy & Crop Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A smart agricultural AIOT distributed big data storage platform, comprising a smart agricultural big data basic support platform and a smart agricultural data center; the smart agricultural big data basic support platform is used for managing and supporting the full life cycle of a huge volume of big data; and the smart agricultural data center is used for planning basic service data stored in the smart agricultural big data basic support platform, so as to ensure that the smart agricultural AIOT distributed big data storage platform has a distributed architecture effect. The present invention employs a distributed architecture, and provides, by providing management and support for the full life cycle of a huge volume of big data, a data model basis for a smart agricultural big data service platform established on the basis of artificial intelligence and Internet information sharing.

Description

智慧农业AIOT分布式大数据存储平台Smart Agriculture AIOT Distributed Big Data Storage Platform 技术领域technical field
本发明涉及智慧农业技术领域,具体涉及一种智慧农业AIOT分布式大数据存储平台。The invention relates to the technical field of smart agriculture, in particular to an AIOT distributed big data storage platform for smart agriculture.
背景技术Background technique
农业是国民经济的基础,随着农业产业化和规模化水平的提高,以及物联网技术、云计算技术、大数据技术以及地理信息系统、遥感和全球定位系统技术在农业领域中越来越广泛应用,传统的农业耕作模式逐渐暴露出一些不足,主要体现:第一,农业信息孤岛严重。农业部门分条管理,部门应用系统多为垂直体系结构、孤立系统,信息共享程度低;第二,数据综合利用率不高。农业数据涉及数据类型多、数据结构不一致、数据质量参差不齐,数据分析整理工作量大,数据综合利用率不高;第三,市场供销信息不对称。农产品受市场影响波动很大,但农民获取市场信息有限,很难及时掌握最新的市场信息,致使农业生产者与消费者之间的信息脱节。第四,管理粗放。我国农业生产较为分散,农业相关数据采集、分析困难,很难做到精确、高效和处理及时,造成农业决策不精准。Agriculture is the foundation of the national economy. With the improvement of agricultural industrialization and scale, and Internet of Things technology, cloud computing technology, big data technology, geographic information system, remote sensing and global positioning system technology are more and more widely used in the agricultural field , the traditional agricultural farming mode has gradually exposed some shortcomings, mainly reflected in: First, the severe island of agricultural information. The agricultural sector is managed in sections, and the departmental application systems are mostly vertical and isolated systems, with a low degree of information sharing. Second, the comprehensive utilization rate of data is not high. Agricultural data involves many types of data, inconsistent data structure, and uneven data quality. The workload of data analysis and sorting is heavy, and the comprehensive utilization rate of data is not high. Third, market supply and marketing information is asymmetric. Agricultural products fluctuate greatly due to the influence of the market, but farmers have limited access to market information, and it is difficult to grasp the latest market information in a timely manner, resulting in a disconnection of information between agricultural producers and consumers. Fourth, extensive management. my country's agricultural production is relatively scattered, and it is difficult to collect and analyze agricultural-related data. It is difficult to achieve accuracy, efficiency and timely processing, resulting in inaccurate agricultural decision-making.
传统的农业耕作模式存在的上述弊端导致传统的农业信息系统无法解决以下问题,这些问题是需要采用物联网、云计算、大数据以及“3S”技术等现代信息技术与农业相融合才能解决的问题:一是如何实现各农业部门以及其他政府部门间之间数据共享与交换,达到部门间信息互联互通的目的;二是如何通过农产品市场信息和气候气象信息,做出科学决策,指导农民合理农作物种植, 避免“有价无市”或“有市无价”的现象发生,实现农产品供销平衡;三是如何通过物联网传感器对农作物环境进行有效的监测,实现农作物生长达到最佳的生长环境以及科学地施肥,使得农作物获得相对高的产量,从而增加农民收入;四是如何通过二维码、条形码等对农产品以及投入品进行标识,实现农产品和投入品进行有效管理和质量安全追溯;五是如何利用互联网发展农业电商和农业休闲旅游,帮助农民拓宽农产品销售渠道、解决农产品滞销和增加收入途径的难题。The above-mentioned disadvantages of the traditional agricultural farming mode lead to the inability of the traditional agricultural information system to solve the following problems, which can only be solved by the integration of modern information technologies such as the Internet of Things, cloud computing, big data and "3S" technology with agriculture : First, how to realize data sharing and exchange among various agricultural departments and other government departments, so as to achieve the purpose of inter-departmental information interconnection; second, how to make scientific decisions through agricultural product market information and climate and meteorological information, and guide farmers to rationalize crops Planting, to avoid the phenomenon of "there is no price but no market" or "there is no price when there is a market", and realize the balance of supply and marketing of agricultural products; the third is how to effectively monitor the crop environment through IoT sensors, so as to achieve the best growth environment for crop growth and Fertilization scientifically enables crops to obtain relatively high yields, thereby increasing farmers' income; fourth, how to identify agricultural products and inputs through QR codes, bar codes, etc., to achieve effective management and quality and safety traceability of agricultural products and inputs; fifth is How to use the Internet to develop agricultural e-commerce and agricultural leisure tourism, help farmers broaden the sales channels of agricultural products, solve the problems of unsalable agricultural products and increase income.
综上所述,采用人工智能技术和互联网信息共享建立智慧农业大数据服务平台已经成为农业现代化的发展趋势,因此,亟需建立智慧农业大数据存储平台,为基于人工智能和互联网信息共享建立的智慧农业大数据服务平台提供数据模型基础。To sum up, the use of artificial intelligence technology and Internet information sharing to establish a smart agricultural big data service platform has become the development trend of agricultural modernization. The smart agricultural big data service platform provides the data model foundation.
发明内容Contents of the invention
本发明的目的是提供智慧农业AIOT分布式大数据存储平台,采用分布式架构,通过提供对海量大数据的全生命周期的管理和支持,为基于人工智能和互联网信息共享建立的智慧农业大数据服务平台提供数据模型基础。The purpose of the present invention is to provide a smart agricultural AIOT distributed big data storage platform, adopt a distributed architecture, and provide intelligent agricultural big data based on artificial intelligence and Internet information sharing by providing management and support for the entire life cycle of massive big data The service platform provides the data model foundation.
为了达到上述目的,本发明所采用的技术方案是:本发明提供了一种智慧农业AIOT分布式大数据存储平台,包括智慧农业大数据基础支撑平台和智慧农业数据中台,所述智慧农业大数据基础支撑平台用于对海量大数据的全生命周期的管理和支持,所述智慧农业数据中台用于将所述智慧农业大数据基础支撑平台存储的基础业务数据进行规划,以确保所述智慧农业AIOT分布式大数据存储平台发挥分布式架构作用;所述智慧农业大数据基础支撑平台与所述智慧农业数据中台通过计算机应用程序接口和网络实现数据交换;In order to achieve the above purpose, the technical solution adopted in the present invention is: the present invention provides a smart agricultural AIOT distributed big data storage platform, including a smart agricultural big data basic support platform and a smart agricultural data middle platform. The data basic support platform is used to manage and support the full life cycle of massive big data, and the smart agricultural data center is used to plan the basic business data stored on the smart agricultural big data basic support platform to ensure that the The smart agriculture AIOT distributed big data storage platform plays a role of distributed architecture; the smart agricultural big data basic support platform and the smart agricultural data middle platform realize data exchange through computer application program interfaces and networks;
所述智慧农业大数据基础支撑平台包括数据获取系统、数据治理系统以及数据存储系统,所述数据获取系统用于数据采集,所述数据治理系统用于对所述数据获取系统采集的数据进行融合和数据治理,所述数据存储系统用于对经过所述数据治理系统分析处理后的数据进行存储;所述数据获取系统包括结构化数据采集模块、非结构化数据采集模块以及实时数据采集模块,所述结构化数据采集模块用于对结构化数据进行采集,所述非结构化数据模块用于对非结构化数据进行采集,所述实时数据采集模块用于对实时数据进行采集;所述数据治理系统包括数据抽取模块、数据清洗模块、数据转换模块以及数据加载模块,所述数据抽取模块用于从所述数据获取系统采集的数据中获取业务数据,所述数据清洗模块用于将所述数据抽取模块获取的有缺陷的数据正确化和规范化以达到要求的数据质量标准,所述数据转换模块用于将所述数据获取系统采集的数据和所述数据抽取模块处理后的数据进行转换以符合数据仓库模型的需求,所述数据加载模块用于将所述数据转换模块转换完成的数据存放至目标数据库;所述数据存储系统包括业务数据库和分布式海量空间数据库,所述业务数据库用于存储与农业资源和农业决策相关的业务数据,所述分布式海量空间数据库用于存储遥感影像数据、视频数据以及物联网传感器设备数据;The smart agricultural big data basic support platform includes a data acquisition system, a data management system and a data storage system, the data acquisition system is used for data collection, and the data management system is used for fusing the data collected by the data acquisition system and data governance, the data storage system is used to store the data analyzed and processed by the data governance system; the data acquisition system includes a structured data acquisition module, an unstructured data acquisition module and a real-time data acquisition module, The structured data collection module is used to collect structured data, the unstructured data module is used to collect unstructured data, and the real-time data collection module is used to collect real-time data; the data The governance system includes a data extraction module, a data cleaning module, a data conversion module and a data loading module, the data extraction module is used to obtain business data from the data collected by the data acquisition system, and the data cleaning module is used to convert the The defective data acquired by the data extraction module is corrected and normalized to meet the required data quality standards, and the data conversion module is used to convert the data collected by the data acquisition system and the data processed by the data extraction module to obtain In line with the requirements of the data warehouse model, the data loading module is used to store the data converted by the data conversion module into the target database; the data storage system includes a business database and a distributed massive spatial database, and the business database is used for Store business data related to agricultural resources and agricultural decision-making, and the distributed massive spatial database is used to store remote sensing image data, video data, and IoT sensor device data;
所述智慧农业数据中台包括共享资源库、智慧农业主题资源库以及基础资源库,所述共享资源数据库通过共享交换平台以满足公众和机构对农业数据的共享需求,所述智慧农业主题资源库为根据应用和需求定制的不同农业主题的资源库,所述基础资源库用于存储多个系统共享使用的数据;所述共享资源库包括无公害产品全周期共享资源库、大宗农产品交易共享资源库、种子需求共享资源库、特色农产品供应量共享资源库、数字农业政务共享资源库,所述无公害产品全周期共享资源库用于为公众提供无公害产品全周期查询功能,所述 种子需求共享资源库用于为从事农业生产的个人及厂家提供种子需求发布及查询功能,所述特色农产品供应量共享资源库用于为从事特色农产品交易的个人及厂家提供特色农产品供应量查询功能,所述数字农业政务共享资源库用于为政府机构提供农业相关的政务信息的发布和查询功能;所述智慧农业主题资源库包括农业产量主题资源库、产业布局主题资源库、环境监测主题资源库、农产品安全主题资源库、农产品物流主题资源库、渔业水产主题资源库、畜牧养殖主题资源库、病虫害防治主题资源库、土壤肥力主题资源库,所述农业产量主题资源库用于存储与农业产量相关的数据并提供查询功能,所述产业布局主题资源库用于存储与农业产业布局相关的数据并提供查询功能,所述环境监测主题资源库用于存储与农业生产相关的环境监测数据并提供查询功能,所述农产品安全主题资源库用于存储与农产品安全相关的数据并提供查询功能,所述农产品物流主题资源库用于存储与农产品物流相关的数据并提供查询功能,所述渔业水产主题资源库用于存储与渔业水产相关的数据并提供查询功能,所述畜牧养殖主题资源库用于存储与畜牧养殖相关的数据并提供查询功能,所述病虫害防治主题资源库用于存储与病虫害防治相关的数据并提供查询功能,所述土壤肥力主题资源库用于存储与土壤肥力相关的数据并提供查询功能;所述基础资源库包括行政单元基础资源库、基础地形基础资源库、农业企业基础资源库、农业资源基础资源库、影像资源基础资源库,所述行政单元基础资源库用于存储与农业有关的行政单元数据并对其进行初级共性加工以供其他系统调用,所述基础地形基础资源库用于存储与农业有关的基础地形数据并对其进行初级共性加工以供其他系统调用,所述农业企业基础资源库用于存储农业企业数据并对其进行初级共性加工以供其他系统调用,所述农业资源基础资源库用于存储农业资源并对其进行初级共性加工以供其他系统调动,所述影像资源基 础资源库用于存储与农业有关的影像资源并对其进行初级共性加工以供其他系统调动。The smart agricultural data center includes a shared resource library, a smart agricultural theme resource library, and a basic resource library. The shared resource database meets the public and institutional sharing needs for agricultural data through a shared exchange platform. The smart agricultural theme resource library It is a resource library of different agricultural themes customized according to applications and needs. The basic resource library is used to store data shared and used by multiple systems; the shared resource library includes the full-cycle shared resource library of pollution-free products, and the shared resources library, seed demand shared resource library, characteristic agricultural product supply shared resource library, and digital agricultural government affairs shared resource library. The shared resource database is used to provide seed demand release and query functions for individuals and manufacturers engaged in agricultural production, and the shared resource database for characteristic agricultural product supply is used to provide characteristic agricultural product supply query functions for individuals and manufacturers engaged in characteristic agricultural product transactions. The digital agricultural government sharing resource library is used to provide government agencies with the release and query functions of agriculture-related government information; the smart agriculture theme resource library includes agricultural output theme resource library, industrial layout theme resource library, environmental monitoring theme resource library, Agricultural product safety theme resource bank, agricultural product logistics theme resource bank, fishery and aquatic product theme resource bank, animal husbandry theme resource bank, disease and pest control theme resource bank, soil fertility theme resource bank, the agricultural output theme resource bank is used to store information related to agricultural output data and provide query functions, the industrial layout subject resource library is used to store data related to agricultural industrial layout and provide query functions, and the environmental monitoring subject resource library is used to store environmental monitoring data related to agricultural production and provide query functions function, the agricultural product safety theme resource library is used to store data related to agricultural product safety and provide query functions, the agricultural product logistics theme resource library is used to store data related to agricultural product logistics and provide query functions, and the fishery and aquatic product theme resources The library is used to store data related to fishery and aquatic products and provide query functions. The animal husbandry theme resource library is used to store data related to animal husbandry and provide query functions. The pest control theme resource library is used to store information related to disease and pest control. data and provide query function, the soil fertility subject resource library is used to store data related to soil fertility and provide query function; the basic resource library includes administrative unit basic resource library, basic terrain basic resource library, agricultural enterprise basic resource Library, basic resource library of agricultural resources, and basic resource library of image resources. The basic resource library of administrative units is used to store the data of administrative units related to agriculture and perform primary common processing on them for other system calls. The basic resource of basic terrain The library is used to store basic terrain data related to agriculture and perform primary common processing on it for other system calls. The agricultural enterprise basic resource library is used to store agricultural enterprise data and perform primary common processing on it for other system calls. The basic resource library of agricultural resources is used to store agricultural resources and carry out primary commonality Processing for mobilization by other systems, the image resources basic resource library is used to store image resources related to agriculture and perform primary common processing on them for mobilization by other systems.
进一步地,所述智慧农业数据中台还包括近源采集数据库,所述近源采集数据库依照源系统建模,以尽量保持从所述数据存储系统获得的业务数据原貌。Further, the smart agricultural data center also includes a near-source acquisition database, and the near-source acquisition database is modeled according to the source system, so as to keep the original appearance of the business data obtained from the data storage system as much as possible.
进一步地,所述共享资源库为在确保信息安全的基础上,通过共享平台,采用数据服务模式,依托从所述近源采集数据库获得的相关数据建立的对外共享资源库;所述智能农业主题资源库采用ETL工具,将从所述近源采集数据库获得的相关数据进行共性加工,面向应用,按需定制;所述基础资源库将从所述近源采集数据库获得的相关数据进行初级共性加工并提炼共性属性。Further, the shared resource library is an external shared resource library established based on relevant data obtained from the near-source acquisition database through a shared platform, using a data service model, on the basis of ensuring information security; the intelligent agriculture theme The resource library uses ETL tools to process the relevant data obtained from the near-source collection database for generality, application-oriented, and customized on demand; the basic resource library performs primary commonality processing on the relevant data obtained from the near-source collection database And extract common attributes.
进一步地,所述结构化数据包括从智慧农业资源管理系统、智慧农业生产管理系统、智慧农业供应链管理系统、智慧农业党建管理系统获得数据以及源于智慧城市的政务信息;所述结构化数据存储于结构化数据库和分布式数据库,能够通过数据接口协议进行实时或离线传输;所述非结构数据包括卫星遥感影像数据、空间地理数据、智能物联网传感器数据以及频数据,使用Hadoop集群进行分布式海量数据的存储;所述实时数据包括传感器、遥感影像数据以及海量并发数据。Further, the structured data includes data obtained from the smart agricultural resource management system, smart agricultural production management system, smart agricultural supply chain management system, smart agricultural party building management system, and government affairs information from smart cities; the structured data Stored in structured databases and distributed databases, which can be transmitted in real time or offline through data interface protocols; the unstructured data includes satellite remote sensing image data, spatial geographic data, intelligent IoT sensor data and frequency data, and is distributed using Hadoop clusters The storage of massive data; the real-time data includes sensors, remote sensing image data and massive concurrent data.
进一步地,所述业务数据是以县、乡为行政单元的土地、水、气候、人口和农业经济和农业资源数据以及基本农田划区定界、标准农田、土地二轮承包、农业决策专家知识库、耕地地力调查与质量评价研究中的土壤养分、重金属和农药残留;所述业务数据库在制定元数据库、数据字典和数据表结构系列同时制订配套的属性数据采集标准与规范;所述分布式海量空间数据库的架构采用基于Hadoop体系中分布式文件管理系统,以及基于MPP+Share-nothing技术设计的MPP数据库。Further, the business data is land, water, climate, population, agricultural economy and agricultural resource data as administrative units of counties and townships, as well as basic farmland demarcation and demarcation, standard farmland, land second-round contracting, and expert knowledge of agricultural decision-making Soil nutrients, heavy metals, and pesticide residues in the database, cultivated land fertility survey and quality evaluation research; the business database formulates supporting attribute data collection standards and specifications while formulating the metadata database, data dictionary, and data table structure series; the distributed The architecture of the massive spatial database adopts the distributed file management system based on the Hadoop system and the MPP database designed based on the MPP+Share-nothing technology.
进一步地,所述数据抽取模块的数据抽取包括以下情况:如果业务操作型数据库和数据仓库之间的数据库管理系统完全相同,只需要建立相应的连接关系就可以使用ETL工具直接访问,或者调用相应的SQL语句或者存储过程;如果数据仓库系统和业务操作型数据库的数据库管理系统不相同,使用ETL工具导出成文本文件或者Excel文件,然后再进行统一的数据抽取;如果需要抽取的数据量非常庞大,采用增量抽取方式,用标记位或者时间戳的形式,每次抽取前首先判断是否是抽取标记位或者是当前最近的时间,然后再将数据源的数据抽取出来。Further, the data extraction of the data extraction module includes the following situations: if the database management systems between the business operation database and the data warehouse are exactly the same, only need to establish a corresponding connection relationship to use the ETL tool to directly access, or call the corresponding SQL statements or stored procedures; if the data warehouse system and the database management system of the business operation database are different, use ETL tools to export them into text files or Excel files, and then perform unified data extraction; if the amount of data to be extracted is very large , using the incremental extraction method, in the form of marker bits or timestamps, before each extraction, first judge whether it is to extract marker bits or the latest time, and then extract the data from the data source.
进一步地,所述数据清洗模块选择的缺陷数据包括数值重复、数据缺失、数据错误、数据范围混淆、存在脏数据和数据不一致这几种情况;数值重复是标准不唯一,很多数值都代表着相同的含义;数据范围混淆是指相同的数值会应用到不同的场合中,代表着不同的含义。Further, the defect data selected by the data cleaning module includes repeated values, missing data, data errors, data range confusion, dirty data, and data inconsistencies; repeated values are not unique, and many values represent the same Meaning; data range confusion means that the same value will be applied to different occasions, representing different meanings.
进一步地,所述数据清洗模块的数据清洗流程包括以下步骤:Further, the data cleaning process of the data cleaning module includes the following steps:
S01:定义业务数据源,标识出满足需求的数据源,并且决定什么时候进行数据清洗;S01: Define business data sources, identify data sources that meet the requirements, and decide when to perform data cleaning;
S02:分析业务数据源,分析数据源的数据是否符合业务的规则和定义,是否存在非正常的数据结构;S02: Analyze the business data source, analyze whether the data of the data source conforms to the rules and definitions of the business, and whether there is an abnormal data structure;
S03:将数据标准化,定义标准化格式的数据,并且加以转换;S03: Standardize the data, define the data in a standardized format, and convert it;
S04:通过业务规则修正错误数据,定义是否为正确数据的标准,确定如何处理错误数据的方法;S04: Correct the wrong data through business rules, define the standard of whether it is correct data, and determine how to deal with the wrong data;
S05:合并数据,将属于同一实体的多个数据进行合并,合并时应该有去重的功能;S05: Merge data, merge multiple data belonging to the same entity, and there should be a deduplication function when merging;
S06:总结数据错误类型,通过总结数据出错的类型,提高清洗程序的完整性和正确性,从而降低数据出现重大问题的可能性。S06: Summarize the types of data errors, improve the integrity and correctness of the cleaning program by summarizing the types of data errors, thereby reducing the possibility of major data problems.
进一步地,所述数据转换模块的转换过程包括以下步骤:Further, the conversion process of the data conversion module includes the following steps:
S11:对空值的处理:如果在转换过程中捕获到某些字段存在空值,在进行加载时需要将空值替换成某一数据或者直接进行加载,不做任何转换;S11: Handling of null values: If some fields have null values captured during the conversion process, the null values need to be replaced with certain data or loaded directly without any conversion;
S12:对数据格式的规范化:根据业务数据源中各个字段的数据类型,进行数据格式的规范和统一,例如,统一将数值类型转化成字符串类型;S12: Standardize the data format: standardize and unify the data format according to the data type of each field in the business data source, for example, uniformly convert the numeric type into a string type;
S13:根据业务需求进行字段的拆分或者合并;S13: Split or merge fields according to business requirements;
S14:对缺失数据的替换:根据业务需求对缺失数据进行替换;S14: Replace missing data: replace missing data according to business requirements;
S15:根据业务规则对数据进行过滤;S15: Filter data according to business rules;
S16:根据编码表进行数据唯一性的转换:根据编码表制定的业务规范进行数据的转换,实现数据仓库系统内部数据的一致性。S16: Conversion of data uniqueness according to the coding table: data conversion is performed according to the business specification formulated by the coding table to realize the consistency of the internal data of the data warehouse system.
进一步地,所述数据加载模块的据加载策略包括时间戳的加载方式、全表对比的加载方式、通过读取日志表进行加载的方式、全表删除后再进行加载的方式;时间戳的加载方式是通过对源系统的表添加时间戳字段,将系统当前时间和时间戳的值进行对比,决定哪些业务数据需要被抽取,可以实现数据的递增加载,是比较常见的一种加载方式;全表对比的加载方式是在数据加载前,将每条数据都与目标表的所有记录进行全表对比,根据主键值是否相同,判断数据是更新还是插入,当数据量比较大的时候,有耗时长、效率低的缺点,通常也对全表对比进行改进,采用版本号、标记字段等缓慢变化维的形式进行增量的抽取;通过读取日志表进行加载的方式是当源数据表发生变化时,不断更新日志表的信息,将日志表的信息作为数据加载的一个依据,日志表维护相对麻烦,会存在一定风险;全表删除后再进行加载的方式是在数据加载前,先删 除目标表的所有数据,然后去加载全部的数据,但是不能实现数据的递增加载,效率较低,实现方式却相对简单。Further, the data loading strategy of the data loading module includes the loading method of the timestamp, the loading method of the full table comparison, the loading method by reading the log table, and the loading method after deleting the entire table; the loading of the timestamp The method is to add a timestamp field to the table of the source system, compare the current time of the system with the value of the timestamp, and decide which business data needs to be extracted, which can realize incremental loading of data, which is a relatively common loading method; full The loading method of table comparison is to compare each piece of data with all records of the target table before data loading, and judge whether the data is updated or inserted according to whether the primary key values are the same. When the amount of data is relatively large, there is The shortcomings of long time consumption and low efficiency usually also improve the comparison of the whole table, and incremental extraction is performed in the form of slowly changing dimensions such as version number and tag field; the method of loading by reading the log table is when the source data table occurs When changing, the information of the log table is constantly updated, and the information of the log table is used as a basis for data loading. The maintenance of the log table is relatively troublesome, and there will be certain risks; the way to load after deleting the entire table is to delete the data before loading the data. All the data in the target table, and then load all the data, but the incremental loading of data cannot be achieved, the efficiency is low, and the implementation method is relatively simple.
与现有技术相比,本发明的有益效果在于,本发明提供的智慧农业AIOT分布式大数据存储平台,包括智慧农业大数据基础支撑平台和智慧农业数据中台,智慧农业大数据基础支撑平台用于对海量大数据的全生命周期的管理和支持,智慧农业数据中台用于将所述智慧农业大数据基础支撑平台存储的基础业务数据进行规划,以确保所述智慧农业AIOT分布式大数据存储平台发挥分布式架构作用;本发明采用分布式架构,通过提供对海量大数据的全生命周期的管理和支持,为基于人工智能和互联网信息共享建立的智慧农业大数据服务平台提供数据模型基础。Compared with the prior art, the beneficial effect of the present invention is that the intelligent agricultural AIOT distributed big data storage platform provided by the present invention includes the intelligent agricultural big data basic support platform and the intelligent agricultural data middle platform, and the intelligent agricultural big data basic support platform It is used to manage and support the full life cycle of massive big data. The smart agricultural data center is used to plan the basic business data stored on the smart agricultural big data basic support platform to ensure that the smart agricultural AIOT distributed big The data storage platform plays the role of a distributed architecture; the present invention adopts a distributed architecture to provide data models for the smart agricultural big data service platform based on artificial intelligence and Internet information sharing by providing management and support for the entire life cycle of massive big data Base.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.
图1是本发明实施例提供的智慧农业AIOT分布式大数据存储平台的系统结构图。Fig. 1 is a system structure diagram of the intelligent agriculture AIOT distributed big data storage platform provided by the embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
本实施例的附图中相同或相似的标号对应相同或相似的部件;在本发明的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本专利的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。In the drawings of this embodiment, the same or similar symbols correspond to the same or similar components; The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred device or element must have a specific orientation, or in a specific orientation. Construction and operation, so the words describing the positional relationship in the drawings are only for illustrative purposes, and should not be construed as limitations on this patent. Those of ordinary skill in the art can understand the specific meanings of the above terms according to specific situations.
以下结合附图与具体实施例,对本发明的技术方案做详细的说明。The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.
参照图1,本发明提供的智慧农业AIOT分布式大数据存储平台,包括智慧农业大数据基础支撑平台和智慧农业数据中台,所述智慧农业大数据基础支撑平台用于对海量大数据的全生命周期的管理和支持,所述智慧农业数据中台用于将所述智慧农业大数据基础支撑平台存储的基础业务数据进行规划,以确保所述智慧农业AIOT分布式大数据存储平台发挥分布式架构作用;所述智慧农业大数据基础支撑平台与所述智慧农业数据中台通过计算机应用程序接口和网络实现数据交换;Referring to Fig. 1, the intelligent agriculture AIOT distributed big data storage platform provided by the present invention includes a smart agricultural big data basic support platform and a smart agricultural data middle platform. Life cycle management and support, the smart agricultural data center is used to plan the basic business data stored in the smart agricultural big data basic support platform, so as to ensure that the smart agricultural AIOT distributed big data storage platform can play a distributed role Architectural role; the smart agricultural big data basic support platform and the smart agricultural data middle platform realize data exchange through a computer application program interface and a network;
所述智慧农业大数据基础支撑平台包括数据获取系统、数据治理系统以及数据存储系统,所述数据获取系统用于数据采集,所述数据治理系统用于对所述数据获取系统采集的数据进行融合和数据治理,所述数据存储系统用于对经过所述数据治理系统分析处理后的数据进行存储;所述数据获取系统包括结构化数据采集模块、非结构化数据采集模块以及实时数据采集模块,所述结构化数据采集模块用于对结构化数据进行采集,所述非结构化数据模块用于对非结构化数据进行采集,所述实时数据采集模块用于对实时数据进行采集;所述数据治理系统包括数据抽取模块、数据清洗模块、数据转换模块以及数据加载模 块,所述数据抽取模块用于从所述数据获取系统采集的数据中获取业务数据,所述数据清洗模块用于将所述数据抽取模块获取的有缺陷的数据正确化和规范化以达到要求的数据质量标准,所述数据转换模块用于将所述数据获取系统采集的数据和所述数据抽取模块处理后的数据进行转换以符合数据仓库模型的需求,所述数据加载模块用于将所述数据转换模块转换完成的数据存放至目标数据库;所述数据存储系统包括业务数据库和分布式海量空间数据库,所述业务数据库用于存储与农业资源和农业决策相关的业务数据,所述分布式海量空间数据库用于存储遥感影像数据、视频数据以及物联网传感器设备数据;The smart agricultural big data basic support platform includes a data acquisition system, a data management system and a data storage system, the data acquisition system is used for data collection, and the data management system is used for fusing the data collected by the data acquisition system and data governance, the data storage system is used to store the data analyzed and processed by the data governance system; the data acquisition system includes a structured data acquisition module, an unstructured data acquisition module and a real-time data acquisition module, The structured data collection module is used to collect structured data, the unstructured data module is used to collect unstructured data, and the real-time data collection module is used to collect real-time data; the data The governance system includes a data extraction module, a data cleaning module, a data conversion module and a data loading module, the data extraction module is used to obtain business data from the data collected by the data acquisition system, and the data cleaning module is used to convert the The defective data acquired by the data extraction module is corrected and normalized to meet the required data quality standards, and the data conversion module is used to convert the data collected by the data acquisition system and the data processed by the data extraction module to obtain In line with the requirements of the data warehouse model, the data loading module is used to store the data converted by the data conversion module into the target database; the data storage system includes a business database and a distributed massive spatial database, and the business database is used for Store business data related to agricultural resources and agricultural decision-making, and the distributed massive spatial database is used to store remote sensing image data, video data, and IoT sensor device data;
所述智慧农业数据中台包括共享资源库、智慧农业主题资源库以及基础资源库,所述共享资源数据库通过共享交换平台以满足公众和机构对农业数据的共享需求,所述智慧农业主题资源库为根据应用和需求定制的不同农业主题的资源库,所述基础资源库用于存储多个系统共享使用的数据;所述共享资源库包括无公害产品全周期共享资源库、大宗农产品交易共享资源库、种子需求共享资源库、特色农产品供应量共享资源库、数字农业政务共享资源库,所述无公害产品全周期共享资源库用于为公众提供无公害产品全周期查询功能,所述种子需求共享资源库用于为从事农业生产的个人及厂家提供种子需求发布及查询功能,所述特色农产品供应量共享资源库用于为从事特色农产品交易的个人及厂家提供特色农产品供应量查询功能,所述数字农业政务共享资源库用于为政府机构提供农业相关的政务信息的发布和查询功能;所述智慧农业主题资源库包括农业产量主题资源库、产业布局主题资源库、环境监测主题资源库、农产品安全主题资源库、农产品物流主题资源库、渔业水产主题资源库、畜牧养殖主题资源库、病虫害防治主题资源库、土壤肥力主题资源库,所述农业产量主题资源库用于存储与农业产量相关的数据并提供查询功能,所述产业布局主 题资源库用于存储与农业产业布局相关的数据并提供查询功能,所述环境监测主题资源库用于存储与农业生产相关的环境监测数据并提供查询功能,所述农产品安全主题资源库用于存储与农产品安全相关的数据并提供查询功能,所述农产品物流主题资源库用于存储与农产品物流相关的数据并提供查询功能,所述渔业水产主题资源库用于存储与渔业水产相关的数据并提供查询功能,所述畜牧养殖主题资源库用于存储与畜牧养殖相关的数据并提供查询功能,所述病虫害防治主题资源库用于存储与病虫害防治相关的数据并提供查询功能,所述土壤肥力主题资源库用于存储与土壤肥力相关的数据并提供查询功能;所述基础资源库包括行政单元基础资源库、基础地形基础资源库、农业企业基础资源库、农业资源基础资源库、影像资源基础资源库,所述行政单元基础资源库用于存储与农业有关的行政单元数据并对其进行初级共性加工以供其他系统调用,所述基础地形基础资源库用于存储与农业有关的基础地形数据并对其进行初级共性加工以供其他系统调用,所述农业企业基础资源库用于存储农业企业数据并对其进行初级共性加工以供其他系统调用,所述农业资源基础资源库用于存储农业资源并对其进行初级共性加工以供其他系统调动,所述影像资源基础资源库用于存储与农业有关的影像资源并对其进行初级共性加工以供其他系统调动。The smart agricultural data center includes a shared resource library, a smart agricultural theme resource library, and a basic resource library. The shared resource database meets the public and institutional sharing needs for agricultural data through a shared exchange platform. The smart agricultural theme resource library It is a resource library of different agricultural themes customized according to applications and needs. The basic resource library is used to store data shared and used by multiple systems; the shared resource library includes the full-cycle shared resource library of pollution-free products, and the shared resources library, seed demand shared resource library, characteristic agricultural product supply shared resource library, and digital agricultural government affairs shared resource library. The shared resource database is used to provide seed demand release and query functions for individuals and manufacturers engaged in agricultural production, and the shared resource database for characteristic agricultural product supply is used to provide characteristic agricultural product supply query functions for individuals and manufacturers engaged in characteristic agricultural product transactions. The digital agricultural government sharing resource library is used to provide government agencies with the release and query functions of agriculture-related government information; the smart agriculture theme resource library includes agricultural output theme resource library, industrial layout theme resource library, environmental monitoring theme resource library, Agricultural product safety theme resource bank, agricultural product logistics theme resource bank, fishery and aquatic product theme resource bank, animal husbandry theme resource bank, disease and pest control theme resource bank, soil fertility theme resource bank, the agricultural output theme resource bank is used to store information related to agricultural output data and provide query functions, the industrial layout subject resource library is used to store data related to agricultural industrial layout and provide query functions, and the environmental monitoring subject resource library is used to store environmental monitoring data related to agricultural production and provide query functions function, the agricultural product safety theme resource library is used to store data related to agricultural product safety and provide query functions, the agricultural product logistics theme resource library is used to store data related to agricultural product logistics and provide query functions, and the fishery and aquatic product theme resources The library is used to store data related to fishery and aquatic products and provide query functions. The animal husbandry theme resource library is used to store data related to animal husbandry and provide query functions. The pest control theme resource library is used to store information related to disease and pest control. data and provide query function, the soil fertility subject resource library is used to store data related to soil fertility and provide query function; the basic resource library includes administrative unit basic resource library, basic terrain basic resource library, agricultural enterprise basic resource Library, basic resource library of agricultural resources, and basic resource library of image resources. The basic resource library of administrative units is used to store the data of administrative units related to agriculture and perform primary common processing on them for other system calls. The basic resource of basic terrain The library is used to store basic terrain data related to agriculture and perform primary common processing on it for other system calls. The agricultural enterprise basic resource library is used to store agricultural enterprise data and perform primary common processing on it for other system calls. The basic resource library of agricultural resources is used to store agricultural resources and carry out primary commonality Processing for mobilization by other systems, the image resource base resource library is used to store image resources related to agriculture and perform primary common processing on them for mobilization by other systems.
上述技术方案提供的智慧农业AIOT分布式大数据存储平台,包括智慧农业大数据基础支撑平台和智慧农业数据中台,智慧农业大数据基础支撑平台用于对海量大数据的全生命周期的管理和支持,智慧农业数据中台用于将所述智慧农业大数据基础支撑平台存储的基础业务数据进行规划,以确保所述智慧农业AIOT分布式大数据存储平台发挥分布式架构作用;本发明采用分布式架构,通过提供对海量大数据的全生命周期的管理和支持,为基于人工智能和互联网信 息共享建立的智慧农业大数据服务平台提供数据模型基础。The smart agriculture AIOT distributed big data storage platform provided by the above technical solutions includes the smart agricultural big data basic support platform and the smart agricultural data middle platform. Support, the smart agricultural data center is used to plan the basic business data stored on the smart agricultural big data basic support platform, so as to ensure that the smart agricultural AIOT distributed big data storage platform plays a role in distributed architecture; the present invention adopts distributed By providing full life cycle management and support for massive big data, it provides a data model foundation for the smart agricultural big data service platform based on artificial intelligence and Internet information sharing.
作为本发明的一种实施方式,所述智慧农业数据中台还包括近源采集数据库,所述近源采集数据库依照源系统建模,以尽量保持从所述数据存储系统获得的业务数据原貌。As an embodiment of the present invention, the smart agricultural data center further includes a near-source acquisition database, and the near-source acquisition database is modeled according to the source system, so as to keep the original appearance of the business data obtained from the data storage system as much as possible.
作为本发明的一种实施方式,所述共享资源库为在确保信息安全的基础上,通过共享平台,采用数据服务模式,依托从所述近源采集数据库获得的相关数据建立的对外共享资源库;所述智能农业主题资源库采用ETL工具,将从所述近源采集数据库获得的相关数据进行共性加工,面向应用,按需定制;所述基础资源库将从所述近源采集数据库获得的相关数据进行初级共性加工并提炼共性属性。As an embodiment of the present invention, the shared resource library is an external shared resource library established on the basis of ensuring information security, using a data service mode through a shared platform, and relying on relevant data obtained from the near-source collection database ; The intelligent agriculture subject resource library adopts ETL tools to process the relevant data obtained from the near-source collection database in a common way, and is application-oriented and customized on demand; Relevant data undergoes primary common processing and refines common attributes.
作为本发明的一种实施方式,所述结构化数据包括从智慧农业资源管理系统、智慧农业生产管理系统、智慧农业供应链管理系统、智慧农业党建管理系统获得数据以及源于智慧城市的政务信息;所述结构化数据存储于结构化数据库和分布式数据库,能够通过数据接口协议进行实时或离线传输;所述非结构数据包括卫星遥感影像数据、空间地理数据、智能物联网传感器数据以及频数据,使用Hadoop集群进行分布式海量数据的存储;所述实时数据包括传感器、遥感影像数据以及海量并发数据。As an embodiment of the present invention, the structured data includes data obtained from the smart agricultural resource management system, smart agricultural production management system, smart agricultural supply chain management system, smart agricultural party building management system, and government affairs information from smart cities The structured data is stored in a structured database and a distributed database, and can be transmitted in real time or offline through a data interface protocol; the unstructured data includes satellite remote sensing image data, spatial geographic data, intelligent IoT sensor data and frequency data , using Hadoop clusters to store distributed massive data; the real-time data includes sensors, remote sensing image data, and massive concurrent data.
作为本发明的一种实施方式,所述业务数据是以县、乡为行政单元的土地、水、气候、人口和农业经济和农业资源数据以及基本农田划区定界、标准农田、土地二轮承包、农业决策专家知识库、耕地地力调查与质量评价研究中的土壤养分、重金属和农药残留;所述业务数据库在制定元数据库、数据字典和数据表结构系列同时制订配套的属性数据采集标准与规范;所述分布式海量空间数据库的架构采用基于Hadoop体系中分布式文件管理系统,以及基于 MPP+Share-nothing技术设计的MPP数据库。As an embodiment of the present invention, the business data is land, water, climate, population, agricultural economy and agricultural resource data as well as basic farmland demarcation and demarcation, standard farmland, land two rounds Soil nutrients, heavy metals and pesticide residues in contracting, agricultural decision-making expert knowledge base, cultivated land fertility survey and quality evaluation research; the business database formulates supporting attribute data collection standards and Specification; the architecture of the distributed massive spatial database adopts the distributed file management system based on the Hadoop system, and the MPP database based on the MPP+Share-nothing technology design.
作为本发明的一种实施方式,所述数据抽取模块的数据抽取包括以下情况:如果业务操作型数据库和数据仓库之间的数据库管理系统完全相同,只需要建立相应的连接关系就可以使用ETL工具直接访问,或者调用相应的SQL语句或者存储过程;如果数据仓库系统和业务操作型数据库的数据库管理系统不相同,使用ETL工具导出成文本文件或者Excel文件,然后再进行统一的数据抽取;如果需要抽取的数据量非常庞大,采用增量抽取方式,用标记位或者时间戳的形式,每次抽取前首先判断是否是抽取标记位或者是当前最近的时间,然后再将数据源的数据抽取出来。As an embodiment of the present invention, the data extraction of the data extraction module includes the following situations: if the database management systems between the business operation database and the data warehouse are completely the same, only need to establish a corresponding connection relationship to use the ETL tool Direct access, or call the corresponding SQL statement or stored procedure; if the database management system of the data warehouse system and the business operation database are different, use the ETL tool to export it into a text file or Excel file, and then perform unified data extraction; if necessary The amount of extracted data is very large. The incremental extraction method is used in the form of marker bits or timestamps. Before each extraction, it is first judged whether it is the extraction marker bit or the latest time, and then the data from the data source is extracted.
作为本发明的一种实施方式,所述数据清洗模块选择的缺陷数据包括数值重复、数据缺失、数据错误、数据范围混淆、存在脏数据和数据不一致这几种情况;数值重复是标准不唯一,很多数值都代表着相同的含义;数据范围混淆是指相同的数值会应用到不同的场合中,代表着不同的含义。As an embodiment of the present invention, the defect data selected by the data cleaning module includes numerical repetition, data missing, data error, data range confusion, dirty data, and data inconsistency; numerical repetition is a standard that is not unique, Many values represent the same meaning; data range confusion means that the same value will be applied to different occasions and represent different meanings.
具体地,所述数据清洗模块的数据清洗流程包括以下步骤:Specifically, the data cleaning process of the data cleaning module includes the following steps:
S01:定义业务数据源,标识出满足需求的数据源,并且决定什么时候进行数据清洗;S01: Define business data sources, identify data sources that meet the requirements, and decide when to perform data cleaning;
S02:分析业务数据源,分析数据源的数据是否符合业务的规则和定义,是否存在非正常的数据结构;S02: Analyze the business data source, analyze whether the data of the data source conforms to the rules and definitions of the business, and whether there is an abnormal data structure;
S03:将数据标准化,定义标准化格式的数据,并且加以转换;S03: Standardize the data, define the data in a standardized format, and convert it;
S04:通过业务规则修正错误数据,定义是否为正确数据的标准,确定如何处理错误数据的方法;S04: Correct the wrong data through business rules, define the standard of whether it is correct data, and determine how to deal with the wrong data;
S05:合并数据,将属于同一实体的多个数据进行合并,合并时应该有去重的功能;S05: Merge data, merge multiple data belonging to the same entity, and there should be a deduplication function when merging;
S06:总结数据错误类型,通过总结数据出错的类型,提高清洗程序的完整性和正确性,从而降低数据出现重大问题的可能性。S06: Summarize the types of data errors, improve the integrity and correctness of the cleaning program by summarizing the types of data errors, thereby reducing the possibility of major data problems.
具体地,所述数据转换模块的转换过程包括以下步骤:Specifically, the conversion process of the data conversion module includes the following steps:
S11:对空值的处理:如果在转换过程中捕获到某些字段存在空值,在进行加载时需要将空值替换成某一数据或者直接进行加载,不做任何转换;S11: Handling of null values: If some fields have null values captured during the conversion process, the null values need to be replaced with certain data or loaded directly without any conversion;
S12:对数据格式的规范化:根据业务数据源中各个字段的数据类型,进行数据格式的规范和统一,例如,统一将数值类型转化成字符串类型;S12: Standardize the data format: standardize and unify the data format according to the data type of each field in the business data source, for example, uniformly convert the numeric type into a string type;
S13:根据业务需求进行字段的拆分或者合并;S13: Split or merge fields according to business requirements;
S14:对缺失数据的替换:根据业务需求对缺失数据进行替换;S14: Replace missing data: replace missing data according to business requirements;
S15:根据业务规则对数据进行过滤;S15: Filter data according to business rules;
S16:根据编码表进行数据唯一性的转换:根据编码表制定的业务规范进行数据的转换,实现数据仓库系统内部数据的一致性。S16: Conversion of data uniqueness according to the coding table: data conversion is performed according to the business specification formulated by the coding table to realize the consistency of the internal data of the data warehouse system.
具体地,所述数据加载模块的据加载策略包括时间戳的加载方式、全表对比的加载方式、通过读取日志表进行加载的方式、全表删除后再进行加载的方式;时间戳的加载方式是通过对源系统的表添加时间戳字段,将系统当前时间和时间戳的值进行对比,决定哪些业务数据需要被抽取,可以实现数据的递增加载,是比较常见的一种加载方式;全表对比的加载方式是在数据加载前,将每条数据都与目标表的所有记录进行全表对比,根据主键值是否相同,判断数据是更新还是插入,当数据量比较大的时候,有耗时长、效率低的缺点,通常也对全表对比进行改进,采用版本号、标记字段等缓慢变化维的形式进行增量的抽取;通过读取日志表进行加载的方式是当源数据表发生变化时,不断更新日志表的信息,将日志表的信息作为数据加载的一个依据,日志表维护相对麻烦,会存在一定风险;全表删除后再进行加载的方式是在数据加载前,先删除 目标表的所有数据,然后去加载全部的数据,但是不能实现数据的递增加载,效率较低,实现方式却相对简单。Specifically, the data loading strategy of the data loading module includes the loading mode of the timestamp, the loading mode of the full table comparison, the loading mode by reading the log table, and the loading mode after the full table is deleted; the loading of the timestamp The method is to add a timestamp field to the table of the source system, compare the current time of the system with the value of the timestamp, and decide which business data needs to be extracted, which can realize incremental loading of data, which is a relatively common loading method; full The loading method of table comparison is to compare each piece of data with all records of the target table before data loading, and judge whether the data is updated or inserted according to whether the primary key values are the same. When the amount of data is relatively large, there is The shortcomings of long time consumption and low efficiency usually also improve the comparison of the whole table, and incremental extraction is performed in the form of slowly changing dimensions such as version number and tag field; the method of loading by reading the log table is when the source data table occurs When changing, the information of the log table is constantly updated, and the information of the log table is used as a basis for data loading. The maintenance of the log table is relatively troublesome, and there will be certain risks; the way to load after deleting the entire table is to delete the data before loading the data. All the data in the target table, and then load all the data, but the incremental loading of data cannot be achieved, the efficiency is low, and the implementation method is relatively simple.
优选地,本发明技术方案所涉及的所有模块的实现方式均采用公开的、成熟的、开源的程序架构及程序代码,本发明技术方案所涉及的业务流程、业务术语和实现的功能均为本领域的公知常识,本领域的技术人员根据本技术方案的描述可以轻易采用已有的、公开的程序架构及程序代码实现。Preferably, the implementation of all modules involved in the technical solution of the present invention adopts open, mature, open source program architecture and program codes, and the business processes, business terms and implemented functions involved in the technical solution of the present invention are all Based on the common knowledge in the field, those skilled in the art can easily adopt the existing and public program architecture and program codes to implement according to the description of the technical solution.
以上对本发明的实施例进行了详细的说明,但本发明的创造并不限于本实施例,熟悉本领域的技术人员在不违背本发明精神的前提下,还可以做出许多同等变型或替换,这些同等变型或替换均包含在本申请的权利要求所限定的保护范围内。The embodiment of the present invention has been described in detail above, but the creation of the present invention is not limited to this embodiment. Those skilled in the art can also make many equivalent modifications or replacements without violating the spirit of the present invention. These equivalent modifications or replacements are all included in the scope of protection defined by the claims of the present application.

Claims (10)

  1. 智慧农业AIOT分布式大数据存储平台,其特征在于,包括智慧农业大数据基础支撑平台和智慧农业数据中台,所述智慧农业大数据基础支撑平台用于对海量大数据的全生命周期的管理和支持,所述智慧农业数据中台用于将所述智慧农业大数据基础支撑平台存储的基础业务数据进行规划,以确保所述智慧农业AIOT分布式大数据存储平台发挥分布式架构作用;所述智慧农业大数据基础支撑平台与所述智慧农业数据中台通过计算机应用程序接口和网络实现数据交换;The smart agriculture AIOT distributed big data storage platform is characterized in that it includes a smart agricultural big data basic support platform and a smart agricultural data middle platform, and the smart agricultural big data basic support platform is used to manage the entire life cycle of massive big data and support, the smart agricultural data center is used to plan the basic business data stored on the smart agricultural big data basic support platform, so as to ensure that the smart agricultural AIOT distributed big data storage platform plays a role in distributed architecture; The smart agricultural big data basic support platform and the smart agricultural data middle platform realize data exchange through a computer application program interface and a network;
    所述智慧农业大数据基础支撑平台包括数据获取系统、数据治理系统以及数据存储系统,所述数据获取系统用于数据采集,所述数据治理系统用于对所述数据获取系统采集的数据进行融合和数据治理,所述数据存储系统用于对经过所述数据治理系统分析处理后的数据进行存储;所述数据获取系统包括结构化数据采集模块、非结构化数据采集模块以及实时数据采集模块,所述结构化数据采集模块用于对结构化数据进行采集,所述非结构化数据模块用于对非结构化数据进行采集,所述实时数据采集模块用于对实时数据进行采集;所述数据治理系统包括数据抽取模块、数据清洗模块、数据转换模块以及数据加载模块,所述数据抽取模块用于从所述数据获取系统采集的数据中获取业务数据,所述数据清洗模块用于将所述数据抽取模块获取的有缺陷的数据正确化和规范化以达到要求的数据质量标准,所述数据转换模块用于将所述数据获取系统采集的数据和所述数据抽取模块处理后的数据进行转换以符合数据仓库模型的需求,所述数据加载模块用于将所述数据转换模块转换完成的数据存放至目标数据库;所述数据存储系统包括业务数据库和分布式海量空间数据库,所述业务数据库用于存储与农业资源和农业决策相关的业务数据,所述分布式海量空间数据库用于存储遥感影像数据、视频数据以及物联网传感器设备数据;The smart agricultural big data basic support platform includes a data acquisition system, a data management system and a data storage system, the data acquisition system is used for data collection, and the data management system is used for fusing the data collected by the data acquisition system and data governance, the data storage system is used to store the data analyzed and processed by the data governance system; the data acquisition system includes a structured data acquisition module, an unstructured data acquisition module and a real-time data acquisition module, The structured data collection module is used to collect structured data, the unstructured data module is used to collect unstructured data, and the real-time data collection module is used to collect real-time data; the data The governance system includes a data extraction module, a data cleaning module, a data conversion module and a data loading module, the data extraction module is used to obtain business data from the data collected by the data acquisition system, and the data cleaning module is used to convert the The defective data acquired by the data extraction module is corrected and normalized to meet the required data quality standards, and the data conversion module is used to convert the data collected by the data acquisition system and the data processed by the data extraction module to obtain In line with the requirements of the data warehouse model, the data loading module is used to store the data converted by the data conversion module into the target database; the data storage system includes a business database and a distributed massive spatial database, and the business database is used for Store business data related to agricultural resources and agricultural decision-making, and the distributed massive spatial database is used to store remote sensing image data, video data, and IoT sensor device data;
    所述智慧农业数据中台包括共享资源库、智慧农业主题资源库以及基础资源库,所述共享资源数据库通过共享交换平台以满足公众和机构对农业数据的共享需求,所述智慧农业主题资源库为根据应用和需求定制的不同农业主题的资源库,所述基础资源库用于存储多个系统共享使用的数据;所述共享资源库包括无公害产品全周期共享资源库、大宗农产品交易共享资源库、种子需求共享资源库、特色农产品供应量共享资源库、数字农业政务共享资源库,所述无公害产品全周期共享资源库用于为公众提供无公害产品全周期查询功能,所述种子需求共享资源库用于为从事农业生产的个人及厂家提供种子需求发布及查询功能,所述特色农产品供应量共享资源库用于为从事特色农产品交易的个人及厂家提供特色农产品供应量查询功能,所述数字农业政务共享资源库用于为政府机构提供农业相关的政务信息的发布和查询功能;所述智慧农业主题资源库包括农业产量主题资源库、产业布局主题资源库、环境监测主题资源库、农产品安全主题资源库、农产品物流主题资源库、渔业水产主题资源库、畜牧养殖主题资源库、病虫害防治主题资源库、土壤肥力主题资源库,所述农业产量主题资源库用于存储与农业产量相关的数据并提供查询功能,所述产业布局主题资源库用于存储与农业产业布局相关的数据并提供查询功能,所述环境监测主题资源库用于存储与农业生产相关的环境监测数据并提供查询功能,所述农产品安全主题资源库用于存储与农产品安全相关的数据并提供查询功能,所述农产品物流主题资源库用于存储与农产品物流相关的数据并提供查询功能,所述渔业水产主题资源库用于存储与渔业水产相关的数据并提供查询功能,所述畜牧养殖主题资源库用于存储与畜牧养殖相关的数据并提供查询功能,所述病虫害防治主题资源库用于存储与病虫害防治相关的数据并提供查询功能,所述土壤肥力主题资源库用于存储与土壤肥力相关的数据并提供查询功能;所述基 础资源库包括行政单元基础资源库、基础地形基础资源库、农业企业基础资源库、农业资源基础资源库、影像资源基础资源库,所述行政单元基础资源库用于存储与农业有关的行政单元数据并对其进行初级共性加工以供其他系统调用,所述基础地形基础资源库用于存储与农业有关的基础地形数据并对其进行初级共性加工以供其他系统调用,所述农业企业基础资源库用于存储农业企业数据并对其进行初级共性加工以供其他系统调用,所述农业资源基础资源库用于存储农业资源并对其进行初级共性加工以供其他系统调动,所述影像资源基础资源库用于存储与农业有关的影像资源并对其进行初级共性加工以供其他系统调动。The smart agricultural data center includes a shared resource library, a smart agricultural theme resource library, and a basic resource library. The shared resource database meets the public and institutional sharing needs for agricultural data through a shared exchange platform. The smart agricultural theme resource library It is a resource library of different agricultural themes customized according to applications and needs. The basic resource library is used to store data shared and used by multiple systems; the shared resource library includes the full-cycle shared resource library of pollution-free products, and the shared resources library, seed demand shared resource library, characteristic agricultural product supply shared resource library, and digital agricultural government affairs shared resource library. The shared resource database is used to provide seed demand release and query functions for individuals and manufacturers engaged in agricultural production, and the shared resource database for characteristic agricultural product supply is used to provide characteristic agricultural product supply query functions for individuals and manufacturers engaged in characteristic agricultural product transactions. The digital agricultural government sharing resource library is used to provide government agencies with the release and query functions of agriculture-related government information; the smart agriculture theme resource library includes agricultural output theme resource library, industrial layout theme resource library, environmental monitoring theme resource library, Agricultural product safety theme resource bank, agricultural product logistics theme resource bank, fishery and aquatic product theme resource bank, animal husbandry theme resource bank, disease and pest control theme resource bank, soil fertility theme resource bank, the agricultural output theme resource bank is used to store information related to agricultural output data and provide query functions, the industrial layout subject resource library is used to store data related to agricultural industrial layout and provide query functions, and the environmental monitoring subject resource library is used to store environmental monitoring data related to agricultural production and provide query functions function, the agricultural product safety theme resource library is used to store data related to agricultural product safety and provide query functions, the agricultural product logistics theme resource library is used to store data related to agricultural product logistics and provide query functions, and the fishery and aquatic product theme resources The library is used to store data related to fishery and aquatic products and provide query functions. The animal husbandry theme resource library is used to store data related to animal husbandry and provide query functions. The pest control theme resource library is used to store information related to disease and pest control. data and provide query function, the soil fertility subject resource library is used to store data related to soil fertility and provide query function; the basic resource library includes administrative unit basic resource library, basic terrain basic resource library, agricultural enterprise basic resource Library, basic resource library of agricultural resources, and basic resource library of image resources. The basic resource library of administrative units is used to store the data of administrative units related to agriculture and perform primary common processing on them for other system calls. The basic resource of basic terrain The library is used to store basic terrain data related to agriculture and perform primary common processing on it for other system calls. The agricultural enterprise basic resource library is used to store agricultural enterprise data and perform primary common processing on it for other system calls. The basic resource library of agricultural resources is used to store agricultural resources and carry out primary commonality Processing for mobilization by other systems, the image resource base resource library is used to store image resources related to agriculture and perform primary common processing on them for mobilization by other systems.
  2. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述智慧农业数据中台还包括近源采集数据库,所述近源采集数据库依照源系统建模,以尽量保持从所述数据存储系统获得的业务数据原貌。According to the smart agriculture AIOT distributed big data storage platform according to claim 1, it is characterized in that, the smart agriculture data center also includes a near-source acquisition database, and the near-source acquisition database is modeled according to the source system to keep as much as possible The original appearance of business data obtained from the data storage system.
  3. 根据权利要求2所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述共享资源库为在确保信息安全的基础上,通过共享平台,采用数据服务模式,依托从所述近源采集数据库获得的相关数据建立的对外共享资源库;所述智能农业主题资源库采用ETL工具,将从所述近源采集数据库获得的相关数据进行共性加工,面向应用,按需定制;所述基础资源库将从所述近源采集数据库获得的相关数据进行初级共性加工并提炼共性属性。According to the intelligent agriculture AIOT distributed big data storage platform according to claim 2, it is characterized in that, the shared resource library adopts a data service mode through a shared platform on the basis of ensuring information security, relying on the resources from the near source An external shared resource library established by collecting relevant data obtained from the database; the intelligent agricultural theme resource library uses ETL tools to process the relevant data obtained from the near-source collection database for general processing, application-oriented, and customized on demand; the basis The resource library performs primary common processing and refines common attributes on the related data obtained from the near-source collection database.
  4. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述结构化数据包括从智慧农业资源管理系统、智慧农业生产管理系统、智慧农业供应链管理系统、智慧农业党建管理系统获得数据以及源于智慧城市的政务信息;所述结构化数据存储于结构化数据库和分布式数据库,能够通过数据接口协议进行实时或离线传输;所述非结构数据包括卫星遥感影像数据、 空间地理数据、智能物联网传感器数据以及频数据,使用Hadoop集群进行分布式海量数据的存储;所述实时数据包括传感器、遥感影像数据以及海量并发数据。According to claim 1, the smart agriculture AIOT distributed big data storage platform is characterized in that, the structured data includes information from smart agricultural resource management system, smart agricultural production management system, smart agricultural supply chain management system, smart agricultural party building The management system obtains data and government affairs information from smart cities; the structured data is stored in structured databases and distributed databases, and can be transmitted in real time or offline through data interface protocols; the unstructured data includes satellite remote sensing image data, Spatial geographic data, intelligent Internet of Things sensor data, and frequency data use Hadoop clusters for distributed massive data storage; the real-time data includes sensors, remote sensing image data, and massive concurrent data.
  5. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述业务数据是以县、乡为行政单元的土地、水、气候、人口和农业经济和农业资源数据以及基本农田划区定界、标准农田、土地二轮承包、农业决策专家知识库、耕地地力调查与质量评价研究中的土壤养分、重金属和农药残留;所述业务数据库在制定元数据库、数据字典和数据表结构系列同时制订配套的属性数据采集标准与规范;所述分布式海量空间数据库的架构采用基于Hadoop体系中分布式文件管理系统,以及基于MPP+Share-nothing技术设计的MPP数据库。According to claim 1, the smart agriculture AIOT distributed big data storage platform is characterized in that the business data is land, water, climate, population, agricultural economy and agricultural resource data and basic Soil nutrients, heavy metals and pesticide residues in farmland zoning and delimitation, standard farmland, second-round land contracting, agricultural decision-making expert knowledge base, cultivated land fertility survey and quality evaluation research; The table structure series formulates supporting attribute data collection standards and specifications at the same time; the architecture of the distributed massive spatial database adopts the distributed file management system based on the Hadoop system and the MPP database designed based on the MPP+Share-nothing technology.
  6. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述数据抽取模块的数据抽取包括以下情况:如果业务操作型数据库和数据仓库之间的数据库管理系统完全相同,只需要建立相应的连接关系就可以使用ETL工具直接访问,或者调用相应的SQL语句或者存储过程;如果数据仓库系统和业务操作型数据库的数据库管理系统不相同,使用ETL工具导出成文本文件或者Excel文件,然后再进行统一的数据抽取;如果需要抽取的数据量非常庞大,采用增量抽取方式,用标记位或者时间戳的形式,每次抽取前首先判断是否是抽取标记位或者是当前最近的时间,然后再将数据源的数据抽取出来。The AIOT distributed big data storage platform for smart agriculture according to claim 1, wherein the data extraction of the data extraction module includes the following situations: if the database management systems between the business operation database and the data warehouse are completely the same, You only need to establish the corresponding connection relationship to use the ETL tool to directly access, or call the corresponding SQL statement or stored procedure; if the database management system of the data warehouse system and the business operation database are different, use the ETL tool to export it into a text file or Excel files, and then perform unified data extraction; if the amount of data to be extracted is very large, use incremental extraction in the form of marker bits or timestamps. Before each extraction, first determine whether it is to extract marker bits or the current latest time, and then extract the data from the data source.
  7. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述数据清洗模块选择的缺陷数据包括数值重复、数据缺失、数据错误、数据范围混淆、存在脏数据和数据不一致这几种情况;数值重复是标准不唯一, 很多数值都代表着相同的含义;数据范围混淆是指相同的数值会应用到不同的场合中,代表着不同的含义。The smart agriculture AIOT distributed big data storage platform according to claim 1, wherein the defect data selected by the data cleaning module includes repeated values, missing data, data errors, confusion of data ranges, existence of dirty data, and data inconsistencies In these cases, repeated values mean that the standard is not unique, and many values represent the same meaning; data range confusion means that the same value will be applied to different occasions and represent different meanings.
  8. 根据权利要求6所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述数据清洗模块的数据清洗流程包括以下步骤:The smart agriculture AIOT distributed big data storage platform according to claim 6, wherein the data cleaning process of the data cleaning module comprises the following steps:
    S01:定义业务数据源,标识出满足需求的数据源,并且决定什么时候进行数据清洗;S01: Define business data sources, identify data sources that meet the requirements, and decide when to perform data cleaning;
    S02:分析业务数据源,分析数据源的数据是否符合业务的规则和定义,是否存在非正常的数据结构;S02: Analyze the business data source, analyze whether the data of the data source conforms to the rules and definitions of the business, and whether there is an abnormal data structure;
    S03:将数据标准化,定义标准化格式的数据,并且加以转换;S03: Standardize the data, define the data in a standardized format, and convert it;
    S04:通过业务规则修正错误数据,定义是否为正确数据的标准,确定如何处理错误数据的方法;S04: Correct the wrong data through business rules, define the standard of whether it is correct data, and determine how to deal with the wrong data;
    S05:合并数据,将属于同一实体的多个数据进行合并;S05: Merge data, merge multiple data belonging to the same entity;
    S06:总结数据错误类型,通过总结数据出错的类型,提高清洗程序的完整性和正确性,从而降低数据出现重大问题的可能性。S06: Summarize the types of data errors, improve the integrity and correctness of the cleaning program by summarizing the types of data errors, thereby reducing the possibility of major data problems.
  9. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述数据转换模块的转换过程包括以下步骤:The smart agriculture AIOT distributed big data storage platform according to claim 1, wherein the conversion process of the data conversion module comprises the following steps:
    S11:对空值的处理:如果在转换过程中捕获到某些字段存在空值,在进行加载时需要将空值替换成某一数据或者直接进行加载,不做任何转换;S11: Handling of null values: If some fields have null values captured during the conversion process, the null values need to be replaced with certain data or loaded directly without any conversion;
    S12:对数据格式的规范化:根据业务数据源中各个字段的数据类型,进行数据格式的规范和统一,例如,统一将数值类型转化成字符串类型;S12: Standardize the data format: standardize and unify the data format according to the data type of each field in the business data source, for example, uniformly convert the numeric type into a string type;
    S13:根据业务需求进行字段的拆分或者合并;S13: Split or merge fields according to business requirements;
    S14:对缺失数据的替换:根据业务需求对缺失数据进行替换;S14: Replace missing data: replace missing data according to business requirements;
    S15:根据业务规则对数据进行过滤;S15: Filter data according to business rules;
    S16:根据编码表进行数据唯一性的转换:根据编码表制定的业务规范进行数据的转换,实现数据仓库系统内部数据的一致性。S16: Conversion of data uniqueness according to the coding table: data conversion is performed according to the business specification formulated by the coding table to realize the consistency of the internal data of the data warehouse system.
  10. 根据权利要求1所述的智慧农业AIOT分布式大数据存储平台,其特征在于,所述数据加载模块的据加载策略包括时间戳的加载方式、全表对比的加载方式、通过读取日志表进行加载的方式、全表删除后再进行加载的方式;时间戳的加载方式是通过对源系统的表添加时间戳字段,将系统当前时间和时间戳的值进行对比,决定哪些业务数据需要被抽取,可以实现数据的递增加载,是比较常见的一种加载方式;全表对比的加载方式是在数据加载前,将每条数据都与目标表的所有记录进行全表对比,根据主键值是否相同,判断数据是更新还是插入;通过读取日志表进行加载的方式是当源数据表发生变化时,不断更新日志表的信息,将日志表的信息作为数据加载的一个依据;全表删除后再进行加载的方式是在数据加载前,先删除目标表的所有数据,然后去加载全部的数据。According to the intelligent agriculture AIOT distributed big data storage platform according to claim 1, it is characterized in that the data loading strategy of the data loading module includes a loading method of a timestamp, a loading method of a full table comparison, and a data loading method by reading a log table. The way of loading, the way of loading after deleting the entire table; the way of loading the timestamp is to add a timestamp field to the table of the source system, compare the current time of the system with the value of the timestamp, and determine which business data needs to be extracted , which can realize incremental loading of data, which is a relatively common loading method; the loading method of full table comparison is to compare each piece of data with all records of the target table before data loading, according to whether the primary key value is Similarly, it is judged whether the data is updated or inserted; the method of loading by reading the log table is to continuously update the information of the log table when the source data table changes, and use the information of the log table as a basis for data loading; after the entire table is deleted The way to load again is to delete all the data in the target table before loading the data, and then load all the data.
PCT/CN2021/111626 2021-07-28 2021-08-09 Smart agriculture aiot distributed big data storage platform WO2023004881A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110853901.5 2021-07-28
CN202110853901.5A CN113297196A (en) 2021-07-28 2021-07-28 Intelligent agricultural AIOT distributed big data storage platform

Publications (1)

Publication Number Publication Date
WO2023004881A1 true WO2023004881A1 (en) 2023-02-02

Family

ID=77331273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/111626 WO2023004881A1 (en) 2021-07-28 2021-08-09 Smart agriculture aiot distributed big data storage platform

Country Status (2)

Country Link
CN (1) CN113297196A (en)
WO (1) WO2023004881A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308293A (en) * 2023-03-27 2023-06-23 上海华维可控农业科技集团股份有限公司 Intelligent agricultural equipment management system and method based on digital platform
CN117891812A (en) * 2024-03-18 2024-04-16 北京数字一百信息技术有限公司 Big data cleaning method and system based on artificial intelligence

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777087B (en) * 2023-08-24 2023-12-15 夏露 Intelligent agriculture layout method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104766153A (en) * 2015-02-03 2015-07-08 中国科学院合肥物质科学研究院 Agricultural things-internet platform architecture
CN105389766A (en) * 2015-12-17 2016-03-09 北京中科云集科技有限公司 Smart city management method and system based on cloud platform
CN106022948A (en) * 2016-07-20 2016-10-12 安徽朗坤物联网有限公司 Comprehensive service system of agricultural internet of things
CN109726848A (en) * 2018-11-20 2019-05-07 江苏智途科技股份有限公司 A kind of wisdom agricultural big data service platform
CN111986042A (en) * 2020-08-24 2020-11-24 绵阳上策网络科技有限公司 Agricultural big data service system constructed based on internet technology
CN112783897A (en) * 2021-01-14 2021-05-11 江西省农业科学院农业经济与信息研究所 Modern agriculture science and technology service cloud platform

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6995675B2 (en) * 1998-03-09 2006-02-07 Curkendall Leland D Method and system for agricultural data collection and management
CN106709017B (en) * 2016-12-27 2018-01-26 山东麦港数据系统有限公司 A kind of aid decision-making method based on big data
CN107506393B (en) * 2017-07-28 2023-11-24 农业农村部农药检定所(国际食品法典农药残留委员会秘书处) Agricultural big data model and application thereof in agriculture

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104766153A (en) * 2015-02-03 2015-07-08 中国科学院合肥物质科学研究院 Agricultural things-internet platform architecture
CN105389766A (en) * 2015-12-17 2016-03-09 北京中科云集科技有限公司 Smart city management method and system based on cloud platform
CN106022948A (en) * 2016-07-20 2016-10-12 安徽朗坤物联网有限公司 Comprehensive service system of agricultural internet of things
CN109726848A (en) * 2018-11-20 2019-05-07 江苏智途科技股份有限公司 A kind of wisdom agricultural big data service platform
CN111986042A (en) * 2020-08-24 2020-11-24 绵阳上策网络科技有限公司 Agricultural big data service system constructed based on internet technology
CN112783897A (en) * 2021-01-14 2021-05-11 江西省农业科学院农业经济与信息研究所 Modern agriculture science and technology service cloud platform

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308293A (en) * 2023-03-27 2023-06-23 上海华维可控农业科技集团股份有限公司 Intelligent agricultural equipment management system and method based on digital platform
CN116308293B (en) * 2023-03-27 2023-12-15 上海华维可控农业科技集团股份有限公司 Intelligent agricultural equipment management system and method based on digital platform
CN117891812A (en) * 2024-03-18 2024-04-16 北京数字一百信息技术有限公司 Big data cleaning method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN113297196A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2023004881A1 (en) Smart agriculture aiot distributed big data storage platform
CN113778967B (en) Yangtze river basin data acquisition processing and resource sharing system
Yan-e Design of intelligent agriculture management information system based on IoT
CN111680025A (en) Method and system for intelligently assimilating space-time information of multi-source heterogeneous data oriented to natural resources
CN109542967B (en) Smart city data sharing system and method based on XBRL standard
LeBauer et al. BETYdb: A yield, trait, and ecosystem service database applied to second‐generation bioenergy feedstock production
CN104205039A (en) Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
CN108133006A (en) A kind of satellite remote sensing product systems of facing agricultural application
CN112328577A (en) Agricultural big data management system and method based on county area
CN109190984B (en) Data processing system and method based on data cube model
CN112699100A (en) Management and analysis system based on metadata
Ngo et al. Data warehouse and decision support on integrated crop big data
CN109977125A (en) A kind of big data safety analysis plateform system based on network security
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN112883001A (en) Data processing method, device and medium based on marketing and distribution through data visualization platform
CN105404637A (en) Data mining method and device
CN113506098A (en) Power plant metadata management system and method based on multi-source data
Yan et al. Research on precision management of farming season based on big data
Chen et al. A study of big data application in agriculture
CN116561114A (en) Metadata-based management method
Hodinka et al. Business intelligence in Environmental reporting powered by XBRL
Vaidya et al. Exploring performance and predictive analytics of agriculture data
CN114387119A (en) Agricultural big data platform based on high in clouds
Balti et al. Enhancing big data warehousing and analytics for spatio-temporal massive data
Sayed et al. A conceptual framework for using big data in Egyptian agriculture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21951460

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE