CN111105083A - Crop growth early warning method and device based on data mining - Google Patents

Crop growth early warning method and device based on data mining Download PDF

Info

Publication number
CN111105083A
CN111105083A CN201911263879.8A CN201911263879A CN111105083A CN 111105083 A CN111105083 A CN 111105083A CN 201911263879 A CN201911263879 A CN 201911263879A CN 111105083 A CN111105083 A CN 111105083A
Authority
CN
China
Prior art keywords
growth
information
data
crop
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911263879.8A
Other languages
Chinese (zh)
Inventor
高燕
唐孟轩
岳希
曾琼
刘敦龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201911263879.8A priority Critical patent/CN111105083A/en
Publication of CN111105083A publication Critical patent/CN111105083A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining

Abstract

The invention discloses a crop growth early warning method and device based on data mining, wherein the method comprises the following steps: the crop data acquisition module acquires various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop; the crop data analysis module analyzes and counts the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification; and the mining analysis module performs mining analysis on the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.

Description

Crop growth early warning method and device based on data mining
Technical Field
The invention relates to the technical field of data processing, in particular to a crop growth early warning method and device based on data mining.
Background
Agriculture is a huge complex system. China has wide land, various soil types, complex crop varieties, high occurrence frequency of plant diseases and insect pests, continuously changed symptoms, and mutual relation and influence of soil conditions and climatic conditions, and much accumulated data is not known by people, but the accumulated data is not available in any field, and has the characteristics of large quantity, multiple dimensions, dynamic state, incompleteness, uncertainty and the like. Data mining is an important link in database knowledge discovery, the main purpose of the data mining is to search information hidden in a large amount of data and having special relations, and data analysis is to analyze the mined information to form a conclusion, so that the data is studied in detail and summarized. Through the mining and analysis of the agricultural data, the relation existing among the agricultural data can be obtained, and a solid foundation is laid for the development of subsequent yield increasing work research. As the world agriculture world, China has the current situation of 'information explosion but lack of knowledge', with the continuous accumulation of agricultural scientific data, the extraction of potentially useful knowledge from the data becomes more and more important, and the emergence of the data mining technology provides a new method and a new approach for the informatization and intellectualization of the agricultural industry.
Disclosure of Invention
In order to solve the problems, the invention provides a crop growth early warning method and device based on data mining.
The crop growth early warning method based on data mining provided by the embodiment of the invention comprises the following steps:
the crop data acquisition module acquires various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop;
the crop data analysis module analyzes and counts the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification;
and the mining analysis module performs mining analysis on the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
Preferably, the modified ID3 decision tree algorithm includes a taylor series of simplified equations of logarithmic operation for computing classification information entropy and attribute information entropy;
wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure BDA0002312331090000021
preferably, the simplified equation of the taylor series is obtained by using a taylor series.
Preferably, the simplified equation for obtaining the logarithm of the taylor series by using the taylor series comprises:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure BDA0002312331090000022
In (1) obtaining
Figure BDA0002312331090000023
When x is infinitely small, formula (1) is simplified as follows:
Figure BDA0002312331090000024
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure BDA0002312331090000025
Preferably, the growth attribute data information includes: growth environment information and growth fertilization information; the growth environment information comprises growth temperature information, growth soil information and growth humidity information; the growth fertilization information comprises growth fertilizer information, growth pesticide information and growth running water information.
Preferably, the crop data analysis module performs analysis statistics on the growth attribute data information to obtain the growth classification data information of the experimental crop and the growth attribute information corresponding to each growth classification, and the obtaining includes:
the crop data analysis module obtains growth classification data information of the experimental crops according to growth early warning requirements;
and the crop data analysis module respectively counts the growth classification number information and the number information of each growth attribute according to the growth classification data information to obtain the growth classification data information of the experimental crop and the growth attribute information corresponding to each growth classification.
Preferably, the mining and analyzing module performs mining and analyzing on the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm, and obtaining the growth early warning result of the crop includes:
respectively calculating the classification information entropy and the attribute information entropies of the crops by using a Taylor series logarithm operation simplified equation;
respectively calculating the information gain of each attribute information of the crops according to the classification information entropy and the attribute information entropies;
and mining and analyzing by using the information gain of each attribute information to obtain a growth early warning result of the crops.
Preferably, the mining and analyzing by using the information gain of each attribute information to obtain the growth early warning result of the crop includes:
selecting the maximum information gain from the information gains of all the attribute information, and taking the attribute information corresponding to the maximum information gain as a split node of a root node of a decision tree for splitting;
judging whether the rest other information gains have the same information gain;
when the same information gain is judged in the rest other information gains, combining the attribute information corresponding to the same gain in a Cartesian product mode to obtain split nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree;
and splitting according to the obtained splitting nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree to generate the decision tree, and mining and analyzing according to the decision tree to obtain the growth early warning result of the crops.
According to the embodiment of the invention, the device for crop growth early warning based on data mining comprises:
the crop data acquisition module is used for acquiring various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop;
the crop data analysis module is used for analyzing and counting the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification;
and the mining analysis module is used for mining and analyzing the growth classification data information and the growth attribute information by utilizing an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
Preferably, the modified ID3 decision tree algorithm includes a taylor series of simplified equations of logarithmic operation for computing classification information entropy and attribute information entropy;
wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure BDA0002312331090000041
wherein, using a taylor series to obtain a simplified equation of logarithmic operation of the taylor series comprises:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure BDA0002312331090000042
In (1) obtaining
Figure BDA0002312331090000043
When x is infinitely small, formula (1) is simplified as follows:
Figure BDA0002312331090000044
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure BDA0002312331090000045
According to the scheme provided by the embodiment of the invention, the crop growth data is analyzed by adopting the Hadoop large data platform, and the Hadoop platform has the advantages of higher excavation efficiency, superior performance and reliable result, and can realize early warning of crop growth.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention.
In the drawings:
fig. 1 is a flowchart of a method for crop growth early warning based on data mining according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an apparatus for crop growth early warning based on data mining according to an embodiment of the present invention;
fig. 3 is a block diagram of a crop growth early warning system based on data mining according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a decision tree provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a decision node with combined attributes { A, B } according to an embodiment of the present invention;
fig. 6 is a flow chart of the improved ID3 algorithm provided by an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it should be understood that the preferred embodiments described below are only for the purpose of illustrating and explaining the present invention, and are not to be construed as limiting the present invention.
Fig. 1 is a flowchart of a method for crop growth early warning based on data mining according to an embodiment of the present invention, as shown in fig. 1, including:
s100, acquiring various data of the experimental crop during the growth period by a crop data acquisition module to obtain growth attribute data information of the experimental crop;
step S110, a crop data analysis module analyzes and counts the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification;
and S120, the mining analysis module performs mining analysis on the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
Wherein the modified ID3 decision tree algorithm comprises a Taylor series of simplified equations of logarithmic operation for computing classification information entropy and attribute information entropy;
wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure BDA0002312331090000061
wherein a Taylor series is used to obtain a simplified equation of logarithmic operation of the Taylor series. Specifically, the simplified equation for obtaining the logarithm of the taylor series by using the taylor series includes:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure BDA0002312331090000062
In (1) obtaining
Figure BDA0002312331090000063
When x is infinitely small, formula (1) is simplified as follows:
Figure BDA0002312331090000064
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure BDA0002312331090000065
Wherein the growth attribute data information comprises: growth environment information and growth fertilization information; the growth environment information comprises growth temperature information, growth soil information and growth humidity information; the growth fertilization information comprises growth fertilizer information, growth pesticide information and growth running water information.
Wherein the step S110 includes: the crop data analysis module obtains growth classification data information of the experimental crops according to growth early warning requirements; and the crop data analysis module respectively counts the growth classification number information and the number information of each growth attribute according to the growth classification data information to obtain the growth classification data information of the experimental crop and the growth attribute information corresponding to each growth classification.
Wherein the step S120 includes: respectively calculating the classification information entropy and the attribute information entropies of the crops by using a Taylor series logarithm operation simplified equation; respectively calculating the information gain of each attribute information of the crops according to the classification information entropy and the attribute information entropies; and mining and analyzing by using the information gain of each attribute information to obtain a growth early warning result of the crops. Specifically, the mining and analyzing by using the information gain of each attribute information to obtain the growth warning result of the crop includes: selecting the maximum information gain from the information gains of all the attribute information, and taking the attribute information corresponding to the maximum information gain as a split node of a root node of a decision tree for splitting; judging whether the rest other information gains have the same information gain; when the same information gain is judged in the rest other information gains, combining the attribute information corresponding to the same gain in a Cartesian product mode to obtain split nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree; and splitting according to the obtained splitting nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree to generate the decision tree, and mining and analyzing according to the decision tree to obtain the growth early warning result of the crops.
Fig. 2 is a schematic diagram of a crop growth warning device based on data mining according to an embodiment of the present invention, as shown in fig. 2, including: the device comprises a crop data acquisition module, a crop data analysis module and a mining analysis module.
The crop data acquisition module is used for acquiring various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop; the crop data analysis module is used for analyzing and counting the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification; and the mining analysis module is used for mining and analyzing the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
Wherein the modified ID3 decision tree algorithm comprises a Taylor series of simplified equations of logarithmic operation for computing classification information entropy and attribute information entropy; wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure BDA0002312331090000071
wherein, using a taylor series to obtain a simplified equation of logarithmic operation of the taylor series comprises:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure BDA0002312331090000072
In (1) obtaining
Figure BDA0002312331090000073
When x is infinitely small, formula (1) is simplified as follows:
Figure BDA0002312331090000074
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure BDA0002312331090000075
The invention aims at the influence of growth factors on the growth of crops in the agricultural production process, data mining is carried out on the growth factors of the crops through a data mining technology, and the obtained analysis conclusion assists agricultural practitioners to realize income increase.
Fig. 3 is a block diagram of a crop growth early warning system based on data mining according to an embodiment of the present invention, and as shown in fig. 3, the system includes a data acquisition module, a data import module, a data preprocessing module, a data analysis module, a data verification module, and a data export module.
And the data acquisition module acquires data such as soil humidity, air temperature, soil components and the like in the production process through equipment such as a sensor and the like, and stores the data in an Oracle database for storage.
And the data import module imports the original data in the data acquisition module into the HDFS and Hive through a button (pdi) to serve as a data source for data preprocessing, stores the data source, and builds a database for subsequent data mining. The incremental import or the full import can be selected according to the user requirement.
And the data preprocessing module is used for preliminarily classifying the data by adopting a classification variable method and classifying the acquired data into an ordered variable and an unordered variable. In the disorder variation method, plants are classified into females, males and the like by a binomial classification method, and soil types are classified into sandy soil, clay soil, loam and the like by a multi-classification method. The ordered variation method is used for classifying the culture environment into four types of excellent, good, passing and poor and the like. The method mainly comprises missing data processing, error value detection, data detection and cleaning, smooth noise, inconsistent data cleaning, continuous data discretization, data word segmentation and the like.
In this embodiment, the classified data is mainly preprocessed as follows:
1) and (4) processing missing data. And a homogeneous mean interpolation method is adopted, so that the data are kept consistent and are cleaned.
2) And detecting an error value. The possible error value or abnormal value of the related attribute is identified by a statistical analysis method, and the data is detected and cleaned by using the constraint between different attributes and external data.
3) Smooth noise. The records with the same attribute value in the data are detected whether to be equal or not by judging whether the attribute values among the records are equal or not through various methods such as box separation, regression, clustering and the like, and the equal attribute records are combined into one record.
4) And cleaning inconsistent data. And quantifying continuous score data by adopting various methods such as box separation, regression, clustering and the like, and converting the continuous score data into discrete attributes.
5) And (5) segmenting words of the data. And matching the character string with words in the dictionary by methods of forward maximum matching, reverse maximum matching, minimum segmentation and the like, and identifying the required data.
Meanwhile, according to the characteristics of the data source and the target of data mining, the data preprocessing module cleans the data source by using HQL and MapReduce, specifically, the HQL is used for completing missing value deletion and missing value supplement, and the MapReduce is used for completing data deduplication.
And the data mining module is used for mining and analyzing data after data preprocessing, wherein the data mining module is used for mining related data in the crop production process by using an improved ID3 decision tree algorithm, a naive Bayes classification algorithm and a K-Means algorithm.
The improved ID3 decision tree algorithm is specifically: and calculating the classification information entropy, the attribute information entropy and the information gain, and selecting the attribute with the maximum information gain as a split node for circular iteration until the decision tree is completed. In the calculation process of the information entropy of the ID3, multiple logarithm operations are inevitably performed, and the calculation time of the algorithm is increased. The taylor median theorem and the mculing expansion are thus introduced. The taylor series, which defines an infinitely differentiable function f (x) in the neighborhood of a, has a power series of the formula (11):
Figure BDA0002312331090000091
then the above equation is substituted when f (x) is ln (1+ x) and a is 0, where f (x) is the craolins series:
Figure BDA0002312331090000092
so when x takes on infinitely small values, the formula can be simplified as:
Figure BDA0002312331090000093
the logarithm algorithm can know that:
Figure BDA0002312331090000094
taking equation (13) into equation (14), a simplified logarithmic formula of taylor series is obtained, which is as follows:
Figure BDA0002312331090000095
therefore, the algorithm time complexity is reduced and the mining efficiency is improved by optimizing the entropy calculation through the Taylor formula.
The data mining process of the data mining module is as follows:
step 1, mining and analyzing a data set by adopting an improved ID3 decision tree algorithm to obtain a data mining result I;
the improved ID3 algorithm flow chart is shown in fig. 6, with the steps shown below.
Step one, calculating the information entropy I (X) of a data set to be classified, wherein the calculation formula is
I(X)=-∑x∈Xp(x)log2x………(21)
Step two, calculating attribute information entropy E (A) with the formula of
E(A)=∑v∈V{-p(v)*∑x∈Xp(xv)log2p(xv)}………(22)
Step three, calculating the information gain, wherein the calculation formula is
IG(A)=I(X)-E………(23)
And step four, selecting the attribute with the maximum information gain as the loop iteration of the split node, combining the attributes in a Cartesian product mode when the information gains of the two attributes are the same to obtain new nodes and leaves, and stopping the iteration to obtain the decision tree when the attributes of the training set are completely traversed or the classification result is not changed any more.
For the improved ID3 decision tree algorithm of step 1, a specific application of this embodiment is:
the experimental data are divided into two types, namely planting approval and planting refusal respectively according to whether the planting is suitable or not. The soil type, whether greenhouse, whether pesticide, whether water is activated and culture conditions are included in the four attributes, and detailed data are shown in the following table 1.
Table 1: experimental data sheet
Type of soil Whether or not to use the greenhouse Pesticide or not Whether or not to activate water Culture conditions Classification result
Sandy soil N N N Passing and lattice Rejection of
Sandy soil N N N Good taste Rejection of
Sandy soil Y Y N Good taste Agree to
Sandy soil Y N Y Passing and lattice Agree to
Sandy soil N Y N Passing and lattice Rejection of
Clay soil N N N Passing and lattice Rejection of
Clay soil N N N Good taste Rejection of
Clay soil Y Y Y Good taste Agree to
Clay soil N N Y Is very good Agree to
Clay soil N Y Y Is very good Agree to
Loam soil N N Y Is very good Agree to
Loam soil N Y Y Good taste Agree to
Loam soil Y Y N Good taste Agree to
Loam soil Y N N Is very good Agree to
Loam soil N N N Passing and lattice Rejection of
(1) Firstly, calculating classification attributes which can be obtained from a data set, agreeing to plant 9 and refusing to plant 6, thereby calculating classification information entropy by using a Taylor series logarithm simplified formula
Figure BDA0002312331090000111
Figure BDA0002312331090000112
(2) Computing attribute information entropy using Taylor series logarithm simplified formula
Figure BDA0002312331090000113
Figure BDA0002312331090000114
Figure BDA0002312331090000115
Similarly, E (S, greenhouse) is 0.647, E (S, agricultural chemical) is 0.647, E (S, water) is 0.550, E (S, culture conditions) is 0.607
(3) Then, the information Gain E (soil type) (E), (S) -E (S, soil type) (0.083) is calculated, and similarly, Gain (greenhouse) is 0.323, Gain (pesticide) is 0.323, Gain (live water) is 0.420, and Gain (culture condition) is 0.363.
(4) Typically, the number of branches of a decision node in the ID3 algorithm is equal to the number of attribute values. However, the number of branches of a decision node in the ID3 algorithm of the present invention is equal to the number of tuples that can be determined by the cartesian product between candidate attribute values. For example, assume that attributes A and B are candidate attributes for the current node. Attribute a has two values and attribute B has two values. By merging the values from both attributes, the number of branches is equal to 4(2x 2). FIG. 5 shows a decision node with a combined attribute A, B. Where representations t1 and t2 are two subtrees. L1 and L2 are leaf nodes. The improved decision tree can greatly reduce the traversal times of the tree and improve the classification efficiency.
According to the invention, the split node with the largest information Gain is selected according to an ID3 algorithm, as shown in fig. 4, the ' water-in-water ' is selected as a root node, iteration is carried out continuously, and the next split node is searched, and as the Gain (greenhouse) is equal to the Gain (pesticide) 0.323, the ' greenhouse-in-water ' and the ' pesticide-in-water ' are subjected to Cartesian multiplication to obtain new 4 combinations, namely ' greenhouse, pesticide ', ' greenhouse not, pesticide ', ' greenhouse, pesticide application ', greenhouse not greenhouse and pesticide application '. And when the attributes of the training set are completely traversed or the classification result is not changed any more, stopping iteration to obtain the decision tree.
Step 2, mining and analyzing the data set by adopting a naive Bayes classification algorithm to obtain a data mining result II;
step 3, mining and analyzing the data set by adopting a K-Means clustering algorithm to obtain a data mining result III;
and 4, simultaneously sending the data mining result I, the data mining result II and the data mining result III to a result verification module for accuracy verification.
And the data verification module verifies the accuracy of the output result of the data mining module to finally obtain the output result of the early warning system.
And the data export module exports the mining result of the data mining module to a crop growth early warning system database Oracle for subsequent analysis through a button (pdi).
The invention is also characterized in that the system also comprises a module scheduling module and a log module, and the module scheduling module realizes the scheduling and integration of all the modules of the system and ensures the high-efficiency and accurate operation of the system. The log module records log data in the whole data mining process, and is convenient for monitoring and managing the data mining process by workers.
According to the scheme provided by the embodiment of the invention, the beneficial effects comprise the following:
(1) the data preprocessing module checks the integrity and consistency of the data set according to the user requirements, filters the data set, removes wrong or inconsistent data in the data set, and ensures the effectiveness of the effective data set.
(2) The log module can record log data in the whole system operation process, and is convenient for users to realize management and maintenance.
(3) The result verification module can verify the accuracy of the data mining result, and data mining is carried out again when the accuracy is low, so that the accuracy of early warning is greatly improved.
(4) The characteristics of high reliability, high expansibility, high efficiency and low cost of the Hadoop platform are fully utilized.
(5) The modules used in the scheme are independent and do not influence each other, any module except the scheduling module and the log module can be used as the execution module, and the method has high expandability.
Although the present invention has been described in detail hereinabove, the present invention is not limited thereto, and various modifications can be made by those skilled in the art in light of the principle of the present invention. Thus, modifications made in accordance with the principles of the present invention should be understood to fall within the scope of the present invention.

Claims (10)

1. A crop growth early warning method based on data mining is characterized by comprising the following steps:
the crop data acquisition module acquires various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop;
the crop data analysis module analyzes and counts the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification;
and the mining analysis module performs mining analysis on the growth classification data information and the growth attribute information by using an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
2. The method of claim 1, wherein the modified ID3 decision tree algorithm comprises a taylor series of simplified equations for logarithmic operations to compute classification information entropy and attribute information entropy;
wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure FDA0002312331080000011
3. the method of claim 2, wherein the simplified equation of the taylor series is obtained by using a taylor series.
4. The method of claim 3, wherein obtaining the simplified equation for the logarithm operation of the Taylor series using the Taylor series comprises:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure FDA0002312331080000012
In (1) obtaining
Figure FDA0002312331080000013
When x is infinitely small, formula (1) is simplified as follows:
Figure FDA0002312331080000014
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure FDA0002312331080000015
5. The method of claim 1, wherein the growth attribute data information comprises: growth environment information and growth fertilization information; the growth environment information comprises growth temperature information, growth soil information and growth humidity information; the growth fertilization information comprises growth fertilizer information, growth pesticide information and growth running water information.
6. The method of claim 5, wherein the crop data analysis module analyzes and counts the growth attribute data information to obtain the growth classification data information of the experimental crop and the growth attribute information corresponding to each growth classification, and comprises:
the crop data analysis module obtains growth classification data information of the experimental crops according to growth early warning requirements;
and the crop data analysis module respectively counts the growth classification number information and the number information of each growth attribute according to the growth classification data information to obtain the growth classification data information of the experimental crop and the growth attribute information corresponding to each growth classification.
7. The method of claim 6, wherein the mining analysis module performs mining analysis on the growth classification data information and the growth attribute information by using a modified ID3 decision tree algorithm, and obtaining the early warning result of the growth of the crop comprises:
respectively calculating the classification information entropy and the attribute information entropies of the crops by using a Taylor series logarithm operation simplified equation;
respectively calculating the information gain of each attribute information of the crops according to the classification information entropy and the attribute information entropies;
and mining and analyzing by using the information gain of each attribute information to obtain a growth early warning result of the crops.
8. The method of claim 7, wherein the mining analysis using the information gain of each attribute information to obtain the growth warning result of the crop comprises:
selecting the maximum information gain from the information gains of all the attribute information, and taking the attribute information corresponding to the maximum information gain as a split node of a root node of a decision tree for splitting;
judging whether the rest other information gains have the same information gain;
when the same information gain is judged in the rest other information gains, combining the attribute information corresponding to the same gain in a Cartesian product mode to obtain split nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree;
and splitting according to the obtained splitting nodes of the sub-tree nodes and the sub-leaf nodes of the decision tree to generate the decision tree, and mining and analyzing according to the decision tree to obtain the growth early warning result of the crops.
9. A crop growth early warning device based on data mining is characterized by comprising:
the crop data acquisition module is used for acquiring various data of the experimental crop during the growth period to obtain growth attribute data information of the experimental crop;
the crop data analysis module is used for analyzing and counting the growth attribute data information to obtain growth classification data information of the experimental crop and growth attribute information corresponding to each growth classification;
and the mining analysis module is used for mining and analyzing the growth classification data information and the growth attribute information by utilizing an improved ID3 decision tree algorithm to obtain a growth early warning result of the crops.
10. The apparatus of claim 9, wherein the modified ID3 decision tree algorithm comprises a taylor series of simplified equations for logarithmic calculation of classification information entropy and attribute information entropy;
wherein, the simplified equation of the logarithm operation of the taylor series is:
Figure FDA0002312331080000031
wherein, using a taylor series to obtain a simplified equation of logarithmic operation of the taylor series comprises:
substituting f (x) ln (1+ x) and a (0) into the taylor series
Figure FDA0002312331080000032
In (1) obtaining
Figure FDA0002312331080000033
When x is infinitely small, formula (1) is simplified as follows:
Figure FDA0002312331080000034
carrying out logarithm simplification processing on the formula (2) according to a logarithm algorithm rule to obtain a logarithm operation simplification equation of the Taylor series;
wherein the logarithmic algorithm is formulated as
Figure FDA0002312331080000035
CN201911263879.8A 2019-12-11 2019-12-11 Crop growth early warning method and device based on data mining Pending CN111105083A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911263879.8A CN111105083A (en) 2019-12-11 2019-12-11 Crop growth early warning method and device based on data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911263879.8A CN111105083A (en) 2019-12-11 2019-12-11 Crop growth early warning method and device based on data mining

Publications (1)

Publication Number Publication Date
CN111105083A true CN111105083A (en) 2020-05-05

Family

ID=70423362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911263879.8A Pending CN111105083A (en) 2019-12-11 2019-12-11 Crop growth early warning method and device based on data mining

Country Status (1)

Country Link
CN (1) CN111105083A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697167A (en) * 2009-10-30 2010-04-21 邱建林 Clustering-decision tree based selection method of fine corn seeds
CN101953287A (en) * 2010-08-25 2011-01-26 中国农业大学 Multi-data based crop water demand detection system and method
CN102780783A (en) * 2012-08-11 2012-11-14 王喆 Crop growing environment information real-time sensing and dynamic presentation system and method
CN106779452A (en) * 2016-12-28 2017-05-31 贵州马科技有限公司 The method for setting up crops data growth model
US20190228477A1 (en) * 2018-01-22 2019-07-25 Weather Analytics Llc Method and System For Forecasting Crop Yield

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697167A (en) * 2009-10-30 2010-04-21 邱建林 Clustering-decision tree based selection method of fine corn seeds
CN101953287A (en) * 2010-08-25 2011-01-26 中国农业大学 Multi-data based crop water demand detection system and method
CN102780783A (en) * 2012-08-11 2012-11-14 王喆 Crop growing environment information real-time sensing and dynamic presentation system and method
CN106779452A (en) * 2016-12-28 2017-05-31 贵州马科技有限公司 The method for setting up crops data growth model
US20190228477A1 (en) * 2018-01-22 2019-07-25 Weather Analytics Llc Method and System For Forecasting Crop Yield

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张维东等: "利用决策树进行数据挖掘中的信息熵计算", 《计算机工程》 *
权维俊等: "专家分类器在京白梨气候区划中的应用", 《气象科技》 *
赵立安等: "基于农业物联网的火龙果生长环境大数据分析", 《节水灌溉》 *
黄秀霞: "C4.5决策树算法优化及其应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN102364498B (en) Multi-label-based image recognition method
CN109345137A (en) A kind of rejecting outliers method based on agriculture big data
Charoen-Ung et al. Sugarcane yield grade prediction using random forest with forward feature selection and hyper-parameter tuning
CN102306176B (en) On-line analytical processing (OLAP) keyword query method based on intrinsic characteristic of data warehouse
CN112906298B (en) Blueberry yield prediction method based on machine learning
Charoen-Ung et al. Sugarcane yield grade prediction using random forest and gradient boosting tree techniques
CN109063660B (en) Crop identification method based on multispectral satellite image
Yadav et al. An analysis of data mining techniques to analyze the effect of weather on agriculture
CN109408578A (en) One kind being directed to isomerous environment monitoring data fusion method
Tangwannawit et al. An optimization clustering and classification based on artificial intelligence approach for internet of things in agriculture
CN113127464B (en) Agricultural big data environment feature processing method and device and electronic equipment
CN112434662B (en) Tea leaf scab automatic identification algorithm based on multi-scale convolutional neural network
Romani et al. Mining relevant and extreme patterns on climate time series with CLIPSMiner
Yan et al. Research on precision management of farming season based on big data
CN111105083A (en) Crop growth early warning method and device based on data mining
CN114720665B (en) Method and device for detecting total nitrogen abnormal value of soil testing formulated fertilization soil
Romani et al. Clearminer: a new algorithm for mining association patterns on heterogeneous time series from climate data
Maurya et al. Estimation of major agricultural crop with effective yield prediction using data mining
Bhattacharyya et al. Long term prediction of rainfall in Andhra Pradesh with Deep learning
Aishwarya et al. Data mining analysis for precision agriculture: A comprehensive survey
Rao et al. Crop yield prediction by using machine learning techniques
CN114780826A (en) Disease and pest data analyzing and mining system based on plants
Wedashwara et al. Parallel evolutionary association rule mining for efficient summarization of wireless sensor network data pattern
Saleh et al. Determination of corn quality using the decision tree of c 4.5 algorithm
SP et al. A seed yield estimation modelling using classification and regression trees (CART) in the biofuel supply chain.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505

RJ01 Rejection of invention patent application after publication