CN111488520A

CN111488520A - Crop planting species recommendation information processing device and method and storage medium

Info

Publication number: CN111488520A
Application number: CN202010198233.2A
Authority: CN
Inventors: 刘奥琦; 卢涛; 王布凡; 陈润斌; 陈冲; 许若波; 周强; 郝晓慧; 王宇; 魏博识; 郎秀娟; 吴志豪; 王彬
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2020-08-04
Anticipated expiration: 2040-03-19
Also published as: CN111488520B

Abstract

The invention provides a crop planting species recommendation information processing device, a method and a storage medium, the device comprises: the system comprises a soil original data acquisition module, a synthesis processing module, a soil synthesis data set calculation module, a target parameter calculation module and a recommendation information acquisition module, wherein the soil original data acquisition module is used for acquiring a plurality of soil original data from soil to be detected and acquiring a soil original data set according to the plurality of soil original data; the synthesis processing module is used for synthesizing the soil original data set to obtain a soil synthesis data set; the soil synthesis data set calculation module is used for calculating a soil synthesis data set to obtain target parameters, and the target parameters are used for calculating a target plane Z; the target parameter calculation module is used for calculating target parameters to obtain a target plane Z. The method solves the problems of unbalanced soil data and multiple small samples, and reduces the subjective intention of a planting decision maker and the influence of other objective factors.

Description

Crop planting species recommendation information processing device and method and storage medium

Technical Field

The invention mainly relates to the technical field of agricultural planting, in particular to a crop planting type recommendation information processing device and method and a storage medium.

Background

The national agriculture industry is currently encouraged to apply new technologies and new technologies to agricultural production. The study of domestic scholars on the problems mainly has the following directions and achievements: when seeking to recommend a proper crop scheme for farmers, fur advances and the like use a combined model of random numbers, CHAID, K-nearest neighbor and naive Bayes for recommendation, mainly match the response of each type of label to the standard range, if not, output is not generated, which requires collecting a large amount of standard data in the research, but under different environments, the standards of even the same type of plant are different. The people in charge of paying and the like recommend the scheme by utilizing an interest model of the user, performing feature extraction on the content to form a feature vector, and then performing similarity calculation with the model feature vector. The method depends on personal intention and interest of a decision maker and aims to provide personalized agricultural informatization services.

In dealing with the problem of unbalanced soil data, a large amount of work is done by scholars. Divya et al propose an optimal classification voting integration technique based on the results of three classification techniques (logistic regression, classification trees and discriminant analysis). Nayal et al propose a method of adding new dimensions to the kernel extension of KerMinSVM to reduce the root frequency ratio. Bhagat et al propose a class learning machine based on SMOTE, which is a variant of a class-based extreme learning machine, taking advantage of minority population oversampling and class dominance-specific regularization, increasing the importance of minority class samples to determine classifier decision regions.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art and provides a crop planting species recommendation information processing device, a method and a storage medium.

The technical scheme for solving the technical problems is as follows: a crop planting species recommendation information processing device comprising:

the system comprises a soil original data acquisition module, a data processing module and a data processing module, wherein the soil original data acquisition module is used for acquiring a plurality of soil original data from soil to be detected and obtaining a soil original data set according to the plurality of soil original data;

the synthesis processing module is used for synthesizing the soil original data set to obtain a plurality of soil synthesis data, and obtaining a soil synthesis data set according to the plurality of soil synthesis data;

the soil synthesis data set calculation module is used for calculating the soil synthesis data set to obtain target parameters;

the target parameter calculation module is used for calculating the target parameters to obtain a target plane Z;

and the recommendation information obtaining module is used for carrying out classification processing on the target plane Z according to a plurality of preset soil comparison data sets carrying crop species labels to obtain crop species label information, and taking the crop species label information as recommendation information.

Another technical solution of the present invention for solving the above technical problems is as follows: a crop planting species recommendation information processing method comprises the following steps:

collecting a plurality of soil original data from soil to be detected, and obtaining a soil original data set according to the plurality of soil original data;

synthesizing the soil original data set to obtain a plurality of soil synthetic data, and obtaining a soil synthetic data set according to the plurality of soil synthetic data;

calculating the soil synthesis data set to obtain target parameters, wherein the target parameters are used for calculating a target plane Z;

calculating the target parameters to obtain a target plane Z;

and classifying the target plane Z according to a plurality of preset soil comparison data sets carrying crop species labels to obtain crop species label information, and using the crop species label information as recommendation information.

Another technical solution of the present invention for solving the above technical problems is as follows: a crop planting species recommendation information processing apparatus comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program, when executed by the processor, implementing the crop planting species recommendation information processing method as described above.

The invention has the beneficial effects that: the method comprises the steps of obtaining a soil synthetic data set through synthetic processing of a soil original data set, obtaining target parameters through calculation of the soil synthetic data set, obtaining a target plane Z through calculation of the target parameters, obtaining crop type label information through classification processing of the target plane Z through comparison of a plurality of preset soil carrying crop type labels with the data set, and finally taking the crop type label information as recommendation information.

Drawings

Fig. 1 is a block diagram of a crop planting species recommendation information processing apparatus according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a processing method of crop planting category recommendation information according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a block diagram of a crop planting species recommendation information processing apparatus according to an embodiment of the present invention.

As shown in fig. 1, a crop planting species recommendation information processing device includes:

It is to be understood that the soil raw dataset, the soil synthetic dataset, and the soil alignment dataset all include ammonium nitrogen, rapid-acting potassium, rapid-acting phosphorus, and organic matter.

Specifically, the soil to be detected which is planted for a long time for a single crop and adopts crop rotation and relay intercropping is collected, the collected soil to be detected is subjected to air drying, soaking and chemical extraction, and after the soil is analyzed by an instrument, the original data set of the soil is obtained.

In the embodiment, a soil synthetic data set is obtained through synthetic processing of a soil original data set, target parameters are obtained through calculation of the soil synthetic data set, a target plane Z is obtained through calculation of the target parameters, crop type label information is obtained through classification processing of the target plane Z through a plurality of preset soil comparison data sets carrying crop type labels, and finally the crop type label information is used as recommendation information, so that the problems of soil data imbalance and small sample multi-classification are solved, standard data do not need to be collected, the influences of subjective intention and other objective factors of a planting decision maker are reduced, and the optimal planting type of soil is scientifically analyzed.

Optionally, as an embodiment of the present invention, the synthesis processing module is specifically configured to:

dividing the soil original data set into a minority soil original data set a and a majority soil data set by using an SMOTE algorithm;

searching the minority soil original data set a by using a K nearest neighbor algorithm to obtain a plurality of minority soil search data;

calculating the plurality of minority soil search data by using an SMOTE algorithm to obtain a plurality of minority soil processing data a [ p ];

constructing the plurality of minority soil processing data a [ p ] and the minority soil original data set a by utilizing a SMOTE algorithm to obtain a plurality of minority soil construction data, and obtaining a minority soil construction data set according to the plurality of minority soil construction data;

and synthesizing the minority soil structure data set and the majority soil structure data set by utilizing a SMOTE algorithm to obtain a soil synthesis data set.

It should be understood that the minority class refers to a small number of the plurality of soil raw data, the majority class refers to a large number of the plurality of soil raw data, the construction representation places the plurality of the minority soil treatment data a [ p ] in the minority soil raw data set a, and the composite representation places the minority soil construction data set in the majority soil raw data set.

It should be understood that the basic idea of the SMOTE algorithm is to analyze and simulate a few classes of samples, and add a new sample that is manually simulated to a dataset, so that the classes in the original data are not seriously unbalanced any more, the simulation process of the SMOTE algorithm adopts the KNN technique, and the step of simulating and generating the new sample is: sampling a nearest neighbor algorithm, and calculating K neighbors of each few samples; randomly selecting N samples from K neighbors to carry out random linear interpolation; constructing a new minority sample; and synthesizing the new sample and the original data to generate a new training set.

Specifically, a certain point a [ n ] is randomly selected from the minority soil original data set a, a point a [ m ] adjacent to the point a [ n ] is selected, and then a point is randomly selected from the connecting line of a [ n ] and a [ m ] to serve as newly synthesized minority soil processing data a [ p ].

In the above embodiment, the soil original data set is divided into the minority soil original data set a and the majority soil data set by using the SMOTE algorithm, the minority soil original data set a is searched by using the K neighbor algorithm to obtain a plurality of minority soil search data, the minority soil processing data a [ p ] is obtained by using the SMOTE algorithm to calculate the minority soil search data, the minority soil structure data set is obtained by using the SMOTE algorithm to construct the minority soil processing data a [ p ] and the minority soil original data set a, and finally the soil synthetic data set is obtained by using the SMOTE algorithm to synthesize the minority soil structure data set and the majority soil data set, so that the collinearity of the minority samples is not affected while the soil data information is not lost, the weight proportion is not required to be continuously adjusted, the algorithm convergence is fast and stable, the method has the advantages of strong regularity, low implementation cost, wide application range and high effective data rate, solves the problems of unbalanced soil data and multiple classifications of small samples, does not need to collect standard data, and reduces the influence of subjective intention and other objective factors of planting decision makers.

according to Euclidean distance and K nearest neighbor algorithm, the minority soil original data a [ i ] in the minority soil original data set a]Searching to obtain a plurality of minority soil search data, wherein the plurality of minority soil search data comprise k neighbor points a [ m ]₁],a[m₂]…a[m_k]]The k is 5 as default;

randomly selecting e (e is less than or equal to k) in the plurality of minority soil search data, and calculating a plurality of minority soil treatment data a [ p ] according to a first formula, wherein the first formula is as follows:

a[p]＝a[i]+rand(0,1)*(a[m_j]-a[i])，

wherein, p is 1,2 … e, j is 1,2 …, k, rand (0,1) is random number between 0 and 1.

In the embodiment, a plurality of minority soil search data are obtained by searching the minority soil original data a [ i ] in the minority soil original data set a according to the Euclidean distance and the K neighbor algorithm, e (e is less than or equal to K) is randomly selected from the plurality of minority soil search data, and a plurality of minority soil processing data ap are obtained by calculating according to the first formula, so that the co-linearity of minority samples is not influenced while the soil data information is not lost, the weight proportion is not required to be adjusted continuously, the algorithm convergence is fast and stable, the regularity is strong, the implementation cost is low, the application range is wide, and the effective data rate is high.

Optionally, as an embodiment of the present invention, the target parameter includes a penalty parameter C and a function parameter ω, and the data analysis module is specifically configured to:

calculating the soil projection data set by utilizing a grid search algorithm to obtain the punishment parameter C and the projection parameter gamma;

calculating the projection parameter gamma by using a Gaussian kernel function to obtain a space projection set X;

and calculating the space projection set X to obtain the function parameter omega.

It should be understood that the grid search is an exhaustive search, and in all candidate parameter combinations, each possibility is tried through loop traversal, and the best performing parameter is selected as the final result; grid search is used for selecting the hyper-parameters under the condition of small quantity, a user lists a small hyper-parameter value field, and the Cartesian product of the hyper-parameters is a group of hyper-parameter combinations.

Preferably, the grid search sequence of the penalty parameter C is {100,300,500,800,1000,800,900,1000,1100,1300,1400,1500,1700}, the grid search sequence of the projection parameter gamma is {0.01,0.05,0.10,0.50,0.80,1.0,5.0,10.0}, and the comparison test precision is selected to be the highest when C is 1050 and gamma is 0.01.

In the embodiment, the penalty parameter C and the projection parameter gamma are obtained by calculating the soil projection data set by using a grid search algorithm, the space projection set X is obtained by calculating the projection parameter gamma by using a gaussian kernel function, and the function parameter omega is obtained by calculating the space projection set X, so that key data is obtained, a basis is provided for the subsequent calculation, and the subjective intention of a planting decision maker and the influence of other objective factors are reduced.

Optionally, as an embodiment of the present invention, the data analysis module is specifically configured to:

obtaining the spatial projection set X by a second formula, wherein the second formula is as follows:

wherein x is a soil synthesis data set, z is a kernel function center, sigma is a width parameter of a function, and gamma is a projection parameter;

obtaining the function parameter ω by a third equation:

ωX+b＝0；

wherein b is a preset constant.

In particular, a gaussian kernel function is some sort of scalar function that is radially symmetric. It can map limited dimensional data to a high dimensional space. Given aThe training set sample is typically defined as any point X in space_iThe monotonic function of Euclidean distance to a certain center X can be recorded as k (| | X)_i-X |), the action of which is often local, i.e. when X is present_iWhen the distance is far away from X, the function value is very small, and the form is a fifth formula:

wherein, X is the center of the kernel function, and sigma is the width parameter of the function, and the radial action range of the function is controlled. If X is_iIf X is far away, the kernel function value is 1, i.e. it is not projected to a higher dimensional space, if X is_iIf the sum X is very different, the kernel function value increases the original number exponentially. It can map the original features to infinite dimensions and perform linear regression in this feature space. gamma is a function parameter when RBF is selected, in SVM algorithm, the larger the gamma value is, the fewer the support vectors are, and the number of the support vectors affects the speed of training and prediction. The relation between sigma and gamma is a sixth expression which is:

deriving a seventh formula

In the embodiment, the spatial projection set X and the function parameter ω are obtained through the second formula and the third formula, so that key data are obtained, a basis is provided for subsequent calculation, and the subjective intention of a planting decision maker and the influence of other objective factors are reduced.

Optionally, as an embodiment of the present invention, the target parameter calculating module is specifically configured to:

calculating the penalty parameter C and the function parameter omega by a fourth formula to obtain a target plane Z, wherein the fourth formula is as follows:

wherein ,

is a preset interval sum.

It should be understood that after mapping the soil composition data set to a high dimensional space, in the feature space, a hyperplane is found which separates the vectors of different classes, there may be many hyperplanes but only one optimal hyperplane, and the optimal hyperplane has the hyperplane with the maximum classification interval, i.e. the sum of the distances from all the sample points in the same class to the hyperplane should be the maximum>0(i is 1,2,3 …, n), namely, the classification error is recorded in ξ _ i, C is a penalty parameter, and the parameter C is used for weighing the interval sum

For making a trade-off between training error and classification interval.

It should be understood that the principle of the SVM model is to find the optimal separation hyperplane on the feature space to maximize the interval between the positive and negative samples on the training set, classify the data by using the separation hyperplane, and introduce the superposition of the kernel function and the binary classification algorithm to solve the nonlinear and multi-classification problem.

Specifically, in the soil projection data set, the soil data points are classified by the optimal hyperplane, which may be expressed as an eighth equation:

ω^TX+b＝0，

where ω and X are both spatial column vectors, given a sample set (X)₁,Y₁),(X₂,Y₂),…(X_m,Y_m) In which X is_iIs the ith input, Y_iIs the corresponding ith output, m is the number of training soil data, ω is the normal vector on the plane,the directions of the hyperplanes are determined, and the normal vectors of a plurality of hyperplanes can be obtained through classification. b is a real number representing the distance from the hyperplane to the origin, and is represented by a ninth expression and a tenth expression, wherein the ninth expression and the tenth expression are respectively:

X＝(X₁,X₂…X_m)^T，

ω＝(ω₁,ω₂…ω_m)^T，

to prevent overfitting, n slack variables ξ are introduced, in the case of inseparability to linearity_i>0(i ═ 1,2,3 …, n), the relaxed separation constraint is a tenth, said tenth being

Y_i(ω+b)≥1-ξ_i(i＝1,2,3…,n)，

The optimal hyperplane may be obtained by a twelfth expression:

the optimal hyperplane is the target plane Z.

In the embodiment, the target plane Z is obtained by calculating the penalty parameter C and the function parameter ω according to the fourth formula, so that data support is provided for later analysis, the problem of soil data imbalance and the problem of multiple small samples and classifications are solved, standard data do not need to be collected, and the subjective intention of a planting decision maker and the influence of other objective factors are reduced.

Optionally, as an embodiment of the present invention, the recommendation information obtaining module is specifically configured to:

and classifying and judging the target plane Z according to a plurality of preset soil comparison data sets with crop species labels and an OVR classification algorithm to obtain a classifier, and obtaining the crop species labels corresponding to the classifier through the classifier.

It should be understood that the OVR classification algorithm refers to selecting each type of label data in the training set to compare with the remaining label data set to form a classifier, and selecting the result with the highest classification precision in the classifier to output.

Specifically, if 13 soil comparison data sets with crop type labels exist, one of the soil comparison data sets is selected to form classifiers with other 12 soil comparison data sets and the target plane Z, classification is performed in each classifier, each classifier gives a classification result and precision, other 12 classifiers are formed by analogy, if the precision of the first classifier is the highest, the classification label of the first classifier is the final classification result, and the crop type label corresponding to the soil comparison data set is output.

In the embodiment, the classifier is obtained according to the soil comparison data set with the crop type labels and the OVR classification algorithm, and the crop type labels corresponding to the classifier are obtained through the classifier, so that the problems of soil data imbalance and small sample multi-classification are solved, standard data do not need to be collected, the subjective intention of a planting decision maker and the influence of other objective factors are reduced, and the optimal recommended planting type of the soil is scientifically analyzed.

Optionally, as an embodiment of the present invention, the apparatus further includes a data display processing module, where the data display processing module is configured to:

and reducing the soil synthesis data set from a four-dimensional space to a two-dimensional space by utilizing a T distribution combined random neighbor embedding method to obtain a plurality of visual soil data, obtaining a visual soil data set from the plurality of visual soil data, and calling a display screen to display the visual soil data set.

The following describes how to go from four-dimensional space to two-dimensional space. Specifically, if the data has separability in a low-dimensional space, the data is separable, the similarity between data points is converted into probability by calculating a spatial distance by using a TSNE algorithm, the similarity in an original space is represented by Gaussian joint probability, the similarity in an embedded space is represented by't distribution', four-dimensional data is converted into two-dimensional data, and data visualization is performed.

Preferably, n _ components is 2, property is 40, learning _ rate is 500, and n _ iter is 1000 are selected in the TSNE algorithm.

In the embodiment, the soil synthesis data set is reduced to the two-dimensional space by using the T distribution and random neighbor embedding method to obtain the visible soil data set, and the display screen is called to display the visible soil data, so that the visualization of the data is realized, and the user can observe the soil data conveniently.

Optionally, as an embodiment of the present invention, the apparatus further includes a cluster analysis module, where the cluster analysis module is configured to:

and carrying out cluster analysis on the soil synthesis data set to obtain a soil cluster data set.

Specifically, a density clustering algorithm is utilized to perform clustering analysis on the soil synthesis data set in a sample space, a clustering model is adjusted, high-density data areas are divided, and data characteristics are recorded; DBSCAN (sensitivity-Based spatial clustering of Applications with Noise) defines clusters as the largest set of Density-connected points, can divide areas with sufficiently high Density into clusters, and can find arbitrarily shaped clusters in a spatial database of Noise. Traversing all sample points by the DBSCAN, and if the similar points with the number larger than min _ samples exist in the radius range, determining the similar points as core points, otherwise, determining the similar points as noise points; the density clustering algorithm DBSCAN relates to two preset values of key parameter radii eps and the minimum number min _ samples of the preset values, and the algorithm continuously iteratively selects the proper eps and min _ samples. Cluster refers to the number of Cluster types, Outliner refers to outliers, ratio refers to the proportion of noise points, and the experiment is iterated until the ratio is about 10%.

In the embodiment, the soil clustering data set is obtained by clustering analysis of the soil synthetic data set, so that the validity of the synthetic data is verified, and a basis is provided for better subsequent classification.

Alternatively, as another embodiment of the present invention, as shown in fig. 2, a method for processing crop planting species recommendation information includes the following steps:

calculating the target parameters to obtain a target plane Z;

Alternatively, another embodiment of the present invention provides a crop planting kind recommendation information processing apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which when executed by the processor, implements the crop planting kind recommendation information processing method as described above. The device may be a computer or the like.

Alternatively, another embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method for processing recommendation information for crop planting species as described above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. It will be understood that the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A crop planting species recommendation information processing device, comprising:

2. The crop planting species recommendation information processing apparatus of claim 1, wherein the composition processing module is specifically configured to:

3. The crop planting species recommendation information processing apparatus of claim 2, wherein the composition processing module is specifically configured to:

a[p]＝a[i]+rand(0,1)*(a[m_j]-a[i])，

4. The crop planting species recommendation information processing apparatus of claim 1, wherein the target parameters include a penalty parameter C and a function parameter ω, and the data analysis module is specifically configured to:

5. The crop planting species recommendation information processing apparatus of claim 4, wherein the data analysis module is specifically configured to:

obtaining the function parameter ω by a third equation:

ωX+b＝0；

wherein b is a preset constant.

6. The crop planting species recommendation information processing apparatus according to claim 1 or 4, wherein the target parameter calculation module is specifically configured to:

wherein ,

is a preset interval sum.

7. The crop planting species recommendation information processing apparatus of claim 1, wherein the recommendation information obtaining module is specifically configured to:

8. The crop planting species recommendation information processing apparatus according to any one of claims 1 to 7, further comprising a data display processing module for:

9. The crop planting species recommendation information processing apparatus of claim 8, further comprising a cluster analysis module configured to:

10. The crop planting species recommendation information processing method is characterized by comprising the following steps:

calculating the target parameters to obtain a target plane Z;