CN114282726A

CN114282726A - Seed field prediction processing method and device, storage medium and electronic device

Info

Publication number: CN114282726A
Application number: CN202111603686.XA
Authority: CN
Inventors: 吴春子; 张芬芬; 白云东; 张鑫; 赵宇
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-05

Abstract

The embodiment of the application provides a seed field prediction processing method, a seed field prediction processing device, a storage medium and an electronic device, wherein the method comprises the following steps: determining the characteristic weight of the signaling position data of the field to be identified; acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified; the target characteristic variable is input into a pre-trained target recognition model to predict the seed production field to obtain a prediction result of the field to be recognized, the problems that the seed production field is recognized through a remote sensing image in the related technology, manual work is needed to recognize the seed production field according to a stripe image which is different from field corn and is shown in the seed production corn field, large-batch manual work is needed to perform picture recognition, and non-stripe sowing cannot be recognized can be solved, the method can be applied to large-range and accurate seed production field recognition, the seed production field can be recognized without manual participation, and the effect of reducing labor cost is achieved.

Description

Seed field prediction processing method and device, storage medium and electronic device

Technical Field

The embodiment of the application relates to the field of communication, in particular to a seed field prediction processing method and device, a storage medium and an electronic device.

Background

The major work of the management of the seed industry, which is the focus of the central document # i in 2021, is also one of the important works of the ministry of the nation and local villages. At present, in the large-scale corn planting process, the corn is divided into field corn and seed production corn, the field corn can be circulated to the market for selling after being matured, and the seed production corn can be used as a seed reservation and is one of the bases of agricultural activities in the next year.

The corn seed production includes legal seed production and illegal seed production, wherein the legal seed production is carried out after the authorization of a seed industry company or related departments, but the illegal seed production is not the contrary. The seeds which flow out in the illegal seed production not only infringe the intellectual property of the seed industry company, but also influence the harvest in the next year, disturb the market order and harm the national benefits.

At present, the identification of illegal seed production fields in the industry generally excludes the legal seed production fields from all seed production fields, and the rest is illegal seed production. The data of legal seed production fields can be obtained from seed industry companies, and the whole seed production fields are judged by two modes of field on-site inspection and remote sensing image identification at present.

The field inspection survey is carried out on site, identification is carried out only by depending on labor cost and experience, and the following problems exist: the corn seed production field and the field corn have no obvious difference for a long time, the agricultural field has poor traffic conditions, remote geographical positions and the like, so that a large amount of manpower is usually required to be invested for field investigation, and the large-scale identification of the seed production field cannot be met.

The remote sensing image judgment is provided in the related technology, the remote sensing technology can shoot the corn seed field in a large area, but manual identification is needed according to stripe images which are different from field corn and are displayed in the seed production corn field, and large-batch manual image identification is also needed. And in addition, part of illegal seed production corn fields are kept away and monitored, and stripe-shaped sowing can be avoided artificially during sowing, so that the identification is not available.

Disclosure of Invention

The embodiment of the application provides a seed production field prediction processing method and device, a storage medium and an electronic device, and aims to solve the problems that in the related technology, a seed production field is identified through a remote sensing image, manual identification is needed according to a stripe image which is different from field corn and is shown in the seed production corn field, a large amount of manual work is needed to be invested for picture identification, and non-stripe sowing cannot be identified.

According to an embodiment of the present application, there is provided a seed field prediction processing method including:

determining the characteristic weight of the signaling position data of the field to be identified;

acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified;

and inputting the target characteristic variables into a pre-trained target recognition model, and predicting the seed production field to obtain a prediction result of the field to be recognized.

Optionally, the method further comprises:

determining the characteristic weight of the signaling position training data;

acquiring training characteristic variables related to the identification of the seed field according to the characteristic weight of the signaling position training data;

and training a breeding field prediction model according to the training characteristic variables of the signaling position training data to obtain the trained target recognition model.

Optionally, training a breeding field prediction model according to the training feature variables of the signaling position training data, and obtaining the trained target recognition model includes:

determining a characteristic threshold;

carrying out binarization processing on the training characteristic variable according to the characteristic threshold value to obtain binarization characteristics of the training characteristic variable;

and training the seed production field prediction model according to the binarization characteristics and the corresponding labels to obtain the trained target recognition model.

Optionally, the determining the feature threshold comprises:

performing box separation processing on the training characteristic variables, and determining WOE values and IV values of the training characteristic variables;

determining the feature threshold based on the WOE value, the IV value, and the sample proportion distribution.

Optionally, binning the training feature variables, and determining the WOE value and the IV value of the training feature variables includes:

performing box separation processing on the training characteristic variables to obtain a plurality of grouped samples;

determining a WOE value corresponding to each packet sample i by:

determining the corresponding IV value for each grouped sample i by:

determining a sum of the IV values of the plurality of grouped samples as the IV value of the target feature variable;

py_irepresents the ratio of the number of the seed fields in the grouped sample i to the number of the seed fields in the total sample, pn_iRepresents the proportion of the number of non-breeding fields in the group i to the number of non-breeding fields in the total sample, y_i,n_iRespectively representing the number of seed production fields and non-seed production fields in the group i, y_T,n_TRespectively representing the number of the seed production fields and the number of the non-seed production fields in the total sample, and m represents the number of the box groups.

Optionally, determining the feature threshold according to the WOE value, the IV value, and the sample proportion distribution comprises:

detecting whether a certain continuous interval and the positive/negative sample ratio show monotone distribution or not according to the WOE value distribution, the IV value and the positive/negative sample ratio of the plurality of grouped samples;

and if so, determining the upper limit of the continuous interval as the characteristic threshold.

Optionally, determining the feature weight of the signaling location data of the field to be identified comprises:

dividing a decision target, a decision criterion and a decision object into a highest layer, a middle layer and a lowest layer according to the mutual relation, and establishing a hierarchical structure model;

comparing every two factors in each layer in the hierarchical structure model, and constructing a discrimination matrix according to a comparison result;

checking the consistency of the discrimination matrix;

and sequentially determining the characteristic weight of the relative importance of all the factors in each layer to the highest layer from the highest layer to the lowest layer.

Optionally, the checking the consistency of the decision matrix includes:

determining a consistency index CI value of the discrimination matrix;

comparing the consistency index CI with a random consistency index RI to determine a check coefficient CR;

if the check coefficient is smaller than a preset value, determining that the judgment matrix passes consistency check;

and if the check coefficient is larger than or equal to the preset value, determining that the judgment matrix does not pass the consistency check.

Optionally, the obtaining of the target characteristic variable related to the identification of the breeding farm according to the characteristic weight of the signaling location data to be identified includes:

and selecting the characteristic variable with the characteristic weight larger than a preset threshold value in the discrimination matrix as the target characteristic variable.

According to another embodiment of the present application, there is provided a seed field prediction processing apparatus including:

the first determining module is used for determining the characteristic weight of the signaling position data of the field to be identified;

the first acquisition module is used for acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified;

and the prediction module is used for inputting the target characteristic variables into a pre-trained target recognition model to carry out seed field preparation prediction so as to obtain a prediction result of the field block to be recognized.

Optionally, the apparatus further comprises:

the second determining module is used for determining the characteristic weight of the signaling position training data;

the second acquisition module is used for acquiring training characteristic variables related to the identification of the seed field according to the characteristic weight of the signaling position training data;

and the training module is used for training the seed field prediction model according to the training characteristic variables of the signaling position training data to obtain the trained target recognition model.

Optionally, the training module comprises:

a first determining submodule for determining a feature threshold;

a binarization processing submodule, configured to perform binarization processing on the training feature variable according to the feature threshold value to obtain a binarization feature of the training feature variable;

and the training submodule is used for training the seed field prediction model according to the binarization characteristics and the corresponding labels to obtain the trained target recognition model.

Optionally, the first determining sub-module includes:

the box separation processing unit is used for carrying out box separation processing on the training characteristic variables and determining WOE values and IV values of the training characteristic variables;

a determining unit, configured to determine the feature threshold according to the WOE value, the IV value, and the sample proportion distribution.

Optionally, the sharing processing unit is further configured to:

determining a WOE value corresponding to each packet sample i by:

determining the corresponding IV value for each grouped sample i by:

Optionally, the determining unit is further configured to:

Optionally, the first determining module includes:

the establishing submodule is used for dividing the decision target, the decision criterion and the decision object into a highest layer, a middle layer and a lowest layer according to the mutual relation and establishing a hierarchical structure model;

the comparison submodule is used for comparing every two factors in each layer in the hierarchical structure model and constructing a discrimination matrix according to a comparison result;

the checking submodule is used for checking the consistency of the discrimination matrix;

and the second determining submodule is used for sequentially determining the characteristic weight of the relative importance of all the factors in each layer to the highest layer from the highest layer to the lowest layer.

Optionally, the check submodule is further configured to:

determining a consistency index CI value of the discrimination matrix;

Optionally, the first obtaining module is further configured to:

According to a further embodiment of the application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

According to yet another embodiment of the present application, there is also provided an electronic device, comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Determining the characteristic weight of signaling position data of a field to be identified; acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified; the target characteristic variables are input into a pre-trained target recognition model to predict the seed production field to obtain the prediction result of the field to be recognized, so that the problems that the seed production field is recognized through a remote sensing image in the related technology, the manual work is needed to recognize the seed production field according to the stripe image which is different from the stripe image of the field corn and is required to be input in a large batch, the non-stripe seeding cannot be recognized can be solved, the method can be applied to large-range and accurate seed production field recognition, the seed production field can be recognized without manual participation, and the effect of reducing the labor cost is achieved.

Drawings

Fig. 1 is a block diagram of a hardware configuration of a mobile terminal of a planting field prediction processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of breeding farm prediction processing according to an embodiment of the present application;

FIG. 3 is a flow diagram of a method of breeding farm prediction processing according to an alternative embodiment of the present application;

FIG. 4 is a flow chart of corn breeding field identification based on telecom operator mobile handset signaling according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a ratio of the number of individuals in the tassel-removing period to the number of individuals in the sowing period of woe and the percentage of the marginal planting field according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a ratio of the number of individuals in the tassel-removing period to the number of individuals in the sowing period of woe and the percentage of the marginal planting field according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a ratio of the number of individuals in the tassel-removing period to the number of individuals in the sowing period of woe and the percentage of the marginal planting field according to an embodiment of the present application;

FIG. 8 is a graph showing the ratio of the average residence time of the tassel-removing period to the growing period in woe values and the percentage of the marginal planting field according to the embodiment of the present application;

FIG. 9 is a schematic illustration of a ROC curve according to an embodiment of the present application;

fig. 10 is a block diagram of a planting field prediction processing device according to an embodiment of the present application;

fig. 11 is a block diagram of a breeding farm prediction processing apparatus according to an alternative embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a hardware structure block diagram of a mobile terminal of a breeding farm prediction processing method according to an embodiment of the present disclosure, and as shown in fig. 1, the mobile terminal may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, where the mobile terminal may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the breeding field prediction processing method in the embodiment of the present application, and the processor 102 executes various functional applications and the service chain address pool slicing process by running the computer program stored in the memory 104, thereby implementing the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a seeding field prediction processing method operating in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the seeding field prediction processing method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, determining the characteristic weight of the signaling position data of the field to be identified;

in this embodiment of the application, in step S202, the feature weight may be specifically determined in the following manner: dividing a decision target, a decision criterion and a decision object into a highest layer, a middle layer and a lowest layer according to the mutual relation, and establishing a hierarchical structure model; comparing every two factors in each layer in the hierarchical structure model, and constructing a discrimination matrix according to a comparison result; checking the consistency of the discrimination matrix, specifically, determining a consistency index CI value of the discrimination matrix, and comparing the consistency index CI with a random consistency index RI to determine a check coefficient CR; if the check coefficient is smaller than a preset value, determining that the judgment matrix passes consistency check, and if the check coefficient is larger than or equal to the preset value, determining that the judgment matrix does not pass the consistency check; and finally, sequentially determining the feature weight of the relative importance of all the factors in each layer to the highest layer from the highest layer to the lowest layer.

Step S204, acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified;

correspondingly, in the step S204, the feature variable with the feature weight larger than the preset threshold in the discrimination matrix is selected as the target feature variable.

And S206, inputting the target characteristic variables into a pre-trained target recognition model, and predicting the seed preparation field to obtain a prediction result of the field to be recognized.

Through the steps S202 to S206, the problems that in the related technology, the seed production field is identified through the remote sensing image, manual identification needs to be carried out according to the stripe image which is different from the field corn and is shown in the seed production corn field, large-batch manual image identification needs to be input, and non-stripe sowing cannot be identified can be solved, the remote sensing image identification device can be applied to large-range and accurate seed production field identification, the identification of the seed production field can be completed without manual participation, and the effect of reducing labor cost is achieved.

The embodiment of the invention effectively provides an accurate digital management tool based on real-time and full telecommunication signaling data and capable of judging the suspected corn seed production field according to the identification service model in real time, thereby providing more efficient and accurate illegal seed production field management capability on the basis of ensuring interpretability and greatly releasing manpower.

In the related art, the identification of the seed production field can be carried out only during the heading period of 7-8 months, the sowing time is missed at that time, and legal crops cannot be planted again after being discovered. The embodiment of the invention judges based on the cultivation difference, and in practice, the difference of cultivation characteristics of the field corn and the seed production corn in the sowing period (4-5 months) is found except the heading period (7-8 months), and then the sowing period can be further explored, so that the time window for finding illegal seed production is arranged in front, and the losses of the country and farmers are saved to the greatest extent.

According to the method and the device, the corn seed field is identified based on telecommunication mobile phone signaling, and the characteristic indexes which are relatively related to the seed field identification are obtained by combining actual service conditions and utilizing AHP (analytic hierarchy process) feature screening. And (4) calculating characteristics through WOE and IV values of the two classification scenes to judge the information content contained in a single classification, obtaining a threshold value of a characteristic index, and further realizing index binarization processing. And finally, dividing the data into a training set and a verification set according to a certain proportion, constructing a supervised learning model, and applying the supervised learning model to the identification of the corn seed field.

The embodiment of the present application further provides a training method of the target recognition model, after training according to the model, prediction that a field identity belongs to a seed production field can be performed, fig. 3 is a flowchart of a seed production field prediction processing method according to an alternative embodiment of the present application, and as shown in fig. 3, the method includes:

step S302, determining the characteristic weight of the signaling position training data;

step S304, acquiring training characteristic variables related to the identification of the seed field according to the characteristic weight of the signaling position training data;

and S306, training the seed field prediction model according to the training characteristic variables of the signaling position training data to obtain the trained target recognition model.

The target recognition model obtained through the training in the steps S302 to S306 is more stable, and the prediction result is more accurate when the target recognition model is used for recognizing whether the field is a seed production field.

In step S306, the training may be performed in the following manner:

s3061, determining a characteristic threshold value;

s3062, performing binarization processing on the training characteristic variable according to the characteristic threshold value to obtain binarization characteristics of the training characteristic variable;

s3063, training the seed field making prediction model according to the binarization features and the corresponding labels to obtain the trained target recognition model.

In this embodiment, the S3061 tool may include: performing box separation on the training characteristic variables, determining WOE values and IV values of the training characteristic variables, further performing box separation on the training characteristic variables to obtain a plurality of grouping samples, and determining the WOE value corresponding to each grouping sample i in the following mode:

determining the corresponding IV value for each grouped sample i by:

grouping the plurality of samples into samplesThe sum of the IV values of the books is determined as the IV value of the target characteristic variable, wherein py_iRepresents the ratio of the number of the seed fields in the grouped sample i to the number of the seed fields in the total sample, pn_iRepresents the proportion of the number of non-breeding fields in the group i to the number of non-breeding fields in the total sample, y_i,n_iRespectively representing the number of seed production fields and non-seed production fields in the group i, y_T,n_TRespectively representing the number of the seed production fields and the number of the non-seed production fields in the total sample, and m represents the number of the box groups. Then, determining the characteristic threshold value according to the WOE value, the IV value and the sample proportion distribution, specifically, detecting whether a certain continuous interval and the positive/negative sample proportion show monotonous distribution or not according to the WOE value distribution, the IV value and the positive/negative sample proportion of the plurality of grouped samples; and if so, determining the upper limit of the continuous interval as the characteristic threshold.

The signaling data seed field identification method based on the telecom operator solves the problem that the time window is short in the traditional seed field identification process. In the traditional identification method, the identification time is concentrated in the castration stage, but the method can lead the identification time to the sowing time and help relevant departments to judge suspected seed production fields and illegal seed production fields as early as possible. Fig. 4 is a flowchart of corn breeding field identification based on telecommunication operator mobile phone signaling according to an embodiment of the present application, as shown in fig. 4, including:

s401, calculating feature weight by adopting AHP analytic hierarchy process to obtain feature index strongly related to seed field identification

The analytic hierarchy process decomposes the problem into different composition factors according to the nature of the problem and the total target to be achieved, and combines the factors according to the mutual correlation influence and membership relation among the factors in different levels to form a multi-level analytic structure model, thereby finally leading the problem to be summarized into the determination of the relative important weight of the lowest level (scheme, measure and the like for decision making) relative to the highest level (total target) or the scheduling of the relative order of superiority and inferiority. A complete AHP analytic hierarchy process typically comprises four steps:

and establishing a hierarchical structure model, dividing a decision target, a considered factor (decision criterion) and a decision object into a highest layer, a middle layer and a lowest layer according to the mutual relation among the decision target, the considered factor (decision criterion) and the decision object, and drawing a hierarchical structure diagram. The highest level refers to the purpose of the decision, the problem to be solved. The lowest layer refers to the alternative at decision time. The middle layer refers to the factor to be considered and the decision criterion. For two adjacent layers, the upper layer is called a target layer, and the lower layer is called a factor layer.

When determining the weight among the factors of each layer, the discriminant matrix is often not easily accepted by others if only the qualitative result is obtained, so Saaty et al propose a consistent matrix method, i.e. all the factors are not put together for comparison, but two factors are compared with each other, and relative scale is adopted at this time to reduce the difficulty of comparing the factors with different properties as much as possible, so as to improve the accuracy. If a certain criterion is met, comparing every two schemes below the certain criterion, and grading according to the importance degree of the schemes. a is_ijFor the result of comparing the importance of element i with that of element j, table 1 lists the 9 importance levels given by Saaty and their assignments. The matrix formed by the results of the pairwise comparisons is referred to as the decision matrix. The discrimination matrix has the properties shown in table 1.

TABLE 1

Factor i to factor j	Quantized value
		Of equal importance	1
Of slight importance	3
		Of greater importance	5
Of strong importance	7
		Of extreme importance	9
Intermediate values of two adjacent judgments	2，4，6，8

The hierarchical single ordering and the consistency check thereof, the eigenvector corresponding to the largest characteristic root of the discrimination matrix is normalized (the sum of each element in the vector is equal to 1) and then is marked as W. The elements of W are the sorting weights of the relative importance of the same level factor to a certain factor of the previous level factor, and the process is called level list sorting. If the hierarchical list ordering can be confirmed, consistency check is required, and the consistency check refers to determining an inconsistent allowable range for the A. Wherein the only nonzero characteristic root of the n-order coherent array is n; the largest eigenroot of an n-th order positive reciprocal matrix a is a uniform matrix if and only if λ is n.

Due to the continuous dependence of lambda on a_ijIf λ is larger than n, the inconsistency of a is more serious, the consistency index is calculated by CI, and if CI is smaller, the consistency is higher. And using the feature vector corresponding to the maximum feature value as a weight vector of the influence degree of the compared factor on a certain factor of an upper layer, wherein the larger the inconsistency degree is, the larger the judgment error is caused. The magnitude of the λ -n value can be used to measure the degree of inconsistency of a. Defining the consistency index as:

CI is 0, with complete consistency; CI is close to 0, and the consistency is satisfactory; the larger the CI, the more severe the inconsistency. To measure the magnitude of CI, a random consistency index RI is introduced:

the random consistency index RI is related to the order of the discrimination matrix, and in general, the larger the order of the matrix, the higher the probability of occurrence of consistency random deviation, and the corresponding relationship is shown in table 2.

TABLE 2

Order of matrix			3	4	5	6	7	8	9	10
											RI			0.58	0.90	1.12	1.24	1.32	1.41	1.45	1.49

Considering that the deviation of the consistency may be caused by random reasons, when checking whether the decision matrix has satisfactory consistency, the CI is compared with the random consistency index RI to obtain a check coefficient CR, where the formula is as follows:

in general, the decision matrix is considered to pass the consistency check if CR <0.1, otherwise it does not have satisfactory consistency.

And (4) checking the total hierarchical ordering and the consistency thereof, and calculating the weight of the relative importance of all factors of a certain level to the highest level (total target), which is called the total hierarchical ordering. This process is performed sequentially from the highest level to the lowest level.

In the present embodiment, based on the feedback related to the department of agriculture and the knowledge of the research and development of the field and the farmers in the field, the biggest difference between the corn production field and the corn field is that the corn production needs to arrange labor force to remove tassels and stamens, but the corn field does not need, so that the difference of population thermal data in the tasseling and stampling period is obvious, and the characteristic variables shown in table 3 are obtained preliminarily through the population thermal data in different cultivation periods and the difference analysis of cultivation modes of the corn production field and the non-corn production field.

TABLE 3

And (4) screening original characteristic data of the signaling position data and the label data in a sowing period (4 months and 10 days to 5 months and 10 days), a growing period (6 months and 1 day to 6 months and 30 days) and a tassel removing and emasculation period (7 months and 1 day to 7 months and 31 days) according to days.

The original characteristics comprise information related to human activities (a ratio of the number of individuals in the tassel-removing and emasculation period to the average number of individuals in the sowing period, a ratio of the number of residence times in the tassel-removing and emasculation period to the average number of individuals in the sowing period, a ratio of the number of individuals in the tassel-removing and emasculation period to the average number of individuals in the growth period, a ratio of the residence times in the tassel-removing and emasculation period to the average number of residence times in the growth period), a plot mark (a plot area), plot warp stop/residence user attributes (an average age of users, an average network access time of users, a user network access preference top1 and a suspected working user specific day average value), characteristic factors are set and sequenced;

a discriminant matrix was constructed from the satty scores as shown in table 4.

TABLE 4

A consistency check and a hierarchical ordering are performed, the consistency check passes, and the following weights can be used for the analytic hierarchy process calculations. The maximum characteristic root is 9.4717, the CI value is 0.0590, and the CR value is 0.0404;

the results of the determination are shown in Table 5.

TABLE 5

According to the consistency test result, characteristic indexes strongly related to the seed field identification can be obtained, and the characteristic indexes comprise four items of a ratio of the average number of people in the tassel-removing and emasculation period to the average number of people in the sowing period, a ratio of the average residence time in the tassel-removing and emasculation period to the average number of people in the sowing period, a ratio of the average number of people in the tassel-removing and emasculation period to the average residence time in the growth period.

S402, judging the information content contained in a single category by adopting WOE and IV value calculation characteristics of a two-category scene.

Considering that target variables in a business scene are two classification variables, namely a breeding field and a non-breeding field, four human activity characteristics of a field block are subjected to binning processing by using a WOE method, and the prediction capability of the characteristics is evaluated by using an IV value, wherein the WOE and IV value statistical formula is as follows:

for the WOE value corresponding to packet i, the calculation formula is as follows:

also, for the IV value corresponding to the packet i, the calculation formula is as follows:

the characteristic IV value is the sum of the IV values of all the groups as follows:

wherein, py_iRepresenting the proportion of the number of the seed fields in the group i to the seed fields in the total sample;

pn_irepresenting the proportion of the number of the non-seed fields in the group i to the non-seed fields in the total sample;

y_iand n_iRespectively representing the number of the 'seed production fields' and the 'non-seed production fields' in the grouping i;

y_Tand n_TRespectively representing the number of the seed preparation field and the number of the non-seed preparation field in the total sample;

m represents the number of bin groups.

And S403, obtaining a characteristic threshold according to the WOE and the sample proportion distribution.

And according to the WOE distribution of the binning interval and the positive/negative sample ratio, checking whether a certain continuous interval and the positive/negative sample ratio show monotone distribution or not, if so, indicating that the continuous interval has definite division capacity on the target variable, and taking the upper limit of the continuous interval as a threshold value of characteristic preprocessing.

According to the field investigation result, the time range of the local corn sowing period, the corn growing period range and the time range of the tassel-removing and tassel-removing period are determined, and the obvious characteristics of increased working time and increased working population in the corn sowing period and the tassel-removing and tassel-removing period are determined.

The WOE value and the IV value of each characteristic of the variable, namely the ratio of the average number of people in the day of the tassel-removing period to the sowing period, the ratio of the average length of residence time in the day of the tassel-removing period to the sowing period, the ratio of the average number of people in the day of the tassel-removing period to the growth period and the ratio of the average length of residence time in the day of the tassel-removing period to the growth period are calculated by dividing boxes.

When the characteristic variable ' the ratio of the number of the persons in the tassel-removing period to the number of the persons in the sowing period per day ' and the ratio of the residence time of the tassel-removing period to the sowing period per day ' are more than-10%, the response proportion difference of the tassel-removing period to the sowing period is obvious;

when the characteristic variable 'the ratio of the number of the persons in the tassel-removing period to the number of the persons in the growing period' and 'the ratio of the residence time of the tassel-removing period to the number of the persons in the growing period' are more than 0%, the response proportion difference of whether the plants are corn seed production fields is obvious.

According to woe, obtaining the proportion distribution relation between woe and the marginal planting field corresponding to the characteristics, and when the ratio of the number of the average persons in the tassel-removing and emasculation periods to the number of the sowing periods in the day and the ratio of the residence time of the tassel-removing and emasculation periods to the average day in the sowing periods are more than-10%, the response proportion difference of whether the plants are the planting fields is obvious (only one interval with the lowest woe value, and the whole is kept monotonous); when the ratio of the average number of people in the tassel-removing period to the growing period and the ratio of the average residence time of the tassel-removing period to the growing period are more than 0%, the difference of response proportion of whether the corn seed field is used is obvious (the woe value is more than one in the lowest range, the overall situation is fluctuation distribution, the overall situation is a descending trend in the range [0, 1 ]), the ratio of the average number of people in the tassel-removing period to the sowing period is woe, and the percentage of the marginal seed field is shown in fig. 5.

The ratio of the average number of people in the emasculation period to the growth period is woe and the percentage of the marginal seed field, as shown in FIG. 6.

The ratio of the number of the tassel-removing and the seeding time to the average number of people in the day is woe, and the ratio of the marginal seed field is shown in FIG. 7.

The ratio of the retention time of the tassel-removing period to the day-average growth period is four woe values and the percentage of the marginal planting field is shown in FIG. 8.

And S404, performing binarization processing on the characteristic variable by using the characteristic threshold value.

And after the characteristic threshold value is obtained, immediately carrying out characteristic transformation to realize variable binarization processing.

Marking the ratio of the number of the average persons in the tassel and emasculation period to the sowing period and the ratio of the average residence time in the days of the tassel and emasculation period to the sowing period as 1 when the ratio is more than-10 percent, or else as 0; when the ratio of the average number of people in the tassel-removing period to the growth period and the ratio of the average residence time in the tassel-removing period to the growth period are more than 0%, the mark is 1, otherwise, the mark is 0. The result of the binarization processing is shown in table 6.

TABLE 6

And S405, forecasting cultivated land preparation by using GBDT.

Gbdt (gradient Boosting Decision tree) is an iterative Decision tree algorithm that consists of multiple Decision trees (CART regression trees), each of which is serially iterated, and the conclusions of all the trees are summed up to make the final answer.

GBDT differs from conventional Boosting in that each calculation is to reduce the residual from the previous one, and in order to eliminate the residual, a model is built in the gradient direction where the residual is reduced. Therefore, in the GBDT, each new decision tree model is established to make the residual of the previous model go down to the gradient, which is greatly different from the conventional Boosting that focuses on the weighting of correct and wrong samples. In the GradientBoosting algorithm, the key is to use the value of the negative gradient direction of the loss function in the current model as an approximate value of the residual error, and then fit a regression tree.

Carrying out the model on the training samples left validation (validation set 40%) with the results shown in table 7.

TABLE 7

Type (B)	Precision ratio	Recall ratio of	F1
				Non-seed-making field	0.50	0.29	0.36
Seed production field	0.89	0.95	0.92

AUC value was 0.81, and ROC curve is shown in FIG. 9.

Aiming at the verification result data, the real seed production field block is subjected to result identification, and the positive sample identification rate is counted

And (4) verification result: taking the verification data of the Ganzhou area as an example, 114 seed production fields which are truly dispersed are recorded in batch, 98 suspected seed production fields are judged by the model, the positive sample identification rate is 85.96 percent, and the verification condition is similar to the internal verification condition of the model.

According to the embodiment of the application, the python language, the machine learning algorithm and the spark distributed computing framework are used for carrying out distributed packaging on related technical details, containerization management is carried out, and the data and the resource isolation of a user are guaranteed; and designing a seed field making evaluation page displayed at the front end, and distinguishing field identification in a front-end and back-end linkage mode. On the basis of an AHP (analytic hierarchy process) feature screening method, feature indexes which are relatively related to seed field identification are obtained, information content contained in a single category is judged by computing features through WOE (world Wide area) and IV (input/output) values of two classification scenes, a threshold value of the feature indexes is obtained, and index binarization processing is further achieved. On the premise of finishing the processing, the data are divided into a training set and a verification set according to a certain proportion, a supervised learning model is constructed, and the supervised learning model is applied to the identification of the corn seed production field of a large batch of fields.

According to another embodiment of the present application, there is provided a seed production field prediction processing device, and fig. 10 is a block diagram of the seed production field prediction processing device according to the embodiment of the present application, and as shown in fig. 10, the seed production field prediction processing device includes:

a first determining module 102, configured to determine a feature weight of signaling location data of a field to be identified;

a first obtaining module 104, configured to obtain a target feature variable related to the identification of the breeding farm according to the feature weight of the signaling location data to be identified;

and the prediction module 106 is configured to input the target characteristic variable into a pre-trained target recognition model, and perform a prediction of a breeding field to obtain a prediction result of the field block to be recognized.

Fig. 11 is a block diagram of a planting field prediction processing apparatus according to an alternative embodiment of the present application, as shown in fig. 11, the apparatus further comprising:

a second determining module 112, configured to determine feature weights of the signaling location training data;

a second obtaining module 114, configured to obtain a training feature variable related to the breeding field identification according to the feature weight of the signaling location training data;

and the training module 116 is configured to train the seed field prediction model according to the training characteristic variables of the signaling position training data, so as to obtain the trained target recognition model.

Optionally, the training module 116 includes:

a first determining submodule for determining a feature threshold;

Optionally, the first determining sub-module includes:

Optionally, the sharing processing unit is further configured to:

determining a WOE value corresponding to each packet sample i by:

determining the corresponding IV value for each grouped sample i by:

Optionally, the determining unit is further configured to:

Optionally, the first determining module 102 includes:

Optionally, the check submodule is further configured to:

determining a consistency index CI value of the discrimination matrix;

Optionally, the first obtaining module 104 is further configured to:

Embodiments of the present application further provide a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to, when executed, perform at least the following steps in any of the above method embodiments:

s1, determining the characteristic weight of the signaling position data of the field to be identified;

s2, acquiring a target characteristic variable related to the seed field identification according to the characteristic weight of the signaling position data to be identified;

and S3, inputting the target characteristic variables into a pre-trained target recognition model, and performing seed field making prediction to obtain a prediction result of the field to be recognized.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present application further provide an electronic device, comprising a memory and a processor, the memory having a computer program stored therein, the processor being configured to execute the computer program to perform at least the following steps in any of the above method embodiments:

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.

It will be apparent to those skilled in the art that the various modules or steps of the present application described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing devices, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into separate integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of seed field prediction processing, comprising:

2. The method of claim 1, further comprising:

determining the characteristic weight of the signaling position training data;

3. The method of claim 2, wherein training a breeding farm prediction model according to the training feature variables of the signaling location training data to obtain the trained target recognition model comprises:

determining a characteristic threshold;

4. The method of claim 3, wherein the determining a feature threshold comprises:

5. The method of claim 4, wherein binning the training feature variables and determining the WOE and IV values of the training feature variables comprises:

determining a WOE value corresponding to each packet sample i by:

determining the corresponding IV value for each grouped sample i by:

6. The method of claim 5, wherein determining the feature threshold based on the WOE value, the IV value, and the sample proportion distribution comprises:

7. The method of claim 1, wherein determining the feature weight of the signaling location data for the field to be identified comprises:

checking the consistency of the discrimination matrix;

8. The method of claim 7, wherein checking the decision matrix for consistency comprises:

determining a consistency index CI value of the discrimination matrix;

9. The method of claim 8, wherein obtaining the target feature variable related to the breeding farm identification according to the feature weight of the signaling location data to be identified comprises:

10. A seed field prediction processing apparatus, comprising:

the determining module is used for determining the characteristic weight of the signaling position data of the field to be identified;

the acquisition module is used for acquiring a target characteristic variable related to the identification of the seed field according to the characteristic weight of the signaling position data to be identified;

11. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 9 when executed.

12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 9.