CN116957838A

CN116957838A - Crop growth environment monitoring method based on knowledge graph representation learning

Info

Publication number: CN116957838A
Application number: CN202310989345.3A
Authority: CN
Inventors: 杨强; 张桃; 李庆; 胡隆河; 乔少杰; 张楠
Original assignee: Chengdu University of Information Technology; Yibin University
Current assignee: Chengdu University of Information Technology; Yibin University
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-10-27

Abstract

The invention discloses a crop growth environment monitoring method based on knowledge graph representation learning, which comprises the steps of collecting sensor data and transmitting the sensor data to a database, and preprocessing the data collected by different sensors, wherein the preprocessing comprises data denoising and missing value complementation; performing standardized processing on the preliminarily processed sensor data to obtain feature vectors of the multi-sensor data; according to the characteristic vector of the multi-sensor data, carrying out data fusion on crop growth environment data acquired by different types of sensors; constructing a crop growth environment detection knowledge graph model by taking the feature vectors after data fusion as training samples; and (3) inputting the crop knowledge graph as an initial characteristic into a graph neural network, predicting various indexes of the crop growth environment and constructing a comprehensive evaluation system of the crop growth environment.

Description

Crop growth environment monitoring method based on knowledge graph representation learning

Technical Field

The invention relates to the technical field of data information fusion, in particular to a crop growth environment monitoring method based on knowledge graph representation learning.

Background

With the growth of population and the acceleration of urbanization, agricultural production faces a number of challenges. How to improve the agricultural production efficiency, optimize the agricultural ecological environment, ensure the quality safety of agricultural products and the like has become an important issue in the modern process of agriculture. And the development of greenhouse planting technology provides a new idea and method for solving the problems. Through the greenhouse cultivation technology, the optimal growth conditions can be provided for the growth of crops in indoor or semi-indoor environments by controlling factors such as air temperature, humidity, illumination and the like. The method can not only effectively improve the yield and quality of crops, but also reduce the use of pesticides and fertilizers, reduce the production cost and realize sustainable agricultural development. However, implementation of the facility cultivation technique requires recourse to scientific information acquisition and processing means. In this respect, the application of the wireless sensor network technology provides powerful support for the development of greenhouse cultivation technology.

The wireless sensor network consists of a plurality of sensor nodes with low power consumption, low cost and miniaturization, and can realize the accurate monitoring and control of the crop growth environment. Through the wireless transmission network, the information of various environments or monitoring objects in the facility can be acquired, transmitted and processed in real time, and meanwhile, the instructions can be transmitted to the target node through the network, so that the environment is controlled. In smart agriculture, the use of multi-sensor data acquisition presents new challenges to data fusion. In order to achieve comprehensive monitoring of various environmental elements within a facility, data acquisition of multiple sensors is required simultaneously. However, the data collected by these sensors may have variability and inconsistency, and a reasonable fusion is required to obtain more accurate and comprehensive information. Therefore, the method for realizing multi-sensor data fusion has very important application value. Through carrying out reasonable fusion to the data that each sensor node gathered, can obtain more accurate and comprehensive information to provide more scientific, accurate guidance for agricultural production. The wireless sensor network technology plays an important role in the application of the facility cultivation technology, and realizes the accurate monitoring and control of the crop growth environment. Meanwhile, the application of the multi-sensor data fusion method can provide more accurate and comprehensive information support for agricultural production and powerful support for the modernization process of the agricultural production.

Disclosure of Invention

The invention aims to improve the efficiency and quality of traditional agricultural production by a crop growth environment monitoring method based on knowledge graph representation learning and fusion of multi-sensor data. The traditional agricultural production is mainly managed by means of manual experience, and the method has the defects of high management cost, low efficiency, large error and the like. The invention adopts advanced informatization, digitalization and intelligent technology to monitor and collect the growth environment of the related agricultural products in real time, and the environment comprises a plurality of indexes such as soil moisture, temperature, humidity, air pressure and the like. By analyzing and fusing the data, the growth state of the crops can be comprehensively known by constructing a knowledge graph, the crops are finely managed, and the agricultural production benefit is improved. Compared with the traditional intelligent agriculture, the intelligent agricultural intelligent monitoring system has the advantages that the technical means are more advanced, the refinement degree is higher, the growth state of crops can be monitored more comprehensively, and more accurate and comprehensive information support is provided for agricultural production. Meanwhile, the invention can effectively solve the problems existing in the traditional agricultural production management, improve the efficiency and quality of agricultural production and provide powerful support for sustainable development of agricultural production.

The invention aims at realizing the following technical scheme:

the crop growth environment monitoring method based on knowledge graph representation learning comprises the following steps:

step S1: collecting sensor data and transmitting the sensor data to a database, and preprocessing the data collected by different sensors, wherein the preprocessing comprises data denoising and missing value complementation;

step S2: performing standardized processing on the preliminarily processed sensor data to obtain feature vectors of the multi-sensor data;

step S3: according to the characteristic vector of the multi-sensor data, carrying out data fusion on crop growth environment data acquired by different types of sensors;

step S4: constructing a crop growth environment detection knowledge graph model by taking the feature vectors after data fusion as training samples;

step S5: and (3) inputting the crop knowledge graph as an initial characteristic into a graph neural network, predicting various indexes of the crop growth environment and constructing a comprehensive evaluation system of the crop growth environment.

Further, the step S1 specifically includes:

step S101: determining a distribution of crop sensors;

step S102: collecting different types of sensor data according to the set acquisition points, and transmitting and storing the sensor data into different databases;

Step S103: and preprocessing mass data in the database, including denoising the data and complementing the missing values.

Further, the data collection method in step S101 is as follows: according to the planting condition and the topography characteristics of the crop base, the distribution range and the density of the sensor are determined so as to ensure that the crop base can be comprehensively covered and the data can be accurately collected; the sensors can be classified into different types, such as soil sensors, weather sensors, biological sensors, etc., according to the growth cycle of crops and the different environmental requirements of different growth stages, so as to collect different types of data.

Further, the data collection method in step S102 is as follows: according to the set collection points, collecting different types of sensor data, transmitting and storing the sensor data into different databases, adopting different data collection modes aiming at different types of sensors, dividing the base into a plurality of collection points according to the actual condition of the crop base, placing a plurality of sensors in each collection point, and transmitting the data of different collection points into different databases so as to facilitate subsequent data processing and analysis;

step S103: and preprocessing mass data in the database, including data denoising, complement missing values and the like.

Further, the step S2 of performing normalization processing on the preliminarily processed sensor to obtain a feature vector of the multi-sensor data includes:

step S201: the crop growth environment data acquired from the multiple sensors are decomposed according to different time characteristics to obtain a plurality of feature vectors, and the feature vectors are combined into a feature matrix phi to be used as input of standardized transformation;

step S202: carrying out standardization processing on each characteristic vector of the matrix phi, and carrying out standardization transformation on each characteristic element to obtain a standardized characteristic matrix;

step S203: and (3) performing loop iteration on the steps S201 and S202, and performing data dimension reduction and redundant information by adopting an independent component analysis method.

Further, the data collection method in step S201 is as follows: the acquired data is represented as a vector in S102: temperature sensor vector o= { O ₁ ,o ₂ ,…,o _n Air humidity sensor vector c= { C } ₁ ,c ₂ ,…,c _n Wind sensor vector w= { W } ₁ ,w ₂ ,…,w _n Atmospheric pressure sensor vector p= { P } ₁ ,p ₂ ,…,p _n Carbon dioxide sensor vector z= { Z } ₁ ,z ₂ ,…,z _n Light intensity sensor vector l= { L } ₁ ,l ₂ ,…,l _n The soil temperature sensor vector t= { T } ₁ ,t ₂ ,…,t _n Soil EC value Sensor vector g= { G ₁ ,g ₂ ,…,g _n Soil PH sensor vector q= { Q ₁ ,q ₂ ,…,q _n The soil moisture sensor vector j= { J } ₁ ,j ₂ ,…,j _n -a }; the eigenvectors of the different sensor data are then combined into an eigenvector matrix, respectively:

wherein ,φ_nκ A value representing an nth sensor of a kth type, κ representing a different type of sensor, n representing the number of sensors;

the normalized feature matrix in step S202 is expressed as: according to the matrix Φ in S201, performing normalized transformation on each feature element to obtain a normalized feature matrix as follows:

wherein ,M＝Median(|φ _ij -m _j |)，/>representing the sensor variable phi _ij Average value of m _j The mean value calculated for the j-th column of sensor data in the same time period is represented, M represents the mean value of absolute values of differences between each value and the mean value, eta is an adjustable parameter, and the method of grid search is selected for determination.

Further, the step S3 of performing data fusion on the crop growth environment data from different types of sensors according to the feature vectors of the multi-sensor data includes:

step S301: carrying out crop growth environment evidence theoretical synthesis on the credibility of each sensor in each sampling period to obtain a credibility measurement vector of each sensor in each sampling period;

Step S302: for the credibility metric vector of all sampling periods, calculating a weight factor of the credibility metric vector and calculating a basic probability distribution function of the credibility metric vector;

step S303: and calculating the credibility measure R and the uncertainty measure U of the credibility intervals of the different types of sensors, and carrying out feature fusion on the acquired data by taking the credibility measure R and the uncertainty measure U as weight factors.

Further, the reliability metric vector in each sampling period in step S301 is expressed as:

where i denotes the number of sensors (i=1, 2,3, …, m), j denotes the number of sampling periods (j=1, 2,3, …, n), each sensor continuously collects data points during each sampling period, the reliability measure of the sensor data is represented by a real number between 0 and 1, where 0 denotes completely unreliable and 1 denotes completely reliable.

Further, the formula for calculating the weight factor is as follows:

wherein ,a_ij Representing each element in the confidence measure vector, w (j) represents the confidence measure vector A _ij M represents the number of sensors, ε _j Is the entropy value of the j-th index,for the information entropy coefficient>Representation A _ij For the contribution degree of the classification result, m-1 represents the normalization factor of the weight factors, so that the sum of all the weight factors is 1;

Confidence interval y of n sampling values under confidence degree alpha _i The basic probability distribution of (a) is:

wherein ,y_i ＝[y _i -σ，y _i +σ]σ is the measurement standard deviation, i=1, 2,3,..n;

further, in step S303, the confidence measure R and the uncertainty measure U of the trusted interval of the sensor are calculated as follows:

R(A _ij )＝∑w(j)*ζ(y _i )

U(A _ij )＝1-R(A _ij )

wherein w (j) is a confidence measure vector A _ij Weight factor, ζ (y) _i ) For confidence interval y _i Is used for basic probability distribution of (a).

After the reliability measure R and the uncertainty measure U of the sensor are calculated, the reliability measure R and the uncertainty measure U of the sensor are fused as weight factors, the reliability measure R of the sensor is multiplied by the data collected by each sensor, then the products are added and divided by the sum of the reliability measures of all the sensors to obtain fused data, and the fused data F is calculated for each sensor i and the sampling period j ^k The formula (i, j) is as follows:

wherein ,R^k (i) Representing the trustworthiness measure of the ith sensor,representing data acquired by the ith sensor during the kth sampling period, R ^k (A _ij ) Representing the confidence metric vector for all sensors at the kth sampling period.

Further, the step S4 of constructing a knowledge graph of a crop growth environment using the data obtained by fusing the plurality of sensors as a training sample includes:

Step S401: knowledge extraction is carried out on the preprocessed crop growth environment data and crop pest information data, preprocessing is carried out on texts, the preprocessing comprises segmentation, part-of-speech labeling, dependency analysis, semantic dependency analysis and other modules, the processed information is stored in the form of { entity, relationship and attribute }, and a label is set according to the information cleaning result and the knowledge graph; finally, the original information is formatted into a CSV file and stored in a local store.

Step S402: by usingThe labeling method labels the crop data entity extracted in the step S401 as gamma labels, and the entity E= { E in the text ₁ ,e ₂ ,…,e _i The entity set of knowledge base includes |E| different entities, R= { R ₁ ,r ₂ ,…,r _j The expression of a set of relationships with Γ, together comprising |R| different relationships, |>Representing Γ and entity e _i Position information of a character of (B), wherein B represents the beginning of a word of an entity, I represents the middle part of an entity, S represents a word as the end of an entity, D represents that the word is an independent entity,/-, and>for non-entity, each tag e matched into a piece of data _i And correspondence r _j Complete B, BIS, BS or D set, and extracting the entities Γ and e corresponding to the tag set _i By tag mapping and data parsing, (Γ, r) is formed _j ,e _i ) A triplet;

step S403: based onAnd (5) marking various crop information data extracted by the mode, and constructing a crop growth environment information naming entity identification model. The named entity recognition model includes three layers: word vector pre-training language model, gate control circulating unit layer and reinforcement learning layer;

step S404: and (3) storing knowledge by using a knowledge graph database, classifying the knowledge questions of the crop growth environment information named entity identified in the step (S403), replacing keywords of the input questions by variables in a template of the knowledge graph database according to the types of the questions input by a user, and finally transmitting the keywords to the knowledge graph database to inquire and acquire answers to the questions.

The word vector pre-training language model firstly performs embedding operation on an input word vector, and an original text sequence T= { T ₁ ,t ₂ ,…,t _n And special start connector [ CLS ]]And end connector [ SEP ]]Connecting to form a sequence to obtain a word vector, a segment vector and a position vector input, wherein the word vector continuously flows forwards in a stack of the encoder, each layer passes through an attention layer and transmits the result to the next encoder through a feedforward neural network; preprocessing a text to obtain text characteristics T epsilon R ^l*d Where l represents the length of the encoder output sequence and d represents the dimension of the encoder concealment state, which is transmitted to the encoder of the transducer; the encoder includes an attention layer and a residual network layer, a tau layer T for a text feature layer ^(τ) Weight W with query, key and value _Q ，W _K and W_V Q, K, V respectively represent query, key and value, q=t ^(τ) W _Q ，K＝T ^(τ) W _K ，V＝T ^(τ) W _V The method comprises the steps of carrying out a first treatment on the surface of the The specific formula of each layer representation of the encoder is as follows:

wherein Layernorm (·) represents the normalization function,the tau-layer word vector representing text features, softmax (·) representing the activation function normalizes the attention weight to represent an effective probability distribution, res (·) representing the output of the residual network layer, which represents the feature representation after Linear transformation and ReLU activation function processing, linear (·) representing the Linear transformation, reLU (·) representing a modified Linear element activation function, T representing the matrix transpose operation, and after calculation of all encoder layers, applying the Linear transformation to the final output.

The gating circulation unit layer specifically comprises an update gate, a reset gate, a candidate hidden layer and a final output layer, and the calculation formula is as follows:

U _t ＝σ(W _ux ·X _t +H _t-1 ·W _uh +B _u ) (update door)

R _t ＝σ(W _rx ·X _t +H _t-1 ·W _rh +B _r ) (reset door)

wherein ,U_t For the input gate at time t, σ is the activation function, transforming the data into a value in the range 0-1, thereby acting as a gating signal, X _t For inputting vector at time t, W _ux and B_u For the weight parameter and bias parameter of the input vector in the input gate unit, H _t-1 and W_uh Outputting a result and a weight parameter for a t-1 moment hidden layer; r is R _t Reset gate at time t, W _rx and B_r Weight and bias parameters of input vector at reset gate, W _rh A weight matrix for the reset gate for the hidden state;for candidate hidden layers, i.e. candidate states calculated from the current input and the previous temporal hidden state, +.> and />For the input weight matrix of the candidate hidden layer and the weight matrix of the previous hidden state of the candidate hidden layer +.>Bias vectors for candidate hidden layers; when R is _t When approaching zero, the model discards the hidden information in the past and only retains the information input currently; when R is _t When approaching 1, it is considered that the past information would act and added to the current information, and that it indicates dot product; />Indicating a selective "forget" of the previously hidden state, ", of>Indicating that the candidate hidden state of the current node is selectively "remembered".

The reinforcement learning layer is divided into five parts<ξ，λ，ρ，ψ，γ>Representation, wherein ζ represents the state space, ζ _i In the identification of the named entity of the crop growing environment, the i state is the current text or voice input, the identified entity, the context information and the like; lambda represents the motion space, lambda _i For the ith action, in named entity recognition, the action is the entity type in the labeling text, such as geographic position, crop variety, climate factor and the like; ρ represents a probability distribution of state transitions from one state to another, in named entity recognition, the state transition probability may determine the next state from the current state and selected actions, e.g., state transitions based on context information and rules; psi represents rewards or punishment functions of new states obtained by actions, and in named entity recognition, rewards functions are defined to measure accuracy of recognition results or other indexes, such as the number of correctly recognized entities, the number of incorrectly recognized entities and the like; gamma is the discount factor for the jackpot calculation.

The goal of the decision is to find the optimal strategy in the environment at each moment to get the maximum discount and consideration. The intelligent agent selects the next action according to the current environment state and positive feedback and negative feedback generated when interacting with the environment, and if the feedback caused by the strategy is positive feedback, the intelligent agent can obtain positive compensation; in the next interaction, the agent tends to select an action using a policy, changing the environment at the next moment, and when the state changes, the reward may be passed through the state transition probability, calculating the discount reward in the optimal state.

Further, step S5 is to embed the vector of the entity of the constructed knowledge graph as an initial feature to be input into the graph neural network, predict various indexes of the crop growth environment and construct a comprehensive evaluation system of the crop growth environment, and includes the following steps:

step S501: based on the step S4, training an n-layer neural network by taking the knowledge graph representation vector as an initial characteristic of a node in the graph neural network; in each layer, firstly, randomly selecting adjacent nodes of a target node according to a certain proportion, aggregating node information, and integrating and updating the aggregated value by using the information of the node; then, carrying out dot product operation on any two nodes to obtain a prediction score of the relation between the two nodes, wherein the relation of the highest prediction score is the most likely selected side, specifically, a node u represents an event node to be predicted, v represents a candidate prediction area node of the event u, and carrying out dot product operation on u and all the candidate prediction area nodes to obtain the probability that u corresponds to various indexes of different crops, wherein the calculation formula is as follows:

wherein u is N (v)

wherein ,for crop growth environment monitoring messages received from node v's neighboring node u on the nth iteration, pol ⁿ For the next node information for aggregation node v for the nth iteration round +.>Function of->N (v) is the event set directly linked to node v in the crop knowledge graph, which is the node state at the N-1 th iteration round,/v>Is a combination of information from the upper layer and the aggregation result of the present layer; COM (COM) ⁿ Is a function for combining the information of node v after the n-1 th round of iteration and the n-th round of aggregation information, comprising the previous round state +.>And aggregation information of neighboring node home run +.>

Step S502: air temperature, air humidity, wind speed and CO acquired by different types of sensors in different time periods ₂ Analyzing environmental changes such as solar radiation, rainfall and the like, comprehensively evaluating the crop growth environment, performing gray correlation analysis on a plurality of crop growth environment index data by adopting a gray correlation analysis and fuzzy comprehensive evaluation method, calculating the contribution weight of each index to crop growth carriers such as soil quality or fertilizer and the like, comprehensively evaluating the scores obtained by each index by utilizing the fuzzy comprehensive evaluation method, finally obtaining the comprehensive evaluation result of the crop growth carriers such as soil quality or fertilizer and the like, returning the evaluation result to the step S4, and updating and iterating the crop growth environment knowledge graph.

The beneficial effects of the invention include:

(1) The crop growth environment monitoring method based on knowledge graph representation learning can integrate data of a plurality of sensors, and improve accuracy and reliability of crop growth environment monitoring. Different types of sensors can measure different environmental parameters, data fusion of a plurality of sensors can effectively synthesize various environmental parameter information, the comprehensiveness and the accuracy of the crop growth environment are improved, agricultural producers can better know the crop growth environment, and effective management measures are timely taken, so that the production efficiency and the quality are improved.

(2) By constructing a knowledge graph of crop growth information, predicting the crop information by using a graph neural network, and rapidly making a targeted crop growth environment management decision according to environmental parameters acquired in real time, the agricultural production efficiency and quality are improved. In addition, the sensor data can acquire crop growth environment parameters in real time, environmental anomalies can be rapidly identified through data processing and data analysis technology, early warning is carried out in advance, proper management measures are adopted, agricultural producers can better know the growth conditions of crops, effective management measures are timely adopted, and production efficiency is improved.

(3) The method can comprehensively evaluate the crop growth environment and provide scientific basis and data support for agricultural management decision. Through the data fusion of a plurality of sensors, the crop growth environment including soil moisture, temperature, humidity, illumination intensity and CO can be comprehensively evaluated ₂ Concentration, etc. By comprehensively evaluating the aspects, the growth environment of crops can be more comprehensively known, and scientific basis and data support are provided for agricultural management. The agricultural management personnel can quickly make targeted crop planting management decisions such as pest management, water and fertilizer management, irrigation management, greenhouse ventilation and the like according to comprehensive evaluation results, so that the agricultural production efficiency and the crop quality are improved.

(4) The dependence on manual observation and manual operation can be effectively reduced, and the automation degree of agricultural management is improved. The traditional crop growth environment monitoring method requires agricultural management personnel to observe and operate manually, and consumes a large amount of manpower and material resources. The knowledge graph shows the learning and multi-sensor data fusion method, so that the automatic acquisition, identification, analysis, monitoring and management of the crop growth environment information can be realized, the dependence on manual operation is reduced, the monitoring efficiency and accuracy can be improved, and the higher degree of automation and the higher degree of scientization are brought to agricultural management.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of a crop growth environment monitoring method based on knowledge graph representation learning;

FIG. 2 is a flowchart of a preprocessing process according to an embodiment of the present invention;

FIG. 3 is a flow chart of a multi-source data standard process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a crop information naming entity recognition model according to an embodiment of the present invention;

fig. 5 is a diagram of a crop growth environment prediction model based on a knowledge graph according to an embodiment of the present invention.

Detailed Description

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

The invention relates to a crop growth environment monitoring method based on knowledge graph representation learning, which is shown in figure 1 and comprises the following steps:

In this embodiment, a preprocessing flow for data collected by different sensors is shown in fig. 2, and step S1 specifically includes the following steps:

step S101: determining a distribution of crop sensors;

Further, the data collection method in step S102 is as follows: different types of sensor data are collected according to the set individual acquisition points, and the sensor types include, but are not limited to, temperature sensors, air humidity sensors, wind sensors, barometric pressure sensors, carbon dioxide sensors, illumination intensity sensors, soil temperature sensors, and the like, and are transmitted and stored in different databases. Aiming at different types of sensors, different data acquisition modes are adopted, the base is divided into a plurality of acquisition points according to the actual condition of the crop base, a plurality of sensors are placed in each acquisition point, and data of different acquisition points are transmitted to different databases so as to facilitate subsequent data processing and analysis;

in this embodiment, the data fusion of the crop growth environment data collected by the different types of sensors is shown in fig. 3, and for steps S2 and S3, the following details are described by specific examples:

assume that there are four sensor groups of data, each sensor group having three sampling periods of data: temperature sensor t= {25.2,25.1,25.3}, humidity sensor h= {0.65,0.63,0.66}, light sensor l= {450,460,455}, wind speed sensor w= {2.1,2.2,2.0}. First, according to step S201, different types of sensor data are combined into one matrix Φ:

Then, according to step S202, the matrix Φ is normalized, and a normalized matrix obtained by a normalization method is adoptedThe following are provided:

according to step S301, a reliability metric vector a (j) for each sensor in each sampling period is calculated. Assuming that the confidence measure is defined to be between 0 and 1 (0 representing unreliable, 1 representing fully reliable), the confidence measure vector A is randomly generated _ij Is as follows:

according to step S302, a weight factor and a base probability distribution function are calculated. Knowing 3 sensor groups (m=3), the weight factor w (j) is calculated as follows:

for the calculation of the weight factor w (j), it is necessary to first calculate the entropy value ε of each index _j ：

ε ₁ ＝-1/ln(3)*[a ₁₁ *log ₂ (a ₁₁ )+a ₂₁ *log ₂ (a ₂₁ )+a ₃₁ *log ₂ (a ₃₁ )+a ₄₁ *log ₂ (a ₄₁ )]

ε ₂ ＝-1/ln(3)*[a ₁₂ *log ₂ (a ₁₂ )+a ₂₂ *log ₂ (a ₂₂ )+a ₃₂ *log ₂ (a ₃₂ )+a ₄₂ *log ₂ (a ₄₂ )]

ε ₃ ＝-1/ln(3)*[a ₁₃ *log ₂ (a ₁₃ )+a ₂₃ *log ₂ (a ₂₃ )+a ₃₃ *log ₂ (a ₃₃ )+a ₄₃ *log ₂ (a ₄₃ )]

Calculating a weight factor w (j):

w(1)＝(1-ε ₁ )/2*(ε ₁ +ε ₂ +ε ₃ )＝0.451

w(2)＝(1-ε ₂ )/2*(ε ₁ +ε ₂ +ε ₃ )＝0.333

w(3)＝(1-ε ₃ )/2*(ε ₁ +ε ₂ +ε ₃ )＝0.216

basic probability distribution function ζ (y _i ) The calculation result when the confidence α=0.9 is as follows:

ζ(y _i )＝α/3＝0.3

then, according to step S303, a reliability measure R and an uncertainty measure U of the sensor group are calculated for the temperature sensor T:

R(T)＝w(1)*ζ(T1)+w(2)*ζ(T2)+w(3)*ζ(T3)

＝(0.451*0.3)+(0.333*0.3)+(0.216*0.3)

＝0.326

U(T)＝1-R(T)

＝1-0.326

＝0.674

similarly, a reliability measure R and an uncertainty measure U of the humidity sensor group, the light sensor group, and the wind sensor group may be calculated.

And finally, fusing the acquired data according to the formula in the step S3. Assuming that the fusion value F of the fusion data of the temperature sensor T in the first sampling period needs to be calculated, the fusion value F is obtained according to the formula:

F ¹ (1,1)＝[R(1)*d ¹ (1,1)+R(2)*d ¹ (1,2)+R(3)*d ¹ (1,3)]/[(R(1)+R(2)+R(3)]

＝0.362*25.2+0.375*25.1+0.248*25.3

＝24.809

Similarly, a fusion value of the humidity sensor group, the light sensor group, and the wind speed sensor group may be calculated.

step S401: the method comprises the steps of carrying out knowledge extraction on preprocessed crop growth environment data and crop pest information data, preprocessing texts by using a NLTK (Natural Language Toolkit) tool kit, including segmentation, part-of-speech tagging, dependency analysis, semantic dependency analysis and other modules, storing processed information in the form of { entity, relationship, attribute }, and setting labels by combining a result of information cleaning and a requirement of a knowledge graph for crop aliases, variety names, variety sources, characteristic features, yield performance, breeders, cultivation technologies, suitable climates, illumination intensity, temperature preference, humidity, soil nutrients, disease names, morphological features of disease plants, main harmful parts, harmful symptoms, prevention methods and the like; finally, the original information is formatted into a CSV file and stored in a local store.

for example, for the original text: the result of labeling the region with mild climate, abundant rainfall, fertile soil and suitability for planting rice is as follows: the method comprises-ground->-zone->Air b-environmental factors i-environmental factors b-environmental factors and i-environmental factors,-lowering b-environmental factors rain i-environmental factors b-environmental factors Peer i-environmental factors, < ->-soil b-environmental factor soil i-environmental factor fertilizer b-environmental factor woi-environmental factor, < ->-b-related words i-related words planting i-related words water b-crop type rice i-crop type; the method comprises the steps of identifying environmental factors, crop types and relation words in a text, and marking the positions of the environmental factors, crop types and relation words in sentences;

Step S403: based onVarious crop information data extracted by the labeling mode are used for constructing a named entity recognition model of crop growth environment information, and the named entity recognition model comprises three layers: the BERT word vector pre-training language model, the gating cycle unit layer and the reinforcement learning layer are shown in fig. 4;

step S404: and storing knowledge by utilizing Neo4j, classifying the knowledge question of the crop growth environment information naming entity identified in the step S403, replacing keywords of the input question by variables in a template according to the type of the question input by a user, and finally transmitting the keywords to Neo4j to inquire and acquire answers of the questions.

The word vector pre-training language model firstly performs embedding operation on an input word vector, and an original text sequence T= { T ₁ ,t ₂ ,…,t _n And special onsetInitial connector [ CLS ]]And end connector [ SEP ]]Connecting to form a sequence to obtain a word vector, a segment vector and a position vector input, wherein the word vector continuously flows forwards in a stack of the encoder, each layer passes through an attention layer and transmits the result to the next encoder through a feedforward neural network; preprocessing a text to obtain text characteristics T epsilon R ^l*d Where l represents the length of the encoder output sequence and d represents the dimension of the encoder concealment state, which is transmitted to the encoder of the transducer; the encoder includes an attention layer and a residual network layer, a tau layer T for a text feature layer ^(τ) Weight W with query, key and value _Q ，W _K and W_V Q, K, V respectively represent query, key and value, q=t ^(τ) W _Q ，K＝T ^(τ) W _K ，V＝T ^(τ) W _V The method comprises the steps of carrying out a first treatment on the surface of the The specific formula of each layer representation of the encoder is as follows:

wherein Layernorm (·) represents the normalization function,the tau-layer word vector representing the text feature, softmax (·) represents the normalization of the attention weight by the activation function to represent an effective probability distribution, res (·) represents the output of the residual network layer, which represents the time-lapseThe feature representation after Linear transformation and ReLU activation function processing, linear (-) represents Linear transformation, reLU (-) represents a modified Linear unit activation function, T represents matrix transposition operations, and after calculation of all encoder layers, linear transformation is applied to the final output.

In this embodiment, the gating cycle unit layer specifically includes an update gate, a reset gate, a candidate hidden layer, and a final output layer, where the calculation formula is as follows:

U _t ＝σ(W _ux ·X _t +H _t-1 ·W _uh +B _u ) (update door)

R _t ＝σ(W _rx ·X _t +H _t-1 ·W _rh +B _r ) (reset door)

The reinforcement learning layer is divided into five parts <ξ，λ，ρ，ψ，γ>Representation, wherein ζ represents the state space, ζ _i In the identification of the named entity of the crop growing environment, the i state is the current text or voice input, the identified entity, the context information and the like; lambda represents the motion space, lambda _i For the ith action, in named entity recognition, the action is the entity type in the labeling text, such as geographic position, crop variety, climate factor and the like; ρ represents the probability distribution of transitioning from one state to another. In named entity recognition, the state transition probability can determine the next state according to the current state and the selected action, for example, state transition is performed based on context information and rules, psi represents a reward or penalty function of the new state obtained by the action, and in named entity recognition, the reward function can be defined to measure the accuracy of a recognition result or other indexes, such as the number of correctly recognized entities, the number of wrongly recognized entities and the like; gamma is the discount factor for the jackpot calculation.

In this embodiment, step S5 embeds the vector of the entity of the constructed knowledge graph as an initial feature, inputs the initial feature into the graph neural network, predicts various indexes of the crop growth environment, and constructs a comprehensive evaluation system of the crop growth environment, as shown in fig. 5, and includes the following steps:

wherein u is N (v)

In the embodiment, the soil quality is taken as an example to specifically describe the steps of comprehensively evaluating the growth environment of crops: firstly, carrying out standardized treatment on data, and selecting soil quality indexes to be evaluated, such as pH value, organic matter content, quick-acting nitrogen, soil humidity, soil load and the like, as evaluation indexes; and then carrying out standardization processing on each index, and converting the original data into a numerical value of 0-1, wherein the calculation formula is as follows:

wherein ,x_i Raw data representing the ith index, X _i ' represents normalized data, min (·) and max (·) represent minimum and maximum functions, respectively, and the normalized results for each index are shown in the following table:

soil quality index	Original value	Minimum value	Maximum value	Normalized numerical value
					pH value of	6.5	5	8.5	0.5
Organic matter content	30	10	40	0.625
					Quick-acting nitrogen	50	10	100	0.375
Soil moisture	25	10	40	0.416
					Soil load	80	60	100	0.5

Then, the relevance is calculated for each index, and the formula is as follows:

wherein ,Δ_ik Represents the absolute value difference, delta, between the ith index and the kth index _i0 The absolute value difference between the ith index and the evaluation object is represented, min (·) and max (·) respectively represent the minimum and maximum functions, beta represents the interval association degree parameter, and the value is 0.6;

and carrying out weighted average on the association degrees of the indexes to obtain the comprehensive association degrees of the evaluation objects, wherein the formula is as follows:

wherein ,w_i The weight of the i-th index is obtained according to expert experience, and the weights of the indexes are respectively 0.2, 0.3, 0.1, 0.2 and 0.2.

And then, establishing an evaluation set of each index according to the selected evaluation index, determining the membership function of each index, carrying out fuzzy comprehensive evaluation, and weighting the membership function of each index according to the weight to obtain a comprehensive evaluation result.

In the fuzzy comprehensive evaluation process, it is necessary to determine the membership function of the evaluation index, and construct a fuzzy relation matrix according to the relevance coefficient, and in this embodiment, the membership function of the evaluation index is determined by using a triangular membership function:

organic matter content: triangle membership functions (low, medium, high);

pH value: triangle membership functions (acidic, neutral, basic);

the content of available nutrients: triangle membership functions (low, medium, high);

soil humidity: triangle membership functions (low, medium, high);

soil load: triangle membership functions (low, medium, high).

The following table shows the specific calculation results of each index:

evaluation index	Correlation degree	Normalized correlation	Membership function	Weighting of	Weighted membership function
						Organic matter content	0.8496	0.2124	High height	0.3	0.0637
pH value of	0.6524	0.1752	In (a)	0.2	0.0349
						Available nutrient content	0.5687	0.1358	In (a)	0.1	0.0139
Soil moisture	0.4021	0.1254	Low and low	0.2	0.0211
						Soil load	0.3124	0.0154	Low and low	0.2	0.0156

According to the magnitude of the comprehensive evaluation value, the soil quality is evaluated, and the evaluation grade can be set as follows:

preferably: the comprehensive evaluation value is more than or equal to 0.15;

good: the comprehensive evaluation value is more than or equal to 0.1 and less than 0.15;

in (a): the comprehensive evaluation value is more than or equal to 0.06 and less than 0.1;

the difference is: the comprehensive evaluation value is less than 0.06.

Based on the above calculation results, the comprehensive evaluation value of the sample was 0.1492, and it was evaluated as "good". Similarly, the comprehensive evaluation value of other samples may be calculated and evaluated.

The traditional agricultural production is mainly managed by means of manual experience, and the method has the defects of high management cost, low efficiency, large error and the like. The invention adopts advanced informatization, digitalization and intelligent technology to monitor and collect the growth environment of the related agricultural products in real time, and the environment comprises a plurality of indexes such as soil moisture, temperature, humidity, air pressure and the like. By analyzing and fusing the data, the growth state of the crops can be comprehensively known by constructing a knowledge graph, the crops are finely managed, and the agricultural production benefit is improved. Compared with the traditional intelligent agriculture, the intelligent agricultural intelligent monitoring system has the advantages that the technical means are more advanced, the refinement degree is higher, the growth state of crops can be monitored more comprehensively, and more accurate and comprehensive information support is provided for agricultural production. Meanwhile, the invention can effectively solve the problems existing in the traditional agricultural production management, improve the efficiency and quality of agricultural production and provide powerful support for sustainable development of agricultural production.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A crop growth environment monitoring method based on knowledge graph representation learning is characterized by comprising the following steps: the method comprises the following steps:

2. The method for monitoring the crop growth environment based on knowledge graph representation learning as claimed in claim 1, wherein the method comprises the following steps: the step S1 specifically includes:

step S101: determining distribution of crop sensors, and determining distribution range and density of the sensors according to planting conditions and topography characteristics of a crop base so as to ensure that the crop base can be comprehensively covered and data can be accurately collected; the sensors can be divided into different types, such as soil sensors, weather sensors, biological sensors and the like, according to different environmental requirements of different growth periods and different growth stages of crops so as to collect different types of data;

step S102: according to the set collection points, collecting different types of sensor data, transmitting and storing the sensor data into different databases, adopting different data collection modes aiming at different types of sensors, dividing the base into a plurality of collection points according to the actual condition of the crop base, placing a plurality of sensors in each collection point, and transmitting the data of different collection points into different databases so as to facilitate subsequent data processing and analysis;

step S103: preprocessing the mass data in the database, including but not limited to data denoising and complement missing values.

3. The method for monitoring the crop growth environment based on knowledge graph representation learning as claimed in claim 1, wherein the method comprises the following steps: the step S2 specifically includes:

step S201: the crop growth environment multisource data acquired from the multisensor are decomposed according to different time sequences to obtain a plurality of eigenvectors, and the eigenvectors are combined into an eigenvector matrix phi to be used as input of standardized transformation;

4. The method for monitoring the crop growth environment based on knowledge graph representation learning as claimed in claim 1, wherein the method comprises the following steps: the step S3 specifically includes:

5. The method for monitoring the crop growth environment based on knowledge graph representation learning as claimed in claim 1, wherein the method comprises the following steps: the step S4 specifically includes:

step S401: knowledge extraction is carried out on the preprocessed crop growth environment data and crop pest information data, preprocessing is carried out on texts, the preprocessing comprises segmentation, part-of-speech labeling, dependency analysis, semantic dependency analysis and other modules, the processed information is stored in the form of { entity, relationship and attribute }, and a label is set according to the information cleaning result and the knowledge graph; finally, the original information is formatted into a CSV file and stored in a local storage;

step S403: based onVarious crop information data extracted by the labeling mode are used for constructing a named entity recognition model of crop growth environment information, and the named entity recognition model comprises three layers: word vector pre-training language model, gate control circulating unit layer and reinforcement learning layer;

6. The method for monitoring the crop growth environment based on knowledge graph representation learning as claimed in claim 1, wherein the method comprises the following steps: the step S5 specifically includes:

wherein u is N (v)

Step S502: air temperature, air humidity, wind speed and CO acquired by different types of sensors in different time periods ₂ Analyzing environmental changes such as solar radiation, rainfall and the like so as to comprehensively evaluate the crop growth environment, carrying out gray correlation analysis on a plurality of crop growth environment index data by adopting a gray correlation analysis combined with fuzzy comprehensive evaluation method, and calculating soil quality of each indexAnd (3) the contribution weight of the crop growth carriers such as the amount or fertilizer and the like, and the scores obtained by the indexes are integrated by utilizing a fuzzy comprehensive evaluation method, so that the comprehensive evaluation result of the crop growth carriers such as the soil quality or the fertilizer and the like is finally obtained, the evaluation result is returned to the step (S4), and the crop growth environment knowledge graph is updated and iterated.

7. The method for monitoring the crop growth environment based on knowledge graph representation learning according to claim 1, wherein the data acquisition method in step S201 is as follows: the acquired data are expressed as vectors: temperature sensor vector o= { O ₁ ,o ₂ ,…,o _n Air humidity sensor vector c= { C } ₁ ,c ₂ ,…,c _n Wind sensor vector w= { W } ₁ ,w ₂ ,…,w _n Atmospheric pressure sensor vector p= { P } ₁ ,p ₂ ,…,p _n Carbon dioxide sensor vector z= { Z } ₁ ,z ₂ ,…,z _n Light intensity sensor vector l= { L } ₁ ,l ₂ ,…,l _n The soil temperature sensor vector t= { T } ₁ ,t ₂ ,…,t _n Soil EC value sensor vector g= { G ₁ ,g ₂ ,…,g _n Soil PH sensor vector q= { Q ₁ ,q ₂ ,…,q _n The soil moisture sensor vector j= { J } ₁ ,j ₂ ,…,j _n -a }; the eigenvectors of the different sensor data are then combined into an eigenvector matrix, respectively:

wherein ,φ_nκ A value representing an nth sensor of a kth type, κ representing a different type of sensor;

the normalized feature matrix in step S202 is expressed as: according to the matrix Φ in step S201, performing normalized transformation on each feature element to obtain a normalized feature matrix as follows:

8. The method for monitoring the crop growth environment based on knowledge graph representation learning of claim 7, wherein the reliability metric vector in each sampling period in step S301 is expressed as:

Where i denotes the number of sensors (i=1, 2,3, …, m), j denotes the number of sampling periods (j=1, 2,3, …, n), each sensor continuously collects data points during each sampling period, the reliability measure of the sensor data is represented by a real number between 0 and 1, where 0 denotes complete unreliability and 1 denotes complete reliability;

further, the formula for calculating the weight factor is as follows:

wherein ,y_i ＝[y _i -σ,y _i +σ]σ is the measurement standard deviation, i=1, 2,3,..n;

further, step S303 sensor reliability metric vector A _ij Confidence measure R (A) _ij ) And uncertainty measure U (A _ij ) The calculation formula is as follows:

R(A _ij )＝∑w(j)*ζ(y _i )

U(A _ij )＝1-R(A _ij )

wherein w (j) is a confidence measure vector A _ij Weight factor, ζ (y) _i ) For confidence interval y _i Is used for basic probability distribution of the (a);

after the reliability measure R and the uncertainty measure U of the sensor are calculated, the reliability measure R and the uncertainty measure U of the sensor are used as weight factors to fuse the acquired data, the reliability measure R of the sensor is multiplied by the data acquired by each sensor, and then the products are added and divided by all The sum of the reliability metrics of the sensors is used for obtaining fused data, and the fused data F is calculated for each sensor i and sampling period j ^k The formula (i, j) is as follows:

wherein ,R^k (i) Representing the confidence measure of the ith sensor at the kth sampling period,representing data acquired by the ith sensor during the kth sampling period, R ^k (A _ij ) Representing the confidence metric vector for all sensors at the kth sampling period.

9. The method for monitoring the crop growth environment based on knowledge graph representation learning of claim 8, wherein the named entity recognition model of step S403 comprises three layers: the word vector pre-training language model, the gating circulating unit layer and the reinforcement learning layer have the specific structure that:

the word vector pre-training language model firstly performs embedding operation on an input word vector, and an original text sequence T= { T ₁ ,t ₂ ,…,t _n And special start connector [ CLS ]]And end connector [ SEP ]]Connecting to form a sequence to obtain a word vector, a segment vector and a position vector input, wherein the word vector continuously flows forwards in a stack of the encoder, each layer passes through an attention layer and transmits the result to the next encoder through a feedforward neural network; preprocessing a text to obtain text characteristics T epsilon R ^l*d Where l represents the length of the encoder output sequence and d represents the dimension of the encoder concealment state, which is transmitted to the encoder of the transducer; the encoder includes an attention layer and a residual network layer, a tau layer T for a text feature layer ^(τ) Weight W with query, key and value _Q ，W _K and W_V ，Q、K、V represents query, key and value, q=t, respectively ^(τ) W _Q ，K＝T ^(τ) W _K ，V＝T ^(τ) W _V The method comprises the steps of carrying out a first treatment on the surface of the The specific formula of each layer representation of the encoder is as follows:

wherein Layernorm (·) represents the normalization function,a tau-layer word vector representing text features, softmax (·) representing the activation function normalizing the attention weights to represent an effective probability distribution, res (·) representing the output of the residual network layer, representing the feature representation after Linear transformation and ReLU activation function processing, linear (·) representing the Linear transformation, reLU (·) representing a modified Linear unit activation function, T representing the matrix transpose operation, and applying the Linear transformation to the final output after calculation of all encoder layers;

U _t ＝σ(W _ux ·X _t +H _t-1 ·W _uh +B _u ) (update door)

R _t ＝σ(W _rx ·X _t +H _t-1 ·W _rh +B _r ) (reset door)

(candidate hidden layer)

(final output layer)

wherein ,U_t For the input gate at time t, σ is the activation function, transforming the data into a value in the range 0-1, thereby acting as a gating signal, X _t For inputting vector at time t, W _ux and B_u For the weight parameter and bias parameter of the input vector in the input gate unit, H _t-1 and W_uh Outputting a result and a weight parameter for a t-1 moment hidden layer; r is R _t Reset gate at time t, W _rx and B_r Weight and bias parameters of input vector at reset gate, W _rh A weight matrix for the reset gate for the hidden state;for candidate hidden layers, i.e. candidate states calculated from the current input and the previous temporal hidden state, +.> and />For the input weight matrix of the candidate hidden layer and the weight matrix of the previous hidden state of the candidate hidden layer +.>Bias vectors for candidate hidden layers; when R is _t When approaching zero, the model discards the hidden information in the past and only retains the information input currently; when R is _t When approaching 1, consider too muchThe going information will act and be added to the current information, as if it represents dot product; />Indicating a selective "forget" of the previously hidden state, ", of>Representing selective "memorization" of candidate hidden states of the current node;

the reinforcement learning layer is divided into five parts <ξ，λ，ρ，ψ，γ>Representation, wherein ζ represents the state space, ζ _i In the identification of the named entity of the crop growing environment, the i state is the current text or voice input, the identified entity, the context information and the like; lambda represents the motion space, lambda _i For the ith action, in named entity recognition, the action is the entity type in the labeling text, such as geographic position, crop variety, climate factor and the like; ρ represents the probability distribution of transitioning from one state to another; psi represents the rewards or penalty functions for the action to get a new state; gamma is the discount factor for the jackpot calculation.

10. The method for monitoring the crop growth environment based on knowledge graph representation learning according to claim 9, wherein the specific method for predicting parameters in regional meteorological data by using the graph neural network in step S501 is as follows:

firstly, setting a threshold according to the growth conditions and requirements of crops, wherein the threshold can be adjusted according to actual conditions; predicting future nodes, and predicting numerical sequences in different time ranges in the future by using a trained graph neural network model; finally, judging whether the threshold value is exceeded, judging whether the predicted value exceeds a set threshold value for each time step, and if so, indicating that the current condition is not suitable for planting crops; if the threshold value is not exceeded, indicating that the current condition is suitable for planting crops;

In step S502, environmental variable amounts acquired by different types of sensors in different time periods are analyzed, so as to comprehensively evaluate whether environmental factors are suitable for crop growth, gray correlation analysis is performed on a plurality of index data by adopting a gray correlation analysis and fuzzy comprehensive evaluation combined method, contribution weights of all indexes to crop growth carriers such as soil quality or fertilizer are calculated, scores obtained by all the indexes are integrated by using the fuzzy comprehensive evaluation method, and finally comprehensive evaluation results of the crop growth carriers such as soil quality or fertilizer are obtained.