CN117575174A

CN117575174A - Intelligent agricultural monitoring and management system

Info

Publication number: CN117575174A
Application number: CN202410051024.3A
Authority: CN
Inventors: 刘德永; 陈艳章; 卜彩霞; 王波; 张伦; 侯艳海
Original assignee: Shandong Universal Software Co ltd
Current assignee: Shandong Universal Software Co ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-02-20
Anticipated expiration: 2044-01-15
Also published as: CN117575174B

Abstract

The invention relates to the technical field of agriculture, in particular to an intelligent agricultural monitoring and management system, which comprises: the data acquisition unit is used for acquiring agricultural monitoring data of a plurality of different agricultural areas through the sensor; the transfer learning unit is used for selecting normalized agricultural monitoring data of one area from normalized agricultural monitoring data of different agricultural areas as a source area, performing transfer learning by taking the normalized agricultural monitoring data of other agricultural areas as a target area, and establishing a transfer learning model; and the monitoring unit is used for taking the acquired agricultural monitoring data of the new other agricultural areas as input data, inputting the acquired agricultural monitoring data into the transfer learning model, and taking the adjusted agricultural monitoring data at the moment as the operation standard value of the new other agricultural areas. The invention can improve the agricultural production efficiency and simultaneously provide intelligent decision support for agricultural management.

Description

Intelligent agricultural monitoring and management system

Technical Field

The invention belongs to the technical field of agriculture, and particularly relates to an intelligent agricultural monitoring and management system.

Background

Agriculture is one of the basic stones of human society, playing a key role in global food supply, economic growth and social stability. With the growing world population and increasing environmental pressure, modern agriculture is facing a great challenge, and more efficient, sustainable and intelligent agricultural management methods are needed.

Advances in sensor technology have enabled the agricultural field to monitor soil and weather conditions in real time. This includes monitoring parameters such as temperature, humidity, precipitation, soil PH, light intensity, etc. The sensor data can be used to optimize agricultural operations and improve resource utilization efficiency. Modern agriculture has widely used automated equipment such as unmanned aerial vehicles, automated irrigation systems and intelligent harvesting robots. These devices can improve production efficiency, reduce labor costs, and reduce dependence on chemicals. The agricultural field has employed data analysis and decision support systems to help farmers and farmers better manage land, crops and resources. These systems may provide advice regarding the best planting season, fertilizer usage, pest control, etc. Precision agriculture is a method for improving agricultural production efficiency by personalized agricultural management. It combines sensor technology, data analysis and automation equipment to perform agricultural operations according to the needs of different lands and crops.

Current agricultural monitoring systems typically use a variety of different sensors to collect data that may be stored in different systems, resulting in data dispersion and difficulty in integration. This limits the acquisition and comprehensive analysis of the comprehensive agricultural information. There are differences in soil, climate and crop variety between different agricultural areas. The prior art is often inflexible and difficult to adapt to the requirements of different geographical areas. Although decision support systems have been applied to agriculture, problems remain, such as low accuracy, slow decision reaction time, etc. This can lead to inaccurate decisions by farmers, affecting the efficiency of agricultural production. Some agricultural operations still rely on traditional schedules and fixed resource usage plans, regardless of actual demand. This results in waste of resources, including water resources and fertilizers.

Disclosure of Invention

The invention mainly aims to provide an intelligent agricultural monitoring and management system, which can improve the agricultural production efficiency and provide intelligent decision support for agricultural management.

In order to solve the problems, the technical scheme of the invention is realized as follows:

an intelligent agricultural monitoring and management system, the system comprising: the data acquisition unit is used for acquiring agricultural monitoring data of a plurality of different agricultural areas through the sensor, and carrying out normalization processing on the acquired agricultural monitoring data to obtain normalized agricultural monitoring data;

the migration learning unit is used for selecting normalized agricultural monitoring data of one area from normalized agricultural monitoring data of different agricultural areas as a source area, performing migration learning by taking the normalized agricultural monitoring data of other agricultural areas as a target area, and establishing a migration learning model, and specifically comprises the following steps: the positive training process comprises the following steps: extracting features from the source field and the target field to obtain source field features and target field features respectively; constructing a policy network for generating policies to be executed on the target domain; constructing a value network for estimating the value of taking different actions on the target area; performing reinforcement learning training by using the source field and the strategy network to obtain a strategy network for training the source field; further training in combination with the target domain using a strategic network and a value network of source domain training; aligning the distribution of the source domain and the target domain using domain adaptation loss; training parameters of the first overall optimization objective function by using a gradient descent method and taking the minimized first overall optimization objective function as a target; the reverse training process comprises the following steps: extracting features from the source field and the target field to obtain source field features and target field features respectively; constructing a policy network for generating policies to be executed on the source domain; constructing a value network for estimating the value of taking different actions on the source domain; performing reinforcement learning training by using the target field and the strategy network to obtain a strategy network for training the target field; further training by combining a strategy network and a value network trained by using the target field with the source field; aligning the distribution of the source domain and the target domain using domain adaptation loss; training parameters of the second overall optimization objective function by using a gradient descent method and taking the minimization of the second overall optimization objective function as a target;

The monitoring unit is used for inputting the acquired agricultural monitoring data of the new other agricultural areas into the transfer learning model, taking the overall optimization objective function formed by the first overall optimization objective function and the second overall optimization objective function as an objective function, taking the minimum objective function as an objective function, iteratively adjusting the value of the agricultural monitoring data, enabling the sum of the absolute value of the difference value of the adjusted value of the agricultural monitoring data and the corresponding item of the original agricultural monitoring data to be smaller than a set threshold value, and taking the adjusted agricultural monitoring data at the moment as the operation standard value of the new other agricultural areas.

Further, the agricultural monitoring data at least includes: temperature, illumination intensity, precipitation, humidity, soil type value, soil PH, soil nitrogen content, soil phosphorus content, soil potassium content, average crop height and average crop leaf number.

Further, the number or variety of agricultural monitoring data for the different agricultural areas may be different; when the obtained agricultural monitoring data are subjected to normalization processing, a template item is constructed, the template item contains all kinds of agricultural monitoring data, and template values are set for each kind of agricultural monitoring data; and if the obtained agricultural monitoring data does not contain all types, the agricultural monitoring data of the missing types are supplemented according to the corresponding template values in the template items.

Further, the source domain includes state-action pairsAnd corresponding reward signal->The method comprises the steps of carrying out a first treatment on the surface of the Use->A data distribution representing a source domain; let the data distribution of the target area be denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Let the state set be denoted +.>，/>Including the state of source domain and target domain sharing; the action set is expressed as +.>，/>Actions including source domain and target domain sharing; in the source domain and the target domain, two reward functions are set as +>And->The method is respectively used for the source field and the target field; the bonus function defines the bonus signal for a given state and action; let the state transfer function be expressed asDefine in a given state +.>And action->After that, the next state->Probability distribution of (2); the strategy is expressed as +.>Define in a given state +.>Take action down->Probability of (2); during the training process, the goal is to maximize the jackpot over the target area, i.e., maximize the desired return in the target area, expressed as:

；

wherein,representing a track->And->Is indicated at the time step->Status and actions of->Representing the length of the track; the final goal is to learn a strategy +.>Maximizing the expected return on the target area;

in the retraining process, the goal is to maximize the jackpot over the source domain, i.e., maximize the desired return in the source, expressed as:

；

Wherein,representing a track->And->Is indicated at the time step->Status and actions of->Representing the length of the track; the final goal is to learn a strategy +.>Maximizing the expected return on the source domain.

Further, extracting features from the source field and the target field by using a cyclic neural network to obtain source field features and target field features respectively;

let the source domain denote:

；

the target area is expressed as:

，

wherein the method comprises the steps ofAnd->The number of samples in the source domain and the target domain, respectively; performing forward propagation of the cyclic neural network on each sequence to obtain the sequenceA characteristic representation of the column; this characteristic representation is the last state of the hidden states in the recurrent neural network or a summary representation;

wherein, the source field is characterized by:

；

the target field is characterized in that:

。

further, in the positive training process, the constructed policy network is expressed by using the following formula:

；

wherein,is indicated in the state->Take action down->Probability distribution of->Mean value of the output representing the policy network, +.>A standard deviation representing the motion profile; the parameters of the policy network are denoted +.>Updating these parameters by means of an optimization algorithm to maximize the first objective function +. >。

Further, a value network is defined for estimating the value of the value at a given timeStatus ofTake action down->Long-term jackpot, i.e. state-action pair +.>Is of value (1); the value network adopts depth->Form of network, accept status->And action->As an input, and output an estimate of the state-action pair, expressed as:

；

wherein,is indicated in the state->Take action down->Estimated value of->The parameters representing the value network are weights and bias terms in the value network, the parameters are trained to more accurately estimate the value of the state-action pair, the training process uses Huber losses of the second objective function to measure the gap between the estimated value and the actual jackpot of the value network, and the parameters of the value network are updated by minimizing the second objective function>To better approximate the true cost function; the jackpot with the discount factor is used to estimate the cost function as follows:

；

wherein,is a discount factor, < >>Is in the time step->A prize obtained; the training objective of the value network is to minimize the error between the estimate and the jackpot, expressed using a mean square error loss function:

；

wherein,is an experience playback buffer; / >Is a loss function representing the target to be minimized, < ->The parameters to be optimized are parameters of a value function; />Representing the expected value, representing +.>Samples of (3)Taking expectations, wherein->Representing the current state, the action taken, the prize earned and the next state, respectively; />Is a value function, expressed in state +>Take action down->Is defined by the parameters->Controlling; the goal of this value function is to estimate the jackpot expectations for taking each action in each state. />Is to take action in the current state +.>A reward obtained later; />Is a discount factor representing the importance of the future reward, is a value between 0 and 1, and is used to reduce the weight of the future reward in order to pay more attention to the instant reward; />Is in the next state->All possible actions taken ∈ ->The maximum value function is used for representing the optimal action value of the next step.

Further, when the source domain and the strategy network are used for reinforcement learning training, a third objective function is defined as follows:

；

wherein,representing an objective function +.>Expressed in policy->Lower in state->Take action->Is a predicted jackpot for (1); the purpose of the third objective function is to maximize the jackpot over the source domain; the gradient of the third objective function is calculated by the following formula:

；

Updating parameters of a policy network by gradient ascent using the following formulaTo improve performance in the source domain:

；

wherein,is the learning rate.

Further, when the strategy network and the value network trained by the source domain are further trained by combining the target domain, a fourth objective function is defined as follows:

；

wherein,representing an objective function +.>Expressed in policy->Lower in state->Take action->Is (are) expected jackpot, ">Status representing value network estimation->Is of value (1); the purpose of the fourth objective function is to maximize the jackpot over the target area while minimizing the gap between the state value estimated by the value network and the jackpot estimated by the strategy; and then calculating the gradient of the fourth objective function by the following formula:

；

updating parameters of a policy network simultaneously by the following formulaAnd parameters of the value network->So that the objective function->And (3) increasing:

。

the intelligent agricultural monitoring and management system has the following beneficial effects: traditional agricultural management often depends on the experience and visual decision of farmers and is easily influenced by subjective factors. The invention utilizes the technologies of transfer learning, reinforcement learning and deep learning to construct an intelligent decision support system. The system can provide personalized agricultural advice according to different farmland characteristics, environmental factors and best practices. It can predict future weather conditions, optimize irrigation and fertilization plans, and suggest optimal pest management strategies. The method is favorable for farmers to make intelligent decisions, reduces agricultural risks and improves economic benefits. The sensor monitors agricultural monitoring data of a plurality of agricultural areas in real time, wherein the agricultural monitoring data comprise key parameters such as temperature, illumination intensity, precipitation amount, humidity, soil type value, soil PH value, soil nitrogen content, soil phosphorus content, soil potassium content, average crop height, average crop leaf number and the like. The real-time collection and analysis of these data allows farmers to better understand the status of their farms, including soil quality, plant health and environmental conditions. Based on these data, the system can provide personalized agricultural advice that helps farmers optimize agricultural operations such as irrigation, fertilization, and pest control. This helps to improve crop yield, quality and sustainability, thereby increasing agricultural productivity.

Drawings

Fig. 1 is a schematic system structure diagram of an intelligent agricultural monitoring and management system according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Example 1: referring to fig. 1, an intelligent agricultural monitoring and management system, the system comprising: the data acquisition unit is used for acquiring agricultural monitoring data of a plurality of different agricultural areas through the sensor, and carrying out normalization processing on the acquired agricultural monitoring data to obtain normalized agricultural monitoring data; the migration learning unit is used for selecting normalized agricultural monitoring data of one area from normalized agricultural monitoring data of different agricultural areas as a source area, performing migration learning by taking the normalized agricultural monitoring data of other agricultural areas as a target area, and establishing a migration learning model, and specifically comprises the following steps: the positive training process comprises the following steps: extracting features from the source field and the target field to obtain source field features and target field features respectively; constructing a policy network for generating policies to be executed on the target domain; constructing a value network for estimating the value of taking different actions on the target area; performing reinforcement learning training by using the source field and the strategy network to obtain a strategy network for training the source field; further training in combination with the target domain using a strategic network and a value network of source domain training; aligning the distribution of the source domain and the target domain using domain adaptation loss; training parameters of the first overall optimization objective function by using a gradient descent method and taking the minimized first overall optimization objective function as a target; the reverse training process comprises the following steps: extracting features from the source field and the target field to obtain source field features and target field features respectively; constructing a policy network for generating policies to be executed on the source domain; constructing a value network for estimating the value of taking different actions on the source domain; performing reinforcement learning training by using the target field and the strategy network to obtain a strategy network for training the target field; further training by combining a strategy network and a value network trained by using the target field with the source field; aligning the distribution of the source domain and the target domain using domain adaptation loss; training parameters of the second overall optimization objective function by using a gradient descent method and taking the minimization of the second overall optimization objective function as a target; the monitoring unit is used for inputting the acquired agricultural monitoring data of the new other agricultural areas into the transfer learning model, taking the overall optimization objective function formed by the first overall optimization objective function and the second overall optimization objective function as an objective function, taking the minimum objective function as an objective function, iteratively adjusting the value of the agricultural monitoring data, enabling the sum of the absolute value of the difference value of the adjusted value of the agricultural monitoring data and the corresponding item of the original agricultural monitoring data to be smaller than a set threshold value, and taking the adjusted agricultural monitoring data at the moment as the operation standard value of the new other agricultural areas.

In particular, there may be significant environmental differences in different agricultural areas, such as climate, soil characteristics, crop species, etc. These differences result in differences in agricultural monitoring and management needs. The positive training and negative training processes can help the model adapt to these field differences. The positive training process involves knowledge migration from the source domain to the target domain (from the known region to the new region), while the negative training process involves knowledge migration from the target domain to the source domain (from the new region to the known region). The two are combined to enable the model to fully understand and apply knowledge in different fields. Domain adaptation is a key concept in transfer learning, which involves adapting the data distribution of different domains to more similar distributions so that the model can be better generalized to the target domain. Both the positive training and the negative training include domain adaptation processes to reduce the impact of domain differences. The forward and reverse training processes allow the model to leverage information from multiple domains. By migrating knowledge back and forth between source and target domains, the model can learn and adapt more fully to the characteristics and needs of different domains. The goal of the forward and reverse training processes is to improve the performance of the model in the target area. The positive training enables the model to obtain a basic agricultural management strategy, while the negative training further improves the adaptability of the model to adapt to specific requirements of the target field.

The main purpose of the training process is to let the model learn how to make decisions efficiently within the source domain (training area). In this process, model learning identifies and interprets the characteristics of the source domain data and how to make optimal decisions in this environment. In the training, the model uses data from the source domain to train the policy network and the value network. The policy network learns what actions to take in a given state, and the value network evaluates the expected benefits of those actions. Through this training, the model can understand the characteristics of the source domain and make efficient decisions. This stage is critical to the underlying knowledge of the build model. It ensures that the model can operate effectively in a known environment, laying a foundation for the subsequent migration learning process.

The goal of the anti-training process is to improve the performance of the model in the target area (application area). In this process, the model learns how to adapt knowledge learned in the source domain to new, unknown environments. In the reverse training, the model uses data of the target domain to adjust the policy network and the value network. The process involves applying a source domain trained network to the target domain data and by repeated training, adapting the model gradually to the characteristics of the new environment. The anti-training is the core of the migration learning, and allows the model to surpass the limitation of being effective in the source field only and enhance the application capability and generalization of the model in the new field.

The positive training and the negative training processes complement each other, the positive training establishes a model foundation in the source field, and the negative training enables the model to adapt to the new target field. Together, these two processes facilitate domain adaptation, i.e., the model is able to learn from data in one domain and apply the learning to another domain. By the method, the generalization capability of the model is improved, and different environments and data sets can be better processed and adapted.

Example 2: the agricultural monitoring data includes at least: temperature, illumination intensity, precipitation, humidity, soil type value, soil PH, soil nitrogen content, soil phosphorus content, soil potassium content, average crop height and average crop leaf number.

In particular, temperature is a critical factor in agricultural activities. The temperature requirements for different plants are different, so monitoring the temperature can help farmers decide when to plant, irrigate, etc. High or low temperatures may affect crop growth and yield. The intensity of the light directly influences the photosynthesis of the plants and thus the growth thereof. Monitoring the intensity of illumination helps to determine the level of sunlight required by the plant and may optimize the growth conditions of the crop. Precipitation monitoring may help farmers determine when irrigation or drainage is needed. It may also help predict extreme weather events such as drought or flooding. Humidity is very important for crop growth and disease control. High humidity may cause diseases such as mold and the like, and low humidity may cause the crop moisture to evaporate too quickly. Different soil types have different water retention and permeability. Monitoring soil type can help select appropriate crop varieties and irrigation schemes. The PH of the soil affects the availability of nutrients. Different plants have different requirements on PH, so monitoring PH helps to regulate soil nutrients. The nitrogen, phosphorus and potassium contents of the soil are main nutrients required by plants. Monitoring the content of these nutrients in the soil can help farmers optimize the fertilization scheme to meet the plant needs. The average height of the crop and the number of leaves can help farmers to know the growth state of the crop. By monitoring the height and leaf number of the crop, irrigation, fertilization and other agricultural operations can be adjusted in time to ensure optimal growth conditions for the crop.

Example 3: the number or variety of agricultural monitoring data for the different agricultural areas may be different; when the obtained agricultural monitoring data are subjected to normalization processing, a template item is constructed, the template item contains all kinds of agricultural monitoring data, and template values are set for each kind of agricultural monitoring data; and if the obtained agricultural monitoring data does not contain all types, the agricultural monitoring data of the missing types are supplemented according to the corresponding template values in the template items.

In particular, agricultural monitoring data can be standardized to the same data structure regardless of their type and quantity. This is very useful for building a versatile agricultural management system, as it can accommodate a variety of data sources. By mapping all data into the same template item, the system ensures standardization and consistency of the data. This makes the data easier to process and analyze, and allows cross-regional comparisons and decisions to be made. The missing data items are automatically identified and supplemented, so that the workload of a user is reduced. This helps to improve the integrity and usability of the data. Standardized and consistent agricultural monitoring data provides a reliable data base for intelligent agricultural management systems. This can help farmers and farmers make better decisions, optimize agricultural activities, and improve crop yield and resource utilization efficiency.

Example 4: source domain includes state-action pairsAnd corresponding reward signal->The method comprises the steps of carrying out a first treatment on the surface of the Use->A data distribution representing a source domain; let the data distribution of the target area be denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Let the state set be denoted +.>，/>Including the state of source domain and target domain sharing; the action set is expressed as +.>，/>Actions including source domain and target domain sharing; in the source domain and the target domain, two reward functions are set as +>And->The method is respectively used for the source field and the target field; the bonus function defines the bonus signal for a given state and action; let the state transfer function be expressed asDefine in a given state +.>And action->After that, the next state->Probability distribution of (2); the strategy is expressed as +.>Define in a given state +.>Take action down->Probability of (2); during the training process, the goal is to maximize the jackpot over the target area, i.e., maximize the desired return in the target area, expressed as:

；

Specifically, state-action pairsAnd reward signal->: in an agricultural monitoring and management system, status +.>Can represent the current farmland environmental conditions such as temperature, humidity, soil nutrients, etc. Action->Can be used forIndicating agricultural operations such as irrigation, fertilization, etc. Reward signal->Indicating an effect, such as an increase or decrease in yield, in a given state and after action is taken. Strategy->: the strategy is that the system is based on the current state->To select action +.>Is a probability distribution of (c). In agricultural management, strategies can decide when to irrigate, when to fertilize, and how to manage the farmland to optimize crop growth and yield. Training objective function->: this objective function represents the expected return in the target area, i.e., maximizing the jackpot in the new agricultural area. This means policy +.>It should be possible to select the best agricultural operation to enhance crop production in the targeted area. The principle behind the training is that by using experience and knowledge of the source domain, the system attempts to learn a strategy that adapts to the characteristics of the target domain. In the agricultural field, this can help farmers to better manage crops in new farms, making decisions based on different environmental conditions, such as adjusting irrigation and fertilization plans based on temperature, humidity and soil nutrients. The main objective of the retraining process is to learn a strategy in the source domain +. >To maximize jackpots in the source domain. This process is used to continue to improve the agricultural management policies of the system in known farms. Inverse training objective function->: this objective function represents the expected return in the source field, i.e., maximizing the jackpot in known farms. This means policy +.>It should be possible to continue to optimize crop production in the source field. The principle of the reverse training is that the system further improves the agricultural management strategy of the known farmland by using the data of the source field and the reward function. This helps to improve crop yield and resource utilization efficiency in the known art. The principle of the forward training and reverse training processes is that by means of transfer learning, the system can learn and optimize agricultural management strategies in different agricultural areas. The system can continuously adapt to the requirements and characteristics of different fields, and the performance of the whole intelligent agricultural monitoring and management system is improved. The method enables farmers to better cope with challenges of different geographic positions, meteorological conditions and crop demands, so that agricultural yield and resource utilization efficiency are improved.

Example 5: extracting features from the source field and the target field by using a cyclic neural network to respectively obtain the features of the source field and the features of the target field;

Let the source domain denote:

；

the target area is expressed as:

，

wherein the method comprises the steps ofAnd->The number of samples in the source domain and the target domain, respectively; performing forward propagation of a cyclic neural network on each sequence to obtain a characteristic representation of the sequence;this characteristic representation is the last state of the hidden states in the recurrent neural network or a summary representation;

wherein, the source field is characterized by:

；

the target field is characterized in that:

。

in particular, in the agricultural field, the monitored data typically includes various environmental parameters such as temperature, humidity, soil nutrient levels, precipitation, and the like. Variations in these parameters have a significant impact on crop growth and yield. Each data sample may represent observations of these parameters over a particular period of time. Since agricultural data is typically of a time-series nature, RNN is a suitable choice because it can capture time-dependent relationships in the data. The cyclic structure of the RNN allows it to take into account previous information when processing sequence data. At each time step, the RNN receives as input the monitoring data at a point in time and updates its internal hidden state. This hidden state contains information about past points in time and gradually captures patterns and relationships in the sequence. This is very useful in the agricultural field, as the growth of crops is affected by seasonal and weather conditions. After the forward propagation of the RNN is completed over the entire sequence, a characteristic representation, typically the last state of the RNN's hidden state or a summary representation, can be obtained. This feature represents that critical information about the data sequence, such as trend of temperature and humidity changes, fluctuation of soil nutrient levels, etc., will be contained. By inputting agricultural monitoring data of different domains into the RNN, source domain features and target domain features can be obtained. These characteristics represent characteristics that will reflect different farms, for example, the source field may be located in a particular climate zone and the target field may have different soil types. These characteristic representations can be used to improve agricultural management systems. For example, features of the source domain may be used to learn an initial agricultural management strategy, and then adapt these strategies to the target domain by migration learning to optimize crop growth and yield. This allows the system to be flexibly adapted under different farmland and environmental conditions to achieve higher agricultural efficiency. The use of RNNs to extract features from agricultural monitoring data in both source and target areas can help the agricultural management system better understand and utilize data for different farms. The method combines time series data and migration learning principles, and provides more accurate and personalized agricultural management strategies for farmers and farmers to cope with challenges under different geographic positions and meteorological conditions. This helps to improve the efficiency and quality of agricultural production.

Example 6: in the training process, the constructed strategy network is expressed by using the following formula:

；

wherein,is indicated in the state->Take action down->Probability distribution of->Mean value of the output representing the policy network, +.>A standard deviation representing the motion profile; the parameters of the policy network are denoted +.>Updating these parameters by means of an optimization algorithm to maximize the first objective function +.>。

In particular, the main principle of using gaussian distribution is that it is able to model a continuous action space, which is important for agricultural decision problems, since agricultural operations tend to be continuous, such as determining the amount of irrigation water, the amount of fertilisation, etc. The gaussian distribution can represent the probability distribution of different actions and by adjusting the mean valueAnd standard deviation->The generation of actions can be flexibly controlled. Parameters of policy network->Adjustment by means of an optimization algorithm to maximize the first objective function +.>. In the agricultural field, this means that the system will learn how to rely on the current environmental state through a policy network +.>Generating optimal agricultural action>. By maximizing the objective function, the policy network will automatically adjust the parameters so that the resulting agricultural operation is more likely to result in good agricultural results, such as increased crop yield or reduced resource waste. Standard deviation >The randomness and flexibility of the actions can be controlled. The system can selectively add or reduce randomness by adjusting standard deviation under different farmland and environmental conditions to adapt to different farmland requirements. This makes the policy network more adaptable, and different agricultural policies can be implemented in different geographical locations and meteorological conditions. The principle of the strategy network can be used for an intelligent decision support system to help farmers and farmers to make agricultural operation plans. By->The probability distribution is generated to select agricultural operations, and the system can optimize resource utilization, improve crop productivity, and adjust strategies as needed.

Example 7: defining a value network for estimating a state of interest in a given stateTake action down->Long-term jackpot, i.e. state-action pair +.>Is of value (1); the value network adopts depth->Form of network, accept status->And action->As an input, and output an estimate of the state-action pair, expressed as:

；/>

wherein,is indicated in the state->Take action down->Estimated value of->Representing the priceThe parameters of the value network, which are weights and bias terms in the value network, are trained to more accurately estimate the value of the state-action pair, the training process uses Huber losses of the second objective function to measure the gap between the estimated value of the value network and the actual jackpot, and the parameters of the value network are updated by minimizing the second objective function >To better approximate the true cost function; the jackpot with the discount factor is used to estimate the cost function as follows:

；

wherein,is an experience playback buffer; />Is a loss function representing the target to be minimized, < ->The parameters to be optimized are parameters of a value function; />Representing the expected value, representing +.>Samples of (3)Taking expectations, wherein->Representing the current state, the action taken, the prize earned and the next state, respectively; />Is a value function, expressed in state +>Take action down->Is defined by the parameters->Controlling; the goal of this value function is to estimate the jackpot expectations for taking each action in each state. />Is to take action in the current state +.>A reward obtained later; />Is a discount factor representing the importance of the future reward, is a value between 0 and 1, and is used to reduce the weight of the future reward in order to pay more attention to the instant reward; />Is in the next state->Middle miningGet all possible actions +. >The maximum value function is used for representing the optimal action value of the next step.

Specifically, the value network uses the form of a deep Q network, accepting statesAnd action->As input and output the estimated value of the state-action pair, expressed as +.>. This value represents the state +.>Take action downwardsEstimate of the long-term jackpot. In the agricultural field, this can be interpreted as a long term impact after taking some agricultural action (e.g. irrigation, fertilization) in a certain farmland situation, including impact on crop yield, quality and resource utilization. Discount factor->For measuring the importance of future rewards. In agricultural decisions, this means that the system can trade off instant rewards against future rewards. Higher->The value will be more concerned about long term benefits, while lower +.>The value will be more concerned with immediate interests. This enables the system to tailor agricultural strategies to specific needs and goals. The goal of the training value network is to minimize the error between the estimate and the jackpot, and the loss function used is the mean square error loss. This means trying toOutput of value networkAs close to the true jackpot as possible. By minimizing the loss function, the network parameters +. >In order to more accurately estimate the value of the state-action pairs. In the agricultural field, this helps the system to better predict the long term impact of different agricultural operations. Training the value network requires a large number of data samples, so an empirical playback buffer is usually used>To store previous observations. This allows the system to randomly select samples from the buffer for training to improve training efficiency and stability. In an agricultural monitoring and management system, a value network may be used to support agricultural decisions. By estimating the long-term jackpots for different agricultural operations, the system can assist farmers and farmers in making decisions, such as determining optimal irrigation strategies, fertilization schemes, harvest opportunities, etc. This helps to optimize resource utilization and improve crop yield and quality. This embodiment describes the use of deep Q networks in agricultural monitoring and management to estimate the value of state-action pairs to support agricultural decision making and resource management. By optimizing network parameters, the system can more accurately predict the long-term influence of different agricultural operations, thereby improving the production efficiency and quality of farmlands. The method combines the deep learning and reinforcement learning principles, and brings more intelligence and sustainability for agricultural management.

Example 8: when the source domain and the strategy network are used for reinforcement learning training, a third objective function is defined as follows:

；

wherein,is the learning rate.

In particular, the purpose of the third objective function is to maximize the jackpot over the source domain. It represents the in-policyLower in state->Take action->Is the expected jackpot of->. In agricultural decisions, this is understood to mean long-term jackpots, such as long-term production benefits and resource utilization efficiency of farms, after performing agricultural operations according to current strategies under known conditions. To optimize parameters of the policy network->It is necessary to calculate the objective function +.>Relative to parameter->Is a gradient of (a). This gradient tells how to fine-tune the policy network to make better performance in the source domain. In the gradient calculation, the strategy +. >And jackpot +.>Is a product of (a) and (b). The calculation of the gradient is via the sampling state->To estimate the expected value. Once the gradient is calculated ++>The parameters of the policy network can be updated using the gradient ascent method +.>. The goal of the gradient-lifting method is to increase the value of the objective function in accordance with the direction of the gradient to improve policy performance. Learning rateThe step size of the parameter update is controlled. By iteratively updating the parameters repeatedly, the policy network is gradually optimized to maximize the jackpot over the source domain. In an agricultural monitoring and management system, this method can be used to optimize agricultural strategies. By maximizing jackpots in the source domainThe system can learn how to formulate an optimal agricultural operation plan under known environmental conditions. This helps to improve agricultural production efficiency, resource utilization and farmland management. This embodiment describes the use of reinforcement learning methods to optimize agricultural decision strategies over the source domain. By maximizing the jackpot, the system can automatically learn adaptive agricultural strategies to improve production efficiency and quality of farmlands while reducing resource waste. The method combines the principles of reinforcement learning and gradient ascent, and brings more intelligence and sustainability to agricultural management.

Example 9: when the strategy network and the value network trained by the source domain are combined with the target domain for further training, a fourth objective function is defined as follows:

；

wherein,representing an objective function +.>Expressed in policy->Lower in state->Take action->Is (are) expected jackpot, ">Status representing value network estimation->Is of value (1); the purpose of the fourth objective function is to maximize the jackpot over the target area while minimizing the gap between the state value estimated by the value network and the jackpot estimated by the strategy; then pass through the followingThe formula calculates the gradient of the fourth objective function:

；

。

in particular, the purpose of the fourth objective function is to maximize the jackpot over the target area while minimizing the gap between the state value of the value network estimate and the strategically estimated jackpot. It is composed of two parts:expressed in policy->Lower in state->Take action->Is (are) expected jackpot, ">Status representing value network estimation->Is of value (c). In the agricultural field, this can be understood as a jackpot after performing agricultural operations according to current policies in the target field, as well as The value network estimates the state value. For simultaneous updating of parameters of the policy network +.>And parameters of the value network->It is necessary to calculate the objective function +.>Gradients relative to the two sets of parameters. This gradient tells how to fine-tune the strategy and value network to maximize the objective function. In gradient computation, strategies are usedJackpot difference->And gradient->. The calculation of the gradient is via the sampling state->To estimate the expected value. Using the calculated gradient ∈ ->The parameters of the policy network can be updated simultaneously +.>And parameters of the value network->. This can be achieved by a gradient-increasing method, wherein the learning rate is +.>The step size of the parameter update is controlled. By iteratively updating the parameters repeatedly, the strategy network and the value network are gradually optimized such that the objective function +.>Increased, i.e. better performance is achieved in the target area. In an agricultural monitoring and management system, this approach can be used to further optimize agricultural strategies while taking into account the jackpot and status value of the target area. By maximizing the objective function, the system can learn how to formulate a more efficient agricultural operation plan in the objective area to improve the production efficiency, quality and sustainability of the farmland. This embodiment describes how to further optimize agricultural decision strategies on the target domain in combination with source domain trained strategy networks and value networks in deep reinforcement learning. By considering both the jackpot and the state value, the system can better adapt to the needs of the target field, improving the efficiency and sustainability of agricultural production and resource management. The method combines the principles of reinforcement learning and multi-objective optimization, and provides more intelligent and decision support for agricultural management.

Example 10: the migration policy is further executed, using KL divergence as a metric by minimizing policy differences between the source domain and the target domain. The specific process is as follows: is to migrate a policy such that policies on the target domainApproach strategy on source domain>. It is desirable to minimize the difference between the two policies during migration to improve performance over the target area. The difference between the two strategies was measured using a KL divergence (Kullback-leibler divergence) defined as follows:

；

wherein,representing the source Domain policy in State->Take action down->Is a function of the probability of (1),representing the state of the target Domain policy->Take action down->Is a probability of (2).

Migration of policies by minimizing KL divergence, i.e. minimizing the following objective function:

；

this can be achieved by using gradient descent or other optimization algorithms, targeting updating the policies of the target domainTo reduce the parameters of the source domain policy +.>KL divergence between. In the training process, the fourth objective function can be combined>While minimizing KL divergence and maximizing jackpot over the target area. Iterating the above process to update the policy of the target area multiple times +. >To gradually approach the strategy of the source domain +.>And better performance in the target area. By policy migration, it is desirable to balance policy differences between source and target domains during migration to achieve better performanceCan be lifted. The KL divergence as a measurement method can help control the migration degree of the strategy, so that the method has better adaptability between two fields.

Example 11: the domain adaptation method is used to better adapt to the target domain, reducing the differences between the source domain and the target domain by aligning their distributions. The difference between the distributions may be quantified using the maximum mean difference (MaximumMeanDiscrepancy, MMD) as a metric. The specific process is as follows: domain adaptation target: the goal of (2) is to reduce the feature distribution differences between the source domain and the target domain by domain adaptation methods to improve performance on the target domain. Typically, the data distribution is different in the source domain and the target domain, which may lead to performance degradation. Maximum Mean Difference (MMD) is used to measure the distribution difference between the source domain and the target domain. MMD was calculated as follows:

；

wherein,feature set representing source field, ++ >Feature set representing the target area, ++>And->The number of samples in the source domain and the target domain, respectively,/-for each of the source domain and the target domain>Representing the features->Mapping to a function in feature space. MMD metrics measure the distribution differences between sample features in two domains by calculating the square of the norm of the difference between them in the feature space.If MMD goes to zero, it means that the distributions of the two domains tend to agree. MMD is added to the training objective function as a domain adaptation penalty to reduce the distribution difference between the source domain and the target domain. The specific objective function can be expressed as:

；

wherein,a training objective function (possibly defined in step 6) represented on the target area,is a superparameter for controlling the weight of the domain adaptation loss. During training, the parameters of the policy network are optimized simultaneously>Parameters of the value network->Weight of field adaptation loss +.>To minimize the objective function. Typically, these parameters are updated using a gradient descent method or other optimization algorithm to achieve gradual alignment of the feature distribution between the source domain and the target domain. The goal of domain adaptation is to make the model better generalize in the target domain and improve performance by reducing the feature distribution difference between the source domain and the target domain. Selection and parameters of the Domain Adaptation method- >The adjustment of (c) generally depends on the specific task and data. The first body optimization objective function is:

。

because of the reverse training process and the forward training process, the reverse training process is not described in detail.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An intelligent agricultural monitoring and management system, characterized in that the system comprises:

the data acquisition unit is used for acquiring agricultural monitoring data of a plurality of different agricultural areas through the sensor, and carrying out normalization processing on the acquired agricultural monitoring data to obtain normalized agricultural monitoring data;

2. The intelligent agricultural monitoring and management system of claim 1, wherein the agricultural monitoring data includes at least: temperature, illumination intensity, precipitation, humidity, soil type value, soil PH, soil nitrogen content, soil phosphorus content, soil potassium content, average crop height and average crop leaf number.

3. The intelligent agricultural monitoring and management system of claim 2, wherein the number or variety of agricultural monitoring data for the different agricultural areas may be different; when the obtained agricultural monitoring data are subjected to normalization processing, a template item is constructed, the template item contains all kinds of agricultural monitoring data, and template values are set for each kind of agricultural monitoring data; and if the obtained agricultural monitoring data does not contain all types, the agricultural monitoring data of the missing types are supplemented according to the corresponding template values in the template items.

4. The intelligent agricultural monitoring and management system according to claim 3, wherein the source domain includes state-action pairsAnd corresponding reward signal->The method comprises the steps of carrying out a first treatment on the surface of the Use->A data distribution representing a source domain; let the data distribution of the target area be denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Let the state set be denoted +.>，/>Including the state of source domain and target domain sharing; the action set is expressed as +.>，/>Actions including source domain and target domain sharing; in the source domain and the target domain, two reward functions are set as +>And->The method is respectively used for the source field and the target field; bonus function determinationSense a reward signal for a given state and action; let the state transfer function be denoted +.>Define in a given state +.>And actionsAfter that, the next state->Probability distribution of (2); the strategy is expressed as +.>Define in a given state +.>Take action down->Probability of (2); during the training process, the goal is to maximize the jackpot over the target area, i.e., maximize the desired return in the target area, expressed as:

；

5. The intelligent agricultural monitoring and management system of claim 4, wherein the source domain features and the target domain features are obtained by extracting features from the source domain and the target domain using a recurrent neural network, respectively;

let the source domain denote:

；

the target area is expressed as:

，

wherein the method comprises the steps ofAnd->The number of samples in the source domain and the target domain, respectively; performing forward propagation of a cyclic neural network on each sequence to obtain a characteristic representation of the sequence; this characteristic representation is the last state of the hidden states in the recurrent neural network or a summary representation;

wherein, the source field is characterized by:

；

the target field is characterized in that:

。

6. the intelligent agricultural monitoring and management system of claim 5, wherein the policy network is constructed during the training process using the following formula:

；

7. The intelligent agricultural monitoring and management system according to claim 6, wherein a value network is defined for estimating a given stateTake action down->Long-term jackpot, i.e. state-action pair +.>Is of value (1); the value network adopts depth->Form of network, accept status->And action->As an input, and output an estimate of the state-action pair, expressed as:

；

wherein,is an experience playback buffer; />Is a loss function representing the target to be minimized, < ->The parameters to be optimized are parameters of a value function; />Representing the expected value, representing +.>Samples of (3)Taking expectations, wherein->Representing the current state, the action taken, the prize earned and the next state, respectively; />Is a value function, expressed in state +>Take action down->Is defined by the parameters->Controlling; the goal of this value function is to estimate the jackpot expectations for taking each action in each state; />Is to take action in the current state +.>A reward obtained later; />Is a discount factor representing the importance of the future reward, is a value between 0 and 1, and is used to reduce the weight of the future reward in order to pay more attention to the instant reward; />Is in the next state->All possible actions taken ∈ ->The maximum value function is used for representing the optimal action value of the next step.

8. The intelligent agricultural monitoring and management system of claim 7, wherein when performing reinforcement learning training using the source domain and the policy network, defining a third objective function is:

；

wherein,is the learning rate.

9. The intelligent agricultural monitoring and management system of claim 8, wherein the fourth objective function is defined when further trained using the policy network and the value network for source domain training in conjunction with the target domain as:

；

wherein,representing an objective function +.>Expressed in policy->Lower in state->Take actionIs (are) expected jackpot, ">Status representing value network estimation->Is of value (1); the purpose of the fourth objective function is to maximize the jackpot over the target area while minimizing the gap between the state value estimated by the value network and the jackpot estimated by the strategy; and then calculating the gradient of the fourth objective function by the following formula:

；

updating parameters of a policy network simultaneously by the following formulaAnd parameters of the value network- >To make the objective functionAnd (3) increasing:

。