CN114780742B

CN114780742B - Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area

Info

Publication number: CN114780742B
Application number: CN202210409530.6A
Authority: CN
Inventors: 苏楠; 章少辉; 白美健; 张宝忠; 陈皓锐
Original assignee: China Institute of Water Resources and Hydropower Research
Current assignee: China Institute of Water Resources and Hydropower Research
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2023-02-24
Anticipated expiration: 2042-04-19
Also published as: CN114780742A

Abstract

The invention discloses a construction and use method of a knowledge map question-answering system for flow scheduling in an irrigation area, belonging to the technical field of flow scheduling in the irrigation area, and comprising the following steps of obtaining a plurality of different types of flow scheduling characteristic data; constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model; obtaining optimal characteristic value combination and scheduling experience data; inputting the optimal characteristic value combination into a first machine learning model for training and parameter adjustment to obtain a second machine learning model; respectively obtaining scheduling flow gradients corresponding to the various types of flow scheduling characteristic data; constructing an irrigation area flow scheduling knowledge question-answering system; constructing a problem template of flow scheduling of the irrigation area, matching a problem set with the problem template probability by using a naive Bayes classifier, and completing the use of an irrigation area flow scheduling knowledge question-answering system by combining a HanLP word segmentation device; the scheme solves the problem of low reliability and convenience of water flow scheduling for the irrigation area.

Description

Construction and use method of irrigation area flow scheduling knowledge graph question-answering system

Technical Field

The invention belongs to the technical field of flow scheduling of irrigation areas, and particularly relates to a construction and use method of a knowledge map question-answering system for flow scheduling of irrigation areas.

Background

The irrigated area is a typical complex water resource system driven by both natural and social factors and evolves, and is also an important content for the construction of national water networks. Reasonable water utilization scheduling in irrigated areas has important significance for improving overall water utilization efficiency and realizing sustainable development of water resources.

The real water use dispatching process of the irrigation area mainly depends on years of historical experience of water use dispatchers of the irrigation area. The target flow scheduled by water is used as a decision result, and the factors influencing the target flow are numerous, so that a plurality of characteristic variables are required to describe, and one characteristic variable is originated from different space distribution measuring points, so that a complex logic decision network depending on historical causal experience is formed. The decision network is characterized in that irrigation district water dispatching personnel are solidified in subconscious thinking after long-term learning trial and error practice, and the decision network externally shows the capability of scientifically predicting the irrigation district water dispatching process. There are two problems to be solved urgently in water scheduling of irrigation areas today:

(1) The existing hydrology or hydrology-hydraulics coupling model based on which the water flow scheduling of the irrigation area is mostly performed can obtain a relatively reasonable flow scheduling value in theoretical physics, but is difficult to effectively solve the problem of water flow scheduling of the irrigation area with social attributes;

(2) Due to the factors of reliability and convenience, almost no scheduling model can be really applied to the daily scheduling of the irrigated area nowadays.

Aiming at the first problem, the machine learning model can summarize the scheduling rule containing a large amount of scheduling experience through learning historical data, and can obtain reasonable scheduling flow by combining the consideration of both nature and humanity. The prior art only solves for the water amount by using machine learning, and does not interpret the scheduling experience contained in a large amount of historical scheduling data. Meanwhile, for a dispatcher who actually manages the irrigation area, the application process of the complex model is not simple enough, the solving process of the water quantity is too complex, and the models which are not subjected to actual dispatching of the pipe area are not as reliable as the dispatching experience which is passed over years, namely the model is the second problem existing in the water flow dispatching of the irrigation area.

Disclosure of Invention

Aiming at the defects in the prior art, the irrigation area flow scheduling knowledge map question-answering system construction and use method provided by the invention is used for learning and interpreting scheduling data containing years of historical experiences of irrigation area water use scheduling personnel by combining a machine learning model and a SHAP model interpretation method, so that the scheduling experiences are made to be knowledgeable, and the problem of low reliability and convenience in irrigation area water flow scheduling is solved.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

the invention provides a construction and use method of a flow scheduling knowledge map question-answering system of an irrigation area, which comprises the following steps:

s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;

s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;

s3, interpreting a first machine learning model by using a SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;

s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;

s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;

s6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;

s7, a problem template of the flow scheduling of the irrigation area is constructed, a problem set is matched with the problem template probability through a naive Bayesian classifier, and the use of the flow scheduling knowledge question-answering system of the irrigation area is completed through the combination of the flow scheduling knowledge question-answering system of the irrigation area and a HanLP word segmentation device.

The invention has the beneficial effects that: the invention provides a construction and use method of a irrigated area water flow scheduling knowledge map question-answering system, which is characterized in that a machine learning model and a SHAP model interpretation method are combined to learn and interpret scheduling data containing years of historical experience of irrigation area water use scheduling personnel, the scheduling experience is made into knowledge, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation area water flow scheduling graphic database Neo4j mainly comprising the scheduling experience and the scheduling prediction flow is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of the graph database Neo4j, the knowledge of irrigation area water use scheduling experience can be finally realized, and a knowledge question-answering system convenient for searching the scheduling experience and the recommended flow is built for irrigation area management personnel.

Further, the constructing of the scheduling scenario machine learning model includes the following steps:

a1, obtaining water use scheduling target flow of an irrigation area;

a2, constructing non-linear regression mapping between water use scheduling target flow and scheduling characteristic data of different types of flow in an irrigation area;

a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;

and A4, constructing a decision tree forest network based on each optimal decision tree and the corresponding predicted value, and finishing constructing a machine learning model.

The beneficial effect of adopting the further scheme is as follows: a scheduling scene machine learning model is constructed through nonlinear regression mapping, and learning training is carried out on irrigation area characteristic data, so that scheduling experience in historical data is obtained, and the credibility of an irrigation area flow scheduling knowledge map question-answering system is increased.

Further, the step A3 includes the steps of:

b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;

b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;

b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;

b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;

and B5, repeating the steps B1 to B4 to obtain a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees.

The beneficial effect of adopting the further scheme is as follows: through a machine learning model of a tree structure, after a plurality of decision trees exist, the average value of all tree predicted values is used as the prediction result of a target variable, the number of characteristic variables and the number of the decision trees are continuously optimized according to the prediction result, a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees are obtained, and nonlinear regression mapping between irrigation area water use scheduling target flow and the characteristic variables is formed.

Further, the step S3 includes the steps of:

s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance sequencing of characteristic values and influence directions of different characteristic values on scheduling traffic;

s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;

and S33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data.

The beneficial effect of adopting the further scheme is as follows: and sorting and screening the characteristic values with the importance ranking at the top according to the importance of the characteristic values, removing redundancy of characteristic value combinations under different scheduling situations to obtain an optimal characteristic value combination, and analyzing the influence directions of different characteristic values on scheduling flow to obtain scheduling experience contained in historical data.

Further, the step S31 includes the steps of:

s311, calculating a shape value of the characteristic variable by using the SHAP model;

s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shapey value of the characteristic variable;

s313, according to the SHAP value g (z') of the first machine learning model, the importance ranking of the characteristic values and the influence directions of different characteristic values on the scheduling traffic are obtained.

The beneficial effect of adopting the further scheme is as follows: and calculating the characteristic variables through the SHAP model to obtain the importance ranking of the characteristic values and the influence directions of different characteristic values on the dispatching flow, continuously balancing the importance degree of each influence element or characteristic variable, further dynamically combining the decision and increasing the credibility of the irrigation area flow dispatching knowledge map question-answering system.

Further, the calculation expression of the shape value of the feature variable is as follows:

wherein phi _j The shape value of the jth characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.

The beneficial effect of adopting the further scheme is as follows: a method for calculating the shape value of the characteristic variable is provided.

Further, the expression of the SHAP value g (z') of the first machine learning model is as follows:

z′ _j ∈{0′,1′} ^M

φ _j ∈R

wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' _j In the decision path of (1 '), it means that the j-th feature variable is located in the joint vector z' _j In the decision path of (2), phi ₀ Representing decision parameters and R a real number.

The beneficial effect of adopting the above further scheme is that: a method of calculating the SHAP value g (z') of the first machine learning model is provided.

Further, the step S6 includes the steps of:

s61, respectively taking a gate, scheduling flow, scheduling experience data, scheduling characteristic data of different types of flow and scheduling flow gradients corresponding to the scheduling characteristic data of the different types of flow as entities and storing the entities in a graph database Neo4j;

s62, constructing a flow scheduling knowledge question-answering system of the irrigation district based on the graphic database Neo4j.

The beneficial effect of adopting the above further scheme is that: the irrigation area flow scheduling knowledge map question-answering system is constructed through various entities, and irrigation area schedulers can conveniently retrieve recommended scheduling flows given by the irrigation area flow scheduling knowledge map question-answering system.

Further, the step S7 includes the steps of:

s71, constructing a problem template of flow scheduling of the irrigation area;

s72, matching the problem set with the problem template probability by using a naive Bayes classifier;

s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;

and S74, searching the graphic database Neo4j according to the corresponding questions, and feeding back the search results of the graphic database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.

The beneficial effect of adopting the above further scheme is that: by constructing the problem template, matching the problem template, the problem set and the questioning data, searching the graphic database Neo4j through the irrigation district flow scheduling knowledge question-answering system, and displaying the corresponding search result, the scheduling problem search of irrigation district dispatchers is facilitated.

Drawings

Fig. 1 is a flow chart of steps of a method for constructing and using a knowledge-map question-answering system for flow scheduling of an irrigation area in an embodiment of the invention.

FIG. 2 is a diagram illustrating the importance ranking of eigenvalues in an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined by the appended claims, and all changes that can be made by the invention using the inventive concept are intended to be protected.

As shown in fig. 1, in an embodiment of the present invention, the present invention provides a method for constructing and using a knowledge-graph question-answering system for flow scheduling in an irrigation area, including the following steps:

often the scheduling scenario is as shown in table 1:

TABLE 1

As data such as rainfall/soil moisture content and the like can be sourced from different space distribution measuring points, the number of characteristic values of an initial input model is more than 30, wherein T1 is temperature, G is the empirical division of irrigation period, S represents the water storage of small and medium-sized reservoirs and small reservoirs around a main canal, R represents the daily rainfall measured by a rainfall site, C represents the water replenishing quantity of urban and ecological water supply, and Y represents the water demand;

the method for constructing the scheduling scene machine learning model comprises the following steps:

a1, obtaining water dispatching target flow of an irrigation area;

a2, constructing nonlinear regression mapping between irrigation area water scheduling target flow and scheduling characteristic data of different types of flow;

the step A3 comprises the following steps:

b5, repeating the steps from B1 to B4 to obtain a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees;

a4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model;

s3, interpreting the first machine learning model by using an SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;

the step S3 includes the steps of:

s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance ranking of characteristic values and influence directions of different characteristic values on scheduling traffic;

the step S31 includes the steps of:

the calculation expression of the shape value of the characteristic variable is as follows:

wherein phi is _j A shape value representing the jth characteristic variable, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S;

s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shape value of the characteristic variable;

the expression of the SHAP value g (z') of the first machine learning model is as follows:

z′ _j ∈{0′,1′} ^M

φ _j ∈R

wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' _j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' _j In the decision path of (2), phi ₀ Representing a decision parameter, R representing a real number;

the SHAP value is based on a shape value, and the shape value is a concept in the game theory;

s313, obtaining the importance sequence of the characteristic values and the influence directions of different characteristic values on the scheduling flow according to the SHAP value g (z') of the first machine learning model;

as shown in fig. 2, the importance ranks of the feature values are, from top to bottom: the method comprises the following steps of recording daily temperature data T1 and time T of meteorological sites, storing water S of medium and small-sized reservoirs around main canals, empirical division G of irrigation periods and non-irrigation periods, water supply and water supplement amounts C of cities and ecology in stages, field moisture content SK, daily rainfall R measured by rainfall sites and scheduling requests Y sent to an irrigation area scheduling bureau and a defense office;

s33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data;

different scheduling scenes are simulated to predict different types of traffic scheduling characteristic data, and the obtained scheduling traffic gradient corresponding to each type is shown in table 2:

TABLE 2

the step S6 includes the steps of:

s61, respectively taking a gate, scheduling traffic, scheduling experience data, scheduling characteristic data of different types of traffic and scheduling traffic gradients corresponding to the scheduling characteristic data of the different types of traffic as entities and storing the entities into a graphic database Neo4j;

s62, constructing an irrigation area flow scheduling knowledge question-answering system based on a graphic database Neo4j;

s7, constructing a problem template of the flow scheduling of the irrigation area, utilizing a naive Bayes classifier to perform probability matching on a problem set and the problem template, and combining a HanLP word segmentation device through an irrigation area flow scheduling knowledge question-answering system to finish the use of the irrigation area flow scheduling knowledge question-answering system;

the Hanlp participle device is used for the shortest path participle, has the functions of Chinese participle, part of speech tagging, new word recognition, named entity recognition, automatic abstraction, text clustering, emotion analysis, word vector and the like, and supports a user-defined dictionary;

the step S7 includes the steps of:

s71, constructing a problem template of flow scheduling of an irrigation area;

and S74, searching the graph database Neo4j according to the corresponding problems, and feeding back the search result of the graph database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.

If the input is: rainfall is-20, irrigation period is negative, and temperature is-30; obtaining an answer: and (3) predicting the scheduling flow: 2.

the invention has the beneficial effects that: the invention provides a construction and use method of a water flow scheduling knowledge map question-answering system for an irrigation district, which is characterized in that scheduling data containing years of historical experiences of water use scheduling personnel of the irrigation district are learned and interpreted by combining a machine learning model and an SHAP model interpretation method, the scheduling experiences are made to be knowledgeable, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation district water flow scheduling graph database Neo4j mainly containing the scheduling experiences and the scheduling prediction flows is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of a graph database Neo4j, knowledge of irrigation area water use scheduling experience can be finally achieved, and a knowledge question-answering system convenient for searching the scheduling experience and recommended flow is built for irrigation area management personnel.

Claims

1. A construction and use method of a flow scheduling knowledge-graph question-answering system for an irrigation district is characterized by comprising the following steps:

the step S3 includes the steps of:

the step S31 includes the steps of:

and S7, constructing a problem template of the flow scheduling of the irrigation area, matching the problem set with the problem template probability by using a naive Bayes classifier, and completing the use of the knowledge question-answering system of the flow scheduling of the irrigation area by combining a HanLP word splitter through the knowledge question-answering system of the flow scheduling of the irrigation area.

2. The irrigation area flow scheduling knowledge-graph question-answering system building and using method according to claim 1, wherein the building of the scheduling scene machine learning model comprises the following steps:

a1, obtaining water use scheduling target flow of an irrigation area;

and A4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model.

3. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 2, wherein the step A3 comprises the following steps:

4. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein a calculation expression of shape values of the characteristic variables is as follows:

wherein phi is _j The shape value of the j-th characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S ^ { j }) represents an average value of sample prediction values of the sample subset S after the j-th characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.

5. The irrigation area traffic scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein an expression of the SHAP value g (z') of the first machine learning model is as follows:

z′ _j ∈{0′,1′} ^M

φ _j ∈R

wherein z ' represents a joint vector, 0' represents that the jth feature variable is not located in the joint vector z ' _j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' _j In the decision path of (1), phi ₀ Representing decision parameters and R a real number.

6. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein the step S6 comprises the following steps:

7. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 6, wherein the step S7 comprises the steps of:

s71, constructing a problem template of flow scheduling of the irrigation area;