CN114780742B - Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area - Google Patents
Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area Download PDFInfo
- Publication number
- CN114780742B CN114780742B CN202210409530.6A CN202210409530A CN114780742B CN 114780742 B CN114780742 B CN 114780742B CN 202210409530 A CN202210409530 A CN 202210409530A CN 114780742 B CN114780742 B CN 114780742B
- Authority
- CN
- China
- Prior art keywords
- scheduling
- characteristic
- flow
- irrigation area
- question
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Animal Behavior & Ethology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a construction and use method of a knowledge map question-answering system for flow scheduling in an irrigation area, belonging to the technical field of flow scheduling in the irrigation area, and comprising the following steps of obtaining a plurality of different types of flow scheduling characteristic data; constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model; obtaining optimal characteristic value combination and scheduling experience data; inputting the optimal characteristic value combination into a first machine learning model for training and parameter adjustment to obtain a second machine learning model; respectively obtaining scheduling flow gradients corresponding to the various types of flow scheduling characteristic data; constructing an irrigation area flow scheduling knowledge question-answering system; constructing a problem template of flow scheduling of the irrigation area, matching a problem set with the problem template probability by using a naive Bayes classifier, and completing the use of an irrigation area flow scheduling knowledge question-answering system by combining a HanLP word segmentation device; the scheme solves the problem of low reliability and convenience of water flow scheduling for the irrigation area.
Description
Technical Field
The invention belongs to the technical field of flow scheduling of irrigation areas, and particularly relates to a construction and use method of a knowledge map question-answering system for flow scheduling of irrigation areas.
Background
The irrigated area is a typical complex water resource system driven by both natural and social factors and evolves, and is also an important content for the construction of national water networks. Reasonable water utilization scheduling in irrigated areas has important significance for improving overall water utilization efficiency and realizing sustainable development of water resources.
The real water use dispatching process of the irrigation area mainly depends on years of historical experience of water use dispatchers of the irrigation area. The target flow scheduled by water is used as a decision result, and the factors influencing the target flow are numerous, so that a plurality of characteristic variables are required to describe, and one characteristic variable is originated from different space distribution measuring points, so that a complex logic decision network depending on historical causal experience is formed. The decision network is characterized in that irrigation district water dispatching personnel are solidified in subconscious thinking after long-term learning trial and error practice, and the decision network externally shows the capability of scientifically predicting the irrigation district water dispatching process. There are two problems to be solved urgently in water scheduling of irrigation areas today:
(1) The existing hydrology or hydrology-hydraulics coupling model based on which the water flow scheduling of the irrigation area is mostly performed can obtain a relatively reasonable flow scheduling value in theoretical physics, but is difficult to effectively solve the problem of water flow scheduling of the irrigation area with social attributes;
(2) Due to the factors of reliability and convenience, almost no scheduling model can be really applied to the daily scheduling of the irrigated area nowadays.
Aiming at the first problem, the machine learning model can summarize the scheduling rule containing a large amount of scheduling experience through learning historical data, and can obtain reasonable scheduling flow by combining the consideration of both nature and humanity. The prior art only solves for the water amount by using machine learning, and does not interpret the scheduling experience contained in a large amount of historical scheduling data. Meanwhile, for a dispatcher who actually manages the irrigation area, the application process of the complex model is not simple enough, the solving process of the water quantity is too complex, and the models which are not subjected to actual dispatching of the pipe area are not as reliable as the dispatching experience which is passed over years, namely the model is the second problem existing in the water flow dispatching of the irrigation area.
Disclosure of Invention
Aiming at the defects in the prior art, the irrigation area flow scheduling knowledge map question-answering system construction and use method provided by the invention is used for learning and interpreting scheduling data containing years of historical experiences of irrigation area water use scheduling personnel by combining a machine learning model and a SHAP model interpretation method, so that the scheduling experiences are made to be knowledgeable, and the problem of low reliability and convenience in irrigation area water flow scheduling is solved.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the invention provides a construction and use method of a flow scheduling knowledge map question-answering system of an irrigation area, which comprises the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
s3, interpreting a first machine learning model by using a SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
s6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
s7, a problem template of the flow scheduling of the irrigation area is constructed, a problem set is matched with the problem template probability through a naive Bayesian classifier, and the use of the flow scheduling knowledge question-answering system of the irrigation area is completed through the combination of the flow scheduling knowledge question-answering system of the irrigation area and a HanLP word segmentation device.
The invention has the beneficial effects that: the invention provides a construction and use method of a irrigated area water flow scheduling knowledge map question-answering system, which is characterized in that a machine learning model and a SHAP model interpretation method are combined to learn and interpret scheduling data containing years of historical experience of irrigation area water use scheduling personnel, the scheduling experience is made into knowledge, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation area water flow scheduling graphic database Neo4j mainly comprising the scheduling experience and the scheduling prediction flow is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of the graph database Neo4j, the knowledge of irrigation area water use scheduling experience can be finally realized, and a knowledge question-answering system convenient for searching the scheduling experience and the recommended flow is built for irrigation area management personnel.
Further, the constructing of the scheduling scenario machine learning model includes the following steps:
a1, obtaining water use scheduling target flow of an irrigation area;
a2, constructing non-linear regression mapping between water use scheduling target flow and scheduling characteristic data of different types of flow in an irrigation area;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
and A4, constructing a decision tree forest network based on each optimal decision tree and the corresponding predicted value, and finishing constructing a machine learning model.
The beneficial effect of adopting the further scheme is as follows: a scheduling scene machine learning model is constructed through nonlinear regression mapping, and learning training is carried out on irrigation area characteristic data, so that scheduling experience in historical data is obtained, and the credibility of an irrigation area flow scheduling knowledge map question-answering system is increased.
Further, the step A3 includes the steps of:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
and B5, repeating the steps B1 to B4 to obtain a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees.
The beneficial effect of adopting the further scheme is as follows: through a machine learning model of a tree structure, after a plurality of decision trees exist, the average value of all tree predicted values is used as the prediction result of a target variable, the number of characteristic variables and the number of the decision trees are continuously optimized according to the prediction result, a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees are obtained, and nonlinear regression mapping between irrigation area water use scheduling target flow and the characteristic variables is formed.
Further, the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance sequencing of characteristic values and influence directions of different characteristic values on scheduling traffic;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
and S33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data.
The beneficial effect of adopting the further scheme is as follows: and sorting and screening the characteristic values with the importance ranking at the top according to the importance of the characteristic values, removing redundancy of characteristic value combinations under different scheduling situations to obtain an optimal characteristic value combination, and analyzing the influence directions of different characteristic values on scheduling flow to obtain scheduling experience contained in historical data.
Further, the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shapey value of the characteristic variable;
s313, according to the SHAP value g (z') of the first machine learning model, the importance ranking of the characteristic values and the influence directions of different characteristic values on the scheduling traffic are obtained.
The beneficial effect of adopting the further scheme is as follows: and calculating the characteristic variables through the SHAP model to obtain the importance ranking of the characteristic values and the influence directions of different characteristic values on the dispatching flow, continuously balancing the importance degree of each influence element or characteristic variable, further dynamically combining the decision and increasing the credibility of the irrigation area flow dispatching knowledge map question-answering system.
Further, the calculation expression of the shape value of the feature variable is as follows:
wherein phi j The shape value of the jth characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.
The beneficial effect of adopting the further scheme is as follows: a method for calculating the shape value of the characteristic variable is provided.
Further, the expression of the SHAP value g (z') of the first machine learning model is as follows:
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (1 '), it means that the j-th feature variable is located in the joint vector z' j In the decision path of (2), phi 0 Representing decision parameters and R a real number.
The beneficial effect of adopting the above further scheme is that: a method of calculating the SHAP value g (z') of the first machine learning model is provided.
Further, the step S6 includes the steps of:
s61, respectively taking a gate, scheduling flow, scheduling experience data, scheduling characteristic data of different types of flow and scheduling flow gradients corresponding to the scheduling characteristic data of the different types of flow as entities and storing the entities in a graph database Neo4j;
s62, constructing a flow scheduling knowledge question-answering system of the irrigation district based on the graphic database Neo4j.
The beneficial effect of adopting the above further scheme is that: the irrigation area flow scheduling knowledge map question-answering system is constructed through various entities, and irrigation area schedulers can conveniently retrieve recommended scheduling flows given by the irrigation area flow scheduling knowledge map question-answering system.
Further, the step S7 includes the steps of:
s71, constructing a problem template of flow scheduling of the irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graphic database Neo4j according to the corresponding questions, and feeding back the search results of the graphic database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
The beneficial effect of adopting the above further scheme is that: by constructing the problem template, matching the problem template, the problem set and the questioning data, searching the graphic database Neo4j through the irrigation district flow scheduling knowledge question-answering system, and displaying the corresponding search result, the scheduling problem search of irrigation district dispatchers is facilitated.
Drawings
Fig. 1 is a flow chart of steps of a method for constructing and using a knowledge-map question-answering system for flow scheduling of an irrigation area in an embodiment of the invention.
FIG. 2 is a diagram illustrating the importance ranking of eigenvalues in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined by the appended claims, and all changes that can be made by the invention using the inventive concept are intended to be protected.
As shown in fig. 1, in an embodiment of the present invention, the present invention provides a method for constructing and using a knowledge-graph question-answering system for flow scheduling in an irrigation area, including the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
often the scheduling scenario is as shown in table 1:
TABLE 1
As data such as rainfall/soil moisture content and the like can be sourced from different space distribution measuring points, the number of characteristic values of an initial input model is more than 30, wherein T1 is temperature, G is the empirical division of irrigation period, S represents the water storage of small and medium-sized reservoirs and small reservoirs around a main canal, R represents the daily rainfall measured by a rainfall site, C represents the water replenishing quantity of urban and ecological water supply, and Y represents the water demand;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
the method for constructing the scheduling scene machine learning model comprises the following steps:
a1, obtaining water dispatching target flow of an irrigation area;
a2, constructing nonlinear regression mapping between irrigation area water scheduling target flow and scheduling characteristic data of different types of flow;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
the step A3 comprises the following steps:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
b5, repeating the steps from B1 to B4 to obtain a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees;
a4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model;
s3, interpreting the first machine learning model by using an SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance ranking of characteristic values and influence directions of different characteristic values on scheduling traffic;
the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
the calculation expression of the shape value of the characteristic variable is as follows:
wherein phi is j A shape value representing the jth characteristic variable, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shape value of the characteristic variable;
the expression of the SHAP value g (z') of the first machine learning model is as follows:
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' j In the decision path of (2), phi 0 Representing a decision parameter, R representing a real number;
the SHAP value is based on a shape value, and the shape value is a concept in the game theory;
s313, obtaining the importance sequence of the characteristic values and the influence directions of different characteristic values on the scheduling flow according to the SHAP value g (z') of the first machine learning model;
as shown in fig. 2, the importance ranks of the feature values are, from top to bottom: the method comprises the following steps of recording daily temperature data T1 and time T of meteorological sites, storing water S of medium and small-sized reservoirs around main canals, empirical division G of irrigation periods and non-irrigation periods, water supply and water supplement amounts C of cities and ecology in stages, field moisture content SK, daily rainfall R measured by rainfall sites and scheduling requests Y sent to an irrigation area scheduling bureau and a defense office;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
s33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
different scheduling scenes are simulated to predict different types of traffic scheduling characteristic data, and the obtained scheduling traffic gradient corresponding to each type is shown in table 2:
TABLE 2
S6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
the step S6 includes the steps of:
s61, respectively taking a gate, scheduling traffic, scheduling experience data, scheduling characteristic data of different types of traffic and scheduling traffic gradients corresponding to the scheduling characteristic data of the different types of traffic as entities and storing the entities into a graphic database Neo4j;
s62, constructing an irrigation area flow scheduling knowledge question-answering system based on a graphic database Neo4j;
s7, constructing a problem template of the flow scheduling of the irrigation area, utilizing a naive Bayes classifier to perform probability matching on a problem set and the problem template, and combining a HanLP word segmentation device through an irrigation area flow scheduling knowledge question-answering system to finish the use of the irrigation area flow scheduling knowledge question-answering system;
the Hanlp participle device is used for the shortest path participle, has the functions of Chinese participle, part of speech tagging, new word recognition, named entity recognition, automatic abstraction, text clustering, emotion analysis, word vector and the like, and supports a user-defined dictionary;
the step S7 includes the steps of:
s71, constructing a problem template of flow scheduling of an irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graph database Neo4j according to the corresponding problems, and feeding back the search result of the graph database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
If the input is: rainfall is-20, irrigation period is negative, and temperature is-30; obtaining an answer: and (3) predicting the scheduling flow: 2.
the invention has the beneficial effects that: the invention provides a construction and use method of a water flow scheduling knowledge map question-answering system for an irrigation district, which is characterized in that scheduling data containing years of historical experiences of water use scheduling personnel of the irrigation district are learned and interpreted by combining a machine learning model and an SHAP model interpretation method, the scheduling experiences are made to be knowledgeable, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation district water flow scheduling graph database Neo4j mainly containing the scheduling experiences and the scheduling prediction flows is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of a graph database Neo4j, knowledge of irrigation area water use scheduling experience can be finally achieved, and a knowledge question-answering system convenient for searching the scheduling experience and recommended flow is built for irrigation area management personnel.
Claims (7)
1. A construction and use method of a flow scheduling knowledge-graph question-answering system for an irrigation district is characterized by comprising the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
s3, interpreting a first machine learning model by using a SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance sequencing of characteristic values and influence directions of different characteristic values on scheduling traffic;
the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shapey value of the characteristic variable;
s313, obtaining the importance sequence of the characteristic values and the influence directions of different characteristic values on the scheduling flow according to the SHAP value g (z') of the first machine learning model;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
s33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
s6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
and S7, constructing a problem template of the flow scheduling of the irrigation area, matching the problem set with the problem template probability by using a naive Bayes classifier, and completing the use of the knowledge question-answering system of the flow scheduling of the irrigation area by combining a HanLP word splitter through the knowledge question-answering system of the flow scheduling of the irrigation area.
2. The irrigation area flow scheduling knowledge-graph question-answering system building and using method according to claim 1, wherein the building of the scheduling scene machine learning model comprises the following steps:
a1, obtaining water use scheduling target flow of an irrigation area;
a2, constructing nonlinear regression mapping between irrigation area water scheduling target flow and scheduling characteristic data of different types of flow;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
and A4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model.
3. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 2, wherein the step A3 comprises the following steps:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
and B5, repeating the steps B1 to B4 to obtain a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees.
4. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein a calculation expression of shape values of the characteristic variables is as follows:
wherein phi is j The shape value of the j-th characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S ^ { j }) represents an average value of sample prediction values of the sample subset S after the j-th characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.
5. The irrigation area traffic scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein an expression of the SHAP value g (z') of the first machine learning model is as follows:
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' j In the decision path of (1), phi 0 Representing decision parameters and R a real number.
6. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein the step S6 comprises the following steps:
s61, respectively taking a gate, scheduling flow, scheduling experience data, scheduling characteristic data of different types of flow and scheduling flow gradients corresponding to the scheduling characteristic data of the different types of flow as entities and storing the entities in a graph database Neo4j;
s62, constructing a flow scheduling knowledge question-answering system of the irrigation district based on the graphic database Neo4j.
7. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 6, wherein the step S7 comprises the steps of:
s71, constructing a problem template of flow scheduling of the irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graph database Neo4j according to the corresponding problems, and feeding back the search result of the graph database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210409530.6A CN114780742B (en) | 2022-04-19 | 2022-04-19 | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210409530.6A CN114780742B (en) | 2022-04-19 | 2022-04-19 | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114780742A CN114780742A (en) | 2022-07-22 |
CN114780742B true CN114780742B (en) | 2023-02-24 |
Family
ID=82431536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210409530.6A Active CN114780742B (en) | 2022-04-19 | 2022-04-19 | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114780742B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117235219A (en) * | 2023-09-15 | 2023-12-15 | 宁波市水利水电规划设计研究院有限公司 | Reservoir knowledge intelligent question-answering system based on flood prevention demands |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552820A (en) * | 2020-04-30 | 2020-08-18 | 江河瑞通(北京)技术有限公司 | Water engineering scheduling data processing method and device |
CN112508442A (en) * | 2020-12-18 | 2021-03-16 | 湖南大学 | Transient stability evaluation method and system based on automation and interpretable machine learning |
CN112581172A (en) * | 2020-12-18 | 2021-03-30 | 四川中电启明星信息技术有限公司 | Multi-model fusion electricity sales quantity prediction method based on empirical mode decomposition |
CN113918512A (en) * | 2021-10-22 | 2022-01-11 | 国家电网公司华中分部 | Power grid operation rule knowledge graph construction system and method |
CN113919886A (en) * | 2021-11-11 | 2022-01-11 | 重庆邮电大学 | Data characteristic combination pricing method and system based on summer pril value and electronic equipment |
CN114116915A (en) * | 2021-10-28 | 2022-03-01 | 天津大学 | Hydraulic engineering intelligent map system based on three-dimensional digital platform |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4170520A4 (en) * | 2020-06-17 | 2023-11-29 | The 4th Paradigm Technology Co., Ltd | Method and device for constructing knowledge graph, computer device, and storage medium |
CN112613720B (en) * | 2020-12-17 | 2023-03-24 | 湖北工业大学 | Reservoir irrigation optimal scheduling method considering multiple uncertainties |
CN112668773A (en) * | 2020-12-24 | 2021-04-16 | 北京百度网讯科技有限公司 | Method and device for predicting warehousing traffic and electronic equipment |
CN113377966B (en) * | 2021-08-11 | 2021-11-19 | 长江水利委员会水文局 | Water conservancy project scheduling regulation reasoning method based on knowledge graph |
CN113918725A (en) * | 2021-08-31 | 2022-01-11 | 南京中禹智慧水利研究院有限公司 | Construction method of knowledge graph in water affairs field |
CN114048900A (en) * | 2021-11-07 | 2022-02-15 | 天津大学 | Irrigated area reservoir dispatching management system |
-
2022
- 2022-04-19 CN CN202210409530.6A patent/CN114780742B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552820A (en) * | 2020-04-30 | 2020-08-18 | 江河瑞通(北京)技术有限公司 | Water engineering scheduling data processing method and device |
CN112508442A (en) * | 2020-12-18 | 2021-03-16 | 湖南大学 | Transient stability evaluation method and system based on automation and interpretable machine learning |
CN112581172A (en) * | 2020-12-18 | 2021-03-30 | 四川中电启明星信息技术有限公司 | Multi-model fusion electricity sales quantity prediction method based on empirical mode decomposition |
CN113918512A (en) * | 2021-10-22 | 2022-01-11 | 国家电网公司华中分部 | Power grid operation rule knowledge graph construction system and method |
CN114116915A (en) * | 2021-10-28 | 2022-03-01 | 天津大学 | Hydraulic engineering intelligent map system based on three-dimensional digital platform |
CN113919886A (en) * | 2021-11-11 | 2022-01-11 | 重庆邮电大学 | Data characteristic combination pricing method and system based on summer pril value and electronic equipment |
Non-Patent Citations (2)
Title |
---|
数据仓库在都江堰灌区数据中心建设中的应用;宋海瑞等;《计算机工程》;20070505(第09期);全文 * |
灌区信息化建设探讨;张泽良等;《山西水利》;20031230(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114780742A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442790B (en) | Method, device, server and storage medium for recommending multimedia data | |
Thangavel et al. | Student placement analyzer: A recommendation system using machine learning | |
CN109960800A (en) | Weakly supervised file classification method and device based on Active Learning | |
CN108920544A (en) | A kind of personalized position recommended method of knowledge based map | |
CN104798043B (en) | A kind of data processing method and computer system | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
CN109543901A (en) | Short-Term Load Forecasting Method based on information fusion convolutional neural networks model | |
CN107330011A (en) | The recognition methods of the name entity of many strategy fusions and device | |
Song et al. | A novel dual path gated recurrent unit model for sea surface salinity prediction | |
CN105931116A (en) | Automated credit scoring system and method based on depth learning mechanism | |
CN108877905A (en) | A kind of medical amount prediction technique of the hospital outpatient based on Xgboost frame | |
Chen et al. | Groundwater level prediction using SOM-RBFN multisite model | |
CN107368521B (en) | Knowledge recommendation method and system based on big data and deep learning | |
CN106407482B (en) | A kind of network academic report category method based on multi-feature fusion | |
CN106779219A (en) | A kind of electricity demand forecasting method and system | |
CN114780742B (en) | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area | |
CN105184326A (en) | Active learning multi-label social network data analysis method based on graph data | |
Wang et al. | An approach of recursive timing deep belief network for algal bloom forecasting | |
Wang | Fuzzy comprehensive evaluation of physical education based on high dimensional data mining | |
CN107015965A (en) | A kind of Chinese text sentiment analysis device and method | |
CN107807919A (en) | A kind of method for carrying out microblog emotional classification prediction using random walk network is circulated | |
CN110389932A (en) | Electric power automatic document classifying method and device | |
CN108563720A (en) | Big data based on AI recommends learning system and recommends method | |
Zhao et al. | Online distance learning precision service technology based on big data analysis | |
CN116662860A (en) | User portrait and classification method based on energy big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |