CN114780742B - Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area - Google Patents

Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area Download PDF

Info

Publication number
CN114780742B
CN114780742B CN202210409530.6A CN202210409530A CN114780742B CN 114780742 B CN114780742 B CN 114780742B CN 202210409530 A CN202210409530 A CN 202210409530A CN 114780742 B CN114780742 B CN 114780742B
Authority
CN
China
Prior art keywords
scheduling
characteristic
flow
irrigation area
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210409530.6A
Other languages
Chinese (zh)
Other versions
CN114780742A (en
Inventor
苏楠
章少辉
白美健
张宝忠
陈皓锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Institute of Water Resources and Hydropower Research
Original Assignee
China Institute of Water Resources and Hydropower Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Institute of Water Resources and Hydropower Research filed Critical China Institute of Water Resources and Hydropower Research
Priority to CN202210409530.6A priority Critical patent/CN114780742B/en
Publication of CN114780742A publication Critical patent/CN114780742A/en
Application granted granted Critical
Publication of CN114780742B publication Critical patent/CN114780742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Animal Behavior & Ethology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a construction and use method of a knowledge map question-answering system for flow scheduling in an irrigation area, belonging to the technical field of flow scheduling in the irrigation area, and comprising the following steps of obtaining a plurality of different types of flow scheduling characteristic data; constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model; obtaining optimal characteristic value combination and scheduling experience data; inputting the optimal characteristic value combination into a first machine learning model for training and parameter adjustment to obtain a second machine learning model; respectively obtaining scheduling flow gradients corresponding to the various types of flow scheduling characteristic data; constructing an irrigation area flow scheduling knowledge question-answering system; constructing a problem template of flow scheduling of the irrigation area, matching a problem set with the problem template probability by using a naive Bayes classifier, and completing the use of an irrigation area flow scheduling knowledge question-answering system by combining a HanLP word segmentation device; the scheme solves the problem of low reliability and convenience of water flow scheduling for the irrigation area.

Description

Construction and use method of irrigation area flow scheduling knowledge graph question-answering system
Technical Field
The invention belongs to the technical field of flow scheduling of irrigation areas, and particularly relates to a construction and use method of a knowledge map question-answering system for flow scheduling of irrigation areas.
Background
The irrigated area is a typical complex water resource system driven by both natural and social factors and evolves, and is also an important content for the construction of national water networks. Reasonable water utilization scheduling in irrigated areas has important significance for improving overall water utilization efficiency and realizing sustainable development of water resources.
The real water use dispatching process of the irrigation area mainly depends on years of historical experience of water use dispatchers of the irrigation area. The target flow scheduled by water is used as a decision result, and the factors influencing the target flow are numerous, so that a plurality of characteristic variables are required to describe, and one characteristic variable is originated from different space distribution measuring points, so that a complex logic decision network depending on historical causal experience is formed. The decision network is characterized in that irrigation district water dispatching personnel are solidified in subconscious thinking after long-term learning trial and error practice, and the decision network externally shows the capability of scientifically predicting the irrigation district water dispatching process. There are two problems to be solved urgently in water scheduling of irrigation areas today:
(1) The existing hydrology or hydrology-hydraulics coupling model based on which the water flow scheduling of the irrigation area is mostly performed can obtain a relatively reasonable flow scheduling value in theoretical physics, but is difficult to effectively solve the problem of water flow scheduling of the irrigation area with social attributes;
(2) Due to the factors of reliability and convenience, almost no scheduling model can be really applied to the daily scheduling of the irrigated area nowadays.
Aiming at the first problem, the machine learning model can summarize the scheduling rule containing a large amount of scheduling experience through learning historical data, and can obtain reasonable scheduling flow by combining the consideration of both nature and humanity. The prior art only solves for the water amount by using machine learning, and does not interpret the scheduling experience contained in a large amount of historical scheduling data. Meanwhile, for a dispatcher who actually manages the irrigation area, the application process of the complex model is not simple enough, the solving process of the water quantity is too complex, and the models which are not subjected to actual dispatching of the pipe area are not as reliable as the dispatching experience which is passed over years, namely the model is the second problem existing in the water flow dispatching of the irrigation area.
Disclosure of Invention
Aiming at the defects in the prior art, the irrigation area flow scheduling knowledge map question-answering system construction and use method provided by the invention is used for learning and interpreting scheduling data containing years of historical experiences of irrigation area water use scheduling personnel by combining a machine learning model and a SHAP model interpretation method, so that the scheduling experiences are made to be knowledgeable, and the problem of low reliability and convenience in irrigation area water flow scheduling is solved.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the invention provides a construction and use method of a flow scheduling knowledge map question-answering system of an irrigation area, which comprises the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
s3, interpreting a first machine learning model by using a SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
s6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
s7, a problem template of the flow scheduling of the irrigation area is constructed, a problem set is matched with the problem template probability through a naive Bayesian classifier, and the use of the flow scheduling knowledge question-answering system of the irrigation area is completed through the combination of the flow scheduling knowledge question-answering system of the irrigation area and a HanLP word segmentation device.
The invention has the beneficial effects that: the invention provides a construction and use method of a irrigated area water flow scheduling knowledge map question-answering system, which is characterized in that a machine learning model and a SHAP model interpretation method are combined to learn and interpret scheduling data containing years of historical experience of irrigation area water use scheduling personnel, the scheduling experience is made into knowledge, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation area water flow scheduling graphic database Neo4j mainly comprising the scheduling experience and the scheduling prediction flow is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of the graph database Neo4j, the knowledge of irrigation area water use scheduling experience can be finally realized, and a knowledge question-answering system convenient for searching the scheduling experience and the recommended flow is built for irrigation area management personnel.
Further, the constructing of the scheduling scenario machine learning model includes the following steps:
a1, obtaining water use scheduling target flow of an irrigation area;
a2, constructing non-linear regression mapping between water use scheduling target flow and scheduling characteristic data of different types of flow in an irrigation area;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
and A4, constructing a decision tree forest network based on each optimal decision tree and the corresponding predicted value, and finishing constructing a machine learning model.
The beneficial effect of adopting the further scheme is as follows: a scheduling scene machine learning model is constructed through nonlinear regression mapping, and learning training is carried out on irrigation area characteristic data, so that scheduling experience in historical data is obtained, and the credibility of an irrigation area flow scheduling knowledge map question-answering system is increased.
Further, the step A3 includes the steps of:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
and B5, repeating the steps B1 to B4 to obtain a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees.
The beneficial effect of adopting the further scheme is as follows: through a machine learning model of a tree structure, after a plurality of decision trees exist, the average value of all tree predicted values is used as the prediction result of a target variable, the number of characteristic variables and the number of the decision trees are continuously optimized according to the prediction result, a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees are obtained, and nonlinear regression mapping between irrigation area water use scheduling target flow and the characteristic variables is formed.
Further, the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance sequencing of characteristic values and influence directions of different characteristic values on scheduling traffic;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
and S33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data.
The beneficial effect of adopting the further scheme is as follows: and sorting and screening the characteristic values with the importance ranking at the top according to the importance of the characteristic values, removing redundancy of characteristic value combinations under different scheduling situations to obtain an optimal characteristic value combination, and analyzing the influence directions of different characteristic values on scheduling flow to obtain scheduling experience contained in historical data.
Further, the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shapey value of the characteristic variable;
s313, according to the SHAP value g (z') of the first machine learning model, the importance ranking of the characteristic values and the influence directions of different characteristic values on the scheduling traffic are obtained.
The beneficial effect of adopting the further scheme is as follows: and calculating the characteristic variables through the SHAP model to obtain the importance ranking of the characteristic values and the influence directions of different characteristic values on the dispatching flow, continuously balancing the importance degree of each influence element or characteristic variable, further dynamically combining the decision and increasing the credibility of the irrigation area flow dispatching knowledge map question-answering system.
Further, the calculation expression of the shape value of the feature variable is as follows:
Figure BDA0003603169180000051
wherein phi j The shape value of the jth characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.
The beneficial effect of adopting the further scheme is as follows: a method for calculating the shape value of the characteristic variable is provided.
Further, the expression of the SHAP value g (z') of the first machine learning model is as follows:
Figure BDA0003603169180000061
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (1 '), it means that the j-th feature variable is located in the joint vector z' j In the decision path of (2), phi 0 Representing decision parameters and R a real number.
The beneficial effect of adopting the above further scheme is that: a method of calculating the SHAP value g (z') of the first machine learning model is provided.
Further, the step S6 includes the steps of:
s61, respectively taking a gate, scheduling flow, scheduling experience data, scheduling characteristic data of different types of flow and scheduling flow gradients corresponding to the scheduling characteristic data of the different types of flow as entities and storing the entities in a graph database Neo4j;
s62, constructing a flow scheduling knowledge question-answering system of the irrigation district based on the graphic database Neo4j.
The beneficial effect of adopting the above further scheme is that: the irrigation area flow scheduling knowledge map question-answering system is constructed through various entities, and irrigation area schedulers can conveniently retrieve recommended scheduling flows given by the irrigation area flow scheduling knowledge map question-answering system.
Further, the step S7 includes the steps of:
s71, constructing a problem template of flow scheduling of the irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graphic database Neo4j according to the corresponding questions, and feeding back the search results of the graphic database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
The beneficial effect of adopting the above further scheme is that: by constructing the problem template, matching the problem template, the problem set and the questioning data, searching the graphic database Neo4j through the irrigation district flow scheduling knowledge question-answering system, and displaying the corresponding search result, the scheduling problem search of irrigation district dispatchers is facilitated.
Drawings
Fig. 1 is a flow chart of steps of a method for constructing and using a knowledge-map question-answering system for flow scheduling of an irrigation area in an embodiment of the invention.
FIG. 2 is a diagram illustrating the importance ranking of eigenvalues in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined by the appended claims, and all changes that can be made by the invention using the inventive concept are intended to be protected.
As shown in fig. 1, in an embodiment of the present invention, the present invention provides a method for constructing and using a knowledge-graph question-answering system for flow scheduling in an irrigation area, including the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
often the scheduling scenario is as shown in table 1:
TABLE 1
Figure BDA0003603169180000071
Figure BDA0003603169180000081
As data such as rainfall/soil moisture content and the like can be sourced from different space distribution measuring points, the number of characteristic values of an initial input model is more than 30, wherein T1 is temperature, G is the empirical division of irrigation period, S represents the water storage of small and medium-sized reservoirs and small reservoirs around a main canal, R represents the daily rainfall measured by a rainfall site, C represents the water replenishing quantity of urban and ecological water supply, and Y represents the water demand;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
the method for constructing the scheduling scene machine learning model comprises the following steps:
a1, obtaining water dispatching target flow of an irrigation area;
a2, constructing nonlinear regression mapping between irrigation area water scheduling target flow and scheduling characteristic data of different types of flow;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
the step A3 comprises the following steps:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
b5, repeating the steps from B1 to B4 to obtain a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees;
a4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model;
s3, interpreting the first machine learning model by using an SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance ranking of characteristic values and influence directions of different characteristic values on scheduling traffic;
the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
the calculation expression of the shape value of the characteristic variable is as follows:
Figure BDA0003603169180000091
wherein phi is j A shape value representing the jth characteristic variable, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S { }) represents an average value of sample prediction values of the sample subset S after the jth characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shape value of the characteristic variable;
the expression of the SHAP value g (z') of the first machine learning model is as follows:
Figure BDA0003603169180000101
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, and 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' j In the decision path of (2), phi 0 Representing a decision parameter, R representing a real number;
the SHAP value is based on a shape value, and the shape value is a concept in the game theory;
s313, obtaining the importance sequence of the characteristic values and the influence directions of different characteristic values on the scheduling flow according to the SHAP value g (z') of the first machine learning model;
as shown in fig. 2, the importance ranks of the feature values are, from top to bottom: the method comprises the following steps of recording daily temperature data T1 and time T of meteorological sites, storing water S of medium and small-sized reservoirs around main canals, empirical division G of irrigation periods and non-irrigation periods, water supply and water supplement amounts C of cities and ecology in stages, field moisture content SK, daily rainfall R measured by rainfall sites and scheduling requests Y sent to an irrigation area scheduling bureau and a defense office;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
s33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
different scheduling scenes are simulated to predict different types of traffic scheduling characteristic data, and the obtained scheduling traffic gradient corresponding to each type is shown in table 2:
TABLE 2
Figure BDA0003603169180000111
S6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
the step S6 includes the steps of:
s61, respectively taking a gate, scheduling traffic, scheduling experience data, scheduling characteristic data of different types of traffic and scheduling traffic gradients corresponding to the scheduling characteristic data of the different types of traffic as entities and storing the entities into a graphic database Neo4j;
s62, constructing an irrigation area flow scheduling knowledge question-answering system based on a graphic database Neo4j;
s7, constructing a problem template of the flow scheduling of the irrigation area, utilizing a naive Bayes classifier to perform probability matching on a problem set and the problem template, and combining a HanLP word segmentation device through an irrigation area flow scheduling knowledge question-answering system to finish the use of the irrigation area flow scheduling knowledge question-answering system;
the Hanlp participle device is used for the shortest path participle, has the functions of Chinese participle, part of speech tagging, new word recognition, named entity recognition, automatic abstraction, text clustering, emotion analysis, word vector and the like, and supports a user-defined dictionary;
the step S7 includes the steps of:
s71, constructing a problem template of flow scheduling of an irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graph database Neo4j according to the corresponding problems, and feeding back the search result of the graph database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
If the input is: rainfall is-20, irrigation period is negative, and temperature is-30; obtaining an answer: and (3) predicting the scheduling flow: 2.
the invention has the beneficial effects that: the invention provides a construction and use method of a water flow scheduling knowledge map question-answering system for an irrigation district, which is characterized in that scheduling data containing years of historical experiences of water use scheduling personnel of the irrigation district are learned and interpreted by combining a machine learning model and an SHAP model interpretation method, the scheduling experiences are made to be knowledgeable, and meanwhile, a scheduling flow prediction gradient value is obtained by using the machine learning model through simulation of a scheduling scene, so that an irrigation district water flow scheduling graph database Neo4j mainly containing the scheduling experiences and the scheduling prediction flows is formed. And an irrigation area flow scheduling knowledge question-answering system is built on the basis of a graph database Neo4j, knowledge of irrigation area water use scheduling experience can be finally achieved, and a knowledge question-answering system convenient for searching the scheduling experience and recommended flow is built for irrigation area management personnel.

Claims (7)

1. A construction and use method of a flow scheduling knowledge-graph question-answering system for an irrigation district is characterized by comprising the following steps:
s1, acquiring a data set of flow scheduling characteristic quantity of an irrigation area, and classifying the data set of flow scheduling characteristic quantity of the irrigation area according to different scheduling scenes to obtain a plurality of different types of flow scheduling characteristic data;
s2, constructing a scheduling scene machine learning model, inputting different types of traffic scheduling characteristic data into the trained scheduling scene machine learning model to adjust parameters, and obtaining a first machine learning model;
s3, interpreting a first machine learning model by using a SHAP model interpretation method to obtain an optimal characteristic value combination and scheduling experience data;
the step S3 includes the steps of:
s31, interpreting a first machine learning model by using a SHAP model interpretation method to respectively obtain importance sequencing of characteristic values and influence directions of different characteristic values on scheduling traffic;
the step S31 includes the steps of:
s311, calculating a shape value of the characteristic variable by using the SHAP model;
s312, calculating to obtain a SHAP value g (z') of the first machine learning model according to the shapey value of the characteristic variable;
s313, obtaining the importance sequence of the characteristic values and the influence directions of different characteristic values on the scheduling flow according to the SHAP value g (z') of the first machine learning model;
s32, sorting and selecting the characteristic values which are sorted in the front according to the importance of the characteristic values, and deleting the redundancy of characteristic value combinations of different scheduling scenes to obtain an optimal characteristic value combination;
s33, analyzing the influence directions of different characteristic values on the scheduling flow to obtain scheduling experience data;
s4, inputting the optimal characteristic value combination into a first machine learning model for training and adjusting parameters to obtain a second machine learning model;
s5, predicting different types of traffic scheduling characteristic data by using a second machine learning model, and respectively obtaining scheduling traffic gradients corresponding to the different types of traffic scheduling characteristic data;
s6, constructing an irrigation district flow scheduling knowledge question-answering system based on scheduling experience data, different types of flow scheduling characteristic data and scheduling flow gradients corresponding to the different types of flow scheduling characteristic data;
and S7, constructing a problem template of the flow scheduling of the irrigation area, matching the problem set with the problem template probability by using a naive Bayes classifier, and completing the use of the knowledge question-answering system of the flow scheduling of the irrigation area by combining a HanLP word splitter through the knowledge question-answering system of the flow scheduling of the irrigation area.
2. The irrigation area flow scheduling knowledge-graph question-answering system building and using method according to claim 1, wherein the building of the scheduling scene machine learning model comprises the following steps:
a1, obtaining water use scheduling target flow of an irrigation area;
a2, constructing nonlinear regression mapping between irrigation area water scheduling target flow and scheduling characteristic data of different types of flow;
a3, obtaining a plurality of optimal decision trees and predicted values corresponding to the optimal decision trees according to nonlinear regression mapping;
and A4, constructing a decision tree forest network based on the optimal decision trees and the corresponding predicted values, and completing construction of a machine learning model.
3. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 2, wherein the step A3 comprises the following steps:
b1, randomly extracting m characteristic variables from different types of flow scheduling characteristic data according to nonlinear regression mapping, wherein m represents the number of the characteristic variables;
b2, selecting s characteristic variables from the m characteristic variables as decision tree nodes, wherein s represents the number of the characteristic variables and is smaller than m;
b3, when the mean square error sum of the nodes of each adjacent decision tree is minimum, differentiating and recursing the nodes of the decision tree to obtain an optimal decision tree;
b4, taking the mean value of the leaf nodes of each optimal decision tree as a corresponding predicted value;
and B5, repeating the steps B1 to B4 to obtain a plurality of optimal decision trees and the predicted values corresponding to the optimal decision trees.
4. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein a calculation expression of shape values of the characteristic variables is as follows:
Figure FDA0004005502810000031
wherein phi is j The shape value of the j-th characteristic variable is represented, f (·) represents nonlinear regression mapping, N represents a characteristic variable sample set, M represents a characteristic variable sample set dimension, S represents a characteristic variable sample subset extracted from the characteristic variable sample set N, | S | represents a dimension of the characteristic variable sample subset S, f (S ^ { j }) represents an average value of sample prediction values of the sample subset S after the j-th characteristic variable is fused with the sample subset S, and f (S) represents a prediction value of the sample subset S.
5. The irrigation area traffic scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein an expression of the SHAP value g (z') of the first machine learning model is as follows:
Figure FDA0004005502810000032
z′ j ∈{0′,1′} M
φ j ∈R
wherein z ' represents a joint vector, 0' represents that the jth feature variable is not located in the joint vector z ' j In the decision path of (2), 1 'indicates that the jth feature variable is located in the joint vector z' j In the decision path of (1), phi 0 Representing decision parameters and R a real number.
6. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 1, wherein the step S6 comprises the following steps:
s61, respectively taking a gate, scheduling flow, scheduling experience data, scheduling characteristic data of different types of flow and scheduling flow gradients corresponding to the scheduling characteristic data of the different types of flow as entities and storing the entities in a graph database Neo4j;
s62, constructing a flow scheduling knowledge question-answering system of the irrigation district based on the graphic database Neo4j.
7. The irrigation area flow scheduling knowledge-graph question-answering system construction and use method according to claim 6, wherein the step S7 comprises the steps of:
s71, constructing a problem template of flow scheduling of the irrigation area;
s72, matching the problem set with the problem template probability by using a naive Bayes classifier;
s73, obtaining question data through the irrigation area flow scheduling knowledge question-answering system, and matching the question data with a question template through a HanLP word splitter to obtain a corresponding question in a question set;
and S74, searching the graph database Neo4j according to the corresponding problems, and feeding back the search result of the graph database Neo4j through the irrigation area traffic scheduling knowledge question-answering system to finish the use of the irrigation area traffic scheduling knowledge question-answering system.
CN202210409530.6A 2022-04-19 2022-04-19 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area Active CN114780742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210409530.6A CN114780742B (en) 2022-04-19 2022-04-19 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210409530.6A CN114780742B (en) 2022-04-19 2022-04-19 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area

Publications (2)

Publication Number Publication Date
CN114780742A CN114780742A (en) 2022-07-22
CN114780742B true CN114780742B (en) 2023-02-24

Family

ID=82431536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210409530.6A Active CN114780742B (en) 2022-04-19 2022-04-19 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area

Country Status (1)

Country Link
CN (1) CN114780742B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235219A (en) * 2023-09-15 2023-12-15 宁波市水利水电规划设计研究院有限公司 Reservoir knowledge intelligent question-answering system based on flood prevention demands

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552820A (en) * 2020-04-30 2020-08-18 江河瑞通(北京)技术有限公司 Water engineering scheduling data processing method and device
CN112508442A (en) * 2020-12-18 2021-03-16 湖南大学 Transient stability evaluation method and system based on automation and interpretable machine learning
CN112581172A (en) * 2020-12-18 2021-03-30 四川中电启明星信息技术有限公司 Multi-model fusion electricity sales quantity prediction method based on empirical mode decomposition
CN113918512A (en) * 2021-10-22 2022-01-11 国家电网公司华中分部 Power grid operation rule knowledge graph construction system and method
CN113919886A (en) * 2021-11-11 2022-01-11 重庆邮电大学 Data characteristic combination pricing method and system based on summer pril value and electronic equipment
CN114116915A (en) * 2021-10-28 2022-03-01 天津大学 Hydraulic engineering intelligent map system based on three-dimensional digital platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4170520A4 (en) * 2020-06-17 2023-11-29 The 4th Paradigm Technology Co., Ltd Method and device for constructing knowledge graph, computer device, and storage medium
CN112613720B (en) * 2020-12-17 2023-03-24 湖北工业大学 Reservoir irrigation optimal scheduling method considering multiple uncertainties
CN112668773A (en) * 2020-12-24 2021-04-16 北京百度网讯科技有限公司 Method and device for predicting warehousing traffic and electronic equipment
CN113377966B (en) * 2021-08-11 2021-11-19 长江水利委员会水文局 Water conservancy project scheduling regulation reasoning method based on knowledge graph
CN113918725A (en) * 2021-08-31 2022-01-11 南京中禹智慧水利研究院有限公司 Construction method of knowledge graph in water affairs field
CN114048900A (en) * 2021-11-07 2022-02-15 天津大学 Irrigated area reservoir dispatching management system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552820A (en) * 2020-04-30 2020-08-18 江河瑞通(北京)技术有限公司 Water engineering scheduling data processing method and device
CN112508442A (en) * 2020-12-18 2021-03-16 湖南大学 Transient stability evaluation method and system based on automation and interpretable machine learning
CN112581172A (en) * 2020-12-18 2021-03-30 四川中电启明星信息技术有限公司 Multi-model fusion electricity sales quantity prediction method based on empirical mode decomposition
CN113918512A (en) * 2021-10-22 2022-01-11 国家电网公司华中分部 Power grid operation rule knowledge graph construction system and method
CN114116915A (en) * 2021-10-28 2022-03-01 天津大学 Hydraulic engineering intelligent map system based on three-dimensional digital platform
CN113919886A (en) * 2021-11-11 2022-01-11 重庆邮电大学 Data characteristic combination pricing method and system based on summer pril value and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
数据仓库在都江堰灌区数据中心建设中的应用;宋海瑞等;《计算机工程》;20070505(第09期);全文 *
灌区信息化建设探讨;张泽良等;《山西水利》;20031230(第06期);全文 *

Also Published As

Publication number Publication date
CN114780742A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN110442790B (en) Method, device, server and storage medium for recommending multimedia data
Thangavel et al. Student placement analyzer: A recommendation system using machine learning
CN109960800A (en) Weakly supervised file classification method and device based on Active Learning
CN108920544A (en) A kind of personalized position recommended method of knowledge based map
CN104798043B (en) A kind of data processing method and computer system
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN109543901A (en) Short-Term Load Forecasting Method based on information fusion convolutional neural networks model
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
Song et al. A novel dual path gated recurrent unit model for sea surface salinity prediction
CN105931116A (en) Automated credit scoring system and method based on depth learning mechanism
CN108877905A (en) A kind of medical amount prediction technique of the hospital outpatient based on Xgboost frame
Chen et al. Groundwater level prediction using SOM-RBFN multisite model
CN107368521B (en) Knowledge recommendation method and system based on big data and deep learning
CN106407482B (en) A kind of network academic report category method based on multi-feature fusion
CN106779219A (en) A kind of electricity demand forecasting method and system
CN114780742B (en) Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN105184326A (en) Active learning multi-label social network data analysis method based on graph data
Wang et al. An approach of recursive timing deep belief network for algal bloom forecasting
Wang Fuzzy comprehensive evaluation of physical education based on high dimensional data mining
CN107015965A (en) A kind of Chinese text sentiment analysis device and method
CN107807919A (en) A kind of method for carrying out microblog emotional classification prediction using random walk network is circulated
CN110389932A (en) Electric power automatic document classifying method and device
CN108563720A (en) Big data based on AI recommends learning system and recommends method
Zhao et al. Online distance learning precision service technology based on big data analysis
CN116662860A (en) User portrait and classification method based on energy big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant