CN112434129A - Method and system for generating professional corpus in power grid dispatching field - Google Patents
Method and system for generating professional corpus in power grid dispatching field Download PDFInfo
- Publication number
- CN112434129A CN112434129A CN202011314046.2A CN202011314046A CN112434129A CN 112434129 A CN112434129 A CN 112434129A CN 202011314046 A CN202011314046 A CN 202011314046A CN 112434129 A CN112434129 A CN 112434129A
- Authority
- CN
- China
- Prior art keywords
- corpus
- scheduling
- professional
- generating
- regulation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000000605 extraction Methods 0.000 claims description 12
- 230000014509 gene expression Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000015654 memory Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 230000003993 interaction Effects 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract description 3
- 230000001105 regulatory effect Effects 0.000 description 10
- 230000004927 fusion Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Primary Health Care (AREA)
- Animal Behavior & Ethology (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for generating a professional corpus in the field of power grid dispatching, which comprises the following steps: extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus; the method comprises the steps that a scheduling professional event corpus is generated according to a scheduling professional entity corpus and a service overall operation intention, the corpus of a universal corpus and a scheduling professional corpus is constructed, the corpus comprises proper nouns in the regulation and control field, and the realization of voice recognition and regulation and control voice interaction can be effectively supported; the accuracy rate of extracting and scheduling professional ontology knowledge entities in the control text is more than 95%, the construction of a professional corpus can be well supported, and the effect is far better than that of the existing word segmentation tool. And extracting the scheduling professional knowledge entities in the structured and unstructured data of the control field to form a professional corpus.
Description
Technical Field
The invention belongs to the technical field of power grid dispatching operation, and particularly relates to a method and a system for generating a professional corpus in the field of power grid dispatching.
Background
The power grid structure is increasingly complex, the regulation and control service volume is increasingly increased, the power grid information query efficiency and the service operation efficiency need to be improved urgently, and the research of the voice assistant which assists a dispatcher in querying power grid data, operating regulation and control services and looking up graphic pictures is of great significance. The power grid intelligent regulation and control assistant comprises three functions of intelligent voice interaction, semantic understanding and dialogue management. The speech recognition and semantic understanding need to train a scheduling professional corpus, and application accuracy is improved. The linguistic data required by the scheduling professional corpus exist in the control field in a structured and unstructured mode, and how to extract the scheduling professional entities in the data and establish the corresponding corpus has important significance for carrying out the work of controlling voice recognition and semantic understanding.
However, the existing word segmentation tools cannot extract proper nouns in the power system, and the existing technology cannot support the construction of a professional corpus in the field of power grid regulation and control.
Disclosure of Invention
The invention aims to provide a method and a system for generating a professional corpus in the field of power grid dispatching, which can generate a corpus required by power grid dispatching.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, a method for generating a professional corpus in a power grid dispatching field is provided, including:
extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
With reference to the first aspect, further, the extracting of the regulatory knowledge includes establishing a regulatory entity recognition model, establishing a regulatory knowledge relationship extraction model, and extracting the regulatory knowledge from the text according to the regulatory entity recognition model and the regulatory knowledge relationship extraction model.
With reference to the first aspect, further, the fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus includes: and performing similarity calculation on various expressions of the control professional term by adopting a text similarity calculation method, forming a mapping relation of different expressions of the same professional term, fusing and mapping the control knowledge, and generating a scheduling professional entity corpus.
With reference to the first aspect, further, the generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention includes:
extracting verbs and nouns from the scheduling professional entity corpus through deep learning to carry out sequential combination, and generating a sentence needing scheduling operation;
and filling the slot position according to the rule to generate a required scheduling operation statement by combining the service operation intention and the set slot position.
With reference to the first aspect, further, the extracted data form of the regulatory knowledge includes structured data, semi-structured data, and unstructured data.
With reference to the first aspect, further, the regulatory entity recognition model is built by using a bidirectional long-short term memory network-conditional random field model.
With reference to the first aspect, further, the extraction of the regulatory knowledge employs a convolutional neural network model.
In a second aspect, a system for generating a professional corpus in a power grid dispatching field is provided, including:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
The beneficial technical effects are as follows:
(1) a corpus of universal corpus and scheduling professional corpus is constructed, the corpus comprises proper nouns in the regulation and control field, and the realization of voice recognition and regulation and control voice interaction can be effectively supported;
(2) the accuracy rate of extracting and scheduling professional ontology knowledge entities in the control text is more than 95%, the construction of a professional corpus can be well supported, and the effect is far better than that of the existing word segmentation tool.
(3) And extracting the scheduling professional knowledge entities in the structured and unstructured data of the control field to form a professional corpus.
Drawings
Fig. 1 is a schematic diagram of a power grid dispatching knowledge extraction process in the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the case of the example 1, the following examples are given,
as shown in fig. 1, the invention provides a method for generating a specialized corpus in the field of power grid scheduling, which comprises the following steps:
step one, extracting the regulation and control knowledge, and fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus.
And the power grid regulation and control knowledge extraction and fusion technology based on the deep learning framework is generated by scheduling the professional entity corpus, and research work is carried out from two aspects of regulation and control knowledge extraction and regulation and control knowledge fusion. In the aspect of regulation knowledge extraction, regulation field data mainly exist in structured, semi-structured and unstructured forms, and knowledge extraction is performed by adopting corresponding methods according to data existing in different forms. The structured data mainly come from a relational database of a regulation and control system, and the data are classified according to application scenes and mapped into a regulation and control knowledge map; adopting rule package to analyze the scheduling regulation, detailed rules and table and list data in the operation instruction book in the semi-structured form, and converting the regulation and control service data in the table into available service operation knowledge; the method comprises the steps of analyzing text data such as non-structural scheduling rules, operation instructions and the like based on an artificial intelligence technology, specifically, establishing a regulation and control entity recognition model based on a bidirectional long and short term memory network-conditional random field (BilSTM-CRF) (not limited to the method), establishing a regulation and control knowledge relation extraction model based on a text convolutional neural network (TextCNN), extracting regulation and control body knowledge in a text according to the model, and predicting other text entities to be regulated and controlled. Based on a text similarity calculation method, such as BM25, bert and other algorithms (not limited to algorithms) for similarity calculation of various expressions of control professional terms, mapping relationships are formed by different expressions of the same professional term, knowledge fusion and mapping are performed on control data, and corpus is expanded by expanding corpus contents on the semantic level. The text similarity algorithm also fuses the regulation and control ontology knowledge with the same expression meaning, eliminates the regulation and control knowledge with content ambiguity and structure redundancy, and adds normalization and rule algorithm modification in the similarity algorithm in order to enhance the accuracy of the fusion of the regulation and control ontology knowledge.
And step two, generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
The method mainly comprises the steps of constructing a scheduling professional event corpus, sequentially combining verbs and nouns of scheduling professional entities extracted based on a deep learning framework network to generate a required scheduling operation short sentence or long sentence, and filling a slot position to generate the operation short sentence or long sentence through a rule (the rule refers to the connection or position relation of the slot position and other phrases) by combining a service overall operation intention and a set slot position (a specific name of an operation object required by the slot position) by a dispatcher.
Example 2
The invention also provides a system for generating the professional corpus in the field of power grid dispatching, which comprises the following steps:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
And scheduling the professional entity corpus to generate various types of entity nouns and operation phrases required by the regulation and control field, scheduling the professional event corpus to generate event intention sentences of operation services of the regulation and control field, wherein the event intention sentences may be scheduling instruction short sentences or long sentences, and finally generating the scheduling field professional corpus of the universal vocabulary corpus and the power grid regulation and control professional corpus.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (8)
1. A method for generating a professional corpus in the field of power grid dispatching is characterized by comprising the following steps:
extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
2. The power grid scheduling field professional corpus generation system and method as claimed in claim 1, wherein: the extraction of the regulation and control knowledge comprises the steps of establishing a regulation and control entity recognition model, establishing a regulation and control knowledge relation extraction model, and extracting the regulation and control knowledge from a text according to the regulation and control entity recognition model and the regulation and control knowledge relation extraction model.
3. The power grid scheduling field professional corpus generation system and method as claimed in claim 1, wherein: the step of fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus comprises the following steps: and performing similarity calculation on various expressions of the control professional term by adopting a text similarity calculation method, forming a mapping relation of different expressions of the same professional term, fusing and mapping the control knowledge, and generating a scheduling professional entity corpus.
4. The system and method for generating a specialized corpus of power grid scheduling fields according to claim 1, wherein the generating a scheduling specialized event corpus according to the scheduling specialized entity corpus and the overall operational intention of the business comprises:
extracting verbs and nouns from the scheduling professional entity corpus through deep learning to carry out sequential combination, and generating a sentence needing scheduling operation;
and filling the slot position according to the rule to generate a required scheduling operation statement by combining the service operation intention and the set slot position.
5. The system and method for generating the power grid scheduling field corpus according to claim 1, wherein the extracted regulation and control knowledge includes structured data, semi-structured data and unstructured data in data form.
6. The system and method for generating the power grid scheduling field professional corpus according to claim 2, wherein the regulation entity recognition model is built by a bidirectional long-short term memory network-conditional random field model.
7. The system and method for generating the power grid scheduling field corpus according to claim 2, wherein a convolutional neural network model is adopted for extracting the control knowledge.
8. The utility model provides a power dispatching field specialty corpus generation system which characterized in that includes:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011314046.2A CN112434129A (en) | 2020-11-20 | 2020-11-20 | Method and system for generating professional corpus in power grid dispatching field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011314046.2A CN112434129A (en) | 2020-11-20 | 2020-11-20 | Method and system for generating professional corpus in power grid dispatching field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112434129A true CN112434129A (en) | 2021-03-02 |
Family
ID=74693346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011314046.2A Pending CN112434129A (en) | 2020-11-20 | 2020-11-20 | Method and system for generating professional corpus in power grid dispatching field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112434129A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112837148A (en) * | 2021-03-03 | 2021-05-25 | 中央财经大学 | Risk logical relationship quantitative analysis method fusing domain knowledge |
CN113689851A (en) * | 2021-07-27 | 2021-11-23 | 国家电网有限公司 | Scheduling professional language understanding system and method |
CN114186759A (en) * | 2022-02-16 | 2022-03-15 | 杭州杰牌传动科技有限公司 | Material scheduling control method and system based on reducer knowledge graph |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170169013A1 (en) * | 2015-12-11 | 2017-06-15 | Microsoft Technology Licensing, Llc | Personalizing Natural Language Understanding Systems |
CN110991179A (en) * | 2019-11-13 | 2020-04-10 | 国网山东省电力公司临沂供电公司 | Semantic analysis method based on electric power professional term |
CN111078847A (en) * | 2019-11-27 | 2020-04-28 | 中国南方电网有限责任公司 | Power consumer intention identification method and device, computer equipment and storage medium |
CN111831792A (en) * | 2020-07-03 | 2020-10-27 | 国网江苏省电力有限公司信息通信分公司 | Electric power knowledge base construction method and system |
CN111930774A (en) * | 2020-08-06 | 2020-11-13 | 全球能源互联网研究院有限公司 | Automatic construction method and system for power knowledge graph ontology |
-
2020
- 2020-11-20 CN CN202011314046.2A patent/CN112434129A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170169013A1 (en) * | 2015-12-11 | 2017-06-15 | Microsoft Technology Licensing, Llc | Personalizing Natural Language Understanding Systems |
CN110991179A (en) * | 2019-11-13 | 2020-04-10 | 国网山东省电力公司临沂供电公司 | Semantic analysis method based on electric power professional term |
CN111078847A (en) * | 2019-11-27 | 2020-04-28 | 中国南方电网有限责任公司 | Power consumer intention identification method and device, computer equipment and storage medium |
CN111831792A (en) * | 2020-07-03 | 2020-10-27 | 国网江苏省电力有限公司信息通信分公司 | Electric power knowledge base construction method and system |
CN111930774A (en) * | 2020-08-06 | 2020-11-13 | 全球能源互联网研究院有限公司 | Automatic construction method and system for power knowledge graph ontology |
Non-Patent Citations (1)
Title |
---|
余建明等: "面向智能调控领域的知识图谱构建与应用", 电力系统保护与控制, pages 29 - 36 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112837148A (en) * | 2021-03-03 | 2021-05-25 | 中央财经大学 | Risk logical relationship quantitative analysis method fusing domain knowledge |
CN112837148B (en) * | 2021-03-03 | 2023-06-23 | 中央财经大学 | Risk logic relationship quantitative analysis method integrating domain knowledge |
CN113689851A (en) * | 2021-07-27 | 2021-11-23 | 国家电网有限公司 | Scheduling professional language understanding system and method |
CN113689851B (en) * | 2021-07-27 | 2024-02-02 | 国家电网有限公司 | Scheduling professional language understanding system and method |
CN114186759A (en) * | 2022-02-16 | 2022-03-15 | 杭州杰牌传动科技有限公司 | Material scheduling control method and system based on reducer knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112434129A (en) | Method and system for generating professional corpus in power grid dispatching field | |
CN104361127B (en) | The multilingual quick constructive method of question and answer interface based on domain body and template logic | |
Schuler et al. | Broad-coverage parsing using human-like memory constraints | |
CN113987104B (en) | Generating type event extraction method based on ontology guidance | |
CN110277086A (en) | Phoneme synthesizing method, system and electronic equipment based on dispatching of power netwoks knowledge mapping | |
Jiang et al. | Natural language processing and its applications in machine translation: A diachronic review | |
Khan et al. | Extracting Spatial Information From Place Descriptions | |
CN111260338B (en) | Intelligent generation method, device and platform for substation operation ticket | |
CN112559760B (en) | CPS (cyber physical system) resource capacity knowledge graph construction method for text description | |
CN113869040A (en) | Voice recognition method for power grid dispatching | |
CN116521889A (en) | Deep learning-based power grid dispatching comprehensive decision determining method and system | |
Tang et al. | Tourism domain ontology construction from the unstructured text documents | |
CN103678607B (en) | A kind of construction method of Emotion tagging system | |
Bajwa et al. | A rule based system for speech language context understanding | |
Ren | Networked artificial intelligence English translation system based on an intelligent knowledge base and translation method thereof | |
CN115033705A (en) | Power grid regulation and control risk early warning information knowledge graph design method and system | |
Talita et al. | Challenges in building domain ontology for minority languages | |
CN102147731A (en) | Automatic functional requirement extraction system based on extended functional requirement description framework | |
CN112446203A (en) | Method for generating architecture transformation grindable standard clause structure | |
Hachey | Recognising clauses using symbolic and machine learning approaches | |
Özbal et al. | Evaluating the impact of syntax and semantics on emotion recognition from text | |
Hu et al. | Semantic sequence labeling model of power dispatching based on deep long short term memory network | |
Ali et al. | AI-Natural Language Processing (NLP) | |
Sharma et al. | Architecture and Types of Intelligent Agent and Uses of Various Technologies | |
Zhao | Design of Intelligent Proofreading System Based on Artificial Intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |