CN112434129A - Method and system for generating professional corpus in power grid dispatching field - Google Patents

Method and system for generating professional corpus in power grid dispatching field Download PDF

Info

Publication number
CN112434129A
CN112434129A CN202011314046.2A CN202011314046A CN112434129A CN 112434129 A CN112434129 A CN 112434129A CN 202011314046 A CN202011314046 A CN 202011314046A CN 112434129 A CN112434129 A CN 112434129A
Authority
CN
China
Prior art keywords
corpus
scheduling
professional
generating
regulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011314046.2A
Other languages
Chinese (zh)
Inventor
李洪波
海威
张越
高博
王晓光
乌日恒
单连飞
余建明
刘艳
张连超
乔咏田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia Power(group) Co ltd Power Dispatch Control Branch
Beijing Kedong Electric Power Control System Co Ltd
Original Assignee
Inner Mongolia Power(group) Co ltd Power Dispatch Control Branch
Beijing Kedong Electric Power Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia Power(group) Co ltd Power Dispatch Control Branch, Beijing Kedong Electric Power Control System Co Ltd filed Critical Inner Mongolia Power(group) Co ltd Power Dispatch Control Branch
Priority to CN202011314046.2A priority Critical patent/CN112434129A/en
Publication of CN112434129A publication Critical patent/CN112434129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Primary Health Care (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for generating a professional corpus in the field of power grid dispatching, which comprises the following steps: extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus; the method comprises the steps that a scheduling professional event corpus is generated according to a scheduling professional entity corpus and a service overall operation intention, the corpus of a universal corpus and a scheduling professional corpus is constructed, the corpus comprises proper nouns in the regulation and control field, and the realization of voice recognition and regulation and control voice interaction can be effectively supported; the accuracy rate of extracting and scheduling professional ontology knowledge entities in the control text is more than 95%, the construction of a professional corpus can be well supported, and the effect is far better than that of the existing word segmentation tool. And extracting the scheduling professional knowledge entities in the structured and unstructured data of the control field to form a professional corpus.

Description

Method and system for generating professional corpus in power grid dispatching field
Technical Field
The invention belongs to the technical field of power grid dispatching operation, and particularly relates to a method and a system for generating a professional corpus in the field of power grid dispatching.
Background
The power grid structure is increasingly complex, the regulation and control service volume is increasingly increased, the power grid information query efficiency and the service operation efficiency need to be improved urgently, and the research of the voice assistant which assists a dispatcher in querying power grid data, operating regulation and control services and looking up graphic pictures is of great significance. The power grid intelligent regulation and control assistant comprises three functions of intelligent voice interaction, semantic understanding and dialogue management. The speech recognition and semantic understanding need to train a scheduling professional corpus, and application accuracy is improved. The linguistic data required by the scheduling professional corpus exist in the control field in a structured and unstructured mode, and how to extract the scheduling professional entities in the data and establish the corresponding corpus has important significance for carrying out the work of controlling voice recognition and semantic understanding.
However, the existing word segmentation tools cannot extract proper nouns in the power system, and the existing technology cannot support the construction of a professional corpus in the field of power grid regulation and control.
Disclosure of Invention
The invention aims to provide a method and a system for generating a professional corpus in the field of power grid dispatching, which can generate a corpus required by power grid dispatching.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, a method for generating a professional corpus in a power grid dispatching field is provided, including:
extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
With reference to the first aspect, further, the extracting of the regulatory knowledge includes establishing a regulatory entity recognition model, establishing a regulatory knowledge relationship extraction model, and extracting the regulatory knowledge from the text according to the regulatory entity recognition model and the regulatory knowledge relationship extraction model.
With reference to the first aspect, further, the fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus includes: and performing similarity calculation on various expressions of the control professional term by adopting a text similarity calculation method, forming a mapping relation of different expressions of the same professional term, fusing and mapping the control knowledge, and generating a scheduling professional entity corpus.
With reference to the first aspect, further, the generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention includes:
extracting verbs and nouns from the scheduling professional entity corpus through deep learning to carry out sequential combination, and generating a sentence needing scheduling operation;
and filling the slot position according to the rule to generate a required scheduling operation statement by combining the service operation intention and the set slot position.
With reference to the first aspect, further, the extracted data form of the regulatory knowledge includes structured data, semi-structured data, and unstructured data.
With reference to the first aspect, further, the regulatory entity recognition model is built by using a bidirectional long-short term memory network-conditional random field model.
With reference to the first aspect, further, the extraction of the regulatory knowledge employs a convolutional neural network model.
In a second aspect, a system for generating a professional corpus in a power grid dispatching field is provided, including:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
The beneficial technical effects are as follows:
(1) a corpus of universal corpus and scheduling professional corpus is constructed, the corpus comprises proper nouns in the regulation and control field, and the realization of voice recognition and regulation and control voice interaction can be effectively supported;
(2) the accuracy rate of extracting and scheduling professional ontology knowledge entities in the control text is more than 95%, the construction of a professional corpus can be well supported, and the effect is far better than that of the existing word segmentation tool.
(3) And extracting the scheduling professional knowledge entities in the structured and unstructured data of the control field to form a professional corpus.
Drawings
Fig. 1 is a schematic diagram of a power grid dispatching knowledge extraction process in the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the case of the example 1, the following examples are given,
as shown in fig. 1, the invention provides a method for generating a specialized corpus in the field of power grid scheduling, which comprises the following steps:
step one, extracting the regulation and control knowledge, and fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus.
And the power grid regulation and control knowledge extraction and fusion technology based on the deep learning framework is generated by scheduling the professional entity corpus, and research work is carried out from two aspects of regulation and control knowledge extraction and regulation and control knowledge fusion. In the aspect of regulation knowledge extraction, regulation field data mainly exist in structured, semi-structured and unstructured forms, and knowledge extraction is performed by adopting corresponding methods according to data existing in different forms. The structured data mainly come from a relational database of a regulation and control system, and the data are classified according to application scenes and mapped into a regulation and control knowledge map; adopting rule package to analyze the scheduling regulation, detailed rules and table and list data in the operation instruction book in the semi-structured form, and converting the regulation and control service data in the table into available service operation knowledge; the method comprises the steps of analyzing text data such as non-structural scheduling rules, operation instructions and the like based on an artificial intelligence technology, specifically, establishing a regulation and control entity recognition model based on a bidirectional long and short term memory network-conditional random field (BilSTM-CRF) (not limited to the method), establishing a regulation and control knowledge relation extraction model based on a text convolutional neural network (TextCNN), extracting regulation and control body knowledge in a text according to the model, and predicting other text entities to be regulated and controlled. Based on a text similarity calculation method, such as BM25, bert and other algorithms (not limited to algorithms) for similarity calculation of various expressions of control professional terms, mapping relationships are formed by different expressions of the same professional term, knowledge fusion and mapping are performed on control data, and corpus is expanded by expanding corpus contents on the semantic level. The text similarity algorithm also fuses the regulation and control ontology knowledge with the same expression meaning, eliminates the regulation and control knowledge with content ambiguity and structure redundancy, and adds normalization and rule algorithm modification in the similarity algorithm in order to enhance the accuracy of the fusion of the regulation and control ontology knowledge.
And step two, generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
The method mainly comprises the steps of constructing a scheduling professional event corpus, sequentially combining verbs and nouns of scheduling professional entities extracted based on a deep learning framework network to generate a required scheduling operation short sentence or long sentence, and filling a slot position to generate the operation short sentence or long sentence through a rule (the rule refers to the connection or position relation of the slot position and other phrases) by combining a service overall operation intention and a set slot position (a specific name of an operation object required by the slot position) by a dispatcher.
Example 2
The invention also provides a system for generating the professional corpus in the field of power grid dispatching, which comprises the following steps:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
And scheduling the professional entity corpus to generate various types of entity nouns and operation phrases required by the regulation and control field, scheduling the professional event corpus to generate event intention sentences of operation services of the regulation and control field, wherein the event intention sentences may be scheduling instruction short sentences or long sentences, and finally generating the scheduling field professional corpus of the universal vocabulary corpus and the power grid regulation and control professional corpus.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A method for generating a professional corpus in the field of power grid dispatching is characterized by comprising the following steps:
extracting the control knowledge, and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
2. The power grid scheduling field professional corpus generation system and method as claimed in claim 1, wherein: the extraction of the regulation and control knowledge comprises the steps of establishing a regulation and control entity recognition model, establishing a regulation and control knowledge relation extraction model, and extracting the regulation and control knowledge from a text according to the regulation and control entity recognition model and the regulation and control knowledge relation extraction model.
3. The power grid scheduling field professional corpus generation system and method as claimed in claim 1, wherein: the step of fusing the extracted regulation and control knowledge to generate a scheduling professional entity corpus comprises the following steps: and performing similarity calculation on various expressions of the control professional term by adopting a text similarity calculation method, forming a mapping relation of different expressions of the same professional term, fusing and mapping the control knowledge, and generating a scheduling professional entity corpus.
4. The system and method for generating a specialized corpus of power grid scheduling fields according to claim 1, wherein the generating a scheduling specialized event corpus according to the scheduling specialized entity corpus and the overall operational intention of the business comprises:
extracting verbs and nouns from the scheduling professional entity corpus through deep learning to carry out sequential combination, and generating a sentence needing scheduling operation;
and filling the slot position according to the rule to generate a required scheduling operation statement by combining the service operation intention and the set slot position.
5. The system and method for generating the power grid scheduling field corpus according to claim 1, wherein the extracted regulation and control knowledge includes structured data, semi-structured data and unstructured data in data form.
6. The system and method for generating the power grid scheduling field professional corpus according to claim 2, wherein the regulation entity recognition model is built by a bidirectional long-short term memory network-conditional random field model.
7. The system and method for generating the power grid scheduling field corpus according to claim 2, wherein a convolutional neural network model is adopted for extracting the control knowledge.
8. The utility model provides a power dispatching field specialty corpus generation system which characterized in that includes:
the scheduling professional entity corpus generating module is used for extracting the control knowledge and fusing the extracted control knowledge to generate a scheduling professional entity corpus;
and the scheduling professional event corpus generating module is used for generating a scheduling professional event corpus according to the scheduling professional entity corpus and the overall business operation intention.
CN202011314046.2A 2020-11-20 2020-11-20 Method and system for generating professional corpus in power grid dispatching field Pending CN112434129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011314046.2A CN112434129A (en) 2020-11-20 2020-11-20 Method and system for generating professional corpus in power grid dispatching field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011314046.2A CN112434129A (en) 2020-11-20 2020-11-20 Method and system for generating professional corpus in power grid dispatching field

Publications (1)

Publication Number Publication Date
CN112434129A true CN112434129A (en) 2021-03-02

Family

ID=74693346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011314046.2A Pending CN112434129A (en) 2020-11-20 2020-11-20 Method and system for generating professional corpus in power grid dispatching field

Country Status (1)

Country Link
CN (1) CN112434129A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837148A (en) * 2021-03-03 2021-05-25 中央财经大学 Risk logical relationship quantitative analysis method fusing domain knowledge
CN113689851A (en) * 2021-07-27 2021-11-23 国家电网有限公司 Scheduling professional language understanding system and method
CN114186759A (en) * 2022-02-16 2022-03-15 杭州杰牌传动科技有限公司 Material scheduling control method and system based on reducer knowledge graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169013A1 (en) * 2015-12-11 2017-06-15 Microsoft Technology Licensing, Llc Personalizing Natural Language Understanding Systems
CN110991179A (en) * 2019-11-13 2020-04-10 国网山东省电力公司临沂供电公司 Semantic analysis method based on electric power professional term
CN111078847A (en) * 2019-11-27 2020-04-28 中国南方电网有限责任公司 Power consumer intention identification method and device, computer equipment and storage medium
CN111831792A (en) * 2020-07-03 2020-10-27 国网江苏省电力有限公司信息通信分公司 Electric power knowledge base construction method and system
CN111930774A (en) * 2020-08-06 2020-11-13 全球能源互联网研究院有限公司 Automatic construction method and system for power knowledge graph ontology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169013A1 (en) * 2015-12-11 2017-06-15 Microsoft Technology Licensing, Llc Personalizing Natural Language Understanding Systems
CN110991179A (en) * 2019-11-13 2020-04-10 国网山东省电力公司临沂供电公司 Semantic analysis method based on electric power professional term
CN111078847A (en) * 2019-11-27 2020-04-28 中国南方电网有限责任公司 Power consumer intention identification method and device, computer equipment and storage medium
CN111831792A (en) * 2020-07-03 2020-10-27 国网江苏省电力有限公司信息通信分公司 Electric power knowledge base construction method and system
CN111930774A (en) * 2020-08-06 2020-11-13 全球能源互联网研究院有限公司 Automatic construction method and system for power knowledge graph ontology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余建明等: "面向智能调控领域的知识图谱构建与应用", 电力系统保护与控制, pages 29 - 36 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837148A (en) * 2021-03-03 2021-05-25 中央财经大学 Risk logical relationship quantitative analysis method fusing domain knowledge
CN112837148B (en) * 2021-03-03 2023-06-23 中央财经大学 Risk logic relationship quantitative analysis method integrating domain knowledge
CN113689851A (en) * 2021-07-27 2021-11-23 国家电网有限公司 Scheduling professional language understanding system and method
CN113689851B (en) * 2021-07-27 2024-02-02 国家电网有限公司 Scheduling professional language understanding system and method
CN114186759A (en) * 2022-02-16 2022-03-15 杭州杰牌传动科技有限公司 Material scheduling control method and system based on reducer knowledge graph

Similar Documents

Publication Publication Date Title
CN112434129A (en) Method and system for generating professional corpus in power grid dispatching field
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
Schuler et al. Broad-coverage parsing using human-like memory constraints
CN113987104B (en) Generating type event extraction method based on ontology guidance
CN110277086A (en) Phoneme synthesizing method, system and electronic equipment based on dispatching of power netwoks knowledge mapping
Jiang et al. Natural language processing and its applications in machine translation: A diachronic review
Khan et al. Extracting Spatial Information From Place Descriptions
CN111260338B (en) Intelligent generation method, device and platform for substation operation ticket
CN112559760B (en) CPS (cyber physical system) resource capacity knowledge graph construction method for text description
CN113869040A (en) Voice recognition method for power grid dispatching
CN116521889A (en) Deep learning-based power grid dispatching comprehensive decision determining method and system
Tang et al. Tourism domain ontology construction from the unstructured text documents
CN103678607B (en) A kind of construction method of Emotion tagging system
Bajwa et al. A rule based system for speech language context understanding
Ren Networked artificial intelligence English translation system based on an intelligent knowledge base and translation method thereof
CN115033705A (en) Power grid regulation and control risk early warning information knowledge graph design method and system
Talita et al. Challenges in building domain ontology for minority languages
CN102147731A (en) Automatic functional requirement extraction system based on extended functional requirement description framework
CN112446203A (en) Method for generating architecture transformation grindable standard clause structure
Hachey Recognising clauses using symbolic and machine learning approaches
Özbal et al. Evaluating the impact of syntax and semantics on emotion recognition from text
Hu et al. Semantic sequence labeling model of power dispatching based on deep long short term memory network
Ali et al. AI-Natural Language Processing (NLP)
Sharma et al. Architecture and Types of Intelligent Agent and Uses of Various Technologies
Zhao Design of Intelligent Proofreading System Based on Artificial Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination