CN116610818A - Construction method and system of power transmission and transformation project knowledge base - Google Patents

Construction method and system of power transmission and transformation project knowledge base Download PDF

Info

Publication number
CN116610818A
CN116610818A CN202310652097.3A CN202310652097A CN116610818A CN 116610818 A CN116610818 A CN 116610818A CN 202310652097 A CN202310652097 A CN 202310652097A CN 116610818 A CN116610818 A CN 116610818A
Authority
CN
China
Prior art keywords
power transmission
data
knowledge
transformation
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310652097.3A
Other languages
Chinese (zh)
Inventor
刘洋
刘士进
王俊
密兴峰
胡瑞通
冯敏
张文
时旭
郝鹏海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Information and Communication Technology Co
Original Assignee
Nari Information and Communication Technology Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Information and Communication Technology Co filed Critical Nari Information and Communication Technology Co
Priority to CN202310652097.3A priority Critical patent/CN116610818A/en
Publication of CN116610818A publication Critical patent/CN116610818A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the technical field of electric power, and particularly relates to a construction method and a construction system of a power transmission and transformation project knowledge base. The application discloses a construction method and a system of a power transmission and transformation project knowledge base, wherein the construction method comprises the following steps: collecting all data of the power transmission and transformation project, and collecting and storing the data; filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data; classifying according to six major power transmission and transformation projects by using a capsule network; outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table; and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary to determine the specific flow and scene corresponding to the unstructured text uploaded by the user, thereby solving the problems of large information loss and difficult fine processing and recording of knowledge during data extraction.

Description

Construction method and system of power transmission and transformation project knowledge base
Technical Field
The application belongs to the technical field of electric power, and particularly relates to a construction method and a construction system of a power transmission and transformation project knowledge base.
Background
Knowledge base construction is a popular research direction in the field of artificial intelligence today. The knowledge database is a knowledge database which is specially used for storing information such as facts, concepts, rules and the like, and can provide important support for applications in the artificial intelligence fields such as machine learning, natural language processing, search engines and the like.
Knowledge base construction involves a number of data processing and semantic analysis techniques. Data needs to be collected from various sources such as the internet, databases, literature, etc. These data then need to be cleaned and preprocessed to remove duplicates, errors, and useless information. The data is then converted into a structured knowledge representation, including entities, attributes, and relationships, using semantic analysis and knowledge extraction techniques. And finally, carrying out knowledge fusion and reasoning, integrating knowledge from different sources into a complete knowledge base, and carrying out knowledge reasoning and reasoning by using a reasoning technology.
Knowledge base construction involves techniques in a number of fields, such as natural language processing, information retrieval, machine learning, semantic web, etc. Among them, natural language processing technology is one of core technologies for knowledge base construction. Through natural language processing technology, natural language text can be converted into structured knowledge representation, so that knowledge extraction and knowledge representation are realized. Most knowledge base construction methods acquire related knowledge files on a designated website through a crawler program, then analyze the related knowledge files, store the structured knowledge files in a relational database, and then display the data in a system so as to facilitate staff inquiry and learning. This approach has significant limitations, in that the use of a crawler to obtain relevant knowledge files in a given website may pose a risk in terms of intellectual property. Secondly, because the document files in the power transmission and transformation project are mainly unstructured, the problems that the loss of extracted information is large, the conversion of information to knowledge is not complete and insufficient and the like are caused by extracting and mining knowledge in a text in a traditional structured data mode, and the text needs to be preprocessed even if the document files are available.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present application has been made in view of the above-described problems.
Therefore, the technical problems solved by the application are as follows: the traditional structured data method for extracting and mining the knowledge in the text can cause great loss of extracted information and insufficient conversion of the information into the knowledge, and the existing knowledge base construction can cause difficult problems for follow-up knowledge fine recording and processing under the condition of numerous and complex knowledge contents and formats.
In order to solve the technical problems, the application provides the following technical scheme: a construction method of a power transmission and transformation project knowledge base comprises the following steps:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the data acquisition comprises the steps of acquiring user uploading data, wherein the data are classified into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
and filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the pretreatment comprises the following steps:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the classification includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
the low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first, the fusion matrix is converted into capsules u by a nonlinear activation function i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the classifying further comprises: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
the dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the relationship between the tag sequence and the entity comprises:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the comparison includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
The application also aims to provide a construction system of the power transmission and transformation project knowledge base, which can collect increment and full unstructured documents uploaded by a user on a power transmission and transformation project platform through a multi-source heterogeneous collector technology, extract knowledge, identify named entities, convert the unstructured documents into a power transmission and transformation project business field table, compare the unstructured documents with a power transmission and transformation project dictionary to obtain scenes and processes corresponding to the documents, finish construction of an unstructured text knowledge system, and solve the problems of large information loss and difficult refinement and recording of knowledge during data extraction.
In order to solve the technical problems, the application provides the following technical scheme: the system for constructing the power transmission and transformation project knowledge base comprises an explicit knowledge collector, an implicit knowledge collector, a primary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation project dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of a method of constructing a knowledge base of power transmission and transformation engineering projects.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of a method of constructing a knowledge base of power transmission and transformation engineering projects.
The application has the beneficial effects that: the application provides a construction method of a knowledge base of a power transmission and transformation project, which is based on a multi-source heterogeneous data acquisition technology to carefully record knowledge in the power transmission and transformation project, to induct and comb scattered knowledge and convert the scattered knowledge into structured data, and to combine with business association labels to form a knowledge standard base of the power transmission and transformation project, and to unify standard system and specification to support decision making and management; the preprocessed text is classified according to six major power transmission and transformation projects by using a capsule network, semantic features of different parts in the text are learned through a capsule layer, and relations among the different features are calculated by using a dynamic routing algorithm, so that context information in the text is captured better, an intelligent power transmission and transformation project knowledge sharing system is built based on a SpringCloud micro-service architecture, and the intelligent power transmission and transformation project knowledge sharing system mainly provides functions of one-stop intelligent retrieval of latest knowledge of power transmission and transformation projects, power transmission and transformation project knowledge association recommendation functions, intelligent questions and answers, knowledge forums and the like for users and helps to quickly draw knowledge in the power transmission and transformation project field. The manager is helped to better know the condition of the whole power transmission and transformation project, so that a more accurate decision is made, the manager can be helped to find and solve the problem in time, the project quality is improved, lean management is realized, and the sustainable development of an electric power enterprise is ensured; the knowledge sharing system of the power transmission and transformation project is constructed, the digital, intelligent and informationized development of the power industry is promoted, the overall level and competitiveness of the power industry are improved, the social resource allocation is optimized through data analysis and mining, and the waste and redundancy are reduced, so that the utilization efficiency of the social resources is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a flowchart of a method for constructing a knowledge base of power transmission and transformation project according to an embodiment of the present application;
FIG. 2 is a block diagram of a system for constructing a knowledge base of power transmission and transformation project according to an embodiment of the present application;
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, for one embodiment of the present application, a method for constructing a knowledge base of power transmission and transformation project is provided, including:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
S1: collecting all data of the power transmission and transformation project, and collecting and storing the data;
further, the data acquisition comprises the steps of acquiring user uploading data, wherein the data is divided into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
it should be noted that, the collection of the implicit knowledge in the power transmission and transformation project mainly comprises the online word segmentation, meaning extraction and content labeling of knowledge content, and labeling of paragraph knowledge with characteristic attributes such as labels.
And filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
S2: filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
still further, the preprocessing includes:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
It should be noted that, because of the specificity of the chinese language, the chinese text classification needs to take into account the problem of chinese word segmentation in particular, so that the chinese word segmentation technique is adopted in the preprocessing, and the classification effect is improved by adjusting the word segmentation granularity (the system adopts the word granularity).
S3: classifying according to six major power transmission and transformation projects by using a capsule network;
still further, the classifying includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
it should be noted that unlike conventional convolutional neural networks, capsule networks improve the generalization ability and robustness of the model by introducing Capsule layers (Capsule layers) instead of convolutional layers and pooling layers. The text is classified and predicted according to six major professions of power transmission and transformation engineering by using a capsule network, the capsule network can effectively extract semantic information implied in the context, and compared with a traditional classifier, the capsule network has stronger expression capability and consists of low-layer capsules and high-layer capsules representing different categories.
The low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first byNon-linear activation function converts fusion matrix into capsule u i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
Still further, the classifying further includes: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
it should be noted that the capsule layer is made up of a plurality of capsules, each of which can learn the semantic features of a different part of the text, e.g. one capsule can learn the features of one word in the text and another capsule can learn the features of one phrase in the text.
The dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
S4: outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
still further, the tag sequence and entity relationship includes:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
S5: and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
Still further, the aligning includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
It should be noted that the power transmission and transformation dictionary comparator is mainly based on a power transmission and transformation engineering dictionary of a power transmission and transformation engineering platform, wherein the whole process management and professional function management noun abbreviation, noun explanation, english abbreviation, data type and standardized service scene of a power grid project related to the digitized application and construction of the power transmission and transformation engineering are specified.
Example 2
A second embodiment of the application, which differs from the previous embodiment, is:
the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Example 3
Referring to fig. 2, a third embodiment of the present application provides a method system for constructing a knowledge base of power transmission and transformation project, including: the system comprises an explicit knowledge collector, an implicit knowledge collector, a preliminary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation engineering dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
Example 4
In order to verify the beneficial effects of the application, scientific demonstration is carried out through economic benefit calculation and experiments.
The following section provides a specific knowledge in a knowledge base of a power transmission and transformation project through user uploading documents, which comprises the following steps:
and constructing a knowledge base of power transmission and transformation project, taking a document uploaded by a user of a certain power transmission and transformation project as an example.
(1) And (3) data acquisition:
and (3) explicit knowledge, namely collecting 500 documents uploaded by a user, wherein 400 parts are unstructured data, and 100 parts are structured data.
And (3) marking paragraph labels and expansion labels on the uploaded documents by referring to related specifications and standards to obtain knowledge content corresponding to each paragraph.
(2) Pretreatment:
filtering nonstandard data, namely finding 6 unstructured data format errors and filtering; 5 parts of structured data information are incomplete and filtered.
And word segmentation, namely performing word segmentation on 489 documents to obtain 30000 words in a word stock, performing word vector conversion on 10000 words in the word stock, and listing the rest words in a knowledge base for construction.
Word vector selecting word2vec word vector model, setting word vector dimension as 200, learning rate as 0.025, and iteration number as 5 to obtain word vector of 10000 words in word bank.
(3) Classification:
setting 6 categories, namely project, technology, quality, progress, technological process and safety.
The capsule network parameter settings are shown in the following table:
TABLE 1
Parameter setting Value taking
Number of iterations 20
Batch quantity 64
Optimizer Adam
Learning rate 5e-4
Dropout Rate 0.5
Word vector dimension 300
Hidden layer unit 256
The environment is shown in the following table:
TABLE 2
Hardware device Specification of specification
Processor and method for controlling the same Intercore i7-9700
Processor main frequency 3.0GHz
Memory 16GB
Software environment Python 3.6
Deep learning frame TensorFlow 1.14.0
After 20 iterations, 489 parts of document classification results, namely 219 parts of project, 125 parts of technology, 58 parts of quality, 36 parts of progress, 28 parts of technology and 23 parts of safety.
(4) CRF and comparison:
and obtaining the entity and entity relation of the document, and comparing the entity and the entity relation with a dictionary to obtain a flow scene.
And comparing the business process field table with the project knowledge dictionary to obtain a specific process and a scene corresponding to each document, and completing knowledge system construction.
(5) Evaluation:
the real category comprises 221 parts of project, 127 parts of technology, 60 parts of quality, 38 parts of progress, 30 parts of technology, and 24 parts of safety.
The accuracy is 99.1 percent of project, 98.4 percent of technology, 96.7 percent of quality, 94.7 percent of progress, 83.3 percent of technology and 95.8 percent of safety.
The average accuracy is 96.7%, and the practical requirement is met.
Therefore, the method can well construct a knowledge base of the power transmission and transformation project, has ideal classification effect and accuracy, and proves that the method has practical value.
It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered by the scope of the claims of the present application.

Claims (10)

1. The construction method of the power transmission and transformation project knowledge base is characterized by comprising the following steps of:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
2. The method for constructing the knowledge base of the power transmission and transformation project according to claim 1, which is characterized in that: the data acquisition comprises the steps of acquiring user uploading data, wherein the data are classified into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
and filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
3. The method for constructing the knowledge base of the power transmission and transformation project according to claim 2, which is characterized in that: the pretreatment comprises the following steps:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
4. A method for constructing a knowledge base of power transmission and transformation project according to claim 3, characterized in that: the classification includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
the low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first, the fusion matrix is converted into capsules u by a nonlinear activation function i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
5. The method for constructing a knowledge base of power transmission and transformation project according to claim 4, wherein the classifying further comprises: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
the dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
6. The method for constructing a knowledge base of power transmission and transformation project according to claim 5, wherein the relationship between the tag sequence and the entity comprises:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
7. The method for constructing the knowledge base of the power transmission and transformation project according to claim 6, which is characterized in that: the comparison includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
8. A system employing a method for constructing a knowledge base of power transmission and transformation project according to any one of claims 1 to 7, characterized in that: the system comprises an explicit knowledge collector, an implicit knowledge collector, a preliminary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation engineering dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any of claims 1 to 7 when executed by a processor.
CN202310652097.3A 2023-06-05 2023-06-05 Construction method and system of power transmission and transformation project knowledge base Pending CN116610818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310652097.3A CN116610818A (en) 2023-06-05 2023-06-05 Construction method and system of power transmission and transformation project knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310652097.3A CN116610818A (en) 2023-06-05 2023-06-05 Construction method and system of power transmission and transformation project knowledge base

Publications (1)

Publication Number Publication Date
CN116610818A true CN116610818A (en) 2023-08-18

Family

ID=87676344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310652097.3A Pending CN116610818A (en) 2023-06-05 2023-06-05 Construction method and system of power transmission and transformation project knowledge base

Country Status (1)

Country Link
CN (1) CN116610818A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076991A (en) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment
CN117151117A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076991A (en) * 2023-10-16 2023-11-17 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment
CN117076991B (en) * 2023-10-16 2024-01-02 云境商务智能研究院南京有限公司 Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment
CN117151117A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content
CN117151117B (en) * 2023-10-30 2024-03-01 国网浙江省电力有限公司营销服务中心 Automatic identification method, device and medium for power grid lightweight unstructured document content

Similar Documents

Publication Publication Date Title
CN110717047B (en) Web service classification method based on graph convolution neural network
CN106445988A (en) Intelligent big data processing method and system
CN106447066A (en) Big data feature extraction method and device
CN116610818A (en) Construction method and system of power transmission and transformation project knowledge base
CN115878904A (en) Intellectual property personalized recommendation method, system and medium based on deep learning
CN113961685A (en) Information extraction method and device
CN111158641B (en) Automatic recognition method for transaction function points based on semantic analysis and text mining
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN112559684A (en) Keyword extraction and information retrieval method
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN116257759A (en) Structured data intelligent classification grading system of deep neural network model
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN118035440A (en) Enterprise associated archive management target knowledge feature recommendation method
CN114637846A (en) Video data processing method, video data processing device, computer equipment and storage medium
CN113051886A (en) Test question duplicate checking method and device, storage medium and equipment
CN111782964B (en) Recommendation method of community posts
CN113641788A (en) Unsupervised long-short shadow evaluation fine-grained viewpoint mining method
CN113177164A (en) Multi-platform collaborative new media content monitoring and management system based on big data
CN112270185A (en) Text representation method based on topic model
CN111291182A (en) Hotspot event discovery method, device, equipment and storage medium
TU Online Text Retrieval Method Based on Convolution Neural Network.
CN110609961A (en) Collaborative filtering recommendation method based on word embedding
CN113220855B (en) Computer technology field development trend analysis method based on IT technical question-answering website

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination