CN116610818A - Construction method and system of power transmission and transformation project knowledge base - Google Patents
Construction method and system of power transmission and transformation project knowledge base Download PDFInfo
- Publication number
- CN116610818A CN116610818A CN202310652097.3A CN202310652097A CN116610818A CN 116610818 A CN116610818 A CN 116610818A CN 202310652097 A CN202310652097 A CN 202310652097A CN 116610818 A CN116610818 A CN 116610818A
- Authority
- CN
- China
- Prior art keywords
- power transmission
- data
- knowledge
- transformation
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 109
- 230000009466 transformation Effects 0.000 title claims abstract description 109
- 238000010276 construction Methods 0.000 title claims abstract description 40
- 239000002775 capsule Substances 0.000 claims abstract description 89
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000001914 filtration Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 42
- 238000000034 method Methods 0.000 claims description 34
- 238000005516 engineering process Methods 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 21
- 230000008859 change Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000012216 screening Methods 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 230000008878 coupling Effects 0.000 claims description 9
- 238000010168 coupling process Methods 0.000 claims description 9
- 238000005859 coupling reaction Methods 0.000 claims description 9
- 238000012544 monitoring process Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013075 data extraction Methods 0.000 abstract description 2
- 238000009411 base construction Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application belongs to the technical field of electric power, and particularly relates to a construction method and a construction system of a power transmission and transformation project knowledge base. The application discloses a construction method and a system of a power transmission and transformation project knowledge base, wherein the construction method comprises the following steps: collecting all data of the power transmission and transformation project, and collecting and storing the data; filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data; classifying according to six major power transmission and transformation projects by using a capsule network; outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table; and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary to determine the specific flow and scene corresponding to the unstructured text uploaded by the user, thereby solving the problems of large information loss and difficult fine processing and recording of knowledge during data extraction.
Description
Technical Field
The application belongs to the technical field of electric power, and particularly relates to a construction method and a construction system of a power transmission and transformation project knowledge base.
Background
Knowledge base construction is a popular research direction in the field of artificial intelligence today. The knowledge database is a knowledge database which is specially used for storing information such as facts, concepts, rules and the like, and can provide important support for applications in the artificial intelligence fields such as machine learning, natural language processing, search engines and the like.
Knowledge base construction involves a number of data processing and semantic analysis techniques. Data needs to be collected from various sources such as the internet, databases, literature, etc. These data then need to be cleaned and preprocessed to remove duplicates, errors, and useless information. The data is then converted into a structured knowledge representation, including entities, attributes, and relationships, using semantic analysis and knowledge extraction techniques. And finally, carrying out knowledge fusion and reasoning, integrating knowledge from different sources into a complete knowledge base, and carrying out knowledge reasoning and reasoning by using a reasoning technology.
Knowledge base construction involves techniques in a number of fields, such as natural language processing, information retrieval, machine learning, semantic web, etc. Among them, natural language processing technology is one of core technologies for knowledge base construction. Through natural language processing technology, natural language text can be converted into structured knowledge representation, so that knowledge extraction and knowledge representation are realized. Most knowledge base construction methods acquire related knowledge files on a designated website through a crawler program, then analyze the related knowledge files, store the structured knowledge files in a relational database, and then display the data in a system so as to facilitate staff inquiry and learning. This approach has significant limitations, in that the use of a crawler to obtain relevant knowledge files in a given website may pose a risk in terms of intellectual property. Secondly, because the document files in the power transmission and transformation project are mainly unstructured, the problems that the loss of extracted information is large, the conversion of information to knowledge is not complete and insufficient and the like are caused by extracting and mining knowledge in a text in a traditional structured data mode, and the text needs to be preprocessed even if the document files are available.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present application has been made in view of the above-described problems.
Therefore, the technical problems solved by the application are as follows: the traditional structured data method for extracting and mining the knowledge in the text can cause great loss of extracted information and insufficient conversion of the information into the knowledge, and the existing knowledge base construction can cause difficult problems for follow-up knowledge fine recording and processing under the condition of numerous and complex knowledge contents and formats.
In order to solve the technical problems, the application provides the following technical scheme: a construction method of a power transmission and transformation project knowledge base comprises the following steps:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the data acquisition comprises the steps of acquiring user uploading data, wherein the data are classified into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
and filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the pretreatment comprises the following steps:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the classification includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
the low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first, the fusion matrix is converted into capsules u by a nonlinear activation function i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the classifying further comprises: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
the dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the relationship between the tag sequence and the entity comprises:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
As a preferable scheme of the construction method of the power transmission and transformation project knowledge base, the application comprises the following steps: the comparison includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
The application also aims to provide a construction system of the power transmission and transformation project knowledge base, which can collect increment and full unstructured documents uploaded by a user on a power transmission and transformation project platform through a multi-source heterogeneous collector technology, extract knowledge, identify named entities, convert the unstructured documents into a power transmission and transformation project business field table, compare the unstructured documents with a power transmission and transformation project dictionary to obtain scenes and processes corresponding to the documents, finish construction of an unstructured text knowledge system, and solve the problems of large information loss and difficult refinement and recording of knowledge during data extraction.
In order to solve the technical problems, the application provides the following technical scheme: the system for constructing the power transmission and transformation project knowledge base comprises an explicit knowledge collector, an implicit knowledge collector, a primary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation project dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of a method of constructing a knowledge base of power transmission and transformation engineering projects.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of a method of constructing a knowledge base of power transmission and transformation engineering projects.
The application has the beneficial effects that: the application provides a construction method of a knowledge base of a power transmission and transformation project, which is based on a multi-source heterogeneous data acquisition technology to carefully record knowledge in the power transmission and transformation project, to induct and comb scattered knowledge and convert the scattered knowledge into structured data, and to combine with business association labels to form a knowledge standard base of the power transmission and transformation project, and to unify standard system and specification to support decision making and management; the preprocessed text is classified according to six major power transmission and transformation projects by using a capsule network, semantic features of different parts in the text are learned through a capsule layer, and relations among the different features are calculated by using a dynamic routing algorithm, so that context information in the text is captured better, an intelligent power transmission and transformation project knowledge sharing system is built based on a SpringCloud micro-service architecture, and the intelligent power transmission and transformation project knowledge sharing system mainly provides functions of one-stop intelligent retrieval of latest knowledge of power transmission and transformation projects, power transmission and transformation project knowledge association recommendation functions, intelligent questions and answers, knowledge forums and the like for users and helps to quickly draw knowledge in the power transmission and transformation project field. The manager is helped to better know the condition of the whole power transmission and transformation project, so that a more accurate decision is made, the manager can be helped to find and solve the problem in time, the project quality is improved, lean management is realized, and the sustainable development of an electric power enterprise is ensured; the knowledge sharing system of the power transmission and transformation project is constructed, the digital, intelligent and informationized development of the power industry is promoted, the overall level and competitiveness of the power industry are improved, the social resource allocation is optimized through data analysis and mining, and the waste and redundancy are reduced, so that the utilization efficiency of the social resources is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a flowchart of a method for constructing a knowledge base of power transmission and transformation project according to an embodiment of the present application;
FIG. 2 is a block diagram of a system for constructing a knowledge base of power transmission and transformation project according to an embodiment of the present application;
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, for one embodiment of the present application, a method for constructing a knowledge base of power transmission and transformation project is provided, including:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
S1: collecting all data of the power transmission and transformation project, and collecting and storing the data;
further, the data acquisition comprises the steps of acquiring user uploading data, wherein the data is divided into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
it should be noted that, the collection of the implicit knowledge in the power transmission and transformation project mainly comprises the online word segmentation, meaning extraction and content labeling of knowledge content, and labeling of paragraph knowledge with characteristic attributes such as labels.
And filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
S2: filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
still further, the preprocessing includes:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
It should be noted that, because of the specificity of the chinese language, the chinese text classification needs to take into account the problem of chinese word segmentation in particular, so that the chinese word segmentation technique is adopted in the preprocessing, and the classification effect is improved by adjusting the word segmentation granularity (the system adopts the word granularity).
S3: classifying according to six major power transmission and transformation projects by using a capsule network;
still further, the classifying includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
it should be noted that unlike conventional convolutional neural networks, capsule networks improve the generalization ability and robustness of the model by introducing Capsule layers (Capsule layers) instead of convolutional layers and pooling layers. The text is classified and predicted according to six major professions of power transmission and transformation engineering by using a capsule network, the capsule network can effectively extract semantic information implied in the context, and compared with a traditional classifier, the capsule network has stronger expression capability and consists of low-layer capsules and high-layer capsules representing different categories.
The low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first byNon-linear activation function converts fusion matrix into capsule u i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
Still further, the classifying further includes: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
it should be noted that the capsule layer is made up of a plurality of capsules, each of which can learn the semantic features of a different part of the text, e.g. one capsule can learn the features of one word in the text and another capsule can learn the features of one phrase in the text.
The dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
S4: outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
still further, the tag sequence and entity relationship includes:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
S5: and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
Still further, the aligning includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
It should be noted that the power transmission and transformation dictionary comparator is mainly based on a power transmission and transformation engineering dictionary of a power transmission and transformation engineering platform, wherein the whole process management and professional function management noun abbreviation, noun explanation, english abbreviation, data type and standardized service scene of a power grid project related to the digitized application and construction of the power transmission and transformation engineering are specified.
Example 2
A second embodiment of the application, which differs from the previous embodiment, is:
the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Example 3
Referring to fig. 2, a third embodiment of the present application provides a method system for constructing a knowledge base of power transmission and transformation project, including: the system comprises an explicit knowledge collector, an implicit knowledge collector, a preliminary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation engineering dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
Example 4
In order to verify the beneficial effects of the application, scientific demonstration is carried out through economic benefit calculation and experiments.
The following section provides a specific knowledge in a knowledge base of a power transmission and transformation project through user uploading documents, which comprises the following steps:
and constructing a knowledge base of power transmission and transformation project, taking a document uploaded by a user of a certain power transmission and transformation project as an example.
(1) And (3) data acquisition:
and (3) explicit knowledge, namely collecting 500 documents uploaded by a user, wherein 400 parts are unstructured data, and 100 parts are structured data.
And (3) marking paragraph labels and expansion labels on the uploaded documents by referring to related specifications and standards to obtain knowledge content corresponding to each paragraph.
(2) Pretreatment:
filtering nonstandard data, namely finding 6 unstructured data format errors and filtering; 5 parts of structured data information are incomplete and filtered.
And word segmentation, namely performing word segmentation on 489 documents to obtain 30000 words in a word stock, performing word vector conversion on 10000 words in the word stock, and listing the rest words in a knowledge base for construction.
Word vector selecting word2vec word vector model, setting word vector dimension as 200, learning rate as 0.025, and iteration number as 5 to obtain word vector of 10000 words in word bank.
(3) Classification:
setting 6 categories, namely project, technology, quality, progress, technological process and safety.
The capsule network parameter settings are shown in the following table:
TABLE 1
Parameter setting | Value taking |
Number of iterations | 20 |
Batch quantity | 64 |
Optimizer | Adam |
Learning rate | 5e-4 |
Dropout Rate | 0.5 |
Word vector dimension | 300 |
Hidden layer unit | 256 |
The environment is shown in the following table:
TABLE 2
Hardware device | Specification of specification |
Processor and method for controlling the same | Intercore i7-9700 |
Processor main frequency | 3.0GHz |
Memory | 16GB |
Software environment | Python 3.6 |
Deep learning frame | TensorFlow 1.14.0 |
After 20 iterations, 489 parts of document classification results, namely 219 parts of project, 125 parts of technology, 58 parts of quality, 36 parts of progress, 28 parts of technology and 23 parts of safety.
(4) CRF and comparison:
and obtaining the entity and entity relation of the document, and comparing the entity and the entity relation with a dictionary to obtain a flow scene.
And comparing the business process field table with the project knowledge dictionary to obtain a specific process and a scene corresponding to each document, and completing knowledge system construction.
(5) Evaluation:
the real category comprises 221 parts of project, 127 parts of technology, 60 parts of quality, 38 parts of progress, 30 parts of technology, and 24 parts of safety.
The accuracy is 99.1 percent of project, 98.4 percent of technology, 96.7 percent of quality, 94.7 percent of progress, 83.3 percent of technology and 95.8 percent of safety.
The average accuracy is 96.7%, and the practical requirement is met.
Therefore, the method can well construct a knowledge base of the power transmission and transformation project, has ideal classification effect and accuracy, and proves that the method has practical value.
It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered by the scope of the claims of the present application.
Claims (10)
1. The construction method of the power transmission and transformation project knowledge base is characterized by comprising the following steps of:
collecting all data of the power transmission and transformation project, and collecting and storing the data;
filtering service data which does not accord with the specification, performing word segmentation and vectorization, and preprocessing acquired data;
classifying according to six major power transmission and transformation projects by using a capsule network;
outputting the label sequence and the entity relation in a structured form by using a CRF algorithm to generate a power transmission and transformation engineering business flow field table;
and comparing the entity and entity relation obtained by the named entity recognition module with the power transmission and transformation engineering dictionary, and determining a specific flow and a scene corresponding to the unstructured text uploaded by the user.
2. The method for constructing the knowledge base of the power transmission and transformation project according to claim 1, which is characterized in that: the data acquisition comprises the steps of acquiring user uploading data, wherein the data are classified into dominant and recessive from sources;
explicit knowledge is various documents uploaded by users and structurally divided into structuring and unstructured;
when the file is structured data, the collector transmits the structured data to the database monitoring module, the database monitoring technology is used for collecting service data changes in real time, monitoring is carried out on database change data, and once new change data is received, the new change data is recorded and sent to the knowledge extraction module;
when the document is unstructured data, the structured data collector is used for collecting and processing, and the interface provided by Apache POI and POI-TL technology is utilized to realize the functions of reading and writing office software in different formats, so that knowledge contained in the unstructured data is obtained;
for implicit knowledge, the detailed recording of the content of the knowledge system is realized through the paragraph label marking of the multiple documents and the knowledge system corresponding to the scene expansion label characteristics;
and filtering and primarily screening the entity set subjected to word segmentation by utilizing a primary screening module, judging the change record according to a preset rule set, discarding the change record if the rule set in the rule set is not met, and submitting the data to a sending module after simple conversion if the rule set in the rule set is met.
3. The method for constructing the knowledge base of the power transmission and transformation project according to claim 2, which is characterized in that: the pretreatment comprises the following steps:
according to standard specifications established by a power transmission and transformation engineering platform, filtering out service data which does not meet the specifications, incomplete data, error data and repeated data, sending out corresponding-level alarms to a service system side for the washed out data which does not meet the specifications, then segmenting a text and mapping each word into a unique vector in a high-dimensional space through a word2vec algorithm.
4. A method for constructing a knowledge base of power transmission and transformation project according to claim 3, characterized in that: the classification includes:
classifying the preprocessed segmented text according to six major professions of power transmission and transformation engineering projects, technology, quality, progress, technological process and safety by using a capsule network, wherein the direction of a capsule output vector represents the classification of the text, the length of the vector represents the probability of classification, and the relation between an upper capsule and a lower capsule is obtained through a dynamic routing protocol algorithm, so that the local and the whole relation is captured;
the low-layer capsules are continuously updated and calculated through a dynamic routing algorithm to obtain a corresponding weight matrix, the weight matrix and the input of the low-layer capsules jointly determine the output of the high-layer capsules, the weight matrix is used for obtaining high-layer representation through a compression function, and the final capsule representation is obtained by splicing all the high-layer capsules;
the capsule network extracts deep features of the features by continuously iterating dynamic routing, and the steps are as follows:
first, the fusion matrix is converted into capsules u by a nonlinear activation function i Using a weight matrix W ij Capsule u i Conversion to predictive vectorsAnd defining the iteration number, define c ij For the coupling coefficient, the weight is adjusted through multiple iterations, and the softmax function is adopted for updating to ensure that the sum of the coefficients between the input layer and the output layer is 1, wherein the sum is as shown in the following formula:
wherein b ij For the weight, set the initial value to 0, u i Representing an initial capsule;
vector predictionAnd coupling coefficient c ij Weighted sum and output s j Expressed as:
wherein s is j Input representing a high-layer capsule;
the direction of the capsule vector represents the internal space structure, the modular length of the capsule vector represents the importance of the feature, and in order not to lose the space feature, the vector v is obtained by adopting a squar compression function for normalization j The modulus is compressed while the direction of the output vector is not changed, expressed as:
finally, by updating the correlation update b of the output vector and the prediction vector, it is expressed as:
wherein, when the direction of the prediction vector and the direction of the output capsule tend to be consistent, the similarity is higher, and the corresponding coupling coefficient c is increased ij The weight of semantic information is increased, more hidden characteristic information is mined, and text classification accuracy is improved.
5. The method for constructing a knowledge base of power transmission and transformation project according to claim 4, wherein the classifying further comprises: performing professional classification through a capsule layer, a dynamic routing layer and a classification layer;
the capsule layer learns semantic features of different parts in the text;
the dynamic routing calculates the relation between different capsules by using a dynamic routing algorithm, captures the context information in the text, and ensures that the relation between different capsules is more accurate and stable by using the dynamic routing algorithm;
and the classification layer inputs the capsule characteristics obtained by the dynamic routing into the classification layer to carry out final power transmission and transformation engineering text professional classification.
6. The method for constructing a knowledge base of power transmission and transformation project according to claim 5, wherein the relationship between the tag sequence and the entity comprises:
predicting an output tag sequence by learning a conditional probability distribution between the input sequence and the output tag and outputting the identified and extracted entities and entity relationships in a structured form;
and outputting the label sequence and the entity relation in a structured form by using a CRF algorithm through inputting the classified professional unstructured text to obtain a power transmission and transformation engineering business flow field table.
7. The method for constructing the knowledge base of the power transmission and transformation project according to claim 6, which is characterized in that: the comparison includes:
after the power transmission and transformation engineering dictionary is summarized and arranged according to scenes and flows, the entity and entity relation obtained by the named entity recognition module is compared with the entity and entity relation to obtain a specific flow and a scene corresponding to the unstructured text uploaded by a user, and the knowledge system construction of the power transmission and transformation engineering platform is completed.
8. A system employing a method for constructing a knowledge base of power transmission and transformation project according to any one of claims 1 to 7, characterized in that: the system comprises an explicit knowledge collector, an implicit knowledge collector, a preliminary screening module, a storage module, a named entity identification module, a capital construction professional classification module, a preprocessing module, a power transmission and transformation engineering dictionary comparison module and a knowledge system construction module;
the explicit knowledge collector is used for acquiring structured and unstructured data uploaded by a user;
the implicit knowledge collector is used for realizing the fine recording of the content of the knowledge system through the knowledge system corresponding to the paragraph label marking and the scenerized expansion label characteristic of the multi-document;
the primary screening module is used for filtering and primary screening the entity set subjected to word segmentation;
the storage module is used for storing the acquired data;
the named entity recognition module is used for outputting the tag sequence and the entity relation in a structured form to obtain a power transmission and transformation engineering business flow field table;
the capital construction specialty classification module is used for classifying six major specialty according to the project, technology, quality, progress, technological channel and safety of power transmission and transformation engineering by using a capsule network;
the preprocessing module is used for preprocessing the acquired data, and comprises the steps of filtering data which does not meet the specification, word segmentation and vectorization;
the power transmission and transformation engineering dictionary comparison module is used for comparing the entity and entity relationship obtained by the named entity identification module with the power transmission and transformation engineering dictionary;
the knowledge system construction module is used for completing knowledge system construction of the power transmission and transformation engineering platform.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any of claims 1 to 7 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310652097.3A CN116610818A (en) | 2023-06-05 | 2023-06-05 | Construction method and system of power transmission and transformation project knowledge base |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310652097.3A CN116610818A (en) | 2023-06-05 | 2023-06-05 | Construction method and system of power transmission and transformation project knowledge base |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116610818A true CN116610818A (en) | 2023-08-18 |
Family
ID=87676344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310652097.3A Pending CN116610818A (en) | 2023-06-05 | 2023-06-05 | Construction method and system of power transmission and transformation project knowledge base |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116610818A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117076991A (en) * | 2023-10-16 | 2023-11-17 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
CN117151117A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
-
2023
- 2023-06-05 CN CN202310652097.3A patent/CN116610818A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117076991A (en) * | 2023-10-16 | 2023-11-17 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
CN117076991B (en) * | 2023-10-16 | 2024-01-02 | 云境商务智能研究院南京有限公司 | Power consumption abnormality monitoring method and device for pollution control equipment and computer equipment |
CN117151117A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
CN117151117B (en) * | 2023-10-30 | 2024-03-01 | 国网浙江省电力有限公司营销服务中心 | Automatic identification method, device and medium for power grid lightweight unstructured document content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717047B (en) | Web service classification method based on graph convolution neural network | |
CN106445988A (en) | Intelligent big data processing method and system | |
CN106447066A (en) | Big data feature extraction method and device | |
CN116610818A (en) | Construction method and system of power transmission and transformation project knowledge base | |
CN115878904A (en) | Intellectual property personalized recommendation method, system and medium based on deep learning | |
CN113961685A (en) | Information extraction method and device | |
CN111158641B (en) | Automatic recognition method for transaction function points based on semantic analysis and text mining | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN112559684A (en) | Keyword extraction and information retrieval method | |
CN114048354B (en) | Test question retrieval method, device and medium based on multi-element characterization and metric learning | |
CN111061939B (en) | Scientific research academic news keyword matching recommendation method based on deep learning | |
CN115952292B (en) | Multi-label classification method, apparatus and computer readable medium | |
CN116257759A (en) | Structured data intelligent classification grading system of deep neural network model | |
CN114676346A (en) | News event processing method and device, computer equipment and storage medium | |
CN118035440A (en) | Enterprise associated archive management target knowledge feature recommendation method | |
CN114637846A (en) | Video data processing method, video data processing device, computer equipment and storage medium | |
CN113051886A (en) | Test question duplicate checking method and device, storage medium and equipment | |
CN111782964B (en) | Recommendation method of community posts | |
CN113641788A (en) | Unsupervised long-short shadow evaluation fine-grained viewpoint mining method | |
CN113177164A (en) | Multi-platform collaborative new media content monitoring and management system based on big data | |
CN112270185A (en) | Text representation method based on topic model | |
CN111291182A (en) | Hotspot event discovery method, device, equipment and storage medium | |
TU | Online Text Retrieval Method Based on Convolution Neural Network. | |
CN110609961A (en) | Collaborative filtering recommendation method based on word embedding | |
CN113220855B (en) | Computer technology field development trend analysis method based on IT technical question-answering website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |