CN115510245B - Unstructured data-oriented domain knowledge extraction method - Google Patents

Unstructured data-oriented domain knowledge extraction method Download PDF

Info

Publication number
CN115510245B
CN115510245B CN202211259591.5A CN202211259591A CN115510245B CN 115510245 B CN115510245 B CN 115510245B CN 202211259591 A CN202211259591 A CN 202211259591A CN 115510245 B CN115510245 B CN 115510245B
Authority
CN
China
Prior art keywords
entity
knowledge
extraction model
model
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211259591.5A
Other languages
Chinese (zh)
Other versions
CN115510245A (en
Inventor
王儒
孙延劭
华益威
魏竹琴
王国新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211259591.5A priority Critical patent/CN115510245B/en
Publication of CN115510245A publication Critical patent/CN115510245A/en
Application granted granted Critical
Publication of CN115510245B publication Critical patent/CN115510245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a domain knowledge extraction method for unstructured data, which comprises the following steps: establishing an entity extraction model based on a two-way long and short-term memory neural network and a conditional random field, establishing a relation extraction model based on an attention mechanism, and respectively training the two models; extracting unstructured data to be extracted by using a trained entity extraction model to obtain a domain entity, and storing the domain entity in a form of a table as a domain entity table; extracting the relationship by using a trained relationship extraction model, and obtaining an entity-relationship table on the basis of the entity table in the field; carrying out knowledge fusion based on semantic similarity according to all the extracted entities and relations to obtain a knowledge-fused entity-relation table, and establishing a knowledge graph in a neo4j graph database; the method and the device can solve the problems that the prior art knowledge acquisition is mainly manual, the management efficiency is low, and the art knowledge system is not perfect, and realize knowledge extraction of unstructured data.

Description

Unstructured data-oriented domain knowledge extraction method
Technical Field
The invention belongs to the technical field of knowledge extraction, and particularly relates to a field knowledge extraction method for unstructured data.
Background
The domain knowledge has the characteristics of strong specialization, various knowledge carriers, complex knowledge system and the like. Under the background of intelligent manufacturing, the research and development of products and the demand of manufacturing on domain knowledge are more and more urgent, and a perfect domain knowledge acquisition, management and sharing system is established, so that the efficiency of product research and development can be effectively improved, and a domain knowledge graph is a key for realizing the goal. Knowledge maps are essentially a large-scale semantic network aimed at describing concepts and events in the real world in terms of entities, representing their interrelationships. The core of the knowledge graph is a triplet composed of entities, attributes and relations, and can be structurally divided into a mode layer and a data layer, wherein the mode layer is composed of a concept body and relations and is used for describing the structure of the knowledge graph, and the data layer is an instantiated knowledge graph constructed through specific data under the guidance of the mode layer.
The domain knowledge graph is an important means for managing domain knowledge and relations, and can be used for uniformly managing various knowledge in the domain. Therefore, the construction process of the knowledge graph is important. Firstly, the data sources of the knowledge graph are required to be clearly constructed, and in the knowledge graph construction process, the data sources are divided into structured data, semi-structured data and unstructured data, wherein the extraction of the structured data and the semi-structured data is mature, and the extraction of the unstructured data is still in the development stage. In practical application, the construction of the knowledge graph is still mainly manual, the automatic construction is still mainly structured and semi-structured, and the technical field needs an automatic knowledge extraction method aiming at unstructured data, which is beneficial to realizing the management of knowledge in complex fields of multi-source isomerism and is convenient for the design and decision of the fields.
The method of extracting knowledge from unstructured data can be decomposed into two parts, entity extraction and relationship extraction.
In terms of entity extraction, with the development of Natural Language Processing (NLP) technology, various entity recognition algorithms based on deep learning, such as a cyclic neural network RNN, are developed, which is a type of neural network for processing sequence data, is suitable for processing unstructured data mainly comprising text data, on the basis, a long and short memory neural network LSTM is developed for avoiding the problem of dimension explosion, a bidirectional long and short neural network BiLSTM is developed for accelerating training, and a conditional random field CRF is added for defining a loss function for further improving extraction precision.
In relation extraction, there are methods such as pipeline method and end2end, the former uses entity extractor to identify each entity according to sentence, then combines every two extracted entities, and adds original text sentence as input of relation identifier to identify the relation between two input entities; the latter is also called end-to-end relation extraction, which directly extracts the triples by processing each sentence. Along with the development of deep learning, the relation extraction field develops a relation extraction model based on a convolutional neural network CNN and on an attention mechanism.
However, the method for entity extraction and relation extraction proposed above is widely used in the general knowledge field at present, and the general knowledge has the characteristics of wide coverage, large data volume and the like, so that the knowledge graph in the general field is generally constructed from bottom to top, and information is extracted from a large amount of data to form entities and relations in the knowledge graph. The domain knowledge is different from the general knowledge, and the domain knowledge is more important to the expertise of the knowledge, so that the domain knowledge needs to have a more strict structure. When the domain knowledge graph is constructed, a top-down mode is needed to be adopted for construction, a mode layer of the domain knowledge graph is designed first, and which information belongs to domain knowledge is determined according to the mode layer. However, in the construction aspect of the domain knowledge graph, manual construction is still mainly used, management efficiency is low, the processed data is mainly composed of structural and semi-structural data, and a systematic method is still lacking in knowledge extraction for unstructured data.
Disclosure of Invention
In view of the above, the invention provides a domain knowledge extraction method for unstructured data, which can solve the problems of low management efficiency and imperfect domain knowledge system of the existing domain knowledge acquisition mainly by manual operation, and realize knowledge extraction of unstructured data.
The invention is realized by the following technical scheme:
The unstructured data is data which is irregular or incomplete in data structure, has no predefined data model and is inconvenient to be represented by a two-dimensional logic table of a database;
The extraction method comprises the following specific steps:
Step S1, combing the domain knowledge concept entity and the relationship combing to establish a domain knowledge graph model layer;
s2, preprocessing unstructured data to obtain manually marked text data;
step S3, establishing an entity extraction model based on a bidirectional long-short-term memory neural network and a conditional random field, establishing a relationship extraction model based on an attention mechanism, and training the entity extraction model and the relationship extraction model by using corresponding data sets respectively;
Step S4, extracting unstructured data to be extracted by using a trained entity extraction model to obtain a domain entity, and storing the domain entity in a form of a table as a domain entity table; extracting the relationship by using a trained relationship extraction model, and obtaining an entity-relationship table corresponding to the entity and the relationship one by one on the basis of the entity table in the field;
And carrying out knowledge fusion based on semantic similarity according to all the extracted entities and relations to obtain a knowledge-relation table after knowledge fusion, and establishing a knowledge graph in a neo4j graph database according to the entity-relation table.
Further, the specific steps of step S1 are as follows:
Step S1-1, combing knowledge concepts and relations in the multiple scene fields according to the purpose of knowledge extraction;
and S1-2, defining a knowledge structure according to the domain knowledge concept entity and the relationship, and establishing a domain knowledge graph model layer.
Further, the specific steps of step S2 are as follows:
S2-1, analyzing unstructured data into txt files by using a text analysis tool;
S2-2, utilizing Jieba word segmentation tools to segment the text file;
s2-3, removing stop word processing is carried out on the text after word segmentation;
and S2-4, manually labeling the text data based on the BIO labeling method or BIOES labeling method.
Further, the specific steps of step S3 are as follows:
s3-1, forming a training set and a testing set for training an entity extraction model and a relation extraction model according to manually marked data;
S3-2, establishing an entity extraction model based on a bidirectional long-short-term memory neural network and a conditional random field, and training the model by using a corresponding data set; establishing a relation extraction model based on an attention mechanism, and training the model by utilizing a corresponding data set;
S3-3, evaluating the training effect of the entity extraction model according to the accuracy rate, the recall rate and the F1 value; and evaluating the training effect of the relation extraction model according to the accuracy rate.
Further, in step S3-2, when the entity extraction model is established: the output dimension of BiLSTM layers of the bidirectional long-short-term memory neural network BiLSTM is the same as the number of label types, and for each input w i, the network outputs a probability value P ij of a label j corresponding to the input w i, and finally an output P of the network is obtained, namely, each input corresponds to a labeling probability value of each label; the conditional random field CRF calculates the labeling probability value under the condition constraint, and the labeling probability value is calculated by setting y as a predicted labeling sequence, x as a text input sequence and y' as an accurate labeling sequence, and the labeling probability value is calculated by the conditional random field CRF
Wherein, P (y|x) is the probability value of the output P after constraint of the conditional random field; the Score may be calculated by:
wherein, ψ i (x, y) is a feature vector;
When training the entity extraction model, the objective is to maximize probability P (y|x), which is obtained by log likelihood:
Defining the loss function as-log (P (y|x)), and optimizing the loss function-log (P (y|x)) by an optimization algorithm to realize training of the entity extraction model BiLSTM-CRF.
Further, in step S3-2,
When a relation extraction model is established, firstly, a vector form of a text is output through a BiLSTM layer of a bidirectional long-short-term memory neural network BiLSTM, then, the relation is classified through an attention mechanism layer, the relation among entities is obtained, and the relation extraction model is established;
when training the relation extraction model, the input of the relation extraction model takes sentences as a unit, and a sentence S containing T characters is given: s= { x 1,x2,...,xT }, where x i represents each character, the output through BiLSTM layers is h= { H 1,h2,...,hT }, the matrix parameters to be trained D w represents the dimension of word embedding, satisfying:
M=tanh(H)
α=softmax(wTM)
r=HαT
Wherein alpha is the attention weight coefficient, and r is the result of adding up the weighted outputs H of BiLSTM layers;
Finally, generating a characterization vector h * =tanh (r) through a nonlinear function;
Mapping the characterization vector h * to the class vector through the fully connected network, and outputting the predicted probability of the relation classification through softmax for the input sentence S Obtaining predictive tag/>, by argmax
Wherein W and b are a parameter matrix and a bias, respectively;
the negative log likelihood is used to define the loss function as:
wherein t ε R m is a single-heat representation, y ε R m is an estimated probability of each relationship class output through softmax, λ is a regularized hyper-parameter, θ represents a model parameter of the relationship extraction model;
And (3) optimizing the loss function J (theta) through an optimization algorithm to realize the training of the relation extraction model.
Further, in step S4, the specific method for performing knowledge fusion by using the semantic similarity calculation method is as follows:
(1) Semantic similarity calculation: calculating the similarity among concepts, attributes and structural relations in the process knowledge through Jaccard similarity coefficients, classifying the similarity, and providing a basis for semantic space model fusion;
(2) Semantic space model fusion: according to the fusion operation rule, carrying out fusion operation on domain knowledge with different similarities, and eliminating similar redundancy or conflict contradiction between the domain knowledge;
(3) Entity linking: and linking the newly added domain knowledge with the existing map by using a joint link model based on the map, calculating the compatibility and the dependence among the entities, disambiguating the newly added knowledge according to the calculation result, and merging the newly added knowledge into the knowledge map.
The beneficial effects are that:
(1) The invention provides a field knowledge extraction method for unstructured data, and relates to knowledge modeling and natural language processing technologies. The method comprises the steps of firstly carrying out concept and relation combing on domain knowledge, establishing a domain knowledge graph model layer, preprocessing unstructured data, creating a training set and a testing set through manually marking the data set, training the data by adopting a named entity recognition model BiLSTM-CRF based on deep learning, evaluating the training effect of the model according to indexes such as accuracy rate, recall rate, F1 value and the like, and training by using a relation extraction model based on an attention mechanism. When knowledge extraction is carried out, the training model can be utilized to carry out entity extraction on unstructured data, the relationship extraction model based on an attention mechanism is utilized to carry out relationship extraction, a entity-relationship table is formed, knowledge fusion is carried out on all extracted entities and relationships based on semantic similarity, and finally a knowledge graph is formed and stored by utilizing a graph database neo4 j. The method has the characteristics of strong specialization, multiple knowledge carriers, complex knowledge system and the like, is suitable for the requirements of research and development and manufacture of products on domain knowledge, and can effectively improve the efficiency of research and development of the products by establishing a perfect domain knowledge acquisition, management and sharing system.
(2) The invention establishes an entity extraction model based on a bidirectional long and short time memory neural network (BiLSTM) and a Conditional Random Field (CRF) to realize entity extraction of unstructured data; establishing a relation extraction model based on an attention mechanism to realize relation extraction of unstructured data; the process entity and the relation in the unstructured data are finally automatically extracted through the combination of the entity extraction model and the relation extraction model, and a high extraction accuracy rate can be obtained through training of a large number of data sets.
(3) When the entity extraction model is built, the bidirectional long-short-term memory neural network (BiLSTM) and the Conditional Random Field (CRF) are adopted, so that the problem of dimensional explosion possibly occurring in the traditional cyclic neural network (RNN) can be solved, and meanwhile, the training speed can be improved.
(4) The method for calculating the semantic similarity based on the knowledge fusion performs knowledge fusion based on the semantic similarity, combines knowledge with the same or highly similar semantic according to all the extracted entities and relations, and has the characteristics of simplicity and reliability.
Drawings
Fig. 1 is a schematic flow chart of an implementation of a domain knowledge extraction method for unstructured data.
FIG. 2 is a schematic diagram of a BiLSTM model structure.
Fig. 3 is a schematic diagram of a long and short term memory neural network model based on an attention mechanism.
FIG. 4 is a schematic diagram of a semantic similarity calculation process.
FIG. 5 is a schematic diagram of a semantic space model fusion process.
Fig. 6 is a schematic diagram of the established process knowledge pattern layer of example 2.
FIG. 7 is a schematic diagram of BIO labeling.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
Example 1:
The embodiment provides a domain knowledge extraction method for unstructured data, wherein the unstructured data is data with irregular or incomplete data structure, a predefined data model is not provided, and the data represented by a two-dimensional logic table of a database is inconvenient to use, and the data with text type is mainly used.
The extraction method comprises the following specific steps:
Step S1, a mode layer is constructed:
step S1-1, domain concept and relationship carding: according to the purpose of knowledge extraction, carding knowledge concepts and relations in the multi-scene field;
Step S1-2, constructing a domain knowledge graph model layer: defining a knowledge structure according to the domain knowledge concept entity and the relationship, and establishing a domain knowledge graph model layer;
Step S2, carrying out data preprocessing on unstructured data:
Step S2-1, analyzing the file into txt files: using a text parsing tool to parse unstructured data into txt files;
Step S2-2, word segmentation: utilizing Jieba word segmentation tools to segment the text file;
Step S2-3, removing stop words: removing stop words from the segmented text;
Step S2-4, manual labeling: manually labeling the text data based on a BIO labeling method;
step S3, performing model training:
Step S3-1, training set and test set: forming a training set and a testing set for training the entity extraction model and the relation extraction model according to the manually marked data;
Step S3-2, training an entity extraction model: establishing an entity extraction model based on a bidirectional long and short term memory neural network (BiLSTM) and a Conditional Random Field (CRF), and training the model by using a corresponding data set;
step S3-3, entity extraction model evaluation: evaluating the training effect of the entity extraction model according to the accuracy rate, the recall rate and the F1 value;
Step S3-4, training a relation extraction model: establishing a relation extraction model based on an attention mechanism, and training the model by utilizing a corresponding data set;
Step S3-5, evaluating a relation extraction model: evaluating the training effect of the relation extraction model according to the accuracy rate;
wherein steps S3-4 and S3-5 are interchangeable with steps S3-2 and S3-3;
Step S4, building a domain knowledge graph:
Step S4-1, extracting the domain entity: extracting unstructured data to be extracted by using a trained entity extraction model to obtain a domain entity;
Step S4-2, field entity table: according to the domain entity extracted by the entity extraction model, storing the domain entity in a form of a table as a domain entity table;
step S4-3, entity relation table: extracting the relationship by using a trained relationship extraction model, and obtaining an entity-relationship table corresponding to the entity and the relationship one by one on the basis of an entity table in the field;
Step S4-4, knowledge fusion: according to all the extracted entities and relations, carrying out knowledge fusion based on semantic similarity, and combining knowledge with the same or highly similar semantics;
step S4-5, knowledge graph: and establishing a knowledge graph in the neo4j graph database according to the entity-relation table after knowledge fusion.
Example 2:
In this embodiment, on the basis of embodiment 1, a diesel engine process-related paper is taken as an example to extract process knowledge, that is, the unstructured data is a diesel engine process-related paper, and the implementation flow of the extraction method is shown in fig. 1; the method comprises the following specific implementation steps:
Step S1, a mode layer is constructed:
Step S1-1, carding the technological concept and the relation: according to the purpose of knowledge extraction, carding the multi-scene process knowledge concepts and relations; the technological knowledge of the diesel engine can be carded according to three dimensions of a technological body, a workpiece body and an equipment body, wherein the technological body can be divided into machining, assembling and casting, the workpiece body is a component structure and parts of the diesel engine, and the equipment body is various equipment used in processing;
s1-2, building a process knowledge graph mode layer: defining a knowledge structure according to the technological knowledge concept entity and the relation, and establishing a technological knowledge map model layer;
in this embodiment, the specific method for establishing the process knowledge graph mode layer is as follows:
(1) Defining a process knowledge graph application scene and determining a process knowledge concept body;
(2) Determining the relationship between the process knowledge concept bodies, wherein in the diesel process knowledge, the relationship between the process bodies and the workpiece bodies is "action", the relationship between the equipment bodies and the process bodies is "implementation", and the relationship between the equipment bodies and the workpiece bodies is "processing", as shown in fig. 6;
Step S2, carrying out data preprocessing on unstructured data:
Step S2-1, analyzing the file into txt files: using a text parsing tool to parse unstructured data into txt files;
Step S2-2, word segmentation: utilizing Jieba word segmentation tools to segment the text file;
Step S2-3, removing stop words: removing stop words from the segmented text;
Step S2-4, manual labeling: manually labeling the text data based on a BIO labeling method, wherein the process entity is labeled as B-TEC and I-TEC, the workpiece entity is labeled as B-WOR and I-WOR, the equipment entity is labeled as B-EQU and I-EQU, the other labels are labeled as O, and partial labeling results are shown in the table 1;
Table 1 part of entity labeling results
The BIO labeling method can be replaced by BIOES labeling method, namely B is the beginning of an entity, I is the middle of the entity, E is the end of the entity, S is the entity with single character, and O is the other; the labeling method is not unique, different labeling methods can be selected according to different entity extraction requirements, and model training is not affected.
Step S3, performing model training:
Step S3-1, training set and test set: forming a training set and a testing set for training the entity extraction model according to the manually marked data;
Step S3-2, training an entity extraction model: establishing an entity extraction model based on a bidirectional long and short term memory neural network (BiLSTM) and a Conditional Random Field (CRF), and training the model by using a data set;
The embodiment adopts a two-way long and short-term memory neural network (BiLSTM) and a Conditional Random Field (CRF) to establish a physical extraction model, so that the problem of dimensional explosion possibly occurring in the traditional cyclic neural network (RNN) can be solved, and the training speed can be improved; the specific method for building and training the model is as follows:
In LSTM, memory cells are connected to each other, instead of the circulation unit in a general RNN, there is circulation inside each memory cell in addition to a circulation connection structure between memory cells; the input of each memory cell is controlled by an input gate, if the input gate allows, the value of each memory cell can be accumulated to a state, the weight of the state is controlled by a forgetting gate, and the output can be controlled by an output gate to be closed or not;
(1) The input gate is updated by:
it=σg(Wixt+Uiht-1+bi)
wherein i t is an input gate at time t, and W i is an input weight matrix; u i is a cyclic weight matrix of the input gate; b i is offset; adjusting W ixt+Uiht-1+bi by a sigmoid activation function σ g, setting output i t to a value between 0 and 1; x t is the input variable, i.e. each character in a sentence, h t-1 is the hidden state of LSTM at time t-1;
(2) The forget gate is updated by:
ft=σg(Wfxt+Ufht-1+bf)
Wherein f t is a forgetting gate at time t, and W f is an input weight matrix; u f is a cyclic weight matrix of the forget gate; b f is bias, W fxt+Ufht-1+bf is adjusted by sigmoid activation function σ g, setting output f t to a value between 0 and 1;
(3) The output gate is updated by:
ot=σg(Woxt+Uoht-1+bo)
Wherein o t is an output gate at time t, and W o is an input weight matrix; u o is a cyclic weight matrix of the output gate; b o is offset. Adjusting W oxt+Uoht-1+bo by a sigmoid activation function σ g, setting output o t to a value between 0 and 1;
(4) Memory cell c t is refreshed by:
Wherein c t is a memory cell at time t, c t-1 is a memory cell at time t-1, W c is an input weight matrix; u c is a cyclic weight matrix of the memory cells; b c is bias, output/>, through tanh activation function sigma h It can be seen that the forget gate f t determines the data transferred from the last memory cell and the input gate i t determines the data currently transferred into the memory cell.
The hidden state h t of LSTM is determined by the output gate and memory cell together:
ht=otσh(ct)
Wherein h t is the hidden state of LSTM at time t;
Although LSTM can solve the problem of long-range dependence by memory cells, LSTM is a forward propagation algorithm, and the output of a state can only be calculated from its previous state. However, in the problem of named entity recognition, a word vector of a text sentence is input, a named entity and words nearby the named entity have semantic dependence, in order to recognize a certain entity, many times the named entity is influenced by not only the previous word but also the subsequent word, and the one-way long-short-term memory neural network cannot be combined with the content behind the current moment to perform named entity recognition and the like, so that a two-way long-short-term memory neural network model (BiLSTM) is adopted to perform named entity recognition, and the model structure is shown in fig. 2-3.
The structure of the two-way long-short-term memory neural network consists of an input layer, a forward hiding layer, a backward hiding layer and an output layer. The input layer inputs sequence data, the forward hidden layer calculates forward characteristics, and the backward hidden layer calculates backward characteristics; the forward hidden layer can memorize information before the current moment, and the backward hidden layer can memorize information in the future of the current moment; and splicing the results output by the forward hidden layer and the backward hidden layer to obtain the bidirectional LSTM, namely BiLSTM network.
And finally, accessing the output to the classification label of the softmax input layer prediction naming entity. For a named entity recognition task, defining that k labels exist in the task, namely label= { label 1,label2,...,labelk }, the length of an input sequence is n, namely w= { w 1,w2,...,wn }, obtaining the score P t,j of each label j corresponding to each input w t through BiLSTM, and forming a P matrix by the score P t,j corresponding to n characters of the whole sequence, wherein the larger the score is, the closer the label corresponding to the score is to a real label.
In a task of identifying a Chinese named entity, the entity is usually formed by combining a plurality of Chinese characters, the Chinese characters are marked according to a BIO marking method, the method is the same as a data marking mode of a training set, B is used for representing a beginning character of the named entity, I represents a middle part and an ending part of the named entity, and O represents a non-entity part. As an example of labeling, FIG. 7 shows that entities are categorized into Workpiece, equipment, technic categories, where B-Workpiece represents the beginning character of the Workpiece entity "engine block", namely "FAIL"; I-Workpiece represents the middle and end portions of the Workpiece entity "engine block," i.e., "engine," "cylinder," "body. The same applies to the Equipment entity and the technical entity.
It can be seen that for the input of chinese sequences, there is a certain constraint on the label to be output:
(1) The start tag of an entity must be "B-" and the tag "I-" must follow "B-" and "O" cannot occur before "I-";
(2) The label type of an entity needs to be kept consistent, e.g. "B-Workpiece" followed by "I-Workpiece", but not "I-Equipment";
These constraints are not imposed by BiLSTM, and therefore the present embodiment employs Conditional Random Fields (CRFs) to further constrain the output of the network to achieve higher accuracy. The conditional random field is one of probability map models, which can be classified into a directed map model including a bayesian network, a hidden markov model, and an undirected map model including a conditional random field.
The Conditional Random Field (CRF) is widely applied in the field of natural language processing at present, is a conditional probability distribution model, and introduces a characteristic function based on a Hidden Markov Model (HMM).
The transfer matrix in the CRF will consider the association between output labels at each moment, so this embodiment considers using the CRF to make BiLSTM layers; the BiLSTM layer provides a function of extracting characteristics according to the context, can predict entity types of input texts, and the CRF layer provides a mechanism for scoring the current output state, and can further restrict the output, so that the accuracy of prediction is improved.
The number of output dimensions of BiLSTM layers is the same as the number of label types, and for each input w i, the network outputs a probability value P ij of the label j corresponding to the input w i, so that an output P of the network, namely a labeling probability value corresponding to each label for each input is obtained. CRF calculates the labeling probability value under the condition constraint, and sets y as the predicted labeling sequence, x as the text input sequence, and y' as the accurate labeling sequence, if there is
Wherein, P (y|x) is the probability value of the output P after constraint of the conditional random field; the Score may be calculated by:
wherein, ψ i (x, y) is the feature vector, so the goal of training the model is to maximize probability P (yx), which is obtained by log likelihood:
Defining the loss function as-log (P (y|x)), and optimizing the loss function-log (P (y|x)) by an optimization algorithm to realize training of the entity extraction model BiLSTM-CRF.
Step S3-3, entity extraction model evaluation: evaluating the training effect of the entity extraction model according to the accuracy rate, the recall rate and the F1 value; wherein the said
Step S3-4, training a relation extraction model: establishing a relation extraction model based on an attention mechanism, and training the model by utilizing a corresponding data set;
The embodiment adopts a relationship extraction model established based on an attention mechanism, and the specific method for establishing and training the model is as follows:
Since the LSTM's "degree of influence" between obtaining the output information at each point in time is the same, in the relationship classification, in order to be able to emphasize the importance of the partial output result to the classification, the attention mechanism is essentially a weighted summation.
And training a relation extraction model to extract unstructured data to be extracted, and obtaining the relation between the entities. The relation extraction model firstly outputs a vector form of a text through a BiLSTM layer, and then classifies the relation through an attention mechanism layer to obtain the relation among the entities.
(1) Input and word embedding layer: the model input is a sample in sentence units. The word embedding layer essentially characterizes the input sentence, given a sentence S comprising given a word comprising T characters: s= { x 1,x2,...,xT }, where x i represents each character.
(2) BiLSTM: biLSTM is identical in structure to step S3-2, and the LSTM unit can be represented by the following formula:
ct=itgt+ftct-1
ht=ottanh(ct)
the output of the model includes forward direction And backward/>Two results by stitching/>As the final BiLSTM output.
(3) The Attention structure: since the LSTM's "degree of influence" between obtaining the output information at each point in time is the same, in the relationship classification, in order to be able to emphasize the importance of the partial output result to the classification, the attention mechanism is essentially a weighted summation.
The input of the model takes sentences as a unit, the output of the model passing through BiLSTM layers is H= { H 1,h2,...,hT }, and matrix parameters to be trained are obtainedR represents a set of real numbers, d w represents a dimension of word embedding, satisfying:
M=tanh(H)
α=softmax(wTM)
r=HαT
Wherein M is intermediate quantity, nonsensical, alpha is attention weight coefficient, r is the result of the LSTM output H after being weighted and summed, and finally, a characterization vector H * =tanh (r) is generated through a nonlinear function.
(4) Loss function: mapping the characterization vector h * to the class vector through the fully connected network, and outputting the predicted probability of the relation classification through softmax for the input sentence SObtaining predictive tag/>, by argmax
Where W and b are the parameter matrix and bias, respectively.
The negative log likelihood is used to define the loss function J (θ) as:
Where t ε R m is a single-heat representation, y ε R m is an estimated probability of each relationship class output through softmax, λ is a regularized hyper-parameter, θ represents model parameters of the relationship extraction model, including n and b; f is a norm;
And (3) optimizing the loss function J (theta) through an optimization algorithm to realize the training of the relation extraction model.
Step S3-5, evaluating a relation extraction model: evaluating the training effect of the relation extraction model according to the accuracy rate;
Wherein steps S3-4 and S3-5 are interchangeable with steps S3-2 and S3-3.
S4, building a process knowledge graph:
step S4-1, extracting process entities: extracting unstructured data to be extracted by using a trained entity extraction model to obtain a process entity;
Step S4-2, process entity table: the process entity extracted according to the entity extraction model is stored as a process entity table in a table form, and part of the process entity table is shown in table 2:
TABLE 2 part of the Process entity Table
ID Name Label
001 Cylinder head Workpiece
002 Oil sprayer sheath Workpiece
003 Valve seat Workpiece
004 Pressure test Process for producing a solid-state image sensor
Step S4-4, entity relation table: extracting the relationship by using a trained relationship extraction model, and obtaining an entity-relationship table corresponding to the relationship one by one on the basis of a process entity table, wherein the entity-relationship table is shown as a part of entity-relationship table in table 3;
TABLE 3 partial entity-relationship table
Start_Name Relation End_Name
Pressure test Acting on Cylinder head
Clear overall Acting on Cylinder head
Press mounting assembly station Realization of Sheath assembly
Step S4-4, knowledge fusion: according to all the extracted entities and relations, carrying out knowledge fusion by adopting a semantic similarity calculation-based method, referring to fig. 4-5, merging knowledge with the same or high similarity of semantics; the method based on semantic similarity calculation can be replaced by other methods, such as an inner product method, a cosine method, a Dice coefficient method and the like.
In this embodiment, the specific method for performing knowledge fusion by using the semantic similarity calculation method is as follows:
(1) Semantic similarity calculation: calculating the similarity among concepts, attributes and structural relations in the process knowledge through Jaccard similarity coefficients, classifying, and providing a basis for semantic space model fusion;
(2) Semantic space model fusion: according to the fusion operation rule, carrying out fusion operation on domain knowledge with different similarities, and eliminating similar redundancy or conflict contradiction between the domain knowledge;
(3) Entity linking: and linking the newly added domain knowledge with the existing map by using a joint link model based on the map, calculating the compatibility and the dependence among the entities, disambiguating the newly added knowledge according to the calculation result, and merging the newly added knowledge into the knowledge map.
Step S4-5, knowledge graph: establishing a knowledge graph in a neo4j graph database according to the entity-relation table after knowledge fusion; after the process knowledge graph is constructed, process designers can design a process by utilizing the process knowledge graph, and can upload new knowledge on the basis, so that the update and sharing of the knowledge are realized.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. The unstructured data is data which is irregular or incomplete in data structure, has no predefined data model and is inconvenient to be represented by a two-dimensional logic table of a database;
the extraction method is characterized by comprising the following specific steps of:
Step S1, combing the domain knowledge concept entity and the relationship combing to establish a domain knowledge graph model layer;
s2, preprocessing unstructured data to obtain manually marked text data;
step S3, establishing an entity extraction model based on a bidirectional long-short-term memory neural network and a conditional random field, establishing a relationship extraction model based on an attention mechanism, and training the entity extraction model and the relationship extraction model by using corresponding data sets respectively;
Step S4, extracting unstructured data to be extracted by using a trained entity extraction model to obtain a domain entity, and storing the domain entity in a form of a table as a domain entity table; extracting the relationship by using a trained relationship extraction model, and obtaining an entity-relationship table corresponding to the entity and the relationship one by one on the basis of the entity table in the field;
Carrying out knowledge fusion based on semantic similarity according to all the extracted entities and relations to obtain a knowledge-relation table after knowledge fusion, and establishing a knowledge graph in a neo4j graph database according to the entity-relation table;
The specific steps of step S3 are as follows:
s3-1, forming a training set and a testing set for training an entity extraction model and a relation extraction model according to manually marked data;
S3-2, establishing an entity extraction model based on a bidirectional long-short-term memory neural network and a conditional random field, and training the model by using a corresponding data set; establishing a relation extraction model based on an attention mechanism, and training the model by utilizing a corresponding data set;
s3-3, evaluating the training effect of the entity extraction model according to the accuracy rate, the recall rate and the F1 value; evaluating the training effect of the relation extraction model according to the accuracy rate;
In step S3-2, when the entity extraction model is established: the output dimension of BiLSTM layers of the bidirectional long-short-term memory neural network BiLSTM is the same as the number of label types, and for each input w i, the network outputs a probability value P ij of a label j corresponding to the input w i, and finally an output P of the network is obtained, namely, each input corresponds to a labeling probability value of each label; the conditional random field CRF calculates the labeling probability value under the condition constraint, and the labeling probability value is calculated by setting y as a predicted labeling sequence, x as a text input sequence and y' as an accurate labeling sequence, and the labeling probability value is calculated by the conditional random field CRF
Wherein, P (y|x) is the probability value of the output P after constraint of the conditional random field; the Score may be calculated by:
Wherein, ψ i (x, y) is a feature vector;
When training the entity extraction model, the objective is to maximize probability P (y|x), which is obtained by log likelihood:
defining a loss function as-log (P (y|x)), and optimizing the loss function-log (P (y|x)) through an optimization algorithm to realize training of an entity extraction model BiLSTM-CRF;
In the step S3-2 of the process,
When a relation extraction model is established, firstly, a vector form of a text is output through a BiLSTM layer of a bidirectional long-short-term memory neural network BiLSTM, then, the relation is classified through an attention mechanism layer, the relation among entities is obtained, and the relation extraction model is established;
when training the relation extraction model, the input of the relation extraction model takes sentences as a unit, and a sentence S containing T characters is given: s= { x 1,x2,...,xT }, where x i represents each character, the output through BiLSTM layers is h= { H 1,h2,...,hT }, the matrix parameters to be trained D w represents the dimension of word embedding, satisfying:
M=tanh(H)
α=softmax(wTM)
r=HαT
Wherein alpha is the attention weight coefficient, and r is the result of adding up the weighted outputs H of BiLSTM layers;
Finally, generating a characterization vector h * =tanh (r) through a nonlinear function;
Mapping the characterization vector h * onto the class vector through the fully connected network, outputting the probability of the predicted relationship classification through softmax for the input sentence s Obtaining predictive tag/>, by argmax
Wherein W and b are a parameter matrix and a bias, respectively;
the negative log likelihood is used to define the loss function as:
wherein t ε R m is a single-heat representation, y ε R m is an estimated probability of each relationship class output through softmax, λ is a regularized hyper-parameter, θ represents a model parameter of the relationship extraction model;
And (3) optimizing the loss function J (theta) through an optimization algorithm to realize the training of the relation extraction model.
2. The method for domain knowledge extraction for unstructured data according to claim 1, wherein the specific steps of step S1 are as follows:
Step S1-1, combing knowledge concepts and relations in the multiple scene fields according to the purpose of knowledge extraction;
and S1-2, defining a knowledge structure according to the domain knowledge concept entity and the relationship, and establishing a domain knowledge graph model layer.
3. The method for domain knowledge extraction for unstructured data according to claim 1, wherein the specific steps of step S2 are as follows:
S2-1, analyzing unstructured data into txt files by using a text analysis tool;
S2-2, utilizing Jieba word segmentation tools to segment the text file;
s2-3, removing stop word processing is carried out on the text after word segmentation;
and S2-4, manually labeling the text data based on the BIO labeling method or BIOES labeling method.
4. A method for domain knowledge extraction for unstructured data according to any of claims 1-3, wherein in step S4, the specific method for knowledge fusion by using a method based on semantic similarity calculation is as follows:
(1) Semantic similarity calculation: calculating the similarity among concepts, attributes and structural relations in the process knowledge through Jaccard similarity coefficients, classifying the similarity, and providing a basis for semantic space model fusion;
(2) Semantic space model fusion: according to the fusion operation rule, carrying out fusion operation on domain knowledge with different similarities, and eliminating similar redundancy or conflict contradiction between the domain knowledge;
(3) Entity linking: and linking the newly added domain knowledge with the existing map by using a joint link model based on the map, calculating the compatibility and the dependence among the entities, disambiguating the newly added knowledge according to the calculation result, and merging the newly added knowledge into the knowledge map.
CN202211259591.5A 2022-10-14 2022-10-14 Unstructured data-oriented domain knowledge extraction method Active CN115510245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211259591.5A CN115510245B (en) 2022-10-14 2022-10-14 Unstructured data-oriented domain knowledge extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211259591.5A CN115510245B (en) 2022-10-14 2022-10-14 Unstructured data-oriented domain knowledge extraction method

Publications (2)

Publication Number Publication Date
CN115510245A CN115510245A (en) 2022-12-23
CN115510245B true CN115510245B (en) 2024-05-14

Family

ID=84510722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211259591.5A Active CN115510245B (en) 2022-10-14 2022-10-14 Unstructured data-oriented domain knowledge extraction method

Country Status (1)

Country Link
CN (1) CN115510245B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033527B (en) * 2023-10-09 2024-01-30 之江实验室 Knowledge graph construction method and device, storage medium and electronic equipment
CN117909492B (en) * 2024-03-19 2024-07-09 国网山东省电力公司信息通信公司 Method, system, equipment and medium for extracting unstructured information of power grid

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800411A (en) * 2018-12-03 2019-05-24 哈尔滨工业大学(深圳) Clinical treatment entity and its attribute extraction method
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN110795543A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Unstructured data extraction method and device based on deep learning and storage medium
CN110866121A (en) * 2019-09-26 2020-03-06 中国电力科学研究院有限公司 Knowledge graph construction method for power field
KR102223382B1 (en) * 2019-11-14 2021-03-08 숭실대학교산학협력단 Method and apparatus for complementing knowledge based on multi-type entity
WO2021082366A1 (en) * 2019-10-28 2021-05-06 南京师范大学 Interactive and iterative learning-based intelligent construction method for geographical name tagging corpus
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system
WO2021212682A1 (en) * 2020-04-21 2021-10-28 平安国际智慧城市科技股份有限公司 Knowledge extraction method, apparatus, electronic device, and storage medium
CN114077673A (en) * 2021-06-21 2022-02-22 南京邮电大学 Knowledge graph construction method based on BTBC model
CN114911945A (en) * 2022-04-13 2022-08-16 浙江大学 Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN114925689A (en) * 2022-05-24 2022-08-19 淮阴工学院 Medical text classification method and device based on BI-LSTM-MHSA
CN115062109A (en) * 2022-06-16 2022-09-16 沈阳航空航天大学 Entity-to-attention mechanism-based entity relationship joint extraction method
CN115168606A (en) * 2022-07-01 2022-10-11 北京理工大学 Mapping template knowledge extraction method for semi-structured process data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097601A1 (en) * 2018-09-26 2020-03-26 Accenture Global Solutions Limited Identification of an entity representation in unstructured data

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800411A (en) * 2018-12-03 2019-05-24 哈尔滨工业大学(深圳) Clinical treatment entity and its attribute extraction method
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN110795543A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Unstructured data extraction method and device based on deep learning and storage medium
CN110866121A (en) * 2019-09-26 2020-03-06 中国电力科学研究院有限公司 Knowledge graph construction method for power field
WO2021082366A1 (en) * 2019-10-28 2021-05-06 南京师范大学 Interactive and iterative learning-based intelligent construction method for geographical name tagging corpus
KR102223382B1 (en) * 2019-11-14 2021-03-08 숭실대학교산학협력단 Method and apparatus for complementing knowledge based on multi-type entity
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system
WO2021212682A1 (en) * 2020-04-21 2021-10-28 平安国际智慧城市科技股份有限公司 Knowledge extraction method, apparatus, electronic device, and storage medium
CN114077673A (en) * 2021-06-21 2022-02-22 南京邮电大学 Knowledge graph construction method based on BTBC model
CN114911945A (en) * 2022-04-13 2022-08-16 浙江大学 Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN114925689A (en) * 2022-05-24 2022-08-19 淮阴工学院 Medical text classification method and device based on BI-LSTM-MHSA
CN115062109A (en) * 2022-06-16 2022-09-16 沈阳航空航天大学 Entity-to-attention mechanism-based entity relationship joint extraction method
CN115168606A (en) * 2022-07-01 2022-10-11 北京理工大学 Mapping template knowledge extraction method for semi-structured process data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究;杨秀璋等;《通信学报》;20220621;全文 *
融合对抗训练的端到端知识三元组联合抽取;黄培馨;赵翔;方阳;朱慧明;肖卫东;;计算机研究与发展;20191215(12);全文 *

Also Published As

Publication number Publication date
CN115510245A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
CN107992597B (en) Text structuring method for power grid fault case
CN115510245B (en) Unstructured data-oriented domain knowledge extraction method
CN110598005B (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN113535904B (en) Aspect level emotion analysis method based on graph neural network
CN112115238A (en) Question-answering method and system based on BERT and knowledge base
CN105512209A (en) Biomedicine event trigger word identification method based on characteristic automatic learning
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN113705238A (en) Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN116975256A (en) Method and system for processing multisource information in construction process of underground factory building of pumped storage power station
CN118013038A (en) Text increment relation extraction method based on prototype clustering
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN116701665A (en) Deep learning-based traditional Chinese medicine ancient book knowledge graph construction method
CN114357166B (en) Text classification method based on deep learning
CN114970557B (en) Knowledge enhancement-based cross-language structured emotion analysis method
CN113064967B (en) Complaint reporting credibility analysis method based on deep migration network
CN113158658B (en) Knowledge embedding-based structured control instruction extraction method
CN112765314B (en) Power information retrieval method based on power ontology knowledge base
CN114897078A (en) Short text similarity calculation method based on deep learning and topic model
CN113326371A (en) Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information
CN118132738B (en) Extraction type question-answering method for bridge evaluation text
Xiang et al. Multi-label emotion classification for imbalanced Chinese corpus based on CNN
Feifei et al. Intelligent question and answer analysis model of power ICT based on BI-LSTM-CRF
Huang et al. Research on text naming recognition algorithm based on text mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant