CN108182295B - Enterprise knowledge graph attribute extraction method and system - Google Patents

Enterprise knowledge graph attribute extraction method and system Download PDF

Info

Publication number
CN108182295B
CN108182295B CN201810136568.4A CN201810136568A CN108182295B CN 108182295 B CN108182295 B CN 108182295B CN 201810136568 A CN201810136568 A CN 201810136568A CN 108182295 B CN108182295 B CN 108182295B
Authority
CN
China
Prior art keywords
event
entity
attribute
dimensional matrix
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810136568.4A
Other languages
Chinese (zh)
Other versions
CN108182295A (en
Inventor
孙世通
刘德彬
严开
陈玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Yijin Technology Co.,Ltd.
Chongqing Yucun Technology Co ltd
Original Assignee
Chongqing Socialcredits Big Data Technology Co ltd
Chongqing Telecommunication System Integration Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Socialcredits Big Data Technology Co ltd, Chongqing Telecommunication System Integration Co ltd filed Critical Chongqing Socialcredits Big Data Technology Co ltd
Priority to CN201810136568.4A priority Critical patent/CN108182295B/en
Publication of CN108182295A publication Critical patent/CN108182295A/en
Application granted granted Critical
Publication of CN108182295B publication Critical patent/CN108182295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an enterprise knowledge graph attribute extraction method, which comprises the following steps: defining an entity category and an event category; defining attribute structures for each type of entity; preparing and marking the corpus; extracting entity attributes; and fusing entity attributes. The invention = objectivity and high efficiency of text content extraction and classification by combining knowledge of experts on entity attributes in specific fields and machine learning, and is applied to Chinese corpora of full enterprise data; and various target attributes can be identified by a small amount of labels. The problem of extraction of node entity attributes in a knowledge graph and fusion of multi-source attributes is solved.

Description

Enterprise knowledge graph attribute extraction method and system
Technical Field
The invention relates to an information processing method and an information processing system, in particular to an enterprise knowledge graph attribute extraction method and an enterprise knowledge graph attribute extraction system.
Background
The knowledge graph is a semantic network based on a graph data structure, and the basic units of the knowledge graph are nodes (nodes) and edges (edges). In the enterprise knowledge graph, nodes represent event entities and enterprise entities; edges characterize relationships between entities. In the whole enterprise knowledge graph, if an enterprise is focused, basic information of the enterprise, a development process formed by connecting event nodes in series, and contents such as enterprise clusters associated with each layer (the association includes but is not limited to equity investment, cooperation, upstream and downstream, subordination and the like) can be found.
The knowledge graph is applied to the fields of enterprise information and enterprise risk discovery, and has the core value that enterprise information of various categories is organically connected in series, so that a risk model is facilitated to identify hidden associated risks, group risks and the like. In the step of structuring node data, two major problems are mainly faced: 1) extracting different attributes from different data sources, and 2) reasonably fusing the attributes from different sources in the same entity.
In terms of technical aspect, to construct such an enterprise knowledge graph, the following two difficulties need to be overcome:
entity attribute extraction, multi-source attribute fusion and establishment of relationships among different entities.
In the prior art, attribute extraction and fusion based on an industry experience rule and a dictionary and attribute extraction and fusion based on supervised learning and pattern matching are adopted.
The defects of the prior art are that the attribute extraction and fusion based on the industry experience rule and the dictionary: for entities in different industries, the determination of the industrial attributes of the entities needs the intervention of professionally qualified industry experts, but the problems of low marking efficiency, inconsistent marking standards and the like cannot be solved all the time depending on manpower. While the dictionary depending on the unified standard can recognize the relationship of the verb as the central word in the text, the extraction of the relationship such as the noun colloquial word is easy to be misjudged. In addition, this method cannot effectively process and judge unknown words.
In the prior art, attribute extraction and fusion based on supervised learning and pattern matching are adopted: the classifier is constructed on the corpus labeled manually, but the main bottleneck of the classifier is that more labels are needed and the requirement on data quality is high.
In the prior art, the extraction of enterprise knowledge graph attributes is mainly based on text data, but certain restrictions exist when an image, an audio/video and a text appear simultaneously and cross-source processing is required. The condition of extracting entities and relations with different levels and granularities is not considered in the modeling process.
In the prior art, the enterprise knowledge map attribute extraction adopts manual marking for processing the target text, so that the efficiency is low, the cost is high, and the massive text cannot be quickly processed.
In the prior art, correlation analysis and reasoning between texts cannot be realized by extracting enterprise knowledge graph attributes, and end-to-end adaptive learning and relationship establishment are realized.
Disclosure of Invention
The invention provides a method for extracting enterprise knowledge graph attributes efficiently, automatically and accurately, which comprises the following steps:
defining entity types, event types and entity attribute structures of training samples;
preparing and marking a training sample corpus;
training an entity attribute extraction model;
inputting the target text into an entity attribute extraction model to obtain target text entity attributes;
and performing entity attribute fusion on the target text.
Further, the entity category, the event category and the entity attribute structure defining the training sample comprise,
defining entity categories as enterprise factors or/and personal factors;
defining event categories as one or more of official documents, court announcements, tenders, equity, strategies, personnel, finance, debt, products, marketing, branding, accidents;
defining the fields of the attributes as a plurality of or one of type fields, time fields, mark fields and body fields;
the preparation and marking of the training sample corpus comprises marking the event category and the entity attribute structure of each text of the training sample library.
Further, the training of the entity attribute extraction model comprises the following steps:
s1: marking according to characters, inputting an N x K dimensional character vector matrix as a first bidirectional long-short time memory recurrent neural network to obtain an N x T dimensional marking class probability distribution matrix of each character, wherein N is a batch size numerical value, K is a character embedding vector length, T is a character marking class number, the position of the maximum value corresponds to a label of the current character, and character embedding data of each character are obtained;
s2: determining training sample subject information;
s3, defining the event vector according to the following formula, wherein evenEmbedding is the event vector, wjThe vector of the jth word in the sentence is represented, and n represents the sentence within the front-back distance n of the main body;
Figure BDA0001576456300000031
and according to event labels, taking an N-K dimensional event vector matrix as the initial input of a second bidirectional long-time and short-time memory cyclic neural network, wherein N is a batch size numerical value, K is a word embedded vector length, L is the category number of the event labels, and the position of the maximum value corresponds to the label of the current event.
The bayesian network is defined as:
P(A,B,C,D)=P(D|A,B)*P(C|A)*P(B|A)P(A)
a is the probability of whether the text describes an event of some kind,
b is the probability of the event extraction being successful,
c is the probability of containing time information,
d is the probability of containing the vocabulary of the specific field,
wherein the value of B is determined by whether the label output by the N-L dimension labeling class probability distribution matrix is the same as the labeling of the training sample, if the label is the same, the value of B is 1, if the label is not the same, the value of B is 0,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and feeding a feature fusion result back to the second bidirectional long-short-term memory recurrent neural network;
s4: and defining a loss function as the mean square error of the output of each time node of the bidirectional long-time memory recurrent neural network and the marking data of the training sample, and repeating the step S3 until the loss function is converged.
Further, the entity attribute extraction model includes,
acquiring a first N x L dimensional matrix from a forward hidden layer of a second bidirectional long-short time memory cyclic neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and taking a feature fusion result as an input of the second bidirectional long-short time memory cyclic neural network to the backward hidden layer;
alternatively, the first and second electrodes may be,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network output layer, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and taking a feature fusion result as the input of the second bidirectional long-short-term memory recurrent neural network input layer;
further, performing entity attribute fusion on the target text comprises the following steps:
a, selecting a basic structure of event entity data as a base value according to the similarity with a structure template;
b, traversing the candidate set events, and matching the tree types according to the depth priority of the tree type structure;
c when two events are compared, the following rules are followed:
if the node attribute value in the basic structure is missing, directly supplementing;
if the attribute values of the corresponding nodes conflict in the basic structure, if the attribute values of the candidate set obtained by the quality evaluation function are better, replacing the non-null value of the substrate;
if the base attribute is in a list format, adding unique non-repetitive elements in the candidate set to the table of the base;
and D, repeating the step B and the step C until the attribute can not be perfected continuously.
In order to ensure the implementation of the method, the invention also provides an enterprise knowledge graph attribute extraction system, which comprises the following units:
the defining unit is used for defining entity types, event types and entity attribute structures of the training samples;
the marking unit is used for training the sample corpus preparation and marking;
the training unit is used for training the entity attribute extraction model;
the entity attribute extraction unit is used for inputting the target text into the entity attribute extraction model to obtain the entity attribute of the target text;
and the attribute fusion unit is used for executing entity attribute fusion on the target text.
Further, the definition unit defines entity category, event category and entity attribute structure of the training sample including,
defining entity categories as enterprise factors or/and personal factors;
defining event categories as one or more of official documents, court announcements, tenders, equity, strategies, personnel, finance, debt, products, marketing, branding, accidents;
defining the fields of the attributes as a plurality of or one of type fields, time fields, mark fields and body fields;
the training sample corpus preparation and marking comprises labeling the event category and the entity attribute structure of each text of the training sample library.
Further, the training unit trains the entity attribute extraction model by adopting the following steps:
s1: marking according to characters, inputting an N x K dimensional character vector matrix as a first bidirectional long-short time memory recurrent neural network to obtain an N x T dimensional marking class probability distribution matrix of each character, wherein N is a batch size numerical value, K is a character embedding vector length, T is a character marking class number, the position of the maximum value corresponds to a label of the current character, and character embedding data of each character are obtained;
s2: determining training sample subject information;
s3, defining the event vector according to the following formula, wherein evenEmbedding is the event vector, wjA vector representing the jth word in the sentence, n representing the sentence within a distance n before and after the subject;
Figure BDA0001576456300000061
And according to event labels, taking an N-K dimensional event vector matrix as the initial input of a second bidirectional long-time and short-time memory cyclic neural network, wherein N is a batch size numerical value, K is a word embedded vector length, L is the category number of the event labels, and the position of the maximum value corresponds to the label of the current event.
The bayesian network is defined as:
P(A,B,C,D)=P(D|A,B)*P(C|A)*P(B|A)P(A)
a is the probability of whether the text describes an event of some kind,
b is the probability of the event extraction being successful,
c is the probability of containing time information,
d is the probability of containing the vocabulary of the specific field,
wherein the value of B is determined by whether the label output by the N-L dimension labeling class probability distribution matrix is the same as the labeling of the training sample, if the label is the same, the value of B is 1, if the label is not the same, the value of B is 0,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and feeding a feature fusion result back to the second bidirectional long-short-term memory recurrent neural network;
s4: and defining a loss function as the mean square error of the output of each time node of the bidirectional long-time memory recurrent neural network and the marking data of the training sample, and repeating the step S3 until the loss function is converged.
Further, the entity attribute extraction model includes,
acquiring a first N x L dimensional matrix from a forward hidden layer of a second bidirectional long-time and short-time memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix output by the Bayesian network and the first N x L dimensional matrix, and taking a feature fusion result as the input of the second bidirectional long-time and short-time memory recurrent neural network to the backward hidden layer;
alternatively, the first and second electrodes may be,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network output layer, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and taking a feature fusion result as the input of the second bidirectional long-short-term memory recurrent neural network input layer;
further, the attribute fusion unit performs entity attribute fusion on the target text by adopting the following steps:
a, selecting a basic structure of event entity data as a base value according to the similarity with a structure template;
b, traversing the candidate set events, and matching attributes in pairs according to the depth priority of the tree structure;
c when two events are compared, the following rules are followed:
if the node attribute value in the basic structure is missing, directly supplementing;
if the attribute values of the corresponding nodes conflict in the basic structure, if the attribute values of the candidate set obtained by the quality evaluation function are better, replacing the non-null value of the substrate;
if the base attribute is in a list format, adding unique non-repetitive elements in the candidate set to the table of the base;
and D, repeating the step B and the step C until the attribute can not be perfected continuously.
The invention has the beneficial effects that:
the method 1 realizes the acquisition of knowledge in the multi-source heterogeneous data and reduces the dependency degree of an algorithm model on the label.
2, the entity attribute extraction, the multi-source attribute fusion and the establishment of the relationship among different entities are realized.
Combining the objectivity and high efficiency of the expert on the entity attribute of the specific field and the machine learning on the extraction and classification of the text content, and applying the objectivity and high efficiency to Chinese corpora of the total enterprise data; and various target attributes can be identified by a small amount of labels.
And 4, after the attribute extraction model is trained through the sample data, automatic entity attribute extraction and knowledge graph construction are realized on the mass target text data, the efficiency is improved, and the labor cost is reduced.
5 the invention combines the advantages of Bayesian network and LSTM to provide Bayesian recurrent neural network. The Bayesian network feeds back the BilSTM recurrent neural network, so that the BilSTM recurrent neural network is transversely used for capturing long-time long-range space-time correlation between entities, and the Bayesian network is longitudinally used for analyzing and reasoning the correlation. Meanwhile, the BilSTM is updated by feeding back the inference result of the Bayesian network, thereby realizing end-to-end adaptive learning and relationship establishment.
Drawings
FIG. 1 is a flowchart of an enterprise knowledge graph attribute extraction method according to an embodiment of the present invention.
FIG. 2 is a block diagram of an enterprise knowledge graph attribute extraction system according to an embodiment of the present invention.
FIG. 3 is a diagram of a prior art long term memory network.
FIG. 4 is a diagram of a prior art BilSTM neural network model.
Fig. 5 is a schematic diagram of a bayesian recurrent neural network model according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a bayesian network according to an embodiment of the present invention.
FIG. 7 is a diagram of a prior art LSTM memory module according to the present invention.
FIG. 8 is a schematic diagram of feature fusion according to an embodiment of the present invention.
FIG. 9 is a schematic diagram of feature fusion according to an embodiment of the present invention.
Detailed Description
One of the ideas of the present invention for solving the description problem of the background art is as follows: and the Bayesian recurrent neural network is adopted as an entity attribute extraction model to realize the extraction of the enterprise knowledge map attributes. The Bayesian network is used as a network layer to be stacked on the upper layer of the BilSTM recurrent neural network, so that the temporal and spatial correlation between entities in a long time and a long range can be captured by using the BilSTM recurrent neural network in the transverse direction, and the correlation analysis and reasoning can be realized by using the Bayesian network in the longitudinal direction. Meanwhile, the BilSTM is updated by feeding back the inference result of the Bayesian network, thereby realizing end-to-end adaptive learning and relationship establishment. And constructing an accurate and efficient entity attribute extraction model, and realizing the automation of entity attribute extraction.
As shown in FIG. 1, the method for extracting enterprise knowledge graph attributes of the invention comprises the following steps:
defining entity types, event types and entity attribute structures of training samples;
preparing and marking a training sample corpus;
training an entity attribute extraction model;
inputting the target text into an entity attribute extraction model to obtain target text entity attributes;
and performing entity attribute fusion on the target text.
Wherein, in the step of defining the entity category and the event category,
the entity category may be business or personal.
The event category can be official documents, court announcements, tenders, equities, strategies, personnel, finance, debt, products, marketing, branding, accidents, etc
For each type of entity, a standardized attribute structure is defined, and taking accident as an example, in an embodiment of the present invention, the attribute structure defining an event is as follows:
Figure BDA0001576456300000101
taking the stock right as an example, the attribute structure of the defined event in an embodiment of the present invention is:
Figure BDA0001576456300000102
in the steps of preparing and marking the corpus, the word notation specification and meaning in one embodiment of the present invention are as follows:
B-ORG stands for entity start bit tag
I-ORG stands for entity composition tag
X represents placeholders such as punctuation
O represents other characters
After the corpus marking is finished, the follow-up program can understand the meaning of the entity in the text, and the machine can conveniently process the text.
According to the above specifications, marking of each character of the training text is completed in one embodiment of the invention.
The event tag specification and meaning in one embodiment of the invention is as follows:
judge represents the official document;
NOTESE stands for court announcements;
COURT stands for open bulletin;
bid stands for bid;
STOCK stands for STOCK right;
STRATGY stands for STRATEGY;
HR stands for human;
FINANCE stands for FINANCE;
debt represents debt;
PROD stands for product;
marker stands for marketing;
BRAND stands for BRAND;
ACCIDENT represents an ACCIDENT;
it should be noted that the labels and specifications of the events can be flexibly selected according to specific items, and are not limited to the events listed by the present invention.
The event labels are expressed in English to facilitate subsequent procedures to process the text.
And according to the specifications, marking each text of the training texts.
In one embodiment of the invention, the marking of the training text is performed manually, and the marking result is used as a reference for model training in the subsequent steps.
The following describes the steps of training the entity attribute extraction model with reference to the embodiment,
in view of the problems (mentioned in the background) of the current mainstream methods in processing entity attribute extraction, the deep neural network is to deal with the difficulties. The invention provides an end-to-end semi-supervised and unsupervised method applied to the attribute extraction problem of an event entity taking an enterprise as a main body, so that the acquisition of knowledge in multi-source heterogeneous data is realized and the dependence degree of an algorithm model on a label is reduced.
Long-Short Term Memory Network (LSTM) is a special recurrent neural Network used to learn the Long-Term dependence of time series data. Since its introduction, it has been widely used in handwriting, speech recognition, machine translation, etc., and has achieved unusual results. The method can realize long-term data memory and has obvious effect in text semantic analysis. The LSTM is expanded in the time dimension, a chain LSTM neural network can be obtained, and the relationship between entities with uncertain length and the entities can be modeled so as to further characterize the respective characteristics of the entities. The LSTM memory module is shown in fig. 7.
The cell of the LSTM can be characterized by the following equation:
it=g(Wxixt+Whiht-1+bi)
ft=g(Wxfxt+Whfht-1+bf)
ot=g(Wxoxt+Whoht-1+bo)
the input variation can be characterized by the following equation:
c_int=tanh(Wxcxt+Whcht-1+bc_in)
the state change can be characterized by the following equation:
ct=ft·ct-1+it·c_int
ht=ot·tanh(ct)
the Bidirectional long-short term memory network (BilSTM) comprises a forward hidden layer and a backward hidden layer, can acquire the long-time long-range associated dependency relationship of a context, captures the characteristics of a contextual entity, acquires the space-time correlation among more entities, can eliminate the influence of noise such as interference entities on a neural network model from two directions, greatly assists in mining the long-time dependency relationship, and extracts the high-level semantic characteristics which are vital to information extraction, entity relationship identification and the like. The advantages of LSTM and its variants over bayesian networks are the ability to capture long sequence relationships between entities, but their reasoning ability and interpretability is poor. The BilSTM neural network model is shown in FIG. 4.
Bayesian Networks (BN), also known as Belief networks (Belief networks), are a probabilistic graphical model. The method simulates uncertainty of causal relationship in human reasoning process to realize relationship establishment and reasoning, and has good knowledge expression and capability of processing uncertainty knowledge. In addition, the bayesian network can encode and interpret knowledge from a probability perspective, and has been widely used in many fields including computer intelligence science, medical diagnosis, information retrieval, and the like. The Bayesian network has the advantages of strong reasoning capability and the disadvantages of poor modeling capability on long sequences and incapability of capturing indirect relationships between entities.
The invention combines the advantages of the Bayesian network and the BilSTM to provide the Bayesian recurrent neural network. The Bayesian network is used as a network layer to be stacked on the upper layer of the BilSTM recurrent neural network, so that the temporal and spatial correlation between entities in a long time and a long range can be captured by using the BilSTM recurrent neural network in the transverse direction, and the correlation analysis and reasoning can be realized by using the Bayesian network in the longitudinal direction. Meanwhile, the BilSTM is updated by feeding back the inference result of the Bayesian network, thereby realizing end-to-end adaptive learning and relationship establishment.
Fig. 5 shows a bayesian recurrent neural network model according to an embodiment of the present invention.
The embodiment of the invention trains an entity attribute extraction model by adopting the following steps:
s1 marks the words, and inputs the word vector matrix (N × K) as the BiLSTM to obtain the mark class probability distribution (N × 4 matrix) of each word. Wherein N is the length of each batch, K is the Embedding vector length, 4 is the number of classes of word labeling, and the position of the maximum value corresponds to the label of the current word. At the same time, the word embedding of each word is obtained.
Embedding can be viewed as a mathematical spatial Mapping (Mapping): map (lambda y: f (x)), which is characterized by: more precisely, when the function f is called a simple shot, for Y in each value domain, there is at most one X in the definition domain so that f (X) is Y), and the structure before and after mapping is unchanged, corresponding to the word embedding concept, it can be understood to find a function or mapping, generate a new spatial expression, and map the X spatial information expressed by the word one-hot to the multidimensional spatial vector of Y.
Batch Size: batch size. In an embodiment of the present invention, the parameter updating method includes three methods:
(1) and (3) the Batch Gradient Descent decreases in Batch Gradient, a loss function is calculated by traversing all data sets, and a parameter is updated once, so that the obtained direction can more accurately point to the direction of the extreme value.
(2) And the Stochastic Gradient Descent reduces the random Gradient, calculates a loss function for each sample once, and updates the parameters once, thereby having the advantage of high speed.
(3) And (3) dividing sample data into a plurality of batches to calculate loss functions and update parameters in batches in the compromise of the Mini-batch Gradient Decent and the previous two methods, wherein the directions are stable. S2 obtains subject candidates for the event from the text according to the result of the sequence annotation,
s2: determining the subject by syntactic and part-of-speech analysis (dependency syntactic analysis, common general knowledge to those skilled in the art is not expanded here);
s3, defining an event vector according to the following formula, wherein evenEmbedding is the event vector, wj represents a vector of the jth word in a sentence, and n represents a sentence within the front-back distance n of a main body;
Figure BDA0001576456300000151
through the steps, the event vector matrix of the text can be obtained from the label class probability distribution of each word in the training text or the target text.
And according to event labels, inputting an event vector matrix (N x K) as a BilSTM to obtain the labeled event probability distribution (N x L matrix) of each event in the training sample. Wherein N is the length of each batch, K is the Embedding vector length, L is the number of categories of event labeling (which will not be described in detail later), and the position of the maximum value corresponds to the label of the current event.
The position of the maximum value corresponds to the label of the current event, namely the event with the maximum probability in the probability distribution is judged as the result of entity attribute extraction.
In an embodiment of the present invention, the event label refers to a text set labeled as the same event type in the training sample.
In an embodiment of the present invention, as shown in fig. 6, according to the actual dependency relationship, a DAG (Directed Acyclic Graph) defining a joint probability that a bayesian network existing text describes a certain class of events and a joint probability that a text describes the certain class of events is defined as:
P(A,B,C,D)=P(D|A,B)*P(C|A)*P(B|A)P(A)
a is the probability of whether the text describes an event of some kind,
b is the probability of the event extraction being successful,
d is the probability of containing the vocabulary of the specific field,
c is the probability of containing time information,
and B events (the probability of successful extraction) can be obtained by calculating whether the labels obtained by calculation are the same as the marking of the training sample in all the events of the corpus, and if the labels are the same, B is assigned to 1, and if the labels are not the same, B is assigned to 0.
If the second bidirectional long-and-short time memory label event output by the recurrent neural network is the same as the label event marked manually, the event extraction is successful, otherwise, the event extraction is unsuccessful.
In an embodiment of the present invention, a training sample is input to the BiLSTM to obtain an event class distribution of the sample, where the sample event has a maximum probability of being an accident, that is, the sample is extracted as an accident event, if the marking of the sample is an accident, the event extraction success B is 1, and if the marking of the sample is not an accident, the event extraction failure B is 0
In an embodiment of the present invention, the probability that the accident event contains the domain-specific vocabulary is that the number of samples containing the domain-specific vocabulary in all the samples manually labeled as the accident in the sample library is divided by the total number of samples manually labeled as the accident.
In an embodiment of the present invention, the probability that the accident event contains the time information is obtained by dividing the number of samples containing the time information in all the samples manually labeled as the accident event in the sample library by the total number of samples manually labeled as the accident event.
The matrix output by the Bayesian network is a probability distribution matrix of whether the text describes a certain event or not;
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and feeding a feature fusion result back to the second bidirectional long-short-term memory recurrent neural network;
specifically, the above process may include two embodiments,
the first embodiment: acquiring a first N x L dimensional matrix from a forward hidden layer of a second bidirectional long-time and short-time memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix output by the Bayesian network and the first N x L dimensional matrix, and taking a feature fusion result as the input of the second bidirectional long-time and short-time memory recurrent neural network to the backward hidden layer;
specifically, the first embodiment includes, in the following,
as shown in fig. 8, a first N × L dimensional matrix is obtained from a time t of the second bidirectional long-short time memory cyclic neural network to the front hidden layer, the first N × L dimensional matrix is input to the bayesian network, feature fusion is performed on the second N × L dimensional matrix output by the bayesian network and the first N × L dimensional matrix, and a feature fusion result is used as an input of the second bidirectional long-short time memory cyclic neural network to the rear hidden layer at the time t;
it will be appreciated by those skilled in the art that in the present invention, time t refers to the input sequence t, and the recurrent neural network will have one input Xt at each time.
In other embodiments, a first N x L dimensional matrix is obtained from a time t1 when the recurrent neural network is recalled from a second bidirectional long-short term, the first N x L dimensional matrix is input into the bayesian network, feature fusion is performed on the second N x L dimensional matrix and the first N x L dimensional matrix output by the bayesian network, a feature fusion result is used as an input at a time t2 when the recurrent neural network is recalled from the second bidirectional long-short term, and t1 and t2 are different input sequences;
second embodiment: as shown in fig. 9, a first N × L dimensional matrix is obtained from the second bidirectional long-and-short-term memory recurrent neural network output layer, the first N × L dimensional matrix is input to the bayesian network, feature fusion is performed on the second N × L dimensional matrix output by the bayesian network and the first N × L dimensional matrix, and a feature fusion result is used as an input of the second bidirectional long-and-short-term memory recurrent neural network input layer;
in the invention, the Bayesian network is used as a network layer to be stacked on the upper layer of the BilSTM recurrent neural network, so that the temporal and spatial correlation in a long time and a long range between entities is captured by using the BilSTM recurrent neural network in the transverse direction, and the correlation analysis and reasoning are realized by using the Bayesian network in the longitudinal direction. Meanwhile, the BilSTM is updated by feeding back the inference result of the Bayesian network, thereby realizing end-to-end adaptive learning and relationship establishment.
It should be noted that, taking the arithmetic mean of the bidirectional long-and-short-term memory recurrent neural network output matrix and the bayesian network output matrix is only one way of matrix feature fusion, and the invention is not limited to this, and the way of matrix feature fusion may also include geometric mean, square mean (root mean square mean, rms), harmonic mean, weighted mean, and the like.
S4 defines the loss function (loss function) as the mean square error (mean square error) of the output of each time node of the BilSTM and label, iterates the model until the loss function converges, i.e., repeats the step S3 until the loss function converges.
The following describes the step of performing entity attribute fusion on the target text with reference to an embodiment.
Inputting the target text into the entity attribute extraction model to obtain the entity attribute of the target text, obtaining the main bodies and attribute structures of all the target texts, and obtaining the distribution of the event categories to which the target text belongs:
Distribution=[p1,p2,…,pL]
however, events obtained from different data sources may describe the same event, but the attribute extraction results are missing or conflicting. Therefore, the invention introduces a fusion strategy and solves the problem on the basis of event extraction.
The invention defines the category similarity of two events and can be characterized by the similarity of the event distribution (cosine similarity, etc.). When too many events are extracted, the similarity between two traversal events will cause a large calculation overhead. Therefore, an event candidate set is obtained, and an event set to be fused is selected from the candidate set.
The basic rule for selecting the candidate set is as follows:
subject of event is the same
Event class distribution has high Similarity (Cosine Similarity)
The events are close in time
For the event candidate set, complementary fusion of attributes is also required to be realized, and the step mainly depends on the matching degree of attributes such as time, subject, category and the like to achieve entity alignment of the same event. The attribute fusion steps are as follows:
a, selecting an event entity data basic structure as a base value according to the similarity with the structure template
B traversing candidate set events, matching attributes in pairs according to tree structure depth priority
C when two events are compared, the following rules are followed:
if the node attribute value in the basic structure is missing, directly supplementing;
if the attribute values of the corresponding nodes conflict in the basic structure, if the attribute values of the candidate set obtained by the quality evaluation function are better, replacing the non-null value of the substrate;
if the base attribute is in a list format, adding unique non-repetitive elements in the candidate set to the table of the base;
d, repeating B-C until the attribute can not be improved continuously
In one embodiment of the invention, two events are extracted from two target texts
The structural template in the embodiment is
Figure BDA0001576456300000191
In the present embodiment the basic structure is
Figure BDA0001576456300000192
Figure BDA0001576456300000201
The attribute values are eventType, tags, subject, time, tags in the embodiment;
in another embodiment of the present invention, a plurality of target table texts get two events after passing through an attribute extraction model:
event 1:
Figure BDA0001576456300000202
event 2:
Figure BDA0001576456300000203
because the two events have the same subject and the same time, namely the two events have the same structure template, and the event 3 is obtained by fusing the event 1 and the time 2
Figure BDA0001576456300000211
In another embodiment of the present invention, two events are obtained after a plurality of target texts pass through an attribute extraction model:
event 4:
Figure BDA0001576456300000212
event 5:
Figure BDA0001576456300000213
Figure BDA0001576456300000221
in the embodiment, the two events have the same base structure, but the time attribute conflicts, and the time attribute value of the event 5 obtained by the quality evaluation function is better, so that the time attribute of the event 4 is replaced by time 2017-05-0800:00: 00.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments are still modified, or some or all of the technical features are equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (8)

1. An enterprise knowledge graph attribute extraction method is characterized by comprising the following steps:
defining entity types, event types and entity attribute structures of training samples;
preparing and marking a training sample corpus;
training an entity attribute extraction model; the method for training the entity attribute extraction model comprises the following steps:
s1: marking according to characters, inputting an N x K dimensional character vector matrix as a first bidirectional long-short time memory recurrent neural network to obtain an N x T dimensional marking class probability distribution matrix of each character, wherein N is a batch size numerical value, K is a character embedding vector length, T is a character marking class number, the position of the maximum value corresponds to a label of the current character, and character embedding data of each character are obtained;
s2: determining training sample subject information;
s3, defining an event vector according to the following formula, wherein evenEmbedding is the event vector, w j represents a vector of the jth word in a sentence, and n represents a sentence within a distance n between the front and the back of a subject;
Figure FDA0003157799320000011
according to event marking, taking an N-X-K dimensional event vector matrix as the initial input of a second bidirectional long-and-short time memory cyclic neural network, wherein N is a batch size numerical value, K is a word embedded vector length, L is the category number of the event marking, and the position of the maximum value corresponds to the label of the current event;
the bayesian network is defined as:
P(A,B,C,D)=P(D|A,B)*P(C|A)*P(B|A)P(A),
a is the probability of whether the text describes an event of some kind,
b is the probability of the event extraction being successful,
c is the probability of containing time information,
d is the probability of containing the vocabulary of the specific field,
wherein the value of B is determined by whether the label output by the N-L dimension labeling class probability distribution matrix is the same as the labeling of the training sample, if the label is the same, the value of B is 1, if the label is not the same, the value of B is 0,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and feeding a feature fusion result back to the second bidirectional long-short-term memory recurrent neural network;
s4: defining a loss function as the mean square error of the output of each time node of the bidirectional long-time memory recurrent neural network and the marking data of the training sample, and repeating the step S3 until the loss function is converged;
inputting the target text into an entity attribute extraction model to obtain target text entity attributes;
and performing entity attribute fusion on the target text.
2. The method of extracting enterprise knowledge-graph attributes of claim 1,
the entity category, the event category and the entity attribute structure of the defined training sample comprise,
defining entity categories as enterprise factors or/and personal factors;
defining event categories as one or more of official documents, court announcements, tenders, equity, strategies, personnel, finance, debt, products, marketing, branding, accidents;
defining the fields of the attributes as a plurality of or one of type fields, time fields, mark fields and body fields;
the preparation and marking of the training sample corpus comprises marking the event category and the entity attribute structure of each text of the training sample library.
3. The method of extracting enterprise knowledge-graph attributes of claim 1,
an entity attribute extraction model, comprising,
acquiring a first N x L dimensional matrix from a forward hidden layer of a second bidirectional long-short time memory cyclic neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and taking a feature fusion result as an input of the second bidirectional long-short time memory cyclic neural network to the backward hidden layer;
alternatively, the first and second electrodes may be,
and acquiring a first N x L dimensional matrix from the second bidirectional long-short-term memory recurrent neural network output layer, inputting the first N x L dimensional matrix into the Bayesian network, performing feature fusion on the second N x L dimensional matrix output by the Bayesian network and the first N x L dimensional matrix, and taking a feature fusion result as the input of the second bidirectional long-short-term memory recurrent neural network input layer.
4. The method of extracting enterprise knowledge-graph attributes of claim 1,
performing entity attribute fusion on the target text comprises the following steps:
1) selecting a basic structure of the event entity data as a base value according to the similarity with the structure template;
2) traversing the candidate set events, and matching attributes according to the depth priority of the tree structure;
3) when two events are compared, the following rules are followed:
if the node attribute value in the basic structure is missing, directly supplementing;
if the attribute values of the corresponding nodes conflict in the basic structure, if the attribute values of the candidate set obtained by the quality evaluation function are better, replacing the non-null value of the substrate;
if the base attribute is in a list format, adding unique non-repetitive elements in the candidate set to the table of the base;
4) and repeating the step 2) and the step 3) until the attribute can not be improved continuously.
5. An enterprise knowledge graph attribute extraction system is characterized by comprising the following units:
the defining unit is used for defining entity types, event types and entity attribute structures of the training samples;
the marking unit is used for training the sample corpus preparation and marking;
the training unit is used for training the entity attribute extraction model; the training unit trains the entity attribute extraction model by adopting the following steps:
s1: marking according to characters, inputting an N x K dimensional character vector matrix as a first bidirectional long-short time memory recurrent neural network to obtain an N x T dimensional marking class probability distribution matrix of each character, wherein N is a batch size numerical value, K is a character embedding vector length, T is a character marking class number, the position of the maximum value corresponds to a label of the current character, and character embedding data of each character are obtained;
s2: determining training sample subject information;
s3: defining an event vector according to the following formula, wherein evenEmbedding is the event vector, w j represents a vector of the jth word in a sentence, and n represents a sentence within a distance n between the front and the back of a subject;
Figure FDA0003157799320000021
according to event marking, taking an N-X-K dimensional event vector matrix as the initial input of a second bidirectional long-and-short time memory cyclic neural network, wherein N is a batch size numerical value, K is a word embedded vector length, L is the category number of the event marking, and the position of the maximum value corresponds to the label of the current event;
the bayesian network is defined as:
P(A,B,C,D)=P(D|A,B)*P(C|A)*P(B|A)P(A),
a is the probability of whether the text describes an event of some kind,
b is the probability of the event extraction being successful,
c is the probability of containing time information,
d is the probability of containing the vocabulary of the specific field,
wherein the value of B is determined by whether the label output by the N-L dimension labeling class probability distribution matrix is the same as the labeling of the training sample, if the label is the same, the value of B is 1, if the label is not the same, the value of B is 0,
acquiring a first N x L dimensional matrix from a second bidirectional long-short-term memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix and the first N x L dimensional matrix output by the Bayesian network, and feeding a feature fusion result back to the second bidirectional long-short-term memory recurrent neural network;
s4: defining a loss function as the mean square error of the output of each time node of the bidirectional long-time memory recurrent neural network and the marking data of the training sample, and repeating the step S3 until the loss function is converged;
the entity attribute extraction unit is used for inputting the target text into the entity attribute extraction model to obtain the entity attribute of the target text;
and the attribute fusion unit is used for executing entity attribute fusion on the target text.
6. The enterprise knowledge-graph attribute extraction system of claim 5,
the definition unit defines entity category, event category and entity attribute structure of the training sample,
defining entity categories as enterprise factors or/and personal factors;
defining event categories as one or more of official documents, court announcements, tenders, equity, strategies, personnel, finance, debt, products, marketing, branding, accidents;
defining the fields of the attributes as a plurality of or one of type fields, time fields, mark fields and body fields;
the training sample corpus preparation and marking comprises labeling the event category and the entity attribute structure of each text of the training sample library.
7. The enterprise knowledge-graph attribute extraction system of claim 5,
an entity attribute extraction model, comprising,
acquiring a first N x L dimensional matrix from a forward hidden layer of a second bidirectional long-time and short-time memory recurrent neural network, inputting the first N x L dimensional matrix into a Bayesian network, performing feature fusion on the second N x L dimensional matrix output by the Bayesian network and the first N x L dimensional matrix, and taking a feature fusion result as the input of the second bidirectional long-time and short-time memory recurrent neural network to the backward hidden layer;
alternatively, the first and second electrodes may be,
and acquiring a first N x L dimensional matrix from the second bidirectional long-short-term memory recurrent neural network output layer, inputting the first N x L dimensional matrix into the Bayesian network, performing feature fusion on the second N x L dimensional matrix output by the Bayesian network and the first N x L dimensional matrix, and taking a feature fusion result as the input of the second bidirectional long-short-term memory recurrent neural network input layer.
8. The enterprise knowledge-graph attribute extraction system of claim 5,
the attribute fusion unit performs entity attribute fusion on the target text by adopting the following steps:
1) selecting a basic structure of the event entity data as a base value according to the similarity with the structure template;
2) traversing the candidate set events, and matching attributes in pairs according to the depth priority of the tree structure;
3) when two events are compared, the following rules are followed:
if the node attribute value in the basic structure is missing, directly supplementing;
if the attribute values of the corresponding nodes conflict in the basic structure, if the attribute values of the candidate set obtained by the quality evaluation function are better, replacing the non-null value of the substrate;
if the base attribute is in a list format, adding unique non-repetitive elements in the candidate set to the table of the base;
4) and repeating the step 2) and the step 3) until the attribute can not be improved continuously.
CN201810136568.4A 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system Active CN108182295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810136568.4A CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810136568.4A CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Publications (2)

Publication Number Publication Date
CN108182295A CN108182295A (en) 2018-06-19
CN108182295B true CN108182295B (en) 2021-09-10

Family

ID=62552761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810136568.4A Active CN108182295B (en) 2018-02-09 2018-02-09 Enterprise knowledge graph attribute extraction method and system

Country Status (1)

Country Link
CN (1) CN108182295B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920556B (en) * 2018-06-20 2021-11-19 华东师范大学 Expert recommending method based on discipline knowledge graph
CN108920656A (en) * 2018-07-03 2018-11-30 龙马智芯(珠海横琴)科技有限公司 Document properties description content extracting method and device
CN110019841A (en) * 2018-07-24 2019-07-16 南京涌亿思信息技术有限公司 Construct data analysing method, the apparatus and system of debtor's knowledge mapping
CN110858353B (en) * 2018-08-17 2023-05-05 阿里巴巴集团控股有限公司 Method and system for obtaining case judge result
CN109189943B (en) * 2018-09-19 2021-06-04 中国电子科技集团公司信息科学研究院 Method for extracting capability knowledge and constructing capability knowledge map
CN109446337B (en) * 2018-09-19 2020-10-13 中国信息通信研究院 Knowledge graph construction method and device
CN109446523B (en) * 2018-10-23 2023-04-25 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and conditional random field
CN109508385B (en) * 2018-11-06 2023-05-19 云南大学 Character relation analysis method in webpage news data based on Bayesian network
CN109471929B (en) * 2018-11-06 2021-08-17 湖南云智迅联科技发展有限公司 Method for semantic search of equipment maintenance records based on map matching
CN109657918B (en) * 2018-11-19 2023-07-18 平安科技(深圳)有限公司 Risk early warning method and device for associated evaluation object and computer equipment
CN109767758B (en) * 2019-01-11 2021-06-08 中山大学 Vehicle-mounted voice analysis method, system, storage medium and device
CN111523315B (en) * 2019-01-16 2023-04-18 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN110210840A (en) * 2019-06-14 2019-09-06 言图科技有限公司 A kind of method and system for realizing business administration based on instant chat
CN110297904B (en) * 2019-06-17 2022-10-04 北京百度网讯科技有限公司 Event name generation method and device, electronic equipment and storage medium
CN110245244A (en) * 2019-06-20 2019-09-17 贵州电网有限责任公司 A kind of organizational affiliation knowledge mapping construction method based on mass text data
CN110399487B (en) * 2019-07-01 2021-09-28 广州多益网络股份有限公司 Text classification method and device, electronic equipment and storage medium
CN110516077A (en) * 2019-08-20 2019-11-29 北京中亦安图科技股份有限公司 Knowledge mapping construction method and device towards enterprise's market conditions
CN111475641B (en) * 2019-08-26 2021-05-14 北京国双科技有限公司 Data extraction method and device, storage medium and equipment
CN110516120A (en) * 2019-08-27 2019-11-29 北京明略软件系统有限公司 Information processing method and device, storage medium, electronic device
CN111105041B (en) * 2019-12-02 2022-12-23 成都四方伟业软件股份有限公司 Machine learning method and device for intelligent data collision
CN111382843B (en) * 2020-03-06 2023-10-20 浙江网商银行股份有限公司 Method and device for establishing enterprise upstream and downstream relationship identification model and mining relationship
CN111400504B (en) * 2020-03-12 2023-04-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise key people
CN111967761B (en) * 2020-08-14 2024-04-02 国网数字科技控股有限公司 Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN112101034B (en) * 2020-09-09 2024-02-27 沈阳东软智能医疗科技研究院有限公司 Method and device for judging attribute of medical entity and related product
CN116097242A (en) * 2020-09-10 2023-05-09 西门子(中国)有限公司 Knowledge graph construction method and device
CN112000718B (en) * 2020-10-28 2021-05-18 成都数联铭品科技有限公司 Attribute layout-based knowledge graph display method, system, medium and equipment
CN112417104B (en) * 2020-12-04 2022-11-11 山西大学 Machine reading understanding multi-hop inference model and method with enhanced syntactic relation
CN112199961B (en) * 2020-12-07 2021-04-02 浙江万维空间信息技术有限公司 Knowledge graph acquisition method based on deep learning
CN112383575B (en) * 2021-01-18 2021-05-04 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN113326371B (en) * 2021-04-30 2023-12-29 南京大学 Event extraction method integrating pre-training language model and anti-noise interference remote supervision information
CN113468342B (en) * 2021-07-22 2023-12-05 北京京东振世信息技术有限公司 Knowledge graph-based data model construction method, device, equipment and medium
CN114741569B (en) * 2022-06-09 2022-09-13 杭州欧若数网科技有限公司 Method and device for supporting composite data types in graph database

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
CN105335378A (en) * 2014-06-25 2016-02-17 富士通株式会社 Multi-data source information processing device and method, and server
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
WO2017185887A1 (en) * 2016-04-29 2017-11-02 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating medical knowledge graph representing natural language medical text
CN107633093A (en) * 2017-10-10 2018-01-26 南通大学 A kind of structure and its querying method of DECISION KNOWLEDGE collection of illustrative plates of powering
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
CN105335378A (en) * 2014-06-25 2016-02-17 富士通株式会社 Multi-data source information processing device and method, and server
WO2017185887A1 (en) * 2016-04-29 2017-11-02 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating medical knowledge graph representing natural language medical text
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN107220237A (en) * 2017-05-24 2017-09-29 南京大学 A kind of method of business entity's Relation extraction based on convolutional neural networks
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN107633093A (en) * 2017-10-10 2018-01-26 南通大学 A kind of structure and its querying method of DECISION KNOWLEDGE collection of illustrative plates of powering

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN";Tao Chen 等;《Expert Systems With Applications》;20170415;第72卷;第221-230页 *
"医学知识图谱构建技术与研究进展";袁凯琦 等;《计算机应用研究》;20170818;第35卷(第7期);第1929-1936页 *
"面向中文网络百科的属性和属性值抽取";贾真 等;《北京大学学报(自然科学版)》;20131111;第50卷(第1期);第41-47页 *
"面向非结构化文本的开放式实体属性抽取";曾道建 等;《江西师范大学学报(自然科学版)》;20130515;第3节非结构化文本属性值抽取,图1 *

Also Published As

Publication number Publication date
CN108182295A (en) 2018-06-19

Similar Documents

Publication Publication Date Title
CN108182295B (en) Enterprise knowledge graph attribute extraction method and system
CN108984683B (en) Method, system, equipment and storage medium for extracting structured data
CN107330032B (en) Implicit discourse relation analysis method based on recurrent neural network
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
CN109902298B (en) Domain knowledge modeling and knowledge level estimation method in self-adaptive learning system
CN113177124B (en) Method and system for constructing knowledge graph in vertical field
CN111488931B (en) Article quality evaluation method, article recommendation method and corresponding devices
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN106778878B (en) Character relation classification method and device
CN111522965A (en) Question-answering method and system for entity relationship extraction based on transfer learning
CN112559734B (en) Brief report generating method, brief report generating device, electronic equipment and computer readable storage medium
CN109948160B (en) Short text classification method and device
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
CN113468887A (en) Student information relation extraction method and system based on boundary and segment classification
CN111666766A (en) Data processing method, device and equipment
CN111582506A (en) Multi-label learning method based on global and local label relation
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN114756687A (en) Self-learning entity relationship combined extraction-based steel production line equipment diagnosis method
CN114756681A (en) Evaluation text fine-grained suggestion mining method based on multi-attention fusion
CN115391570A (en) Method and device for constructing emotion knowledge graph based on aspects
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition
CN115659947A (en) Multi-item selection answering method and system based on machine reading understanding and text summarization
CN114840685A (en) Emergency plan knowledge graph construction method
CN115129807A (en) Fine-grained classification method and system for social media topic comments based on self-attention
CN114372454A (en) Text information extraction method, model training method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20191111

Address after: 400042 No.51 dapingzheng street, Yuzhong District, Chongqing

Applicant after: CHONGQING TELECOMMUNICATION SYSTEM INTEGRATION CO.,LTD.

Applicant after: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

Address before: 401121 the 18 layer of kylin C, No. 2, No. 53, Mount Huangshan Avenue, Yubei District, Chongqing

Applicant before: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Shitong

Inventor after: Liu Debin

Inventor after: Yan Kai

Inventor after: Chen Wei

Inventor after: Yang Chen

Inventor before: Sun Shitong

Inventor before: Liu Debin

Inventor before: Yan Kai

Inventor before: Chen Wei

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee after: Zhongdian Zhi'an Technology Co.,Ltd.

Country or region after: China

Patentee after: Chongqing Yucun Technology Co.,Ltd.

Address before: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee before: CHONGQING TELECOMMUNICATION SYSTEM INTEGRATION CO.,LTD.

Country or region before: China

Patentee before: CHONGQING SOCIALCREDITS BIG DATA TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240409

Address after: 401120 Tower B, No. 10 Datagu West Road, Yubei District, Xiantao Street, Yubei District, Chongqing

Patentee after: China Telecom Yijin Technology Co.,Ltd.

Country or region after: China

Patentee after: Chongqing Yucun Technology Co.,Ltd.

Address before: No.51, Daping Main Street, Yuzhong District, Chongqing 400042

Patentee before: Zhongdian Zhi'an Technology Co.,Ltd.

Country or region before: China

Patentee before: Chongqing Yucun Technology Co.,Ltd.