CN110750642A - CNN-based Chinese relation classification method and system - Google Patents

CNN-based Chinese relation classification method and system Download PDF

Info

Publication number
CN110750642A
CN110750642A CN201910928313.6A CN201910928313A CN110750642A CN 110750642 A CN110750642 A CN 110750642A CN 201910928313 A CN201910928313 A CN 201910928313A CN 110750642 A CN110750642 A CN 110750642A
Authority
CN
China
Prior art keywords
matrix
word
sentence
cnn
vector matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910928313.6A
Other languages
Chinese (zh)
Inventor
王德庆
张辉
田润琦
郝瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Beijing University of Aeronautics and Astronautics
Original Assignee
Beijing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Aeronautics and Astronautics filed Critical Beijing University of Aeronautics and Astronautics
Priority to CN201910928313.6A priority Critical patent/CN110750642A/en
Publication of CN110750642A publication Critical patent/CN110750642A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a CNN-based Chinese relation classification method and a CNN-based Chinese relation classification system, wherein the method comprises the following steps: splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN; inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence; and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type. According to the method, on the basis of the current mainstream model, an attention mechanism based on the semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the classification effect is greatly improved.

Description

CNN-based Chinese relation classification method and system
Technical Field
The invention relates to a CNN-based Chinese relation classification method and a system for realizing the method.
Background
With the wide application of technologies such as big data and artificial intelligence, knowledge maps are becoming the popular research direction in the computer field today. The establishment of the knowledge graph can obviously improve the accuracy of a plurality of intelligent systems. In a scientific search engine, a functional module may be implemented. When a user inputs an entity word, the word having a specific relationship with the word is displayed in the form of a graph. For example, inputting "influenza" can obtain information of a medicine for treating cold, such as "qingkailing injection", and the like.
The relation classification technology is a key technology for realizing the functional module. The research of relation classification has great significance for the research of natural language processing.
Today, the internet is increasingly developed, unstructured text is the most easily available resource in the age of big data, but how to acquire knowledge information from unstructured text is a very worthy of research. The research of the relation classification is to extract the semantic relation between two entities in a sentence, and information in an unstructured text can be represented in the form of a structured relation triple. For example, "coca cola is a carbonated beverage" that can be extracted, thus it is understood that there is a descriptive relationship between coca cola and carbonated beverage. If a relation classification model with extremely high accuracy exists, a large amount of structured knowledge can be obtained through a large amount of unstructured texts on the Internet, so that a knowledge graph can be constructed, and the intelligent degree of the fields such as medical treatment, news and the like is improved.
The extraction technology of entity relations goes through the development process from the traditional machine learning method to the deep neural network. With the rapid development of computer computing power, the deep learning technology also becomes the mainstream technology in the field of artificial intelligence, and the deep learning also provides a new idea for the research of relationship classification. The neural network model can learn high-level semantic features, the process is completely to achieve the optimal mathematical solution through mathematical operation, and the dependence on human design is relatively reduced. However, human participation is still needed to design features which need to be extracted, such as grammar, word vectors and other features which can express sentence semantics, so as to guide the neural network model to learn high-level semantic features.
At present, various technologies such as voice assistants, chat robots and the like are getting more and more hot, and research on a relation classification technology has very good prospect and requirements.
Disclosure of Invention
Aiming at the defects of the prior art, the primary technical problem to be solved by the invention is to provide a CNN-based Chinese relation classification method.
Another technical problem to be solved by the present invention is to provide a chinese relation classification system based on CNN.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to a first aspect of the embodiments of the present invention, a CNN-based chinese relation classification method is provided, which includes the following steps:
splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;
inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;
and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.
Preferably, the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism are spliced to obtain the input matrix of the CNN, and the method comprises the following steps:
carrying out word segmentation on the sentences marked with the entities to obtain word sequences; converting the word sequence into a word vector matrix;
calculating entity distance vectors PF1 and PF2 of the distance from each word to two entities in the sentence, and splicing the obtained entity distance vectors to form an entity distance vector matrix;
obtaining a word vector matrix with an attention adding mechanism according to the word vector matrix and the SDP weight sequence; and splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain the input matrix of the CNN.
Preferably, the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence; the method comprises the following steps:
extracting words on a semantic dependency key path in a sentence according to semantic dependency analysis, and giving word weights to the words on the semantic dependency key path to obtain an SDP weight sequence;
and weighting the word vector matrix according to the SDP weight sequence to obtain a weighted word vector matrix.
Preferably, the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, including the steps of:
establishing a correlation matrix of the correlation degree between each word in the sentence and the relationship type;
introducing an SDP weight sequence to obtain final weight output;
and weighting the word vector matrix through the final weight output to obtain a weighted word vector matrix.
Preferably, in establishing the association matrix of the association degree between each word in the sentence and the relationship type, the following formula is adopted:
T=WfU1Wc
wherein, WfIs a word vector matrix; u shape1Is a weight matrix to be learned; t is an incidence matrix of words and relation types, WcVector matrix for relational type representation.
After the final weight output is obtained preferentially, the influence of noise is removed through a scaling factor, and the following formula is adopted:
h=Wsdp+μR;
wherein μ is a scaling factor; h is the final weight of each word; r is a sentenceThe weight of each word in the list; wsdpA weight for each word in the sentence.
Preferably, the weighting of the word vector matrix by the final weight output is performed by weighting the word vector matrix WF using the diagonal matrix of the final weight of each word to obtain a weighted word vector matrix.
Preferably, the input matrix is input into the convolutional layer of the CNN to obtain the characteristic vector of the sentence; the method comprises the following steps:
the input matrix is subjected to channel filtering on input data through a filter with a convolution window of 1, and only data of an effective channel is reserved;
inputting the filtered input matrix into 4 convolution windows with different sizes respectively for convolution, wherein the number of convolution kernels is reduced in sequence along with the increase of the number of windows, and the output data is activated;
for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel;
and obtaining the eigenvector of the input matrix through eigenvalue splicing.
Preferably, the feature vectors of the sentences are input into a full connection layer of the CNN to obtain the probability of each relation type; the method comprises the following steps:
inputting the feature vector of the sentence into a first hidden layer of the full-connection layer, and filtering N% of nodes with equal probability;
inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and a relation type vector matrix to obtain a score of each relation type;
and carrying out classification regression calculation on the score of each relationship type to obtain the probability of each relationship type.
According to a second aspect of the embodiments of the present invention, there is provided a CNN-based chinese relationship classification system, including a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:
splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;
inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;
and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.
According to the CNN-based Chinese relation classification method provided by the invention, on the basis of the current mainstream model, an attention mechanism based on a semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the classification effect is greatly improved. In addition, two hidden layers are adopted in the full-connection layer, the dependence of the model on a single node is reduced, and the condition of overfitting of the model is weakened or avoided.
Drawings
FIG. 1 is a flow chart of a CNN-based Chinese relationship classification method according to the present invention;
FIG. 2 is a diagram illustrating semantic dependency analysis according to an embodiment of the present invention;
fig. 3 is a diagram illustrating SDP weights in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a window of 3 convolution kernels in accordance with an embodiment of the present invention;
FIG. 5 is a schematic view of a convolutional layer structure in another embodiment of the present invention;
FIG. 6 is a schematic diagram of a fully connected layer structure in accordance with an embodiment of the present invention;
fig. 7 is a schematic structural diagram of the CNN-based chinese relationship classification system provided in the present invention.
Detailed Description
The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.
The invention adopts the convolution neural network technology to realize the relation classification model and realize Chinese relation classification. The traditional convolutional neural network model gives equal weight to each word in the sentence, resulting in a relatively distractive force. According to the invention, on the basis of the current mainstream model, an attention mechanism based on the semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the model effect is improved.
As shown in fig. 1, in the CNN-based chinese relationship classification method provided by the present invention, a sentence with two entities marked is input in a data processing portion, and the sentence is converted into a semantic dependency path weight formed by a word vector matrix WF, an entity distance vector matrix PF, and a semantic dependency path of the sentence through data preprocessing, and is used as an input of a model. In the input layer of the model, a word vector matrix Attention based WF added with Attention weight is generated by using WF, PF, SDP and a set model parameter class vector matrix classmatrix and is marked as ABWF. And splicing the WF, the PF and the ABWF together to form a final input. In the convolutional layer, 4 convolutional windows with different sizes are adopted for convolution, the convolution windows are respectively 2,3,4 and 5, and after convolution, the maximum pooling is adopted for pooling operation, so that the feature vector of the sentence is finally obtained. Two hidden layers are adopted for calculation in the full connection layer, then in the softmax layer, the previous model parameter class vector matrix class matrix is used as a weight, and finally the probability value of each class is output. This process is described in detail below.
S1, the word vector matrix WF and the entity distance vector matrix PF of the sentence are spliced with the word vector matrix ABWF weighted by the attention mechanism to obtain the input matrix of the CNN.
Before describing the CNN-based chinese relationship classification method provided by the present invention in detail, the relevant principles and techniques of the convolutional neural network will be introduced.
Convolutional Neural Networks (CNNs) are a widely used artificial deep learning neural network, often applied in the field of visual image processing. Convolutional neural networks generally consist of 3 parts: compared with the traditional neural network, the convolutional layer, the fully-connected layer and the output layer pass through the convolutional layer before input data enters the fully-connected layer.
What is the convolutional layer made? Assuming that a picture is input, the pixel point of the picture is as high as millions, so that the huge data size is obviously not processed by the full connection layer. Therefore, the convolution layer has the effect that input data are compressed into an input low-dimensionality feature vector through the continuous convolution and pooling process, the calculated amount of the full-connection layer is reduced, the noise removing effect is achieved, and finally the low-dimensionality feature vector is extracted.
The main operations to be performed by the convolutional layer are convolution and pooling. For convolution operation, convolution kernels with different weights are adopted, scanning is continuously carried out on a two-dimensional picture matrix in an input image, and values in the convolution kernels are calculated to serve as one value in output. Through convolution, each value in the output contains the information of the pixel point where the point is located and the information of the surrounding pixel points. Thus, the local information is extracted. Then, pooling is adopted, for example, taking the maximum pooling as an example, a maximum value is taken out from the information of a large area as a representative, so that the information is further extracted.
Through the convolution layer, the input information of each position can be continuously concentrated, and the feature vector containing the feature information of the whole sentence is extracted.
In the CNN-based Chinese relation classification method provided by the invention, the model adopts a convolution neural network, and the convolved object is each word, so that some important words can be highlighted through continuous convolution of the words to generate the feature vector of the words. Finally, inputting the feature vector of the sentence to the full connection layer to complete the classification task of the relation of the sentence.
In the embodiment provided by the invention, before the sentence is input into the CNN, preprocessing is carried out, namely, a word vector matrix WF, an entity distance vector matrix PF and a word vector matrix ABWF weighted by an attention mechanism are spliced to obtain an input matrix of the CNN; the method specifically comprises the following steps:
s11, performing word segmentation on the sentences marked with the entities to obtain word sequences; and converts the word sequences into a word vector matrix WF.
Because the Chinese marking corpora are rare at present and no related resources exist on the network, the English corpora can be used for initial model building. In the embodiment provided by the invention, when the English corpus is used for initial model building, the used English corpus is SemEval-2010task 8, the corpus is used for testing the performance of the model on a plurality of papers, and the model can be conveniently compared with other models by using the corpus. The CNN-based Chinese relation classification method provided by the invention adopts a convolutional neural network to construct the model, adds an attention mechanism on the basis of an early relation classification CNN model, and takes knowledge information of a Semantic Dependence Path (SDP) as an input characteristic of the model to improve the performance of the model.
When using Chinese corpora, there are two parts to the requirement for Chinese corpora: thesaurus and thesis abstract texts. In the embodiment of the invention, the keywords in the scientific search engine are used as part of the word stock, but the keywords are too many and contain a large number of nonsense words, so that the keywords in the scientific search word stock need to be filtered. The keyword library is adopted and can be obtained through entity relation classification research and application based on the Attention mechanism published by the Lining 2017 at the university of aerospace in Beijing. The thesaurus contains entity words 478990. However, the word stock is not very comprehensive, and some entity word supplement in the aspects of medicine and disease is hoped to be carried out.
Therefore, in the embodiment provided by the invention, the published literature coded by ICD-10 international disease classification standard is adopted to supplement the disease names, and the 10324 words of disease entities are supplemented. In addition, a word stock of 'medicine name large-scale' in the dog searching word stock is adopted to supplement medicine entity words, and 35989 medicine entity words are supplemented in total. Supplemented, the final thesaurus size was 518346.
Because no Chinese labeling linguistic data exists, machine matching is carried out by using a regular expression, and the labeling linguistic data which can be used for training is obtained. Therefore, the corpus source is a paper library of a scientific search engine, in which about 4000 thousands of abstracts of papers are contained, and a part of the abstracts is used to form a Chinese training test set.
The Chinese word vector model has many open sources, and is not limited herein. The Chinese word vector model training corpus is derived from wiki Chinese encyclopedia, and embedded word vectors are trained by using gensim. Finally, the total number of words in the Chinese word vector model is 641372.
The model may enter sentences that label entities, e.g.) "<Entity>Oil field of mud bay</Entity>Is located at<Entity>Shangan Ning basin</Entity>In the treasure area of Yanan city in the southeast east ". The method adopts a jieba word segmentation device to segment words of the model, and modifies word segmentation matching rules of the existing jieba word segmentation device to a certain extent and adds the support of some special characters. So that the word sequence S ═ w (w) corresponding to the sentence can be input1,w2,...wn) Wherein two entities are labeled e1=wsAnd e2=wt. Next, a pre-trained word vector matrix Emb is usedwordThe word sequence is converted into a word vector matrix WF. Here:
wherein d iswIs the dimension of the word vector, n is the length of the sentence, and R is the weight of each word in the sentence. Here, it is set that the dimension d of the word vectorw300, the sentence length n is 100, and when the sentence is less than 100 words, zero vector is used for filling. Wherein, the word vector matrix Emb for converting the word sequence into the word vector matrix WFwordThe training mode can be obtained by any existing training mode, and is not particularly limited in the embodiment provided by the invention.
S12, calculating entity distance vectors PF1 (first entity distance vector) and PF2 (second entity distance vector) of the distance from each word to two entities in the sentence, and splicing the obtained entity distance vectors to form an entity distance vector matrix PF.
Specifically, in order to represent semantic information, the distance between a word and an entity word is used to represent the semantic information, and the implementation manner is adopted. Such as the sentence:
Figure BDA0002219528550000081
with respect to the word "treatment", it is at a distance of 3 from the physical word "aspirin" and-1 from the physical word "arthritis". For the entity word itself, its distance to itself is then 0. Thus for the word "treatment", the pair of physical distances is [ p ]1,p2]=[3,-1]. In the model, the maximum length of a sentence is set to 100 words, so the distance difference is-99 to 99, which is different from 199. Thus, the entity distance vector matrix of entity 1 is set
Figure BDA0002219528550000082
Entity distance vector matrix of entity 2And is
Figure BDA0002219528550000084
Wherein d ispFor the distance vector dimension, set to 50, l is the entity distance vector matrix length. Therefore, entity distance vectors PF1 and PF2 corresponding to each word can be obtained, and finally PF1 and PF2 are spliced together to form the PF.
PF=[PF1,PF2](2)
Wherein the content of the first and second substances,dpthe dimension after splicing PF1 and PF2 is 100, and n is the sentence length of 100.
S13, obtaining a word vector matrix ABWF added with an attention mechanism according to the word vector matrix and the SDP weight sequence; the method specifically comprises the following steps:
s131, extracting words on the semantic dependence key path in the sentence according to the semantic dependence analysis, and giving word weight to the words on the semantic dependence key path to obtain an SDP weight sequence.
First, a concept, Semantic Dependency analysis (SDP), is introduced. SDP is a method that analyzes semantic associations between components in sentences and presents the semantic associations in the form of dependency structures. The semantic dependency is used for extracting the semantics of the sentence, and the advantage is that the meaning of each word is not abstracted, but the word is described through the semantic frame role of the word in the sentence. As shown in fig. 2,3 sentences are used to describe the same thing, "zhang san dieg apple". It can be seen that each sentence has a Root node as a Root node, and the Root node generally connects the main predicate elements in the sentence, for example, although the syntax of the three sentences in fig. 2 is different, the Root node is connected to the word "eat". And then, according to the root node, the semantic role of each word in the sentence is expanded at one time.
With this semantic dependency analysis, two entities can always be selected from the sentence, such as "zhang san" and "apple" in fig. 2. Because of the existence of the root node, the two entity words can be always connected together through a path. More importantly, words on this path, such as "eat," are the key words that express the relationship between these two entities. In the task of relationship classification, the words on the paths should be given higher weight to make the words more emphasized. In the embodiment provided by the invention, the words on the semantic dependency critical path in the sentence are extracted, and the words are assigned with word weights. For example: the term on the semantic dependency critical path is assigned a term weight of 0.8, while the other terms in the sentence have a weight of 0.3 and the filler terms in the sentence have a weight of 0, as shown in fig. 3.
The tools that can be used have: for realizing the semantic analysis of Chinese, a Chinese natural language processing tool of 'Kazakh big LTP' can be used, and for realizing the semantic analysis of English, spaCy can be used. Then, a sequence of SDP weights may be generated, where n is the sentence length, SDP represents each word weight:
SDP∈Rn(3)
s132, weighting the word vector matrix WF according to the SDP weight sequence to obtain a weighted word vector matrix ABWF, that is, the word vector matrix with the attention mechanism added.
S1321, establishing a correlation matrix of the correlation degree between each word in the sentence and the relation type;
s1322, introducing an SDP weight sequence to obtain final weight output;
s1323, weighting the word vector matrix WF through the final weight output to obtain a weighted word vector matrix ABWF, that is, the word vector matrix with the attention mechanism added.
Specifically, in the ABCNN (Attention-based connected neural network), a method of implementing an Attention mechanism in the CNN model is proposed for the first time. The corresponding question scenario is "compare whether the semantics of two sentences are the same". In the CNN model without using the attention mechanism, two sentences are separately subjected to model processing, and the existence of the second sentence is not known at all while the first sentence is processed. What is compared finally is whether the semantics of the two sentences are the same, and in fact, the model does not fully utilize the information which can be acquired.
In the ABCNN, when inputting, a layer is added on the basis of the original layer of input, and the layer is the weight obtained by the attention mechanism.
Then how can this layer be obtained? Assuming that this layer is G, the scenario is that the first sentence has 5 words and the second sentence has 7 words. From the inner product of the word vectors, the degree of similarity of the two words can be obtained, and then A ∈ R can be obtained5×7. By using:
G=W0UAT(4)
wherein U is E.Rd×dG∈Rd×5And d is the dimension of the word vector, and G can be obtained by learning U, wherein G represents the attribute feature map. It can be seen that when a word in matrix a is more closely associated with another word, some of the weights in the column corresponding to the word in matrix a are large. When calculated using this formula, the resulting matrix G is weighted more heavily for the word vector for that word, and thusMore attention will be focused on words that are closely related to the second sentence. This is the basic principle of the attention mechanism in CNNs.
In the embodiments provided by the present invention, certain changes are made to the scheme in ABCNN.
In ABCNN, attention is paid to the association between two sentences, whereas in the embodiments provided by the present invention attention is paid to the association between a sentence and a relationship type. For this purpose, the relationship type needs to be vectorized, and the relationship type is expressed as a vector matrix
Figure BDA0002219528550000101
Wherein d iscIs the dimension of the type vector, with a value of 50, c is the number of types, with a value of 14. WcIs a weight matrix that needs to be learned, and the feature vectors of the relationship types are also continuously adjusted in the learning.
First, a relevance matrix T of the relevance degree between each word in the sentence and the relation type needs to be established, wherein WF is denoted as Wf
T=WfU1Wc(5)
Wherein the content of the first and second substances,
Figure BDA0002219528550000102
for the weight matrix to be learned, T ∈ Rn×cFor the incidence matrix of words and relation types, Ti,jIndicating the degree of association of the ith word with the jth relationship. In the same way, the weight matrix of SDP is introduced again, and the SDP matrix is marked as WsdpThen, the final weight output is obtained, as shown in equation (6):
R=WsdpU2TT(6)
wherein U is1∈Rn×cFor the weight matrix to be learned, R ∈ RnThe weight of each word in the sentence. Here, how to understand W is described in detailsdpThe role played, WsdpRepresents the weight of each word in the sentence, so WsdpThe object represented is the sentence, passing Wsdp TU2TTThe operation of (1) is to establish a correlation matrix of sentences and words, and the obtained correlation degree of the sentences and each word, namely the weight sequence of the words.
The weight value in R introduces the information of the relation feature vector and the SDP semantic information, there is a certain noise on the filler word, so it is desirable that this weight value can be changed on the basis of the basic weight value, and then a scaling factor μ is added:
h=Wsdp+μR (7)
finally h is belonged to RnIs the final weight for each word.
By weighting the word vector matrix WF with the diagonal matrix of h, a weighted word vector matrix ABWF, i.e. a word vector matrix with an attention mechanism added, can be obtained.
S14, the word vector matrix WF and the entity distance vector matrix PF of the sentence are spliced with the word vector matrix ABWF weighted by the attention mechanism to obtain the input matrix of the CNN.
The final inputs include: the word vector matrix WF, the entity distance vector matrix PF, and the word vector matrix ABWF weighted by the attention mechanism are spliced. The final input matrix is
Figure BDA0002219528550000111
Wherein d ism=dw×2+dpThe value is 700 and n is the sentence length 100.
S2, the input matrix is input into the CNN convolution layer to obtain the sentence characteristic vector.
The convolutional layer is the most important layer in the convolutional neural network, and has a significant effect on the effect of the model, and the convolutional layer will be described with a task.
The matrix M with the final inputs having been processed before, in one embodiment provided by the invention, is represented by M ∈ R100 ×700For example, the input is 100 words, and the feature vector of each word is 700 dimensions, hereDimension 700 may be understood as 700 channels. Here, the convolution window size is set to 3 and the number of convolution kernels is 50. As shown in fig. 4, when the convolution kernel is convolved to three words of "being", "eating" and "one", the dot product operation is performed on the matrix formed by the three words and the convolution kernel to obtain a numerical value as the result of the operation, and there are 50 convolution kernels in total, so that the 3 words are convolved to form a 50-dimensional vector, which can also be understood as 50 channels. Then the 100 x 700 input matrix is concentrated into a 100 x 50 matrix P by convolution. As shown in fig. 4, each row in P has information of 3 rows in M, such as "being", "eating" and "one". And the convolution kernel can continuously correct the weight of the convolution kernel in the continuous learning process, and useful information is extracted from the input. Then, through maximum pooling, only one maximum value is taken as the eigenvalue of each channel, and finally, the eigenvector f epsilon R of the input matrix is obtained50
In the embodiment provided by the present invention, the input matrix is input to the convolutional layer of CNN to obtain the feature vector of the sentence, and the following steps may also be adopted:
and S21, the input matrix passes through a filter with convolution window 1, and the input data is subjected to channel filtering, and only the data of the effective channel is reserved.
And S22, respectively inputting the filtered input matrix into convolution kernels with window sizes of 2,3,4 and 5, sequentially reducing the number of the convolution kernels along with the increase of the number of the windows, and activating the output data.
And S23, for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel.
S24, obtaining the output of the channel L through eigenvalue splicing, and obtaining the eigenvector f of the input matrix Mm∈R500. Where L is the total number of convolution kernels for each window size.
Specifically, in most documents, a window of 3 convolution kernels are used for convolution, and it can be seen that the convolution aiming at 3 words should be the most appropriate convolution strategy. In an english sentence, generally, each 3 words may have a main component of a sentence, so the convolution kernel with window 3 can extract each main component of the sentence well as use it as a feature value. However, after reading some abstracts of the science search, the abstracts are found to be too long, and a large number of modifiers, fictional words, conjunctions and the like appear. The main components in a sentence are directly far away, and a large number of irrelevant words are inserted in the middle, so that a convolution kernel with a convolution window of 3 may not be suitable for a search language environment.
Based on the above-mentioned analyzed characteristics of the data in the field, in the embodiment provided by the present invention, the convolution kernels of convolution windows with different sizes are set, and the model itself selects which convolution kernel size to use. Fig. 5 shows a design scheme of the inclusion module.
Specifically, in the convolutional layer, the sizes of the convolution windows are divided into four sizes, i.e., 2,3,4, and 5. Before entering the convolution windows with the sizes, a filter with the convolution window of 1 is firstly passed, the function of the filter is to perform channel filtering on input data and only retain data of a valid channel, and the number of the filters with the convolution window of 1 is uniformly set to be 400 before each convolution window. The number of convolution kernels with window sizes of 2,3,4 and 5 is set to be 200,150,100 and 50 respectively in the model, and the number is set to be reduced in sequence, and in the embodiment provided by the invention, the relu activation function is adopted to activate the output. Then, for a plurality of data on each channel, a maximum pooling scheme is adopted for pooling, and the most important characteristic value of each channel is reserved. Finally, through splicing, the output with the channel of 500 can be obtained, and finally the eigenvector f of the input matrix M is obtainedm∈R500
S3, inputting the feature vector of the sentence into the full-connection layer of the CNN to obtain the probability of each relation type; the method comprises the following steps:
s31, inputting the feature vector of the sentence into the first hidden layer of the full-connection layer, and filtering N% nodes with equal probability, wherein N can be set according to the requirement.
And S32, inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and the relation type vector matrix to obtain the score of each relation type.
And S33, performing softmax classification regression on the score of each relationship type to obtain the probability of each relationship type.
Specifically, as shown in fig. 6, the model has a full-link layer structure, the full-link layer adopts 2 hidden layers, the output of the last hidden layer is subjected to matrix multiplication with a relationship type vector matrix, and calculation is performed through the softmax layer, so that the probability of each relationship type is finally obtained.
The sentence feature vector f obtained previouslym∈R500This is also the input to the fully connected layer. To reduce the effect of overfitting, the model will use dropout layers with dropout values of 0.25. For example, the dropout layer has the effect that for 100 input nodes, each node has a probability of 25% being deleted, so that in the case of many input nodes, only 75% of the nodes participate in subsequent operations. In each operation process, each node is possibly deleted, so that the dependence of the model on a single node is reduced, and the condition of model overfitting is weakened or avoided.
Only one hidden layer can not provide enough nonlinear operation, so that two hidden layers are adopted, the model can carry out more nonlinear operations, and the model effect is improved. In the embodiment provided by the invention, 2 hidden layers are reasonably adopted. The first hidden layer adopts 200 hidden layer units, and the number of the second hidden layer units is 50 when the second hidden layer enters the 2 nd hidden layer with the probability of 0.25 dropout. The 50 hidden layer units are arranged to generate 50-dimensional feature vectors to participate in subsequent operations.
In the attention mechanism, the model uses a relationship type vector matrix WcThe vector matrix needs to be learned to obtain the proper relation type feature vector, and the relation type vector matrix WcIt is learned at this level. The above operation of the convolutional layer yields a feature vector of the sentence, which is denoted as h:
s=hWc(9)
obtaining s-epsilon R through matrix multiplicationcWhere c is the number of relationship types 14 and s is the score for each relationship type. Finally, softmax classification regression calculation was performed for s:
Figure BDA0002219528550000141
piis the probability of each relationship type, where siFor the score of category i, the score is indexed using the softmax classification formula, resulting in a probability value for each category.
Compared with the mode that max can only take the maximum value, so that a smaller value cannot be taken, the softmax algorithm enables a higher probability of taking a large value, and a smaller value to be taken. Moreover, the softmax and the cross entropy calculate the loss function, the final derivative result is very simple, and the derivative operation is simple and is one of the advantages of the softmax.
Finally, p epsilon R can be obtainedcAnd c is the relationship type number 14 as the output of the model.
And (4) performing back propagation on the model by using a cross entropy loss function, and updating the model parameters.
The invention also provides a Chinese relation classification system based on the CNN. As shown in fig. 7, the system includes a processor 72 and a memory 71 storing instructions executable by the processor 72;
the processor 72 may be a general purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.
The memory 71 is used for storing the program codes and transmitting the program codes to the CPU. Memory 71 may include volatile memory, such as Random Access Memory (RAM); the memory 71 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 71 may also comprise a combination of memories of the kind described above.
Specifically, the chinese relationship classification system based on CNN provided by the embodiment of the present invention includes a processor 72 and a memory 71; the memory 71 has stored thereon a computer program operable on the processor 72, which when executed by the processor 72 performs the steps of:
splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;
inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;
and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.
When the word vector matrix of the sentence, the entity distance vector matrix, and the word vector matrix weighted by the attention mechanism are spliced to obtain the input matrix of the CNN, the computer program is executed by the processor 72 to implement the following steps;
the method comprises the following steps of splicing a word vector matrix and an entity distance vector matrix of a sentence and a word vector matrix weighted by an attention mechanism to obtain a CNN input matrix:
carrying out word segmentation on the sentences marked with the entities to obtain word sequences; converting the word sequence into a word vector matrix;
calculating entity distance vectors PF1 and PF2 of the distance from each word to two entities in the sentence, and splicing the obtained entity distance vectors to form an entity distance vector matrix;
obtaining a word vector matrix with an attention adding mechanism according to the word vector matrix and the SDP weight sequence; and splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain the input matrix of the CNN.
Wherein, when the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence, the computer program is executed by the processor 72 to implement the following steps;
extracting words on a semantic dependency key path in a sentence according to semantic dependency analysis, and giving word weights to the words on the semantic dependency key path to obtain an SDP weight sequence;
and weighting the word vector matrix according to the SDP weight sequence to obtain a weighted word vector matrix.
Wherein, when the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, the computer program is executed by the processor 72 to implement the following steps;
establishing a correlation matrix of the correlation degree between each word in the sentence and the relationship type;
introducing an SDP weight sequence to obtain final weight output;
and weighting the word vector matrix through the final weight output to obtain a weighted word vector matrix.
Wherein, in establishing a relevance matrix of the degree of relevance between each word in the sentence and the relationship type, the computer program is executed by the processor 72 to implement the following steps;
the following formula is adopted:
T=WfU1Wc
wherein, WfIs a word vector matrix; u shape1Is a weight matrix to be learned; t is an incidence matrix of words and relation types, WcVector matrix for relational type representation.
Wherein the computer program realizes the following steps when being executed by the processor 72;
after the final weight output is obtained, the influence of noise is removed through a scaling factor, and the following formula is adopted:
h=Wsdp+μR;
wherein μ is a scaling factor; h is the final weight of each word; r is the weight of each word in the sentence; wsdpA weight for each word in the sentence.
Wherein the computer program realizes the following steps when being executed by the processor 72;
the weighting of the word vector matrix by the final weight output is to weight the word vector matrix WF using the diagonal matrix of the final weight of each word to obtain a weighted word vector matrix.
Wherein, when the input matrix is input to the convolutional layer of CNN to obtain the feature vector of the sentence, the computer program is executed by the processor 72 to implement the following steps;
the input matrix is subjected to channel filtering on input data through a filter with a convolution window of 1, and only data of an effective channel is reserved;
inputting the filtered input matrix into 4 convolution windows with different sizes respectively for convolution, wherein the number of convolution kernels is reduced in sequence along with the increase of the number of windows, and the output data is activated;
for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel;
and obtaining the eigenvector of the input matrix through eigenvalue splicing.
Wherein, when the feature vector of the sentence is input into the fully connected layer of the CNN to obtain the probability of each relationship type, the computer program is executed by the processor 72 to implement the following steps;
inputting the feature vector of the sentence into a first hidden layer of the full-connection layer, and filtering N% of nodes with equal probability;
inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and a relation type vector matrix to obtain a score of each relation type;
and carrying out classification regression calculation on the score of each relationship type to obtain the probability of each relationship type.
The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
The CNN-based chinese relationship classification method and system provided by the present invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims (10)

1. A CNN-based Chinese relation classification method is characterized by comprising the following steps:
splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;
inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;
and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.
2. The CNN-based chinese relationship classification method of claim 1, wherein the word vector matrix of the sentence, the entity distance vector matrix, and the word vector matrix weighted by the attention mechanism are concatenated to obtain the CNN input matrix, comprising the steps of:
carrying out word segmentation on the sentences marked with the entities to obtain word sequences; converting the word sequence into a word vector matrix;
calculating entity distance vectors PF1 and PF2 from each word in the sentence to two entity distances respectively, and splicing the obtained entity distance vectors to form an entity distance vector matrix;
obtaining a word vector matrix with an attention adding mechanism according to the word vector matrix and the SDP weight sequence; and splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain the input matrix of the CNN.
3. The audio visualization method based on spectrum analysis according to claim 2, wherein the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence; the method comprises the following steps:
extracting words on a semantic dependency key path in a sentence according to semantic dependency analysis, and giving word weights to the words on the semantic dependency key path to obtain an SDP weight sequence;
and weighting the word vector matrix according to the SDP weight sequence to obtain a weighted word vector matrix.
4. The CNN-based chinese relationship classification method of claim 3, wherein the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, comprising the steps of:
establishing a correlation matrix of the correlation degree between each word in the sentence and the relationship type;
introducing an SDP weight sequence to obtain final weight output;
and weighting the word vector matrix through the final weight output to obtain a weighted word vector matrix.
5. The CNN-based chinese relationship classification method according to claim 4, wherein the correlation matrix of the degree of correlation between each word in the sentence and the relationship type is established by using the following formula:
first, a relevance matrix T of the relevance degree between each word in the sentence and the relation type needs to be established, wherein WF is denoted as Wf
T=WfU1Wc
Wherein, WfIs a word vector matrix; u shape1Is a weight matrix to be learned; t is an incidence matrix of words and relation types, WcVector matrix for relational type representation.
6. The CNN-based chinese relationship classification method of claim 4, characterized by:
establishing a correlation matrix of the correlation degree between each word in the sentence and the relationship type;
the sequence of SDP weights is introduced and,
after the final weight output is obtained, the influence of noise is removed through a scaling factor, and the following formula is adopted:
h=Wsdp+μR;
wherein μ is a scaling factor; h is the final weight of each word; r is the weight of each word in the sentence; wsdpA weight for each word in the sentence.
7. The CNN-based chinese relationship classification method of claim 6, wherein:
the weighting of the word vector matrix by the final weight output is to weight the word vector matrix WF using the diagonal matrix of the final weight of each word to obtain a weighted word vector matrix.
8. The CNN-based chinese relationship classification method of claim 1, wherein the input matrix is input to the CNN convolutional layer to obtain a feature vector of a sentence; the method comprises the following steps:
the input matrix is subjected to channel filtering on input data through a filter with a convolution window of 1, and only data of an effective channel is reserved;
inputting the filtered input matrix into 4 convolution windows with different sizes respectively for convolution, wherein the number of convolution kernels is reduced in sequence along with the increase of the number of windows, and the output data is activated;
for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel;
and obtaining the eigenvector of the input matrix through eigenvalue splicing.
9. The CNN-based chinese relationship classification method of claim 1, wherein the feature vectors of sentences are input into the fully-connected layer of CNN to obtain the probability of each relationship type; the method comprises the following steps:
inputting the feature vector of the sentence into a first hidden layer of the full-connection layer, and filtering N% of nodes with equal probability;
inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and a relation type vector matrix to obtain a score of each relation type;
and carrying out classification regression calculation on the score of each relationship type to obtain the probability of each relationship type.
10. A CNN-based Chinese relation classification system is characterized by comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:
splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;
inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;
and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.
CN201910928313.6A 2019-09-28 2019-09-28 CNN-based Chinese relation classification method and system Pending CN110750642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910928313.6A CN110750642A (en) 2019-09-28 2019-09-28 CNN-based Chinese relation classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910928313.6A CN110750642A (en) 2019-09-28 2019-09-28 CNN-based Chinese relation classification method and system

Publications (1)

Publication Number Publication Date
CN110750642A true CN110750642A (en) 2020-02-04

Family

ID=69277312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910928313.6A Pending CN110750642A (en) 2019-09-28 2019-09-28 CNN-based Chinese relation classification method and system

Country Status (1)

Country Link
CN (1) CN110750642A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111523318A (en) * 2020-04-02 2020-08-11 言图科技有限公司 Chinese phrase analysis method, system, storage medium and electronic equipment
WO2021169347A1 (en) * 2020-02-25 2021-09-02 华为技术有限公司 Method and device for extracting text keywords
CN113792539A (en) * 2021-09-15 2021-12-14 平安科技(深圳)有限公司 Entity relation classification method and device based on artificial intelligence, electronic equipment and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169347A1 (en) * 2020-02-25 2021-09-02 华为技术有限公司 Method and device for extracting text keywords
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111382272B (en) * 2020-03-09 2022-11-01 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111523318A (en) * 2020-04-02 2020-08-11 言图科技有限公司 Chinese phrase analysis method, system, storage medium and electronic equipment
CN113792539A (en) * 2021-09-15 2021-12-14 平安科技(深圳)有限公司 Entity relation classification method and device based on artificial intelligence, electronic equipment and medium
CN113792539B (en) * 2021-09-15 2024-02-20 平安科技(深圳)有限公司 Entity relationship classification method and device based on artificial intelligence, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US10614106B2 (en) Automated tool for question generation
Vougiouklis et al. Neural wikipedian: Generating textual summaries from knowledge base triples
Dos Santos et al. Deep convolutional neural networks for sentiment analysis of short texts
CN109325229B (en) Method for calculating text similarity by utilizing semantic information
KR101136007B1 (en) System and method for anaylyzing document sentiment
CN110750642A (en) CNN-based Chinese relation classification method and system
CN111914097A (en) Entity extraction method and device based on attention mechanism and multi-level feature fusion
CN110879834B (en) Viewpoint retrieval system based on cyclic convolution network and viewpoint retrieval method thereof
He Towards Visual Question Answering on Pathology Images.
Rahimi et al. The impact of preprocessing on word embedding quality: A comparative study
Qudar et al. A survey on language models
Samih et al. Enhanced sentiment analysis based on improved word embeddings and XGboost.
VeeraSekharReddy et al. An attention based bi-LSTM DenseNet model for named entity recognition in english texts
Seilsepour et al. Self-supervised sentiment classification based on semantic similarity measures and contextual embedding using metaheuristic optimizer
Kocmi et al. SubGram: extending skip-gram word representation with substrings
CN112949293A (en) Similar text generation method, similar text generation device and intelligent equipment
CN111414755A (en) Network emotion analysis method based on fine-grained emotion dictionary
Nambiar et al. Attention based abstractive summarization of malayalam document
Tahayna et al. Applying English Idiomatic Expressions to Classify Deep Sentiments in COVID-19 Tweets.
Nambiar et al. Abstractive summarization of Malayalam document using sequence to sequence model
Yahi et al. Morphosyntactic preprocessing impact on document embedding: An empirical study on semantic similarity
Derbel et al. Disease named entity recognition using long–short dependencies
Dadas Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases
Prasad et al. Lexicon based extraction and opinion classification of associations in text from Hindi weblogs
Muhsen et al. Arguments extraction for e-health services based on text mining tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination