CN110750642A

CN110750642A - CNN-based Chinese relation classification method and system

Info

Publication number: CN110750642A
Application number: CN201910928313.6A
Authority: CN
Inventors: 王德庆; 张辉; 田润琦; 郝瑞
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2019-09-28
Filing date: 2019-09-28
Publication date: 2020-02-04

Abstract

The invention discloses a CNN-based Chinese relation classification method and a CNN-based Chinese relation classification system, wherein the method comprises the following steps: splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN; inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence; and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type. According to the method, on the basis of the current mainstream model, an attention mechanism based on the semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the classification effect is greatly improved.

Description

CNN-based Chinese relation classification method and system

Technical Field

The invention relates to a CNN-based Chinese relation classification method and a system for realizing the method.

Background

With the wide application of technologies such as big data and artificial intelligence, knowledge maps are becoming the popular research direction in the computer field today. The establishment of the knowledge graph can obviously improve the accuracy of a plurality of intelligent systems. In a scientific search engine, a functional module may be implemented. When a user inputs an entity word, the word having a specific relationship with the word is displayed in the form of a graph. For example, inputting "influenza" can obtain information of a medicine for treating cold, such as "qingkailing injection", and the like.

The relation classification technology is a key technology for realizing the functional module. The research of relation classification has great significance for the research of natural language processing.

Today, the internet is increasingly developed, unstructured text is the most easily available resource in the age of big data, but how to acquire knowledge information from unstructured text is a very worthy of research. The research of the relation classification is to extract the semantic relation between two entities in a sentence, and information in an unstructured text can be represented in the form of a structured relation triple. For example, "coca cola is a carbonated beverage" that can be extracted, thus it is understood that there is a descriptive relationship between coca cola and carbonated beverage. If a relation classification model with extremely high accuracy exists, a large amount of structured knowledge can be obtained through a large amount of unstructured texts on the Internet, so that a knowledge graph can be constructed, and the intelligent degree of the fields such as medical treatment, news and the like is improved.

The extraction technology of entity relations goes through the development process from the traditional machine learning method to the deep neural network. With the rapid development of computer computing power, the deep learning technology also becomes the mainstream technology in the field of artificial intelligence, and the deep learning also provides a new idea for the research of relationship classification. The neural network model can learn high-level semantic features, the process is completely to achieve the optimal mathematical solution through mathematical operation, and the dependence on human design is relatively reduced. However, human participation is still needed to design features which need to be extracted, such as grammar, word vectors and other features which can express sentence semantics, so as to guide the neural network model to learn high-level semantic features.

At present, various technologies such as voice assistants, chat robots and the like are getting more and more hot, and research on a relation classification technology has very good prospect and requirements.

Disclosure of Invention

Aiming at the defects of the prior art, the primary technical problem to be solved by the invention is to provide a CNN-based Chinese relation classification method.

Another technical problem to be solved by the present invention is to provide a chinese relation classification system based on CNN.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to a first aspect of the embodiments of the present invention, a CNN-based chinese relation classification method is provided, which includes the following steps:

splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain an input matrix of the CNN;

inputting the input matrix into the CNN convolution layer to obtain the characteristic vector of the sentence;

and inputting the feature vector of the sentence into a full connection layer of the CNN to obtain the probability of each relation type.

Preferably, the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism are spliced to obtain the input matrix of the CNN, and the method comprises the following steps:

carrying out word segmentation on the sentences marked with the entities to obtain word sequences; converting the word sequence into a word vector matrix;

calculating entity distance vectors PF1 and PF2 of the distance from each word to two entities in the sentence, and splicing the obtained entity distance vectors to form an entity distance vector matrix;

obtaining a word vector matrix with an attention adding mechanism according to the word vector matrix and the SDP weight sequence; and splicing the word vector matrix and the entity distance vector matrix of the sentence and the word vector matrix weighted by the attention mechanism to obtain the input matrix of the CNN.

Preferably, the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence; the method comprises the following steps:

extracting words on a semantic dependency key path in a sentence according to semantic dependency analysis, and giving word weights to the words on the semantic dependency key path to obtain an SDP weight sequence;

and weighting the word vector matrix according to the SDP weight sequence to obtain a weighted word vector matrix.

Preferably, the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, including the steps of:

establishing a correlation matrix of the correlation degree between each word in the sentence and the relationship type;

introducing an SDP weight sequence to obtain final weight output;

and weighting the word vector matrix through the final weight output to obtain a weighted word vector matrix.

Preferably, in establishing the association matrix of the association degree between each word in the sentence and the relationship type, the following formula is adopted:

T＝W_fU₁W_c；

wherein, W_fIs a word vector matrix; u shape₁Is a weight matrix to be learned; t is an incidence matrix of words and relation types, W_cVector matrix for relational type representation.

After the final weight output is obtained preferentially, the influence of noise is removed through a scaling factor, and the following formula is adopted:

h＝W_sdp+μR；

wherein μ is a scaling factor; h is the final weight of each word; r is a sentenceThe weight of each word in the list; w_sdpA weight for each word in the sentence.

Preferably, the weighting of the word vector matrix by the final weight output is performed by weighting the word vector matrix WF using the diagonal matrix of the final weight of each word to obtain a weighted word vector matrix.

Preferably, the input matrix is input into the convolutional layer of the CNN to obtain the characteristic vector of the sentence; the method comprises the following steps:

the input matrix is subjected to channel filtering on input data through a filter with a convolution window of 1, and only data of an effective channel is reserved;

inputting the filtered input matrix into 4 convolution windows with different sizes respectively for convolution, wherein the number of convolution kernels is reduced in sequence along with the increase of the number of windows, and the output data is activated;

for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel;

and obtaining the eigenvector of the input matrix through eigenvalue splicing.

Preferably, the feature vectors of the sentences are input into a full connection layer of the CNN to obtain the probability of each relation type; the method comprises the following steps:

inputting the feature vector of the sentence into a first hidden layer of the full-connection layer, and filtering N% of nodes with equal probability;

inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and a relation type vector matrix to obtain a score of each relation type;

and carrying out classification regression calculation on the score of each relationship type to obtain the probability of each relationship type.

According to a second aspect of the embodiments of the present invention, there is provided a CNN-based chinese relationship classification system, including a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:

According to the CNN-based Chinese relation classification method provided by the invention, on the basis of the current mainstream model, an attention mechanism based on a semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the classification effect is greatly improved. In addition, two hidden layers are adopted in the full-connection layer, the dependence of the model on a single node is reduced, and the condition of overfitting of the model is weakened or avoided.

Drawings

FIG. 1 is a flow chart of a CNN-based Chinese relationship classification method according to the present invention;

FIG. 2 is a diagram illustrating semantic dependency analysis according to an embodiment of the present invention;

fig. 3 is a diagram illustrating SDP weights in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a window of 3 convolution kernels in accordance with an embodiment of the present invention;

FIG. 5 is a schematic view of a convolutional layer structure in another embodiment of the present invention;

FIG. 6 is a schematic diagram of a fully connected layer structure in accordance with an embodiment of the present invention;

fig. 7 is a schematic structural diagram of the CNN-based chinese relationship classification system provided in the present invention.

Detailed Description

The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.

The invention adopts the convolution neural network technology to realize the relation classification model and realize Chinese relation classification. The traditional convolutional neural network model gives equal weight to each word in the sentence, resulting in a relatively distractive force. According to the invention, on the basis of the current mainstream model, an attention mechanism based on the semantic dependence path is added, so that key words expressing specific relations in sentences can be paid more attention, and the model effect is improved.

As shown in fig. 1, in the CNN-based chinese relationship classification method provided by the present invention, a sentence with two entities marked is input in a data processing portion, and the sentence is converted into a semantic dependency path weight formed by a word vector matrix WF, an entity distance vector matrix PF, and a semantic dependency path of the sentence through data preprocessing, and is used as an input of a model. In the input layer of the model, a word vector matrix Attention based WF added with Attention weight is generated by using WF, PF, SDP and a set model parameter class vector matrix classmatrix and is marked as ABWF. And splicing the WF, the PF and the ABWF together to form a final input. In the convolutional layer, 4 convolutional windows with different sizes are adopted for convolution, the convolution windows are respectively 2,3,4 and 5, and after convolution, the maximum pooling is adopted for pooling operation, so that the feature vector of the sentence is finally obtained. Two hidden layers are adopted for calculation in the full connection layer, then in the softmax layer, the previous model parameter class vector matrix class matrix is used as a weight, and finally the probability value of each class is output. This process is described in detail below.

S1, the word vector matrix WF and the entity distance vector matrix PF of the sentence are spliced with the word vector matrix ABWF weighted by the attention mechanism to obtain the input matrix of the CNN.

Before describing the CNN-based chinese relationship classification method provided by the present invention in detail, the relevant principles and techniques of the convolutional neural network will be introduced.

Convolutional Neural Networks (CNNs) are a widely used artificial deep learning neural network, often applied in the field of visual image processing. Convolutional neural networks generally consist of 3 parts: compared with the traditional neural network, the convolutional layer, the fully-connected layer and the output layer pass through the convolutional layer before input data enters the fully-connected layer.

What is the convolutional layer made? Assuming that a picture is input, the pixel point of the picture is as high as millions, so that the huge data size is obviously not processed by the full connection layer. Therefore, the convolution layer has the effect that input data are compressed into an input low-dimensionality feature vector through the continuous convolution and pooling process, the calculated amount of the full-connection layer is reduced, the noise removing effect is achieved, and finally the low-dimensionality feature vector is extracted.

The main operations to be performed by the convolutional layer are convolution and pooling. For convolution operation, convolution kernels with different weights are adopted, scanning is continuously carried out on a two-dimensional picture matrix in an input image, and values in the convolution kernels are calculated to serve as one value in output. Through convolution, each value in the output contains the information of the pixel point where the point is located and the information of the surrounding pixel points. Thus, the local information is extracted. Then, pooling is adopted, for example, taking the maximum pooling as an example, a maximum value is taken out from the information of a large area as a representative, so that the information is further extracted.

Through the convolution layer, the input information of each position can be continuously concentrated, and the feature vector containing the feature information of the whole sentence is extracted.

In the CNN-based Chinese relation classification method provided by the invention, the model adopts a convolution neural network, and the convolved object is each word, so that some important words can be highlighted through continuous convolution of the words to generate the feature vector of the words. Finally, inputting the feature vector of the sentence to the full connection layer to complete the classification task of the relation of the sentence.

In the embodiment provided by the invention, before the sentence is input into the CNN, preprocessing is carried out, namely, a word vector matrix WF, an entity distance vector matrix PF and a word vector matrix ABWF weighted by an attention mechanism are spliced to obtain an input matrix of the CNN; the method specifically comprises the following steps:

s11, performing word segmentation on the sentences marked with the entities to obtain word sequences; and converts the word sequences into a word vector matrix WF.

Because the Chinese marking corpora are rare at present and no related resources exist on the network, the English corpora can be used for initial model building. In the embodiment provided by the invention, when the English corpus is used for initial model building, the used English corpus is SemEval-2010task 8, the corpus is used for testing the performance of the model on a plurality of papers, and the model can be conveniently compared with other models by using the corpus. The CNN-based Chinese relation classification method provided by the invention adopts a convolutional neural network to construct the model, adds an attention mechanism on the basis of an early relation classification CNN model, and takes knowledge information of a Semantic Dependence Path (SDP) as an input characteristic of the model to improve the performance of the model.

When using Chinese corpora, there are two parts to the requirement for Chinese corpora: thesaurus and thesis abstract texts. In the embodiment of the invention, the keywords in the scientific search engine are used as part of the word stock, but the keywords are too many and contain a large number of nonsense words, so that the keywords in the scientific search word stock need to be filtered. The keyword library is adopted and can be obtained through entity relation classification research and application based on the Attention mechanism published by the Lining 2017 at the university of aerospace in Beijing. The thesaurus contains entity words 478990. However, the word stock is not very comprehensive, and some entity word supplement in the aspects of medicine and disease is hoped to be carried out.

Therefore, in the embodiment provided by the invention, the published literature coded by ICD-10 international disease classification standard is adopted to supplement the disease names, and the 10324 words of disease entities are supplemented. In addition, a word stock of 'medicine name large-scale' in the dog searching word stock is adopted to supplement medicine entity words, and 35989 medicine entity words are supplemented in total. Supplemented, the final thesaurus size was 518346.

Because no Chinese labeling linguistic data exists, machine matching is carried out by using a regular expression, and the labeling linguistic data which can be used for training is obtained. Therefore, the corpus source is a paper library of a scientific search engine, in which about 4000 thousands of abstracts of papers are contained, and a part of the abstracts is used to form a Chinese training test set.

The Chinese word vector model has many open sources, and is not limited herein. The Chinese word vector model training corpus is derived from wiki Chinese encyclopedia, and embedded word vectors are trained by using gensim. Finally, the total number of words in the Chinese word vector model is 641372.

The model may enter sentences that label entities, e.g.) "<Entity>Oil field of mud bay</Entity>Is located at<Entity>Shangan Ning basin</Entity>In the treasure area of Yanan city in the southeast east ". The method adopts a jieba word segmentation device to segment words of the model, and modifies word segmentation matching rules of the existing jieba word segmentation device to a certain extent and adds the support of some special characters. So that the word sequence S ═ w (w) corresponding to the sentence can be input₁,w₂,...w_n) Wherein two entities are labeled e₁＝w_sAnd e₂＝w_t. Next, a pre-trained word vector matrix Emb is used_wordThe word sequence is converted into a word vector matrix WF. Here:

wherein d is_wIs the dimension of the word vector, n is the length of the sentence, and R is the weight of each word in the sentence. Here, it is set that the dimension d of the word vector_w300, the sentence length n is 100, and when the sentence is less than 100 words, zero vector is used for filling. Wherein, the word vector matrix Emb for converting the word sequence into the word vector matrix WF_wordThe training mode can be obtained by any existing training mode, and is not particularly limited in the embodiment provided by the invention.

S12, calculating entity distance vectors PF1 (first entity distance vector) and PF2 (second entity distance vector) of the distance from each word to two entities in the sentence, and splicing the obtained entity distance vectors to form an entity distance vector matrix PF.

Specifically, in order to represent semantic information, the distance between a word and an entity word is used to represent the semantic information, and the implementation manner is adopted. Such as the sentence:

with respect to the word "treatment", it is at a distance of 3 from the physical word "aspirin" and-1 from the physical word "arthritis". For the entity word itself, its distance to itself is then 0. Thus for the word "treatment", the pair of physical distances is [ p ]₁,p₂]＝[3,-1]. In the model, the maximum length of a sentence is set to 100 words, so the distance difference is-99 to 99, which is different from 199. Thus, the entity distance vector matrix of entity 1 is set

Entity distance vector matrix of entity 2And is

Wherein d is_pFor the distance vector dimension, set to 50, l is the entity distance vector matrix length. Therefore, entity distance vectors PF1 and PF2 corresponding to each word can be obtained, and finally PF1 and PF2 are spliced together to form the PF.

PF＝[PF1,PF2](2)

Wherein the content of the first and second substances,d_pthe dimension after splicing PF1 and PF2 is 100, and n is the sentence length of 100.

S13, obtaining a word vector matrix ABWF added with an attention mechanism according to the word vector matrix and the SDP weight sequence; the method specifically comprises the following steps:

s131, extracting words on the semantic dependence key path in the sentence according to the semantic dependence analysis, and giving word weight to the words on the semantic dependence key path to obtain an SDP weight sequence.

First, a concept, Semantic Dependency analysis (SDP), is introduced. SDP is a method that analyzes semantic associations between components in sentences and presents the semantic associations in the form of dependency structures. The semantic dependency is used for extracting the semantics of the sentence, and the advantage is that the meaning of each word is not abstracted, but the word is described through the semantic frame role of the word in the sentence. As shown in fig. 2,3 sentences are used to describe the same thing, "zhang san dieg apple". It can be seen that each sentence has a Root node as a Root node, and the Root node generally connects the main predicate elements in the sentence, for example, although the syntax of the three sentences in fig. 2 is different, the Root node is connected to the word "eat". And then, according to the root node, the semantic role of each word in the sentence is expanded at one time.

With this semantic dependency analysis, two entities can always be selected from the sentence, such as "zhang san" and "apple" in fig. 2. Because of the existence of the root node, the two entity words can be always connected together through a path. More importantly, words on this path, such as "eat," are the key words that express the relationship between these two entities. In the task of relationship classification, the words on the paths should be given higher weight to make the words more emphasized. In the embodiment provided by the invention, the words on the semantic dependency critical path in the sentence are extracted, and the words are assigned with word weights. For example: the term on the semantic dependency critical path is assigned a term weight of 0.8, while the other terms in the sentence have a weight of 0.3 and the filler terms in the sentence have a weight of 0, as shown in fig. 3.

The tools that can be used have: for realizing the semantic analysis of Chinese, a Chinese natural language processing tool of 'Kazakh big LTP' can be used, and for realizing the semantic analysis of English, spaCy can be used. Then, a sequence of SDP weights may be generated, where n is the sentence length, SDP represents each word weight:

SDP∈Rⁿ(3)

s132, weighting the word vector matrix WF according to the SDP weight sequence to obtain a weighted word vector matrix ABWF, that is, the word vector matrix with the attention mechanism added.

S1321, establishing a correlation matrix of the correlation degree between each word in the sentence and the relation type;

s1322, introducing an SDP weight sequence to obtain final weight output;

s1323, weighting the word vector matrix WF through the final weight output to obtain a weighted word vector matrix ABWF, that is, the word vector matrix with the attention mechanism added.

Specifically, in the ABCNN (Attention-based connected neural network), a method of implementing an Attention mechanism in the CNN model is proposed for the first time. The corresponding question scenario is "compare whether the semantics of two sentences are the same". In the CNN model without using the attention mechanism, two sentences are separately subjected to model processing, and the existence of the second sentence is not known at all while the first sentence is processed. What is compared finally is whether the semantics of the two sentences are the same, and in fact, the model does not fully utilize the information which can be acquired.

In the ABCNN, when inputting, a layer is added on the basis of the original layer of input, and the layer is the weight obtained by the attention mechanism.

Then how can this layer be obtained? Assuming that this layer is G, the scenario is that the first sentence has 5 words and the second sentence has 7 words. From the inner product of the word vectors, the degree of similarity of the two words can be obtained, and then A ∈ R can be obtained^5×7. By using:

G＝W₀UA^T(4)

wherein U is E.R^d×dG∈R^d×5And d is the dimension of the word vector, and G can be obtained by learning U, wherein G represents the attribute feature map. It can be seen that when a word in matrix a is more closely associated with another word, some of the weights in the column corresponding to the word in matrix a are large. When calculated using this formula, the resulting matrix G is weighted more heavily for the word vector for that word, and thusMore attention will be focused on words that are closely related to the second sentence. This is the basic principle of the attention mechanism in CNNs.

In the embodiments provided by the present invention, certain changes are made to the scheme in ABCNN.

In ABCNN, attention is paid to the association between two sentences, whereas in the embodiments provided by the present invention attention is paid to the association between a sentence and a relationship type. For this purpose, the relationship type needs to be vectorized, and the relationship type is expressed as a vector matrix

Wherein d is_cIs the dimension of the type vector, with a value of 50, c is the number of types, with a value of 14. W_cIs a weight matrix that needs to be learned, and the feature vectors of the relationship types are also continuously adjusted in the learning.

First, a relevance matrix T of the relevance degree between each word in the sentence and the relation type needs to be established, wherein WF is denoted as W_f：

T＝W_fU₁W_c(5)

Wherein the content of the first and second substances,

for the weight matrix to be learned, T ∈ R^n×cFor the incidence matrix of words and relation types, T_i,jIndicating the degree of association of the ith word with the jth relationship. In the same way, the weight matrix of SDP is introduced again, and the SDP matrix is marked as W_sdpThen, the final weight output is obtained, as shown in equation (6):

R＝W_sdpU₂T^T(6)

wherein U is₁∈R^n×cFor the weight matrix to be learned, R ∈ RⁿThe weight of each word in the sentence. Here, how to understand W is described in detail_sdpThe role played, W_sdpRepresents the weight of each word in the sentence, so W_sdpThe object represented is the sentence, passing W_sdp ^TU₂T^TThe operation of (1) is to establish a correlation matrix of sentences and words, and the obtained correlation degree of the sentences and each word, namely the weight sequence of the words.

The weight value in R introduces the information of the relation feature vector and the SDP semantic information, there is a certain noise on the filler word, so it is desirable that this weight value can be changed on the basis of the basic weight value, and then a scaling factor μ is added:

h＝W_sdp+μR (7)

finally h is belonged to RⁿIs the final weight for each word.

By weighting the word vector matrix WF with the diagonal matrix of h, a weighted word vector matrix ABWF, i.e. a word vector matrix with an attention mechanism added, can be obtained.

S14, the word vector matrix WF and the entity distance vector matrix PF of the sentence are spliced with the word vector matrix ABWF weighted by the attention mechanism to obtain the input matrix of the CNN.

The final inputs include: the word vector matrix WF, the entity distance vector matrix PF, and the word vector matrix ABWF weighted by the attention mechanism are spliced. The final input matrix is

Wherein d is_m＝d_w×2+d_pThe value is 700 and n is the sentence length 100.

S2, the input matrix is input into the CNN convolution layer to obtain the sentence characteristic vector.

The convolutional layer is the most important layer in the convolutional neural network, and has a significant effect on the effect of the model, and the convolutional layer will be described with a task.

The matrix M with the final inputs having been processed before, in one embodiment provided by the invention, is represented by M ∈ R¹⁰⁰ ^×700For example, the input is 100 words, and the feature vector of each word is 700 dimensions, hereDimension 700 may be understood as 700 channels. Here, the convolution window size is set to 3 and the number of convolution kernels is 50. As shown in fig. 4, when the convolution kernel is convolved to three words of "being", "eating" and "one", the dot product operation is performed on the matrix formed by the three words and the convolution kernel to obtain a numerical value as the result of the operation, and there are 50 convolution kernels in total, so that the 3 words are convolved to form a 50-dimensional vector, which can also be understood as 50 channels. Then the 100 x 700 input matrix is concentrated into a 100 x 50 matrix P by convolution. As shown in fig. 4, each row in P has information of 3 rows in M, such as "being", "eating" and "one". And the convolution kernel can continuously correct the weight of the convolution kernel in the continuous learning process, and useful information is extracted from the input. Then, through maximum pooling, only one maximum value is taken as the eigenvalue of each channel, and finally, the eigenvector f epsilon R of the input matrix is obtained⁵⁰。

In the embodiment provided by the present invention, the input matrix is input to the convolutional layer of CNN to obtain the feature vector of the sentence, and the following steps may also be adopted:

and S21, the input matrix passes through a filter with convolution window 1, and the input data is subjected to channel filtering, and only the data of the effective channel is reserved.

And S22, respectively inputting the filtered input matrix into convolution kernels with window sizes of 2,3,4 and 5, sequentially reducing the number of the convolution kernels along with the increase of the number of the windows, and activating the output data.

And S23, for a plurality of data on each channel, performing pooling by adopting a maximum pooling scheme, and reserving the most important characteristic value of each channel.

S24, obtaining the output of the channel L through eigenvalue splicing, and obtaining the eigenvector f of the input matrix M_m∈R⁵⁰⁰. Where L is the total number of convolution kernels for each window size.

Specifically, in most documents, a window of 3 convolution kernels are used for convolution, and it can be seen that the convolution aiming at 3 words should be the most appropriate convolution strategy. In an english sentence, generally, each 3 words may have a main component of a sentence, so the convolution kernel with window 3 can extract each main component of the sentence well as use it as a feature value. However, after reading some abstracts of the science search, the abstracts are found to be too long, and a large number of modifiers, fictional words, conjunctions and the like appear. The main components in a sentence are directly far away, and a large number of irrelevant words are inserted in the middle, so that a convolution kernel with a convolution window of 3 may not be suitable for a search language environment.

Based on the above-mentioned analyzed characteristics of the data in the field, in the embodiment provided by the present invention, the convolution kernels of convolution windows with different sizes are set, and the model itself selects which convolution kernel size to use. Fig. 5 shows a design scheme of the inclusion module.

Specifically, in the convolutional layer, the sizes of the convolution windows are divided into four sizes, i.e., 2,3,4, and 5. Before entering the convolution windows with the sizes, a filter with the convolution window of 1 is firstly passed, the function of the filter is to perform channel filtering on input data and only retain data of a valid channel, and the number of the filters with the convolution window of 1 is uniformly set to be 400 before each convolution window. The number of convolution kernels with window sizes of 2,3,4 and 5 is set to be 200,150,100 and 50 respectively in the model, and the number is set to be reduced in sequence, and in the embodiment provided by the invention, the relu activation function is adopted to activate the output. Then, for a plurality of data on each channel, a maximum pooling scheme is adopted for pooling, and the most important characteristic value of each channel is reserved. Finally, through splicing, the output with the channel of 500 can be obtained, and finally the eigenvector f of the input matrix M is obtained_m∈R⁵⁰⁰。

S3, inputting the feature vector of the sentence into the full-connection layer of the CNN to obtain the probability of each relation type; the method comprises the following steps:

s31, inputting the feature vector of the sentence into the first hidden layer of the full-connection layer, and filtering N% nodes with equal probability, wherein N can be set according to the requirement.

And S32, inputting the filtered nodes into a second hidden layer of the full-connection layer, and performing matrix multiplication on the output of the second hidden layer and the relation type vector matrix to obtain the score of each relation type.

And S33, performing softmax classification regression on the score of each relationship type to obtain the probability of each relationship type.

Specifically, as shown in fig. 6, the model has a full-link layer structure, the full-link layer adopts 2 hidden layers, the output of the last hidden layer is subjected to matrix multiplication with a relationship type vector matrix, and calculation is performed through the softmax layer, so that the probability of each relationship type is finally obtained.

The sentence feature vector f obtained previously_m∈R⁵⁰⁰This is also the input to the fully connected layer. To reduce the effect of overfitting, the model will use dropout layers with dropout values of 0.25. For example, the dropout layer has the effect that for 100 input nodes, each node has a probability of 25% being deleted, so that in the case of many input nodes, only 75% of the nodes participate in subsequent operations. In each operation process, each node is possibly deleted, so that the dependence of the model on a single node is reduced, and the condition of model overfitting is weakened or avoided.

Only one hidden layer can not provide enough nonlinear operation, so that two hidden layers are adopted, the model can carry out more nonlinear operations, and the model effect is improved. In the embodiment provided by the invention, 2 hidden layers are reasonably adopted. The first hidden layer adopts 200 hidden layer units, and the number of the second hidden layer units is 50 when the second hidden layer enters the 2 nd hidden layer with the probability of 0.25 dropout. The 50 hidden layer units are arranged to generate 50-dimensional feature vectors to participate in subsequent operations.

In the attention mechanism, the model uses a relationship type vector matrix W_cThe vector matrix needs to be learned to obtain the proper relation type feature vector, and the relation type vector matrix W_cIt is learned at this level. The above operation of the convolutional layer yields a feature vector of the sentence, which is denoted as h:

s＝hW_c(9)

obtaining s-epsilon R through matrix multiplication^cWhere c is the number of relationship types 14 and s is the score for each relationship type. Finally, softmax classification regression calculation was performed for s:

p_iis the probability of each relationship type, where s_iFor the score of category i, the score is indexed using the softmax classification formula, resulting in a probability value for each category.

Compared with the mode that max can only take the maximum value, so that a smaller value cannot be taken, the softmax algorithm enables a higher probability of taking a large value, and a smaller value to be taken. Moreover, the softmax and the cross entropy calculate the loss function, the final derivative result is very simple, and the derivative operation is simple and is one of the advantages of the softmax.

Finally, p epsilon R can be obtained^cAnd c is the relationship type number 14 as the output of the model.

And (4) performing back propagation on the model by using a cross entropy loss function, and updating the model parameters.

The invention also provides a Chinese relation classification system based on the CNN. As shown in fig. 7, the system includes a processor 72 and a memory 71 storing instructions executable by the processor 72;

the processor 72 may be a general purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.

The memory 71 is used for storing the program codes and transmitting the program codes to the CPU. Memory 71 may include volatile memory, such as Random Access Memory (RAM); the memory 71 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 71 may also comprise a combination of memories of the kind described above.

Specifically, the chinese relationship classification system based on CNN provided by the embodiment of the present invention includes a processor 72 and a memory 71; the memory 71 has stored thereon a computer program operable on the processor 72, which when executed by the processor 72 performs the steps of:

When the word vector matrix of the sentence, the entity distance vector matrix, and the word vector matrix weighted by the attention mechanism are spliced to obtain the input matrix of the CNN, the computer program is executed by the processor 72 to implement the following steps;

the method comprises the following steps of splicing a word vector matrix and an entity distance vector matrix of a sentence and a word vector matrix weighted by an attention mechanism to obtain a CNN input matrix:

Wherein, when the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence, the computer program is executed by the processor 72 to implement the following steps;

Wherein, when the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, the computer program is executed by the processor 72 to implement the following steps;

introducing an SDP weight sequence to obtain final weight output;

Wherein, in establishing a relevance matrix of the degree of relevance between each word in the sentence and the relationship type, the computer program is executed by the processor 72 to implement the following steps;

the following formula is adopted:

T＝W_fU₁W_c；

Wherein the computer program realizes the following steps when being executed by the processor 72;

after the final weight output is obtained, the influence of noise is removed through a scaling factor, and the following formula is adopted:

h＝W_sdp+μR；

wherein μ is a scaling factor; h is the final weight of each word; r is the weight of each word in the sentence; w_sdpA weight for each word in the sentence.

the weighting of the word vector matrix by the final weight output is to weight the word vector matrix WF using the diagonal matrix of the final weight of each word to obtain a weighted word vector matrix.

Wherein, when the input matrix is input to the convolutional layer of CNN to obtain the feature vector of the sentence, the computer program is executed by the processor 72 to implement the following steps;

and obtaining the eigenvector of the input matrix through eigenvalue splicing.

Wherein, when the feature vector of the sentence is input into the fully connected layer of the CNN to obtain the probability of each relationship type, the computer program is executed by the processor 72 to implement the following steps;

The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.

The CNN-based chinese relationship classification method and system provided by the present invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims

1. A CNN-based Chinese relation classification method is characterized by comprising the following steps:

2. The CNN-based chinese relationship classification method of claim 1, wherein the word vector matrix of the sentence, the entity distance vector matrix, and the word vector matrix weighted by the attention mechanism are concatenated to obtain the CNN input matrix, comprising the steps of:

calculating entity distance vectors PF1 and PF2 from each word in the sentence to two entity distances respectively, and splicing the obtained entity distance vectors to form an entity distance vector matrix;

3. The audio visualization method based on spectrum analysis according to claim 2, wherein the word vector matrix with the attention mechanism is obtained according to the word vector matrix and the SDP weight sequence; the method comprises the following steps:

4. The CNN-based chinese relationship classification method of claim 3, wherein the word vector matrix is weighted according to the SDP weight sequence to obtain a weighted word vector matrix, comprising the steps of:

introducing an SDP weight sequence to obtain final weight output;

5. The CNN-based chinese relationship classification method according to claim 4, wherein the correlation matrix of the degree of correlation between each word in the sentence and the relationship type is established by using the following formula:

T＝W_fU₁W_c；

6. The CNN-based chinese relationship classification method of claim 4, characterized by:

the sequence of SDP weights is introduced and,

h＝W_sdp+μR；

7. The CNN-based chinese relationship classification method of claim 6, wherein:

8. The CNN-based chinese relationship classification method of claim 1, wherein the input matrix is input to the CNN convolutional layer to obtain a feature vector of a sentence; the method comprises the following steps:

and obtaining the eigenvector of the input matrix through eigenvalue splicing.

9. The CNN-based chinese relationship classification method of claim 1, wherein the feature vectors of sentences are input into the fully-connected layer of CNN to obtain the probability of each relationship type; the method comprises the following steps:

10. A CNN-based Chinese relation classification system is characterized by comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of: