CN108804677A - In conjunction with the deep learning question classification method and system of multi-layer attention mechanism - Google Patents

In conjunction with the deep learning question classification method and system of multi-layer attention mechanism Download PDF

Info

Publication number
CN108804677A
CN108804677A CN201810599036.4A CN201810599036A CN108804677A CN 108804677 A CN108804677 A CN 108804677A CN 201810599036 A CN201810599036 A CN 201810599036A CN 108804677 A CN108804677 A CN 108804677A
Authority
CN
China
Prior art keywords
vector
interrogative
matrix
term vector
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810599036.4A
Other languages
Chinese (zh)
Other versions
CN108804677B (en
Inventor
余本功
许庆堂
陈杨楠
陈能英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201810599036.4A priority Critical patent/CN108804677B/en
Publication of CN108804677A publication Critical patent/CN108804677A/en
Application granted granted Critical
Publication of CN108804677B publication Critical patent/CN108804677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of deep learning question classification methods of combination multi-layer attention mechanism, are related to technical field of information processing, which includes:Interrogative vector set is built, which includes the common term vector that interrogative vector sum contains query word information;According to interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;It is mapped according to window, extracts the temporal aspect of question sentence;According to temporal aspect, by Question Classification.This method enhances the semantic information of the interrogative in question sentence, and is merged convolutional neural networks and long memory models in short-term by the attention mechanism in deep learning, effectively improves the precision of Question Classification.

Description

In conjunction with the deep learning question classification method and system of multi-layer attention mechanism
Technical field
The present invention relates to technical field of information processing, and in particular, to a kind of depth of combination multi-layer attention mechanism Problem concerning study sorting technique and system.
Background technology
There are two main classes for traditional problem sorting technique, and one kind is rule-based method, and another kind of is to be based on engineering The method of habit.Rule-based method is mainly by manually establishing a large amount of rule bases, and the method for taking rule match is asked to realize Topic classification.Method based on machine learning is that Question Classification is regarded as a kind of supervised learning task, by manually formulate feature, It introduces external language material to be extended question text, while utilizing the machine learning such as classical support vector machines, naive Bayesian Algorithm is classified.
In recent years, since depth learning technology is more and more ripe, be based on machine learning the problem of sorting technique gradually to depth Degree study development.But with the deep development of question answering system research, there is higher to the precision of question classification method, practicability Requirement.There are following two problems more in existing deep learning question classification method:
First:There is no pay close attention to query word information to the term vector that existing deep learning model uses.With Plain text classification is different, and the type identification of question sentence is more dependent on interrogative this characteristic of division.This is because question text Shorter, semantic information and word co-occurrence information are insufficient, so the interrogative in question sentence is affected to the result of Question Classification.
Second:Existing question classification method cannot merge the feature extraction ability of two kinds of models well.Existing depth Degree problem concerning study sorting technique mainly uses convolutional neural networks and long memory models in short-term, two kinds of models respectively to have excellent lack Point:Convolutional neural networks can preferably capture the complex mappings from data itself to high-level semantic, have and be far more than traditional machine The ability to express of device learning model;And long memory models in short-term can model the temporal aspect of text, it is effective to utilize The contextual information of question text.
Invention content
The object of the present invention is to provide a kind of deep learning question classification method of combination multi-layer attention mechanism, the party Method enhances the semantic information of the interrogative in question sentence, and by the attention mechanism in deep learning by convolutional neural networks and Long memory models in short-term are merged, and the precision of Question Classification is effectively improved.
To achieve the goals above, on the one hand, embodiments of the present invention provide a kind of combination multi-layer attention The deep learning question classification method of mechanism, this method include:Interrogative vector set is built, interrogative vector set includes doubting It asks term vector and contains the common term vector of query word information;According to interrogative vector set, asked using convolution algorithm extraction The window mapping of sentence;It is mapped according to window, extracts the temporal aspect of question sentence;According to temporal aspect, by Question Classification.
Preferably, structure interrogative vector set specifically includes:The interrogative in Chinese Question is collected, to build interrogative Allusion quotation;The text message of question sentence is read, and is compared with interrogative dictionary, to obtain the interrogative and generic word in question sentence, and is established Term vector set, term vector set include generic word column vector and interrogative column vector;Diagonal concern matrix is established, for characterizing Contextual correlation between generic word column vector and interrogative column vector and bonding strength;It is normalized to diagonally paying close attention to matrix Processing obtains input layer attention matrix;According to input layer attention matrix, based on the attention Mechanism establishing in deep learning Interrogative vector set, and generate query term vector permutation matrix.
Preferably, diagonally concern matrix is indicated using formula (1):
Wherein, ArDiagonally to pay close attention to matrix,Diagonally to pay close attention to i-th of element on the diagonal line of matrix, indicate i-th Generic word column vector wiWith the relative importance between interrogative column vector e, f is generic word column vector wiWith interrogative arrange to The inner product operation function of e is measured, β is parameter vector, and e is interrogative column vector, wiFor i-th of generic word column vector,;
I-th of element on the diagonal line of input layer attention matrix is indicated using formula (2):
Wherein,For i-th of element on the diagonal line of input layer attention matrix, the in term vector set is indicated The generic word column vector w of i positioniOr the relative importance between interrogative column vector e and interrogative column vector e,For I-th of element on the diagonal line of diagonal concern matrix,Diagonally to pay close attention to j-th of element on the diagonal line of matrix, exp is Exponential function,For summing function, n is the number of the generic word column vector and interrogative column vector that include in term vector set Amount;
I-th of common term vector in interrogative vector set is indicated using formula (3):
Wherein, xiFor i-th of common term vector in interrogative vector set;
Query term vector in interrogative vector set is indicated using formula (3 '):
Wherein, xjFor the query term vector in interrogative vector set;
Query term vector permutation matrix is indicated using formula (4):
X=[x1, x2..., xn] formula (4)
Wherein, X is query term vector permutation matrix, dimension Rl×n, l is the dimension of the common term vector of interrogative vector sum, N is the number for the common term vector of interrogative vector sum for including in interrogative vector set, and comma represents row concatenation and connects.
Preferably, according to interrogative vector set, the window mapping that question sentence is extracted using convolution algorithm is specifically included:According to Interrogative vector set establishes the window matrix of the generic word vector sum query term vector in interrogative vector set;Structure is more A filter;Multiple filters generate multiple Feature Mappings with window matrix by convolution algorithm;By multiple Feature Mapping transposition It resets, to obtain window mapping.
Preferably, window matrix is indicated using formula (5):
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor in i-th position in interrogative vector set query term vector or common term vector pair The window matrix answered, for extracting the semantic feature in interrogative vector set, xiFor the query term vector in i-th of position Or common term vector, xi+1For in i+1 position query term vector or common term vector, xi+k-1To be in the i-th+k-1 The query term vector or common term vector of position, k are empirical value, and comma indicates that concatenation connects;
The Feature Mapping that j-th of filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that query term vector or common term vector generate is acted on for j-th of filter, for strong Change the semantic feature of query term vector or common term vector in j-th position, f is the activation primitive of convolution algorithm, and m is filter Wave device parameter matrix, b are bias term;
Window mapping is indicated using formula (7):
E=[c1;c2;…ci;…;cd] formula (7)
Wherein, E maps for window, indicates the timing of question sentence, ciFor the Feature Mapping for using i-th of filter to generate, d For the quantity of multiple filters.
Preferably, it is mapped according to window, the temporal aspect for extracting question sentence specifically includes:It is mapped according to window, establishes connection Layer attention matrix, the importance for indicating generic word and interrogative in question sentence;Articulamentum attention matrix is returned One change is handled, the articulamentum attention matrix after being normalized;Articulamentum attention after mapping and normalize according to window Matrix extracts the temporal aspect of question sentence.
Preferably, articulamentum attention matrix is indicated using formula (8):
Wherein, AcFor articulamentum attention matrix,For j-th of element on the diagonal line of articulamentum attention matrix, Tanh is hyperbolic tangent function, and U is weighting parameters matrix, EjFor j-th of column vector of window mapping, b is bias term;
J-th of element on the diagonal line of articulamentum attention matrix after normalization is indicated using formula (9):
Wherein,For j-th of element on the diagonal line of the articulamentum attention matrix after normalization, exp is index letter Number,For summing function, n is the number for the common term vector of interrogative vector sum for including in interrogative vector set;
Temporal aspect is indicated using formula (10):
Wherein, GjFor in i-th position in interrogative vector set the query term vector or generic word to It measures corresponding temporal aspect, characterizes the query term vector in j-th position in interrogative vector set or common term vector Importance.
Preferably, which is characterized in that the set function f of convolution algorithm is line rectification function, with interrogative vector set In query term vector in j-th position or the corresponding Feature Mapping of common term vector indicated using formula (11):
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in interrogative vector set query term vector or common term vector pair The Feature Mapping answered, the semantic feature for strengthening the query term vector or common term vector that are in j-th position, f is convolution The activation primitive of operation, m are filter parameter matrix, and b is bias term, WiFor in interrogative vector set be in i-th The corresponding window matrix of query term vector or common term vector of position, max (0, x) are line rectification function.
On the other hand, embodiments of the present invention provide a kind of depth of combination multi-layer attention mechanism Practise Question Classification system, which is characterized in that including:Input layer, for inputting query term vector permutation matrix;Convolutional Neural net Network, the window for extracting question sentence map;Two-way length memory network in short-term, the temporal aspect for extracting question sentence;Grader is used According to temporal aspect by Question Classification.
Through the above technical solutions, the deep learning question classification method and system pay attention to by query dictionary and input layer Torque battle array constructs interrogative vector set, enhances the semantic information of the interrogative in question sentence, and interrogative vector is gathered It inputs convolutional neural networks and extracts local feature, in addition this method and system have also merged convolutional Neural net by attention mechanism Network and long memory network, attention mechanism in short-term can be screened to the most useful convolution feature of Question Classification, be conveyed to two-way length Short-term memory network carries out high-level temporal aspect extraction, to effectively increase the precision of Question Classification.
Other features and advantages of the present invention will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is to be used to provide further understanding of the present invention, an and part for constitution instruction, with following tool Body embodiment is used to explain the present invention together, but is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the deep learning Question Classification side of combination multi-layer attention mechanism according to an embodiment of the present invention The flow chart of method;
Fig. 2 shows the deep learning question classification method of one embodiment of the present invention and other three kinds of four kinds of models pair The accuracy rate of the classification results of the problems in data set compares block diagram.
Specific implementation mode
The specific implementation mode of the present invention is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific implementation mode stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
Question Classification is the basic component part of question answering system research, and the height of precision directly affects system to natural language Say the quality of understanding effect.On the one hand, Question Classification can be subsequent information retrieval by the target answer type of determining problem Semantic restriction and constraint are provided with answer extracting, the seeking scope of candidate answers is reduced, improves the accuracy rate of question answering system.Example Such as, " where is most notable emblem restaurant?" be one about place the problem of, when follow-up answer extracting, only needs matching candidate to answer Venue type entity in case, accurately and effectively improves question and answer precision.Still further aspect, Question Classification can be question and answer The selection strategy of pattern provides foundation, for example, " what the characteristics of authentic Anhui cuisine is?" it is a description class problem, generate answer When should lay particular emphasis on description and introduction to substance feature, be supplied to user most want obtain answer knowledge.
Fig. 1 is the deep learning Question Classification side of combination multi-layer attention mechanism according to an embodiment of the present invention The flow chart of method.As shown in Figure 1, in one embodiment of the present invention, a kind of combination multi-layer attention mechanism is provided Deep learning question classification method, this method may include:
In step S101, interrogative vector set is built, interrogative vector set includes that interrogative vector sum contains The common term vector of query word information;
In step s 102, according to interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;
In step s 103, it is mapped according to window, extracts the temporal aspect of question sentence;
In step S104, according to temporal aspect, by Question Classification.
Above-mentioned steps constitute deep learning Question Classification model, and error in classification will be calculated after Question Classification by the model, In the case where error in classification is more than preset value, the parameter of the model is repeatedly updated until classifying quality (result) is restrained.
The deep learning question classification method of combination multi-layer attention mechanism provided by the invention possesses length in short-term simultaneously Memory network captures the advantage of global context and convolutional neural networks capture question sentence local feature, and possesses to the weight such as interrogative The feature vector of fusion is finally added to completion Question Classification task in grader by the Extracting Ability of point feature.The depth Question classification method is practised to specifically include:
(1) interrogative vector set is built
Collect the interrogative in Chinese Question, to build query dictionary, such as " who ", " where ", " how many " etc. belong to Interrogative.The text message of question sentence is read, and is compared with interrogative dictionary, to obtain the interrogative and generic word in question sentence, and Term vector set is established, term vector set includes generic word column vector and interrogative column vector.Diagonal concern matrix is established, is used for Characterize the contextual correlation and bonding strength between generic word and interrogative.Formula (1) for example may be used in diagonal concern matrix It indicates:
Wherein, ArDiagonally to pay close attention to matrix,Diagonally to pay close attention to i-th of element on the diagonal line of matrix, indicate i-th Generic word column vector wiWith the relative importance between interrogative column vector e, f is generic word column vector wiWith interrogative arrange to The inner product operation function of e is measured, β is parameter vector, and e is query term vector, wiFor i-th of generic word column vector.Parameter vector β's Value for example can be with random initializtion, and is constantly updated using gradient descent method during model training.
It is normalized to diagonally paying close attention to matrix, obtains input layer attention matrix, input layer attention matrix Formula (2) for example may be used to indicate in i-th of element on diagonal line:
Wherein,For i-th of element on the diagonal line of input layer attention matrix, the in term vector set is indicated The generic word column vector w of i positioniOr the relative importance between interrogative column vector e and interrogative column vector e,For I-th of element on the diagonal line of diagonal concern matrix,Diagonally to pay close attention to j-th of element on the diagonal line of matrix, exp is Exponential function,For summing function, n is the number of the generic word column vector and interrogative column vector that include in term vector set Amount.
According to input layer attention matrix, interrogative vector set is established, and generates query term vector permutation matrix.Query Formula (3) for example may be used to indicate in i-th of common term vector in term vector set:
Wherein, xiFor i-th of common term vector in interrogative vector set;
Formula (3 ') for example may be used to indicate in query term vector in interrogative vector set:
Wherein, xjFor the query term vector in interrogative vector set;
Formula (4) for example may be used to indicate in query term vector permutation matrix:
X=[x1, x2..., xn] formula (4)
Wherein, X is query term vector permutation matrix, dimension Rl×n, l is the dimension of the common term vector of interrogative vector sum, N is the number for the common term vector of interrogative vector sum for including in interrogative vector set, and comma represents row concatenation and connects.It doubts Ask that the generic word for including in the number of the common term vector of interrogative vector sum for including in term vector set and term vector set arranges The quantity of vector sum interrogative column vector is identical.
(2) feature extraction is carried out by convolution algorithm
Convolutional neural networks can extract local feature by convolution algorithm from question sentence, and local feature can be asked The local semantic feature between adjacent generic word or interrogative in sentence.In one embodiment of the present invention, using d The filter of same size closes sliding in interrogative vector set, to extract generic word and the query of the different location in question sentence The text message of word obtains interrogative semantic dependency relations Feature Mapping.Filter can for example be expressed as m ∈ Rl×k, i.e. convolution Filter m is the matrix of a l rows k row.The filter for example can be one-dimensional filtering device, that is to say, that the filter is only one It is slided in a dimension.
With i-th of the generic word or the corresponding window matrix W of interrogative in question sentenceiBy k common term vectors or k Generic word vector sum interrogative vector forms, such as can be indicated by formula (5):
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor with i-th of the generic word or the corresponding window matrix of interrogative in question sentence, for extracting interrogative Semantic feature in vector set, xiFor be in interrogative vector set i-th position query term vector or generic word to Amount, xi+1To be in the query term vector of i+1 position or common term vector, x in interrogative vector seti+k-1For interrogative Query term vector or common term vector, k in a positions the i-th+k-1 in vector set are empirical value, and comma indicates vector grade Connection.
Convolution filter m carries out convolution algorithm with multiple window matrixes respectively, multiple Feature Mappings is generated, wherein j-th The Feature Mapping that filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that query term vector or common term vector generate is acted on for filter, at reinforcing In the query term vector of j-th position or the semantic feature of common term vector, f is the activation primitive of convolution algorithm, and m is filter Parameter matrix, b are bias term, and the initial value of matrix m and b for random initializtion by obtaining, and during model training It constantly updates.
Due to studies have shown that line rectification function has unilateral inhibition and relatively broad excited boundary, and can lead to Introducing network sparsity is crossed to obtain faster feature learning rate, therefore convolution algorithm in one embodiment of the present invention Activation primitive f for example can be line rectification function, then with the interrogative in j-th of position in interrogative vector set Formula (11) for example may be used to indicate in vector or the common corresponding Feature Mapping of term vector:
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in interrogative vector set query term vector or common term vector pair The Feature Mapping answered, the semantic feature for strengthening the query term vector or common term vector that are in j-th position, f is convolution The activation primitive of operation, m are filter parameter matrix, and b is bias term, WiFor in interrogative vector set be in i-th The corresponding window matrix of query term vector or common term vector of position, max (0, x) are line rectification function.
To retain the sequential character of question sentence, for the Feature Mapping c of d equal lengthj, j=1,2 ... d, by its turn It sets, rearrange, to obtain and each window matrix WiCorresponding window mapping, wherein with being in interrogative vector set Formula (7) for example may be used to indicate in the corresponding window mapping of query term vector or common term vector of j-th position:
E=[c1;c2;…cj;…;cd] formula (7)
Wherein, E maps for window, indicates the timing of question sentence, cjFor the Feature Mapping for using j-th of filter to generate, d For the quantity of multiple filters.
(3) temporal aspect of question sentence is extracted
For Question Classification task, the classifying rules of problem dependent on above with following contextual information, therefore at this The window mapping of convolutional neural networks extraction is inputed into two-way length memory network in short-term in one embodiment of invention, with into one The temporal aspect of step extraction question sentence.
It, can be right in automatic identification question sentence by convolutional neural networks and the long overall network that memory network merges in short-term The maximally related part of Question Classification.It is by the specific practice that convolutional neural networks and long memory network in short-term merge:
Construct articulamentum attention matrix, the importance for characterizing generic word and interrogative in question sentence, so as to phase Important generic word and interrogative are paid close attention to, at the same reduce in question sentence characteristic extraction procedure information lose and Information redundancy.Formula (8) expression for example may be used in articulamentum attention matrix:
Wherein, AcFor articulamentum attention matrix,For i-th of element on the diagonal line of articulamentum attention matrix, Tanh is hyperbolic tangent function, and U is weighting parameters matrix, EjFor the jth every trade vector of window mapping, b is bias term, wherein adding The initial value of weight parameter matrix U is obtained by way of random initializtion, and is constantly updated during model training.
Articulamentum attention matrix is normalized, with the articulamentum attention matrix after being normalized, is returned Formula (9) for example may be used to indicate in i-th of element on the diagonal line of articulamentum attention matrix after one change:
Wherein,For i-th of element on the diagonal line of the articulamentum attention matrix after normalization, exp is index letter Number,For summing function, n is the number for the common term vector of interrogative vector sum for including in interrogative vector set;
Articulamentum attention matrix after normalization is multiplied with the window mapping that convolutional neural networks export, to be asked The temporal aspect of sentence, and using the temporal aspect of question sentence as the input of two-way length memory network in short-term.
Formula (10) for example may be used to indicate in temporal aspect corresponding with i-th of the generic word or interrogative of question sentence:
Wherein, GiFor with i-th of the generic word or the corresponding temporal aspect of interrogative in question sentence, characterize i-th of generic word Or the importance of interrogative.
Final classification device, by Question Classification, completes classification task according to temporal aspect.
In one embodiment of the present invention, a kind of deep learning problem of combination multi-layer attention mechanism point is provided Class system, the system may include:
Input layer, for inputting query term vector permutation matrix;
Convolutional neural networks, the window for extracting question sentence map;
Two-way length memory network in short-term, the temporal aspect for extracting question sentence;
Grader, for according to temporal aspect by Question Classification.
In one embodiment of this invention deep learning provided by the invention is demonstrated for following three data sets respectively The validity of question classification method.Three data sets are respectively:
(1) data set (hereinafter referred to as the data set is baidu) that Baidu laboratory provides, including 6205 datas, i.e., Including 6295 question sentences and corresponding answer, such as one of question sentence is:《Fundamentals of Machine Design》Whom doing for this this book be?Phase The answer answered is:Yang Kezhen, Cheng Guangyun, Li Zhong;
(2) Chinese Computer Federation (CCF) international natural language processing in 2016 and Chinese computing meeting evaluation of QA Open problem set (hereinafter referred to as the data set is NLPCC 2016), including 9604 datas, such as one of question sentence is: 《Journey to the West》Belong to what kind of book?Corresponding answer is:It is magical;
(3) CCF international natural language processings in 2017 and the open problem set of Chinese computing meeting evaluation of QA are (following Referred to as the data set is NLPCC 2017), including 9518 datas, such as one of question sentence is:The date of birth of Li Ming is What?Corresponding answer is:In January, 1963.
In order to be compared with the classification results of the deep learning question classification method provided in embodiments of the present invention, In one embodiment of this invention, following four model is also respectively adopted to have carried out point the problems in above-mentioned three kinds of data sets Class:
(1) support vector machines model, based on the SVM models using linear kernel function that Li et al. people proposes, using bag of words Model carries out text representation, and carries out weight calculation to word with word frequency-inverse document frequency TF-IDF algorithms, is effect Preferable traditional classification model;
(2) convolutional neural networks CNN models, the basic convolutional neural networks model proposed by kim et al., and convolutional layer, Pond layer and full articulamentum composition;
(3) a kind of long short-term memory LSTM models, two-way length memory models in short-term, are suitable for processing and predicted time sequence It is middle to be spaced and postpone relatively long text sequence;
(4) the long short-term memory C-LSTM models of convolution-, Zhou et al. is by convolutional neural networks and grows memory models phase in short-term In conjunction with inputing to long memory models in short-term after proposing feature by convolution, use novel vectorial rearrangement pattern.
The model hereinafter referred to as MCA- that the deep learning question classification method provided in embodiments of the present invention uses LSTM models, MCA-LSTM models increase the arrangement of the query term vector based on attention mechanism on the basis of C-LSTM models Matrix and articulamentum attention matrix, most important part in identification sentence that can be adaptive.
Deep learning question classification method provided by the invention and above-mentioned four kinds of models are to asking in above-mentioned three kinds of data sets The accuracy rate of the classification results of topic is as shown in table 1:
The contrast table of the accuracy of the classification results of 1 distinct methods of table
As it can be seen from table 1 the distributed number due to each major class data of different data collection is different, each model accuracy has not With the fluctuation of degree, but have no effect on each model comparison reference.Fig. 2 shows the deep learnings of one embodiment of the present invention to ask The accuracy rate for inscribing the classification results of the problems in sorting technique and other four kinds of models pair, three kinds of data sets compares block diagram.By scheming 2 as can be seen that MAC-LSTM shows preferable advantage on three data sets.
As seen from Figure 2, the MAC- used in the deep learning question classification method of embodiment provided by the invention The classification results of LSTM models are better than the conventional model SVM of height artificial design features, and artificial design features need a large amount of people Power is worked, and cannot promote other data sets and task, the deep learning Question Classification of embodiment provided by the invention well The MAC-LSTM models used in method have the ability that automatic study semantic sentence indicates, do not need any artificial extraction feature With better scalability;Single convolutional neural networks are compared with the result of long memory network in short-term, convolutional Neural net Network shows and number shorter in text to the accuracy higher of the classification results of NLPCC2016 data sets and NLPCC2017 data sets Convolution algorithm more can effectively extract text feature in the case of according to amount abundance;The MAC-LSTM models of C-LSTM models and the present invention Compared with single convolutional neural networks, long memory models in short-term, classifying quality is more preferable, illustrates that text is being extracted in the combination of model While local feature, the further depth of sequence signature can be excavated, be more advantageous to the understanding to problem;Compare MAC-LSTM moulds Type can be seen that multi-layer attention mechanism proposed by the present invention with C-LSTM models can be such that model is paid high attention in training process The clarification of objective information of classification demonstrates attention mechanism in Question Classification to preferably identify interrogative correlated characteristic Validity in task.
By the above embodiment, the deep learning question classification method and system pay attention to by query dictionary and input layer Torque battle array constructs interrogative vector set, enhances the semantic information of the interrogative in question sentence, and interrogative vector is gathered It inputs convolutional neural networks and extracts local feature, in addition this method and system have also merged convolutional Neural net by attention mechanism Network and long memory network, attention mechanism in short-term can be screened to the most useful convolution feature of Question Classification, be conveyed to two-way length Short-term memory network carries out high-level temporal aspect extraction, to effectively increase the precision of Question Classification.
The preferred embodiment of the present invention is described in detail above in association with attached drawing, still, the present invention is not limited to above-mentioned realities The detail in mode is applied, within the scope of the technical concept of the present invention, a variety of letters can be carried out to technical scheme of the present invention Monotropic type, these simple variants all belong to the scope of protection of the present invention.It is further to note that in above-mentioned specific implementation mode Described in each particular technique feature can be combined by any suitable means in the case of no contradiction, be The unnecessary repetition, the present invention is avoided no longer separately to illustrate various combinations of possible ways.
In addition, various embodiments of the present invention can be combined randomly, as long as it is without prejudice to originally The thought of invention, it should also be regarded as the disclosure of the present invention.

Claims (9)

1. a kind of deep learning question classification method of combination multi-layer attention mechanism, which is characterized in that including:
Interrogative vector set is built, the interrogative vector set includes that interrogative vector sum contains the general of query word information Logical term vector;
According to the interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;
It is mapped according to the window, extracts the temporal aspect of the question sentence;
According to the temporal aspect, by the Question Classification.
2. deep learning question classification method according to claim 1, which is characterized in that the structure interrogative vector set Conjunction specifically includes:
The interrogative in Chinese Question is collected, to build query dictionary;
Read the text message of the question sentence, and compared with the interrogative dictionary, with obtain the interrogative in the question sentence and Generic word, and term vector set is established, the term vector set includes generic word column vector and interrogative column vector;
Diagonal concern matrix is established, it is strong for characterizing the contextual correlation between the generic word and the interrogative and connection Degree;
The diagonal concern matrix is normalized, input layer attention matrix is obtained;
According to the input layer attention matrix, based on the attention Mechanism establishing interrogative vector set in deep learning, and Generate query term vector permutation matrix.
3. deep learning question classification method according to claim 2, which is characterized in that the diagonal concern matrix uses Formula (1) indicates:
Wherein, ArFor the diagonal concern matrix,For i-th of element on the diagonal line of the diagonal concern matrix, the is indicated The i generic word column vector wiWith the relative importance between the interrogative column vector e, f be the generic word arrange to Measure wiWith the inner product operation function of the interrogative column vector e, β is parameter vector, and e is the interrogative column vector, wiIt is i-th A generic word column vector;
I-th of element on the diagonal line of the input layer attention matrix is indicated using formula (2):
Wherein,For i-th of element on the diagonal line of the input layer attention matrix, indicate to be in the term vector set I-th of position the generic word column vector wiOr the phase between the interrogative column vector e and the interrogative column vector e To significance level,For it is described it is diagonal concern matrix diagonal line on i-th of element,For the diagonal concern matrix J-th of element on diagonal line, exp are exponential function,For summing function, n is the institute that includes in the term vector set State the quantity of generic word column vector and the interrogative column vector;
I-th of common term vector in the interrogative vector set is indicated using formula (3):
Wherein, xiFor i-th of common term vector in the interrogative vector set;
The query term vector in the interrogative vector set is indicated using formula (3 '):
Wherein, xjFor the query term vector in the interrogative vector set;
The query term vector permutation matrix is indicated using formula (4):
X=[x1, x2..., xn] formula (4)
Wherein, X is the query term vector permutation matrix, dimension Rl×n, l be the interrogative vector sum described in generic word to The dimension of amount, n be include in the interrogative vector set the interrogative vector sum described in common term vector number, tease Number representing row concatenation connects.
4. deep learning question classification method according to claim 3, which is characterized in that it is described according to the interrogative to Duration set, the window mapping that question sentence is extracted using convolution algorithm are specifically included:
According to the interrogative vector set, query described in the generic word vector sum in the interrogative vector set is established The window matrix of term vector;
Build multiple filters;
The multiple filter generates multiple Feature Mappings with the window matrix by convolution algorithm;
The multiple Feature Mapping transposition is reset, is mapped with obtaining the window.
5. deep learning question classification method according to claim 4, which is characterized in that the window matrix uses formula (5) it indicates:
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor in i-th position in the interrogative vector set the query term vector or the generic word The corresponding window matrix of vector, for extracting the semantic feature in the interrogative vector set, xiFor in i-th position The query term vector or the common term vector, xi+1For the query term vector in i+1 position or described common Term vector, xi+k-1For in the positions the i-th+k-1 the query term vector or the common term vector, k be empirical value, comma Indicate that concatenation connects;
The Feature Mapping that j-th of filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that the query term vector or the common term vector generate is acted on for j-th of filter, is used In the semantic feature for strengthening the query term vector in j-th position or the common term vector, f is convolution algorithm Activation primitive, m be filter parameter matrix, b is bias term;
The window mapping is indicated using formula (7):
E=[c1;c2;…cj;…;cd] formula (7)
Wherein, E maps for the window, characterizes the timing of the question sentence, cjFor the institute for using j-th of filter to generate Feature Mapping is stated, d is the quantity of the multiple filter.
6. deep learning question classification method according to claim 5, which is characterized in that described to be reflected according to the window It penetrates, the temporal aspect for extracting the question sentence specifically includes:
It is mapped according to the window, articulamentum attention matrix is established, for indicating the generic word in the question sentence and interrogative Importance;
The articulamentum attention matrix is normalized, the articulamentum attention matrix after being normalized;
According to the articulamentum attention matrix after window mapping and the normalization, the temporal aspect of the question sentence is extracted.
7. deep learning question classification method according to claim 6, which is characterized in that the articulamentum attention matrix It is indicated using formula (8):
Wherein, AcFor the articulamentum attention matrix,For j-th yuan on the diagonal line of the articulamentum attention matrix Element, tanh are hyperbolic tangent function, and U is weighting parameters matrix, EjFor j-th of the column vector mapped with the window, b is described Bias term;
J-th of element on the diagonal line of articulamentum attention matrix after the normalization is indicated using formula (9):
Wherein,For j-th of element on the diagonal line of the articulamentum attention matrix after the normalization, exp is index letter Number,For summing function, n be include in the interrogative vector set the interrogative vector sum described in common term vector Number;
The temporal aspect is indicated using formula (10):
Wherein, GjFor in j-th position in the interrogative vector set the query term vector or common term vector Corresponding temporal aspect characterizes the query term vector or described in j-th of position in the interrogative vector set The importance of common term vector.
8. deep learning question classification method according to claim 7, which is characterized in that the activation letter of the convolution algorithm Number f are line rectification function, it is described with the interrogative vector set in the query term vector in j-th position or The corresponding Feature Mapping of the common term vector is indicated using formula (11):
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in the interrogative vector set the query term vector or the generic word The corresponding Feature Mapping of vector, for strengthening the query term vector in j-th position or the common term vector Semantic feature, f be convolution algorithm activation primitive, m be filter parameter matrix, b is bias term, WiFor with the interrogative The query term vector or the corresponding window matrix of the common term vector in i-th of position in vector set, max (0, x) it is line rectification function.
9. a kind of deep learning Question Classification system of combination multi-layer attention mechanism, which is characterized in that including:
Input layer, for inputting query term vector permutation matrix;
Convolutional neural networks, the window for extracting question sentence map;
Two-way length memory network in short-term, the temporal aspect for extracting the question sentence;
Grader, for according to the temporal aspect by the Question Classification.
CN201810599036.4A 2018-06-12 2018-06-12 Deep learning problem classification method and system combining multi-level attention mechanism Active CN108804677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810599036.4A CN108804677B (en) 2018-06-12 2018-06-12 Deep learning problem classification method and system combining multi-level attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810599036.4A CN108804677B (en) 2018-06-12 2018-06-12 Deep learning problem classification method and system combining multi-level attention mechanism

Publications (2)

Publication Number Publication Date
CN108804677A true CN108804677A (en) 2018-11-13
CN108804677B CN108804677B (en) 2021-08-31

Family

ID=64085475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810599036.4A Active CN108804677B (en) 2018-06-12 2018-06-12 Deep learning problem classification method and system combining multi-level attention mechanism

Country Status (1)

Country Link
CN (1) CN108804677B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508377A (en) * 2018-11-26 2019-03-22 南京云思创智信息科技有限公司 Text feature, device, chat robots and storage medium based on Fusion Model
CN109766424A (en) * 2018-12-29 2019-05-17 安徽省泰岳祥升软件有限公司 It is a kind of to read the filter method and device for understanding model training data
CN110033022A (en) * 2019-03-08 2019-07-19 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of text
CN110148428A (en) * 2019-05-27 2019-08-20 哈尔滨工业大学 A kind of acoustic events recognition methods indicating study based on subspace
CN110209824A (en) * 2019-06-13 2019-09-06 中国科学院自动化研究所 Text emotion analysis method based on built-up pattern, system, device
CN110263160A (en) * 2019-05-29 2019-09-20 中国电子科技集团公司第二十八研究所 A kind of Question Classification method in computer question answering system
CN110472238A (en) * 2019-07-25 2019-11-19 昆明理工大学 Text snippet method based on level interaction attention
CN110727765A (en) * 2019-10-10 2020-01-24 合肥工业大学 Problem classification method and system based on multi-attention machine mechanism and storage medium
CN111159569A (en) * 2019-12-13 2020-05-15 西安交通大学 Social network user behavior prediction method based on user personalized features
CN111461394A (en) * 2020-02-24 2020-07-28 桂林电子科技大学 Student score prediction method based on deep matrix decomposition
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion
CN112861443A (en) * 2021-03-11 2021-05-28 合肥工业大学 Advanced learning fault diagnosis method integrated with priori knowledge
CN114462387A (en) * 2022-02-10 2022-05-10 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus
CN116363817A (en) * 2023-02-02 2023-06-30 淮阴工学院 Chemical plant dangerous area invasion early warning method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092596A (en) * 2017-04-24 2017-08-25 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN107590138A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of neural machine translation method based on part of speech notice mechanism
CN107766447A (en) * 2017-09-25 2018-03-06 浙江大学 It is a kind of to solve the method for video question and answer using multilayer notice network mechanism
CN108108771A (en) * 2018-01-03 2018-06-01 华南理工大学 Image answering method based on multiple dimensioned deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN107092596A (en) * 2017-04-24 2017-08-25 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR
CN107590138A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of neural machine translation method based on part of speech notice mechanism
CN107766447A (en) * 2017-09-25 2018-03-06 浙江大学 It is a kind of to solve the method for video question and answer using multilayer notice network mechanism
CN108108771A (en) * 2018-01-03 2018-06-01 华南理工大学 Image answering method based on multiple dimensioned deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNTING ZHOU 等: "A C-LSTM Neural Network for Text Classification", 《COMPUTER SCIENCE》 *
梁斌 等: "基于多注意力卷积神经网络的特定目标情感分析", 《计算机研究与发展》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508377A (en) * 2018-11-26 2019-03-22 南京云思创智信息科技有限公司 Text feature, device, chat robots and storage medium based on Fusion Model
CN109766424A (en) * 2018-12-29 2019-05-17 安徽省泰岳祥升软件有限公司 It is a kind of to read the filter method and device for understanding model training data
CN109766424B (en) * 2018-12-29 2021-11-19 安徽省泰岳祥升软件有限公司 Filtering method and device for reading understanding model training data
CN110033022A (en) * 2019-03-08 2019-07-19 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of text
CN110148428B (en) * 2019-05-27 2021-04-02 哈尔滨工业大学 Acoustic event identification method based on subspace representation learning
CN110148428A (en) * 2019-05-27 2019-08-20 哈尔滨工业大学 A kind of acoustic events recognition methods indicating study based on subspace
CN110263160A (en) * 2019-05-29 2019-09-20 中国电子科技集团公司第二十八研究所 A kind of Question Classification method in computer question answering system
CN110263160B (en) * 2019-05-29 2021-04-02 中国电子科技集团公司第二十八研究所 Question classification method in computer question-answering system
CN110209824A (en) * 2019-06-13 2019-09-06 中国科学院自动化研究所 Text emotion analysis method based on built-up pattern, system, device
CN110209824B (en) * 2019-06-13 2021-06-22 中国科学院自动化研究所 Text emotion analysis method, system and device based on combined model
CN110472238A (en) * 2019-07-25 2019-11-19 昆明理工大学 Text snippet method based on level interaction attention
CN110472238B (en) * 2019-07-25 2022-11-18 昆明理工大学 Text summarization method based on hierarchical interaction attention
CN110727765B (en) * 2019-10-10 2021-12-07 合肥工业大学 Problem classification method and system based on multi-attention machine mechanism and storage medium
CN110727765A (en) * 2019-10-10 2020-01-24 合肥工业大学 Problem classification method and system based on multi-attention machine mechanism and storage medium
CN111159569A (en) * 2019-12-13 2020-05-15 西安交通大学 Social network user behavior prediction method based on user personalized features
CN111461394A (en) * 2020-02-24 2020-07-28 桂林电子科技大学 Student score prediction method based on deep matrix decomposition
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion
CN112861443A (en) * 2021-03-11 2021-05-28 合肥工业大学 Advanced learning fault diagnosis method integrated with priori knowledge
CN112861443B (en) * 2021-03-11 2022-08-30 合肥工业大学 Advanced learning fault diagnosis method integrated with priori knowledge
CN114462387A (en) * 2022-02-10 2022-05-10 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus
CN114462387B (en) * 2022-02-10 2022-09-02 北京易聊科技有限公司 Sentence pattern automatic discrimination method under no-label corpus
CN116363817A (en) * 2023-02-02 2023-06-30 淮阴工学院 Chemical plant dangerous area invasion early warning method and system
CN116363817B (en) * 2023-02-02 2024-01-02 淮阴工学院 Chemical plant dangerous area invasion early warning method and system

Also Published As

Publication number Publication date
CN108804677B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN108804677A (en) In conjunction with the deep learning question classification method and system of multi-layer attention mechanism
Li et al. Heterogeneous ensemble for default prediction of peer-to-peer lending in China
Nie et al. Data-driven answer selection in community QA systems
CN110188272B (en) Community question-answering website label recommendation method based on user background
Wang et al. Research on Web text classification algorithm based on improved CNN and SVM
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
CN110083705A (en) A kind of multi-hop attention depth model, method, storage medium and terminal for target emotional semantic classification
CN109213864A (en) Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109582782A (en) A kind of Text Clustering Method based on Weakly supervised deep learning
CN109635083B (en) Document retrieval method for searching topic type query in TED (tele) lecture
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
Kilimci Sentiment analysis based direction prediction in bitcoin using deep learning algorithms and word embedding models
CN106778014A (en) A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network
CN108595602A (en) The question sentence file classification method combined with depth model based on shallow Model
CN109559300A (en) Image processing method, electronic equipment and computer readable storage medium
CN107402954A (en) Establish the method for order models, application process and device based on the model
CN111309887B (en) Method and system for training text key content extraction model
CN108304364A (en) keyword extracting method and device
CN109255012A (en) A kind of machine reads the implementation method and device of understanding
Shao et al. Collaborative learning for answer selection in question answering
CN108875034A (en) A kind of Chinese Text Categorization based on stratification shot and long term memory network
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN109960732A (en) A kind of discrete Hash cross-module state search method of depth and system based on robust supervision
Zhu et al. Crime event embedding with unsupervised feature selection
Ye et al. Multi-level composite neural networks for medical question answer matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant