CN108804677A - In conjunction with the deep learning question classification method and system of multi-layer attention mechanism - Google Patents
In conjunction with the deep learning question classification method and system of multi-layer attention mechanism Download PDFInfo
- Publication number
- CN108804677A CN108804677A CN201810599036.4A CN201810599036A CN108804677A CN 108804677 A CN108804677 A CN 108804677A CN 201810599036 A CN201810599036 A CN 201810599036A CN 108804677 A CN108804677 A CN 108804677A
- Authority
- CN
- China
- Prior art keywords
- vector
- interrogative
- matrix
- term vector
- window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of deep learning question classification methods of combination multi-layer attention mechanism, are related to technical field of information processing, which includes:Interrogative vector set is built, which includes the common term vector that interrogative vector sum contains query word information;According to interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;It is mapped according to window, extracts the temporal aspect of question sentence;According to temporal aspect, by Question Classification.This method enhances the semantic information of the interrogative in question sentence, and is merged convolutional neural networks and long memory models in short-term by the attention mechanism in deep learning, effectively improves the precision of Question Classification.
Description
Technical field
The present invention relates to technical field of information processing, and in particular, to a kind of depth of combination multi-layer attention mechanism
Problem concerning study sorting technique and system.
Background technology
There are two main classes for traditional problem sorting technique, and one kind is rule-based method, and another kind of is to be based on engineering
The method of habit.Rule-based method is mainly by manually establishing a large amount of rule bases, and the method for taking rule match is asked to realize
Topic classification.Method based on machine learning is that Question Classification is regarded as a kind of supervised learning task, by manually formulate feature,
It introduces external language material to be extended question text, while utilizing the machine learning such as classical support vector machines, naive Bayesian
Algorithm is classified.
In recent years, since depth learning technology is more and more ripe, be based on machine learning the problem of sorting technique gradually to depth
Degree study development.But with the deep development of question answering system research, there is higher to the precision of question classification method, practicability
Requirement.There are following two problems more in existing deep learning question classification method:
First:There is no pay close attention to query word information to the term vector that existing deep learning model uses.With
Plain text classification is different, and the type identification of question sentence is more dependent on interrogative this characteristic of division.This is because question text
Shorter, semantic information and word co-occurrence information are insufficient, so the interrogative in question sentence is affected to the result of Question Classification.
Second:Existing question classification method cannot merge the feature extraction ability of two kinds of models well.Existing depth
Degree problem concerning study sorting technique mainly uses convolutional neural networks and long memory models in short-term, two kinds of models respectively to have excellent lack
Point:Convolutional neural networks can preferably capture the complex mappings from data itself to high-level semantic, have and be far more than traditional machine
The ability to express of device learning model;And long memory models in short-term can model the temporal aspect of text, it is effective to utilize
The contextual information of question text.
Invention content
The object of the present invention is to provide a kind of deep learning question classification method of combination multi-layer attention mechanism, the party
Method enhances the semantic information of the interrogative in question sentence, and by the attention mechanism in deep learning by convolutional neural networks and
Long memory models in short-term are merged, and the precision of Question Classification is effectively improved.
To achieve the goals above, on the one hand, embodiments of the present invention provide a kind of combination multi-layer attention
The deep learning question classification method of mechanism, this method include:Interrogative vector set is built, interrogative vector set includes doubting
It asks term vector and contains the common term vector of query word information;According to interrogative vector set, asked using convolution algorithm extraction
The window mapping of sentence;It is mapped according to window, extracts the temporal aspect of question sentence;According to temporal aspect, by Question Classification.
Preferably, structure interrogative vector set specifically includes:The interrogative in Chinese Question is collected, to build interrogative
Allusion quotation;The text message of question sentence is read, and is compared with interrogative dictionary, to obtain the interrogative and generic word in question sentence, and is established
Term vector set, term vector set include generic word column vector and interrogative column vector;Diagonal concern matrix is established, for characterizing
Contextual correlation between generic word column vector and interrogative column vector and bonding strength;It is normalized to diagonally paying close attention to matrix
Processing obtains input layer attention matrix;According to input layer attention matrix, based on the attention Mechanism establishing in deep learning
Interrogative vector set, and generate query term vector permutation matrix.
Preferably, diagonally concern matrix is indicated using formula (1):
Wherein, ArDiagonally to pay close attention to matrix,Diagonally to pay close attention to i-th of element on the diagonal line of matrix, indicate i-th
Generic word column vector wiWith the relative importance between interrogative column vector e, f is generic word column vector wiWith interrogative arrange to
The inner product operation function of e is measured, β is parameter vector, and e is interrogative column vector, wiFor i-th of generic word column vector,;
I-th of element on the diagonal line of input layer attention matrix is indicated using formula (2):
Wherein,For i-th of element on the diagonal line of input layer attention matrix, the in term vector set is indicated
The generic word column vector w of i positioniOr the relative importance between interrogative column vector e and interrogative column vector e,For
I-th of element on the diagonal line of diagonal concern matrix,Diagonally to pay close attention to j-th of element on the diagonal line of matrix, exp is
Exponential function,For summing function, n is the number of the generic word column vector and interrogative column vector that include in term vector set
Amount;
I-th of common term vector in interrogative vector set is indicated using formula (3):
Wherein, xiFor i-th of common term vector in interrogative vector set;
Query term vector in interrogative vector set is indicated using formula (3 '):
Wherein, xjFor the query term vector in interrogative vector set;
Query term vector permutation matrix is indicated using formula (4):
X=[x1, x2..., xn] formula (4)
Wherein, X is query term vector permutation matrix, dimension Rl×n, l is the dimension of the common term vector of interrogative vector sum,
N is the number for the common term vector of interrogative vector sum for including in interrogative vector set, and comma represents row concatenation and connects.
Preferably, according to interrogative vector set, the window mapping that question sentence is extracted using convolution algorithm is specifically included:According to
Interrogative vector set establishes the window matrix of the generic word vector sum query term vector in interrogative vector set;Structure is more
A filter;Multiple filters generate multiple Feature Mappings with window matrix by convolution algorithm;By multiple Feature Mapping transposition
It resets, to obtain window mapping.
Preferably, window matrix is indicated using formula (5):
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor in i-th position in interrogative vector set query term vector or common term vector pair
The window matrix answered, for extracting the semantic feature in interrogative vector set, xiFor the query term vector in i-th of position
Or common term vector, xi+1For in i+1 position query term vector or common term vector, xi+k-1To be in the i-th+k-1
The query term vector or common term vector of position, k are empirical value, and comma indicates that concatenation connects;
The Feature Mapping that j-th of filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that query term vector or common term vector generate is acted on for j-th of filter, for strong
Change the semantic feature of query term vector or common term vector in j-th position, f is the activation primitive of convolution algorithm, and m is filter
Wave device parameter matrix, b are bias term;
Window mapping is indicated using formula (7):
E=[c1;c2;…ci;…;cd] formula (7)
Wherein, E maps for window, indicates the timing of question sentence, ciFor the Feature Mapping for using i-th of filter to generate, d
For the quantity of multiple filters.
Preferably, it is mapped according to window, the temporal aspect for extracting question sentence specifically includes:It is mapped according to window, establishes connection
Layer attention matrix, the importance for indicating generic word and interrogative in question sentence;Articulamentum attention matrix is returned
One change is handled, the articulamentum attention matrix after being normalized;Articulamentum attention after mapping and normalize according to window
Matrix extracts the temporal aspect of question sentence.
Preferably, articulamentum attention matrix is indicated using formula (8):
Wherein, AcFor articulamentum attention matrix,For j-th of element on the diagonal line of articulamentum attention matrix,
Tanh is hyperbolic tangent function, and U is weighting parameters matrix, EjFor j-th of column vector of window mapping, b is bias term;
J-th of element on the diagonal line of articulamentum attention matrix after normalization is indicated using formula (9):
Wherein,For j-th of element on the diagonal line of the articulamentum attention matrix after normalization, exp is index letter
Number,For summing function, n is the number for the common term vector of interrogative vector sum for including in interrogative vector set;
Temporal aspect is indicated using formula (10):
Wherein, GjFor in i-th position in interrogative vector set the query term vector or generic word to
It measures corresponding temporal aspect, characterizes the query term vector in j-th position in interrogative vector set or common term vector
Importance.
Preferably, which is characterized in that the set function f of convolution algorithm is line rectification function, with interrogative vector set
In query term vector in j-th position or the corresponding Feature Mapping of common term vector indicated using formula (11):
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in interrogative vector set query term vector or common term vector pair
The Feature Mapping answered, the semantic feature for strengthening the query term vector or common term vector that are in j-th position, f is convolution
The activation primitive of operation, m are filter parameter matrix, and b is bias term, WiFor in interrogative vector set be in i-th
The corresponding window matrix of query term vector or common term vector of position, max (0, x) are line rectification function.
On the other hand, embodiments of the present invention provide a kind of depth of combination multi-layer attention mechanism
Practise Question Classification system, which is characterized in that including:Input layer, for inputting query term vector permutation matrix;Convolutional Neural net
Network, the window for extracting question sentence map;Two-way length memory network in short-term, the temporal aspect for extracting question sentence;Grader is used
According to temporal aspect by Question Classification.
Through the above technical solutions, the deep learning question classification method and system pay attention to by query dictionary and input layer
Torque battle array constructs interrogative vector set, enhances the semantic information of the interrogative in question sentence, and interrogative vector is gathered
It inputs convolutional neural networks and extracts local feature, in addition this method and system have also merged convolutional Neural net by attention mechanism
Network and long memory network, attention mechanism in short-term can be screened to the most useful convolution feature of Question Classification, be conveyed to two-way length
Short-term memory network carries out high-level temporal aspect extraction, to effectively increase the precision of Question Classification.
Other features and advantages of the present invention will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is to be used to provide further understanding of the present invention, an and part for constitution instruction, with following tool
Body embodiment is used to explain the present invention together, but is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the deep learning Question Classification side of combination multi-layer attention mechanism according to an embodiment of the present invention
The flow chart of method;
Fig. 2 shows the deep learning question classification method of one embodiment of the present invention and other three kinds of four kinds of models pair
The accuracy rate of the classification results of the problems in data set compares block diagram.
Specific implementation mode
The specific implementation mode of the present invention is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched
The specific implementation mode stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
Question Classification is the basic component part of question answering system research, and the height of precision directly affects system to natural language
Say the quality of understanding effect.On the one hand, Question Classification can be subsequent information retrieval by the target answer type of determining problem
Semantic restriction and constraint are provided with answer extracting, the seeking scope of candidate answers is reduced, improves the accuracy rate of question answering system.Example
Such as, " where is most notable emblem restaurant?" be one about place the problem of, when follow-up answer extracting, only needs matching candidate to answer
Venue type entity in case, accurately and effectively improves question and answer precision.Still further aspect, Question Classification can be question and answer
The selection strategy of pattern provides foundation, for example, " what the characteristics of authentic Anhui cuisine is?" it is a description class problem, generate answer
When should lay particular emphasis on description and introduction to substance feature, be supplied to user most want obtain answer knowledge.
Fig. 1 is the deep learning Question Classification side of combination multi-layer attention mechanism according to an embodiment of the present invention
The flow chart of method.As shown in Figure 1, in one embodiment of the present invention, a kind of combination multi-layer attention mechanism is provided
Deep learning question classification method, this method may include:
In step S101, interrogative vector set is built, interrogative vector set includes that interrogative vector sum contains
The common term vector of query word information;
In step s 102, according to interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;
In step s 103, it is mapped according to window, extracts the temporal aspect of question sentence;
In step S104, according to temporal aspect, by Question Classification.
Above-mentioned steps constitute deep learning Question Classification model, and error in classification will be calculated after Question Classification by the model,
In the case where error in classification is more than preset value, the parameter of the model is repeatedly updated until classifying quality (result) is restrained.
The deep learning question classification method of combination multi-layer attention mechanism provided by the invention possesses length in short-term simultaneously
Memory network captures the advantage of global context and convolutional neural networks capture question sentence local feature, and possesses to the weight such as interrogative
The feature vector of fusion is finally added to completion Question Classification task in grader by the Extracting Ability of point feature.The depth
Question classification method is practised to specifically include:
(1) interrogative vector set is built
Collect the interrogative in Chinese Question, to build query dictionary, such as " who ", " where ", " how many " etc. belong to
Interrogative.The text message of question sentence is read, and is compared with interrogative dictionary, to obtain the interrogative and generic word in question sentence, and
Term vector set is established, term vector set includes generic word column vector and interrogative column vector.Diagonal concern matrix is established, is used for
Characterize the contextual correlation and bonding strength between generic word and interrogative.Formula (1) for example may be used in diagonal concern matrix
It indicates:
Wherein, ArDiagonally to pay close attention to matrix,Diagonally to pay close attention to i-th of element on the diagonal line of matrix, indicate i-th
Generic word column vector wiWith the relative importance between interrogative column vector e, f is generic word column vector wiWith interrogative arrange to
The inner product operation function of e is measured, β is parameter vector, and e is query term vector, wiFor i-th of generic word column vector.Parameter vector β's
Value for example can be with random initializtion, and is constantly updated using gradient descent method during model training.
It is normalized to diagonally paying close attention to matrix, obtains input layer attention matrix, input layer attention matrix
Formula (2) for example may be used to indicate in i-th of element on diagonal line:
Wherein,For i-th of element on the diagonal line of input layer attention matrix, the in term vector set is indicated
The generic word column vector w of i positioniOr the relative importance between interrogative column vector e and interrogative column vector e,For
I-th of element on the diagonal line of diagonal concern matrix,Diagonally to pay close attention to j-th of element on the diagonal line of matrix, exp is
Exponential function,For summing function, n is the number of the generic word column vector and interrogative column vector that include in term vector set
Amount.
According to input layer attention matrix, interrogative vector set is established, and generates query term vector permutation matrix.Query
Formula (3) for example may be used to indicate in i-th of common term vector in term vector set:
Wherein, xiFor i-th of common term vector in interrogative vector set;
Formula (3 ') for example may be used to indicate in query term vector in interrogative vector set:
Wherein, xjFor the query term vector in interrogative vector set;
Formula (4) for example may be used to indicate in query term vector permutation matrix:
X=[x1, x2..., xn] formula (4)
Wherein, X is query term vector permutation matrix, dimension Rl×n, l is the dimension of the common term vector of interrogative vector sum,
N is the number for the common term vector of interrogative vector sum for including in interrogative vector set, and comma represents row concatenation and connects.It doubts
Ask that the generic word for including in the number of the common term vector of interrogative vector sum for including in term vector set and term vector set arranges
The quantity of vector sum interrogative column vector is identical.
(2) feature extraction is carried out by convolution algorithm
Convolutional neural networks can extract local feature by convolution algorithm from question sentence, and local feature can be asked
The local semantic feature between adjacent generic word or interrogative in sentence.In one embodiment of the present invention, using d
The filter of same size closes sliding in interrogative vector set, to extract generic word and the query of the different location in question sentence
The text message of word obtains interrogative semantic dependency relations Feature Mapping.Filter can for example be expressed as m ∈ Rl×k, i.e. convolution
Filter m is the matrix of a l rows k row.The filter for example can be one-dimensional filtering device, that is to say, that the filter is only one
It is slided in a dimension.
With i-th of the generic word or the corresponding window matrix W of interrogative in question sentenceiBy k common term vectors or k
Generic word vector sum interrogative vector forms, such as can be indicated by formula (5):
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor with i-th of the generic word or the corresponding window matrix of interrogative in question sentence, for extracting interrogative
Semantic feature in vector set, xiFor be in interrogative vector set i-th position query term vector or generic word to
Amount, xi+1To be in the query term vector of i+1 position or common term vector, x in interrogative vector seti+k-1For interrogative
Query term vector or common term vector, k in a positions the i-th+k-1 in vector set are empirical value, and comma indicates vector grade
Connection.
Convolution filter m carries out convolution algorithm with multiple window matrixes respectively, multiple Feature Mappings is generated, wherein j-th
The Feature Mapping that filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that query term vector or common term vector generate is acted on for filter, at reinforcing
In the query term vector of j-th position or the semantic feature of common term vector, f is the activation primitive of convolution algorithm, and m is filter
Parameter matrix, b are bias term, and the initial value of matrix m and b for random initializtion by obtaining, and during model training
It constantly updates.
Due to studies have shown that line rectification function has unilateral inhibition and relatively broad excited boundary, and can lead to
Introducing network sparsity is crossed to obtain faster feature learning rate, therefore convolution algorithm in one embodiment of the present invention
Activation primitive f for example can be line rectification function, then with the interrogative in j-th of position in interrogative vector set
Formula (11) for example may be used to indicate in vector or the common corresponding Feature Mapping of term vector:
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in interrogative vector set query term vector or common term vector pair
The Feature Mapping answered, the semantic feature for strengthening the query term vector or common term vector that are in j-th position, f is convolution
The activation primitive of operation, m are filter parameter matrix, and b is bias term, WiFor in interrogative vector set be in i-th
The corresponding window matrix of query term vector or common term vector of position, max (0, x) are line rectification function.
To retain the sequential character of question sentence, for the Feature Mapping c of d equal lengthj, j=1,2 ... d, by its turn
It sets, rearrange, to obtain and each window matrix WiCorresponding window mapping, wherein with being in interrogative vector set
Formula (7) for example may be used to indicate in the corresponding window mapping of query term vector or common term vector of j-th position:
E=[c1;c2;…cj;…;cd] formula (7)
Wherein, E maps for window, indicates the timing of question sentence, cjFor the Feature Mapping for using j-th of filter to generate, d
For the quantity of multiple filters.
(3) temporal aspect of question sentence is extracted
For Question Classification task, the classifying rules of problem dependent on above with following contextual information, therefore at this
The window mapping of convolutional neural networks extraction is inputed into two-way length memory network in short-term in one embodiment of invention, with into one
The temporal aspect of step extraction question sentence.
It, can be right in automatic identification question sentence by convolutional neural networks and the long overall network that memory network merges in short-term
The maximally related part of Question Classification.It is by the specific practice that convolutional neural networks and long memory network in short-term merge:
Construct articulamentum attention matrix, the importance for characterizing generic word and interrogative in question sentence, so as to phase
Important generic word and interrogative are paid close attention to, at the same reduce in question sentence characteristic extraction procedure information lose and
Information redundancy.Formula (8) expression for example may be used in articulamentum attention matrix:
Wherein, AcFor articulamentum attention matrix,For i-th of element on the diagonal line of articulamentum attention matrix,
Tanh is hyperbolic tangent function, and U is weighting parameters matrix, EjFor the jth every trade vector of window mapping, b is bias term, wherein adding
The initial value of weight parameter matrix U is obtained by way of random initializtion, and is constantly updated during model training.
Articulamentum attention matrix is normalized, with the articulamentum attention matrix after being normalized, is returned
Formula (9) for example may be used to indicate in i-th of element on the diagonal line of articulamentum attention matrix after one change:
Wherein,For i-th of element on the diagonal line of the articulamentum attention matrix after normalization, exp is index letter
Number,For summing function, n is the number for the common term vector of interrogative vector sum for including in interrogative vector set;
Articulamentum attention matrix after normalization is multiplied with the window mapping that convolutional neural networks export, to be asked
The temporal aspect of sentence, and using the temporal aspect of question sentence as the input of two-way length memory network in short-term.
Formula (10) for example may be used to indicate in temporal aspect corresponding with i-th of the generic word or interrogative of question sentence:
Wherein, GiFor with i-th of the generic word or the corresponding temporal aspect of interrogative in question sentence, characterize i-th of generic word
Or the importance of interrogative.
Final classification device, by Question Classification, completes classification task according to temporal aspect.
In one embodiment of the present invention, a kind of deep learning problem of combination multi-layer attention mechanism point is provided
Class system, the system may include:
Input layer, for inputting query term vector permutation matrix;
Convolutional neural networks, the window for extracting question sentence map;
Two-way length memory network in short-term, the temporal aspect for extracting question sentence;
Grader, for according to temporal aspect by Question Classification.
In one embodiment of this invention deep learning provided by the invention is demonstrated for following three data sets respectively
The validity of question classification method.Three data sets are respectively:
(1) data set (hereinafter referred to as the data set is baidu) that Baidu laboratory provides, including 6205 datas, i.e.,
Including 6295 question sentences and corresponding answer, such as one of question sentence is:《Fundamentals of Machine Design》Whom doing for this this book be?Phase
The answer answered is:Yang Kezhen, Cheng Guangyun, Li Zhong;
(2) Chinese Computer Federation (CCF) international natural language processing in 2016 and Chinese computing meeting evaluation of QA
Open problem set (hereinafter referred to as the data set is NLPCC 2016), including 9604 datas, such as one of question sentence is:
《Journey to the West》Belong to what kind of book?Corresponding answer is:It is magical;
(3) CCF international natural language processings in 2017 and the open problem set of Chinese computing meeting evaluation of QA are (following
Referred to as the data set is NLPCC 2017), including 9518 datas, such as one of question sentence is:The date of birth of Li Ming is
What?Corresponding answer is:In January, 1963.
In order to be compared with the classification results of the deep learning question classification method provided in embodiments of the present invention,
In one embodiment of this invention, following four model is also respectively adopted to have carried out point the problems in above-mentioned three kinds of data sets
Class:
(1) support vector machines model, based on the SVM models using linear kernel function that Li et al. people proposes, using bag of words
Model carries out text representation, and carries out weight calculation to word with word frequency-inverse document frequency TF-IDF algorithms, is effect
Preferable traditional classification model;
(2) convolutional neural networks CNN models, the basic convolutional neural networks model proposed by kim et al., and convolutional layer,
Pond layer and full articulamentum composition;
(3) a kind of long short-term memory LSTM models, two-way length memory models in short-term, are suitable for processing and predicted time sequence
It is middle to be spaced and postpone relatively long text sequence;
(4) the long short-term memory C-LSTM models of convolution-, Zhou et al. is by convolutional neural networks and grows memory models phase in short-term
In conjunction with inputing to long memory models in short-term after proposing feature by convolution, use novel vectorial rearrangement pattern.
The model hereinafter referred to as MCA- that the deep learning question classification method provided in embodiments of the present invention uses
LSTM models, MCA-LSTM models increase the arrangement of the query term vector based on attention mechanism on the basis of C-LSTM models
Matrix and articulamentum attention matrix, most important part in identification sentence that can be adaptive.
Deep learning question classification method provided by the invention and above-mentioned four kinds of models are to asking in above-mentioned three kinds of data sets
The accuracy rate of the classification results of topic is as shown in table 1:
The contrast table of the accuracy of the classification results of 1 distinct methods of table
As it can be seen from table 1 the distributed number due to each major class data of different data collection is different, each model accuracy has not
With the fluctuation of degree, but have no effect on each model comparison reference.Fig. 2 shows the deep learnings of one embodiment of the present invention to ask
The accuracy rate for inscribing the classification results of the problems in sorting technique and other four kinds of models pair, three kinds of data sets compares block diagram.By scheming
2 as can be seen that MAC-LSTM shows preferable advantage on three data sets.
As seen from Figure 2, the MAC- used in the deep learning question classification method of embodiment provided by the invention
The classification results of LSTM models are better than the conventional model SVM of height artificial design features, and artificial design features need a large amount of people
Power is worked, and cannot promote other data sets and task, the deep learning Question Classification of embodiment provided by the invention well
The MAC-LSTM models used in method have the ability that automatic study semantic sentence indicates, do not need any artificial extraction feature
With better scalability;Single convolutional neural networks are compared with the result of long memory network in short-term, convolutional Neural net
Network shows and number shorter in text to the accuracy higher of the classification results of NLPCC2016 data sets and NLPCC2017 data sets
Convolution algorithm more can effectively extract text feature in the case of according to amount abundance;The MAC-LSTM models of C-LSTM models and the present invention
Compared with single convolutional neural networks, long memory models in short-term, classifying quality is more preferable, illustrates that text is being extracted in the combination of model
While local feature, the further depth of sequence signature can be excavated, be more advantageous to the understanding to problem;Compare MAC-LSTM moulds
Type can be seen that multi-layer attention mechanism proposed by the present invention with C-LSTM models can be such that model is paid high attention in training process
The clarification of objective information of classification demonstrates attention mechanism in Question Classification to preferably identify interrogative correlated characteristic
Validity in task.
By the above embodiment, the deep learning question classification method and system pay attention to by query dictionary and input layer
Torque battle array constructs interrogative vector set, enhances the semantic information of the interrogative in question sentence, and interrogative vector is gathered
It inputs convolutional neural networks and extracts local feature, in addition this method and system have also merged convolutional Neural net by attention mechanism
Network and long memory network, attention mechanism in short-term can be screened to the most useful convolution feature of Question Classification, be conveyed to two-way length
Short-term memory network carries out high-level temporal aspect extraction, to effectively increase the precision of Question Classification.
The preferred embodiment of the present invention is described in detail above in association with attached drawing, still, the present invention is not limited to above-mentioned realities
The detail in mode is applied, within the scope of the technical concept of the present invention, a variety of letters can be carried out to technical scheme of the present invention
Monotropic type, these simple variants all belong to the scope of protection of the present invention.It is further to note that in above-mentioned specific implementation mode
Described in each particular technique feature can be combined by any suitable means in the case of no contradiction, be
The unnecessary repetition, the present invention is avoided no longer separately to illustrate various combinations of possible ways.
In addition, various embodiments of the present invention can be combined randomly, as long as it is without prejudice to originally
The thought of invention, it should also be regarded as the disclosure of the present invention.
Claims (9)
1. a kind of deep learning question classification method of combination multi-layer attention mechanism, which is characterized in that including:
Interrogative vector set is built, the interrogative vector set includes that interrogative vector sum contains the general of query word information
Logical term vector;
According to the interrogative vector set, the window that question sentence is extracted using convolution algorithm is mapped;
It is mapped according to the window, extracts the temporal aspect of the question sentence;
According to the temporal aspect, by the Question Classification.
2. deep learning question classification method according to claim 1, which is characterized in that the structure interrogative vector set
Conjunction specifically includes:
The interrogative in Chinese Question is collected, to build query dictionary;
Read the text message of the question sentence, and compared with the interrogative dictionary, with obtain the interrogative in the question sentence and
Generic word, and term vector set is established, the term vector set includes generic word column vector and interrogative column vector;
Diagonal concern matrix is established, it is strong for characterizing the contextual correlation between the generic word and the interrogative and connection
Degree;
The diagonal concern matrix is normalized, input layer attention matrix is obtained;
According to the input layer attention matrix, based on the attention Mechanism establishing interrogative vector set in deep learning, and
Generate query term vector permutation matrix.
3. deep learning question classification method according to claim 2, which is characterized in that the diagonal concern matrix uses
Formula (1) indicates:
Wherein, ArFor the diagonal concern matrix,For i-th of element on the diagonal line of the diagonal concern matrix, the is indicated
The i generic word column vector wiWith the relative importance between the interrogative column vector e, f be the generic word arrange to
Measure wiWith the inner product operation function of the interrogative column vector e, β is parameter vector, and e is the interrogative column vector, wiIt is i-th
A generic word column vector;
I-th of element on the diagonal line of the input layer attention matrix is indicated using formula (2):
Wherein,For i-th of element on the diagonal line of the input layer attention matrix, indicate to be in the term vector set
I-th of position the generic word column vector wiOr the phase between the interrogative column vector e and the interrogative column vector e
To significance level,For it is described it is diagonal concern matrix diagonal line on i-th of element,For the diagonal concern matrix
J-th of element on diagonal line, exp are exponential function,For summing function, n is the institute that includes in the term vector set
State the quantity of generic word column vector and the interrogative column vector;
I-th of common term vector in the interrogative vector set is indicated using formula (3):
Wherein, xiFor i-th of common term vector in the interrogative vector set;
The query term vector in the interrogative vector set is indicated using formula (3 '):
Wherein, xjFor the query term vector in the interrogative vector set;
The query term vector permutation matrix is indicated using formula (4):
X=[x1, x2..., xn] formula (4)
Wherein, X is the query term vector permutation matrix, dimension Rl×n, l be the interrogative vector sum described in generic word to
The dimension of amount, n be include in the interrogative vector set the interrogative vector sum described in common term vector number, tease
Number representing row concatenation connects.
4. deep learning question classification method according to claim 3, which is characterized in that it is described according to the interrogative to
Duration set, the window mapping that question sentence is extracted using convolution algorithm are specifically included:
According to the interrogative vector set, query described in the generic word vector sum in the interrogative vector set is established
The window matrix of term vector;
Build multiple filters;
The multiple filter generates multiple Feature Mappings with the window matrix by convolution algorithm;
The multiple Feature Mapping transposition is reset, is mapped with obtaining the window.
5. deep learning question classification method according to claim 4, which is characterized in that the window matrix uses formula
(5) it indicates:
Wi=[xi, xi+1..., xi+k-1] formula (5)
Wherein, WiFor in i-th position in the interrogative vector set the query term vector or the generic word
The corresponding window matrix of vector, for extracting the semantic feature in the interrogative vector set, xiFor in i-th position
The query term vector or the common term vector, xi+1For the query term vector in i+1 position or described common
Term vector, xi+k-1For in the positions the i-th+k-1 the query term vector or the common term vector, k be empirical value, comma
Indicate that concatenation connects;
The Feature Mapping that j-th of filter generates is indicated using formula (6):
cj=f (Wi* m+b) formula (6)
Wherein, cjThe Feature Mapping that the query term vector or the common term vector generate is acted on for j-th of filter, is used
In the semantic feature for strengthening the query term vector in j-th position or the common term vector, f is convolution algorithm
Activation primitive, m be filter parameter matrix, b is bias term;
The window mapping is indicated using formula (7):
E=[c1;c2;…cj;…;cd] formula (7)
Wherein, E maps for the window, characterizes the timing of the question sentence, cjFor the institute for using j-th of filter to generate
Feature Mapping is stated, d is the quantity of the multiple filter.
6. deep learning question classification method according to claim 5, which is characterized in that described to be reflected according to the window
It penetrates, the temporal aspect for extracting the question sentence specifically includes:
It is mapped according to the window, articulamentum attention matrix is established, for indicating the generic word in the question sentence and interrogative
Importance;
The articulamentum attention matrix is normalized, the articulamentum attention matrix after being normalized;
According to the articulamentum attention matrix after window mapping and the normalization, the temporal aspect of the question sentence is extracted.
7. deep learning question classification method according to claim 6, which is characterized in that the articulamentum attention matrix
It is indicated using formula (8):
Wherein, AcFor the articulamentum attention matrix,For j-th yuan on the diagonal line of the articulamentum attention matrix
Element, tanh are hyperbolic tangent function, and U is weighting parameters matrix, EjFor j-th of the column vector mapped with the window, b is described
Bias term;
J-th of element on the diagonal line of articulamentum attention matrix after the normalization is indicated using formula (9):
Wherein,For j-th of element on the diagonal line of the articulamentum attention matrix after the normalization, exp is index letter
Number,For summing function, n be include in the interrogative vector set the interrogative vector sum described in common term vector
Number;
The temporal aspect is indicated using formula (10):
Wherein, GjFor in j-th position in the interrogative vector set the query term vector or common term vector
Corresponding temporal aspect characterizes the query term vector or described in j-th of position in the interrogative vector set
The importance of common term vector.
8. deep learning question classification method according to claim 7, which is characterized in that the activation letter of the convolution algorithm
Number f are line rectification function, it is described with the interrogative vector set in the query term vector in j-th position or
The corresponding Feature Mapping of the common term vector is indicated using formula (11):
cj=f (Wi* m+b)=max (0, Wi* m+b) formula (11)
Wherein, cjFor in j-th position in the interrogative vector set the query term vector or the generic word
The corresponding Feature Mapping of vector, for strengthening the query term vector in j-th position or the common term vector
Semantic feature, f be convolution algorithm activation primitive, m be filter parameter matrix, b is bias term, WiFor with the interrogative
The query term vector or the corresponding window matrix of the common term vector in i-th of position in vector set, max
(0, x) it is line rectification function.
9. a kind of deep learning Question Classification system of combination multi-layer attention mechanism, which is characterized in that including:
Input layer, for inputting query term vector permutation matrix;
Convolutional neural networks, the window for extracting question sentence map;
Two-way length memory network in short-term, the temporal aspect for extracting the question sentence;
Grader, for according to the temporal aspect by the Question Classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810599036.4A CN108804677B (en) | 2018-06-12 | 2018-06-12 | Deep learning problem classification method and system combining multi-level attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810599036.4A CN108804677B (en) | 2018-06-12 | 2018-06-12 | Deep learning problem classification method and system combining multi-level attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804677A true CN108804677A (en) | 2018-11-13 |
CN108804677B CN108804677B (en) | 2021-08-31 |
Family
ID=64085475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810599036.4A Active CN108804677B (en) | 2018-06-12 | 2018-06-12 | Deep learning problem classification method and system combining multi-level attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804677B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508377A (en) * | 2018-11-26 | 2019-03-22 | 南京云思创智信息科技有限公司 | Text feature, device, chat robots and storage medium based on Fusion Model |
CN109766424A (en) * | 2018-12-29 | 2019-05-17 | 安徽省泰岳祥升软件有限公司 | It is a kind of to read the filter method and device for understanding model training data |
CN110033022A (en) * | 2019-03-08 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of text |
CN110148428A (en) * | 2019-05-27 | 2019-08-20 | 哈尔滨工业大学 | A kind of acoustic events recognition methods indicating study based on subspace |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110263160A (en) * | 2019-05-29 | 2019-09-20 | 中国电子科技集团公司第二十八研究所 | A kind of Question Classification method in computer question answering system |
CN110472238A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Text snippet method based on level interaction attention |
CN110727765A (en) * | 2019-10-10 | 2020-01-24 | 合肥工业大学 | Problem classification method and system based on multi-attention machine mechanism and storage medium |
CN111159569A (en) * | 2019-12-13 | 2020-05-15 | 西安交通大学 | Social network user behavior prediction method based on user personalized features |
CN111461394A (en) * | 2020-02-24 | 2020-07-28 | 桂林电子科技大学 | Student score prediction method based on deep matrix decomposition |
CN111914097A (en) * | 2020-07-13 | 2020-11-10 | 吉林大学 | Entity extraction method and device based on attention mechanism and multi-level feature fusion |
CN112861443A (en) * | 2021-03-11 | 2021-05-28 | 合肥工业大学 | Advanced learning fault diagnosis method integrated with priori knowledge |
CN114462387A (en) * | 2022-02-10 | 2022-05-10 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
CN116363817A (en) * | 2023-02-02 | 2023-06-30 | 淮阴工学院 | Chemical plant dangerous area invasion early warning method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092596A (en) * | 2017-04-24 | 2017-08-25 | 重庆邮电大学 | Text emotion analysis method based on attention CNNs and CCR |
US20170262995A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
CN107590138A (en) * | 2017-08-18 | 2018-01-16 | 浙江大学 | A kind of neural machine translation method based on part of speech notice mechanism |
CN107766447A (en) * | 2017-09-25 | 2018-03-06 | 浙江大学 | It is a kind of to solve the method for video question and answer using multilayer notice network mechanism |
CN108108771A (en) * | 2018-01-03 | 2018-06-01 | 华南理工大学 | Image answering method based on multiple dimensioned deep learning |
-
2018
- 2018-06-12 CN CN201810599036.4A patent/CN108804677B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170262995A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
CN107092596A (en) * | 2017-04-24 | 2017-08-25 | 重庆邮电大学 | Text emotion analysis method based on attention CNNs and CCR |
CN107590138A (en) * | 2017-08-18 | 2018-01-16 | 浙江大学 | A kind of neural machine translation method based on part of speech notice mechanism |
CN107766447A (en) * | 2017-09-25 | 2018-03-06 | 浙江大学 | It is a kind of to solve the method for video question and answer using multilayer notice network mechanism |
CN108108771A (en) * | 2018-01-03 | 2018-06-01 | 华南理工大学 | Image answering method based on multiple dimensioned deep learning |
Non-Patent Citations (2)
Title |
---|
CHUNTING ZHOU 等: "A C-LSTM Neural Network for Text Classification", 《COMPUTER SCIENCE》 * |
梁斌 等: "基于多注意力卷积神经网络的特定目标情感分析", 《计算机研究与发展》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508377A (en) * | 2018-11-26 | 2019-03-22 | 南京云思创智信息科技有限公司 | Text feature, device, chat robots and storage medium based on Fusion Model |
CN109766424A (en) * | 2018-12-29 | 2019-05-17 | 安徽省泰岳祥升软件有限公司 | It is a kind of to read the filter method and device for understanding model training data |
CN109766424B (en) * | 2018-12-29 | 2021-11-19 | 安徽省泰岳祥升软件有限公司 | Filtering method and device for reading understanding model training data |
CN110033022A (en) * | 2019-03-08 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of text |
CN110148428B (en) * | 2019-05-27 | 2021-04-02 | 哈尔滨工业大学 | Acoustic event identification method based on subspace representation learning |
CN110148428A (en) * | 2019-05-27 | 2019-08-20 | 哈尔滨工业大学 | A kind of acoustic events recognition methods indicating study based on subspace |
CN110263160A (en) * | 2019-05-29 | 2019-09-20 | 中国电子科技集团公司第二十八研究所 | A kind of Question Classification method in computer question answering system |
CN110263160B (en) * | 2019-05-29 | 2021-04-02 | 中国电子科技集团公司第二十八研究所 | Question classification method in computer question-answering system |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110209824B (en) * | 2019-06-13 | 2021-06-22 | 中国科学院自动化研究所 | Text emotion analysis method, system and device based on combined model |
CN110472238A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Text snippet method based on level interaction attention |
CN110472238B (en) * | 2019-07-25 | 2022-11-18 | 昆明理工大学 | Text summarization method based on hierarchical interaction attention |
CN110727765B (en) * | 2019-10-10 | 2021-12-07 | 合肥工业大学 | Problem classification method and system based on multi-attention machine mechanism and storage medium |
CN110727765A (en) * | 2019-10-10 | 2020-01-24 | 合肥工业大学 | Problem classification method and system based on multi-attention machine mechanism and storage medium |
CN111159569A (en) * | 2019-12-13 | 2020-05-15 | 西安交通大学 | Social network user behavior prediction method based on user personalized features |
CN111461394A (en) * | 2020-02-24 | 2020-07-28 | 桂林电子科技大学 | Student score prediction method based on deep matrix decomposition |
CN111914097A (en) * | 2020-07-13 | 2020-11-10 | 吉林大学 | Entity extraction method and device based on attention mechanism and multi-level feature fusion |
CN112861443A (en) * | 2021-03-11 | 2021-05-28 | 合肥工业大学 | Advanced learning fault diagnosis method integrated with priori knowledge |
CN112861443B (en) * | 2021-03-11 | 2022-08-30 | 合肥工业大学 | Advanced learning fault diagnosis method integrated with priori knowledge |
CN114462387A (en) * | 2022-02-10 | 2022-05-10 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
CN114462387B (en) * | 2022-02-10 | 2022-09-02 | 北京易聊科技有限公司 | Sentence pattern automatic discrimination method under no-label corpus |
CN116363817A (en) * | 2023-02-02 | 2023-06-30 | 淮阴工学院 | Chemical plant dangerous area invasion early warning method and system |
CN116363817B (en) * | 2023-02-02 | 2024-01-02 | 淮阴工学院 | Chemical plant dangerous area invasion early warning method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108804677B (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804677A (en) | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism | |
Li et al. | Heterogeneous ensemble for default prediction of peer-to-peer lending in China | |
Nie et al. | Data-driven answer selection in community QA systems | |
CN110188272B (en) | Community question-answering website label recommendation method based on user background | |
Wang et al. | Research on Web text classification algorithm based on improved CNN and SVM | |
CN109558487A (en) | Document Classification Method based on the more attention networks of hierarchy | |
CN110083705A (en) | A kind of multi-hop attention depth model, method, storage medium and terminal for target emotional semantic classification | |
CN109213864A (en) | Criminal case anticipation system and its building and pre-judging method based on deep learning | |
CN109582782A (en) | A kind of Text Clustering Method based on Weakly supervised deep learning | |
CN109635083B (en) | Document retrieval method for searching topic type query in TED (tele) lecture | |
CN112667794A (en) | Intelligent question-answer matching method and system based on twin network BERT model | |
Kilimci | Sentiment analysis based direction prediction in bitcoin using deep learning algorithms and word embedding models | |
CN106778014A (en) | A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network | |
CN108595602A (en) | The question sentence file classification method combined with depth model based on shallow Model | |
CN109559300A (en) | Image processing method, electronic equipment and computer readable storage medium | |
CN107402954A (en) | Establish the method for order models, application process and device based on the model | |
CN111309887B (en) | Method and system for training text key content extraction model | |
CN108304364A (en) | keyword extracting method and device | |
CN109255012A (en) | A kind of machine reads the implementation method and device of understanding | |
Shao et al. | Collaborative learning for answer selection in question answering | |
CN108875034A (en) | A kind of Chinese Text Categorization based on stratification shot and long term memory network | |
CN111581364B (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
CN109960732A (en) | A kind of discrete Hash cross-module state search method of depth and system based on robust supervision | |
Zhu et al. | Crime event embedding with unsupervised feature selection | |
Ye et al. | Multi-level composite neural networks for medical question answer matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |