CN107832326B - Natural language question-answering method based on deep convolutional neural network - Google Patents
Natural language question-answering method based on deep convolutional neural network Download PDFInfo
- Publication number
- CN107832326B CN107832326B CN201710841026.2A CN201710841026A CN107832326B CN 107832326 B CN107832326 B CN 107832326B CN 201710841026 A CN201710841026 A CN 201710841026A CN 107832326 B CN107832326 B CN 107832326B
- Authority
- CN
- China
- Prior art keywords
- matrix
- natural language
- width
- vector
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 44
- 239000013598 vector Substances 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 30
- 238000011176 pooling Methods 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 239000010410 layer Substances 0.000 description 41
- 238000012549 training Methods 0.000 description 14
- 230000011218 segmentation Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 101100153581 Bacillus anthracis topX gene Proteins 0.000 description 1
- 101150041570 TOP1 gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a natural language question-answering method based on a deep convolutional neural network. The method comprises the following steps: 1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix; 2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features; 3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features; 4) and selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question. The invention can better extract the deep generalized semantic features and accurately position the supporting data information, thereby obtaining better natural language question-answering effect.
Description
Technical Field
The invention relates to a method for extracting semantic features of natural language questions and candidate information by utilizing a deep convolutional neural network to enhance the effect of correlation calculation so as to improve the accuracy of natural language question answering, and belongs to the field of natural language question answering.
Background
With the development of information technology and internet, information overload is more and more serious, and how to effectively understand the requirements of users and cross the gap between query and existing information inconsistency, so that the method for effectively acquiring the user requirements from a large amount of information becomes a very important problem.
A user's query typically appears as a problem expressed using natural language. The expression form of the resource database providing the answer information can be various, and a structured knowledge base can be formed by triples in the shapes of (subject, predicate and object), for example, the triples (China, capital and Beijing) contain the knowledge that the capital of China is Beijing; or a text set composed of a large number of common natural language sentences, and the corpus may come from various platforms such as encyclopedia, news, social media and the like and combinations thereof, such as "i come to the university of reading of capital-Beijing-China. "the knowledge that the first capital of China is Beijing is also included; similarly, the resource database may be a combination of multiple pieces of information in various forms. An important process in natural language question answering is to evaluate semantic relevance between information in a resource database and questions queried by a user, so as to select the most effective information to help answer the questions of the user.
Natural language problems are often characterized by flexibility and variability, and the organization of information in a resource database is also complex, so that it becomes a challenging task to effectively extract features to calculate semantic relevance between candidate information and natural language problems. The convolutional neural network can automatically organize the structures between adjacent words, extract the overall semantic features of the text and abstract and summarize semantic information. The deep convolutional neural network has deeper layers and a more complex structure, can integrally process semantics in a larger input window by using fewer parameters and model the semantics into deeper and more abstract feature representation, is favorable for better processing the problems of complexity of organization forms of natural language questions and candidate information, inconsistent expression among the natural language questions and the candidate information and the like, and better expresses the semantics of the questions and the candidate information in a unified feature space, thereby better calculating the semantic correlation among the questions and the candidate information and improving the accuracy of natural language question answering.
Disclosure of Invention
The invention aims to provide a method for better extracting semantic features of natural language questions and candidate information to assist in calculating semantic correlation between the natural language questions and the candidate information so as to improve the accuracy of natural language question answering. That is, for the natural language question q and the database information set D ═ Di}. Extracting a corresponding feature vector by using a deep neural network method:andand calculating the problem q and each database information D according to the problem qiSimilarity between S ═ SqiAnd based on this, select several pieces of information most relevant to the problem,and generates answers to the questions accordingly.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a natural language question-answering method based on a deep convolutional neural network comprises the following steps:
1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix;
2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features;
3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features;
4) and selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question.
The database information set in the step 1) is an original database information set or a candidate information set which is obtained by information screening and has a reduced range.
The specific steps of the process of the invention are further illustrated below:
(1) for each question q and the whole database information set D, reducing the range of effective information by some low-overhead means such as retrieval and the like and carrying out preliminary sorting on the screened results to obtain a candidate information set with the reduced range
(2) Natural language questions and candidate information are formulated as vector representations having a sequence structure that can be used for deep convolutional neural network input. I.e. for each representation E q ∈ { q }, u D of the inputqIt can be expressed as a set of terms: t ═ TnEach of which is tiA vector representation may be used and a full-order relationship may be defined on the set of items T. At this time, E can be expressed in a matrix form according to the sequential relationship: mE=(v1;v2;…;vn) Wherein v isiIs tiThe corresponding vector is represented by a vector that,to pairWhereinThe method is a symbol for indicating the order relationship, is used for expressing an abstract and wide order relationship, and is not limited to numerical order, lexicographic order and other order relationships defined naturally.
(3) Input matrix M is aligned using a multi-step long deep convolutional neural network with Residual chaining (Residual Connections)EIs subjected to a treatment of M0=ME,Mi+1=Mi+CNN(Mi). Where CNN represents a convolutional neural network. At MiAnd Mi+1When the feature numbers are different, the left term of the front formula needs to be subjected to one-dimensional convolution processing with the width of 1 to adjust the feature dimension. Between layers, additional pooling layers may be added to reduce the size of the feature matrix. The final output E eigenvector f ═ pooling (M)N) And the global pooling result is output for the last layer of convolution.
Residual Connections (Residual Connections) refer to adding the previous convolutional layer output result directly to the current convolutional layer output, replacing the output of the current layer for subsequent processing, and if the two tensor feature dimensions added have a difference, processing the convolutional layer with a width of 1 for dimension adjustment. The linkage can enable the deep convolutional network to adaptively adjust the network depth to a certain extent in the learning process, and reduce the influence of the deep convolutional network on gradient propagation.
(4) According toAndcalculating a score S ═ S for each candidate information representationqiAnd (6) calculating the relevance of each candidate message to the problem. And accordingly from DqSeveral pieces of information are selected as the main basis for generating answers.
(5) And generating answers of the questions according to the selected information and the weights by using a corresponding natural language generation technology.
In step (1), it is also feasible not to screen the information if the computational power is sufficient, but considering that the scale of D tends to be large in the question-and-answer task, it is also necessary to narrow down the candidate size by excluding irrelevant information using some simple means. If for the screening result DqThe sorting cannot be performed or the sorting with better effect is difficult to provide, and the disorder of the returned candidate information is also feasible, which can be completely compensated by the subsequent deep convolutional neural network training process.
In step (2), the vector representation may be a pre-trained low-dimensional dense semantic representation vector such as a word vector trained by using a neural language model, a word vector obtained by reducing the dimensions of a high-dimensional matrix by using a Singular Value Decomposition (SVD) method (e.g., a result obtained by LSA (Latent semantic analysis), and the like), or an original high-dimensional sparse vector such as a one-hot vector. The full-order relationship can be a natural full-order relationship such as word order, or an artificially defined full-order relationship. For example, for the full-segmentation word segmentation result, the order may be defined as a priority initial order, and then the final order is considered. In fact, the requirement of the deep convolutional network on the input is slightly weaker than that of the existing full-order relation, and we assume that the jth convolution kernel of the ith layer has the width of mijThe subset of items involved in the computation is clustered as { T }ijAre only needed toCan defineOnMeta proximity relationsThat is, it is defined as follows:
and isMay define a fully ordered relationship on the elements }, and recursively,this is mainly only for the reason that the convolution process can be performed efficiently.
In step (3), the interlayer relationship of the deep convolutional network with residual linkage can be represented by the following formula:
wherein M isiIs the characteristic matrix after the ith layer,for adjusting MiAnd Mi+1A convolution calculation of width 1 with a possible difference in the number of features between them, at Σjkij=∑jki-1jThe constant identity transformation can be omitted, concat is a connection function, and a series of tensors of the input are connected on the last component. m isijIs the width, k, of the jth convolution kernel of the ith layerijThe number of the jth convolution kernels of the ith layer. And CNNmkRepresenting a convolution operation with an activation function having a width m and a filter number k, satisfying:
CNNmk(ME)=CNNmk((v1;v2;…;vn))=(vo1-|m|/2;vo2-|m|/2;…;von-|m|/2)
wherein,if the subscript of v is less than 1 or greater than n, then 0 vector padding is used, i.e.v>n or v<1. Wherein g represents a non-linear activation function, such as sigmoid function or hyperbolic tangent function, WmjRepresenting the weight matrix used in the convolutional layer calculation, T representing the matrix transposition operation, bmjThe offset vector used in the convolutional layer calculation may be degraded if necessary without being added.
Finally, after the N convolutional layers are laminated, the input can be characterized by the following equation:
fE=pooling(MN)
wherein the pooling can be any global pooling operation, such as maximal pooling or tie pooling.
The step 3) can be realized by adopting the following three methods besides the above method:
the method comprises the following steps:
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijAt sigma, in the convolutional layer calculation ofjkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijWith an activation function.
The method 2 comprises the following steps:
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; posing is a pooling function, and an input matrix is converted into a vector by calculating information such as the maximum value, the minimum value, the average value, the median and the like of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijWith an activation function.
The method 3 comprises the following steps:
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijWith an activation function.
If necessary, the above obtained feature vector f can be usedEAfter some transformation, e.g. using several neural network full-link layer pairs fEIs processed to obtain f'EThen f 'is used'EIn place of fEAnd participating in subsequent calculations and operations. One or more layers of recurrent neural networks (e.g., structures such as LSTM, GRU, etc.) may be added before or after any convolutional layer computation to perform additional processing on the sequence of feature vectors. Additional processing of the sequence of feature vectors may also be performed by adding pooling operations (e.g., pooling operations of any of the above, such as pooling, k-max pooling) before or after any of the convolutional layer calculations. These additional optional operations may be selected with reference to the characteristics of the data set used in the method implementation.
In the process of multi-layer lap machine network training, skills such as dropout layers or regular terms L1 and L2 can be added to limit the expression capability of the model, enhance the generalization capability and prevent over-learning. During training, in order to prevent the distribution rule of the whole feature matrix from being greatly changed along with the number of layers due to deep convolution operation, and to cause gradient disappearance or gradient explosion, some normalization processing means such as the mean value and variance of the distribution of the part or the whole of the feature matrix on one training input batch can be added between layers.
In step (4), there are many methods for calculating the score, such as:
using the multi-layer perceptron after the connection:wherein,composite operations representing functions, per-layer perceptron operationsWhere i ∈ {1,2, …, Nf-g denotes a non-linear activation function;
or the number of wins using pairwise comparisons such as:
where the notation num indicates that j, i is looped, the subscript indicates that j ═ i is skipped, and P indicates probability.
In step (5), there are many implementation forms according to the generated answers, and there is usually a great relationship with the specific question-answer data form and task setting. Simpler examples such as the one D that will score the highestqAs an answer to a question (in the case of D being a set of texts) or the one D that will score the highestqAs an answer to the question (in the case of D being a knowledge base triplet). More complex methods such as the Memory Neural Network (MNN) may also be used.
The invention also provides a server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method described above.
Compared with the prior art, the invention has the following positive effects:
the invention uses deep convolutional neural network to extract semantic features so as to better extract and summarize semantic representation of question and answer related information. In the natural language question-answering task, compared with the traditional method, the method can extract semantic information with higher level, so that the method can better adapt to the characteristics of expression flexibility, information organization inconsistency and the like in the natural language question-answering task, and can better evaluate the semantic correlation degree of the question and the candidate information so as to improve the effect of natural language question-answering.
Drawings
Fig. 1 is a framework diagram of a natural language question answering method in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention is based on a knowledge base data set and a question set provided by an evaluation task of a sixth natural language processing and Chinese computing conference (NLPCC 2017) Chinese knowledge base question and answer. It should be clear to those skilled in the art that other candidate information sets and problem sets may be used in the implementation.
Specifically, in this embodiment, there are 32110 question-answer pairs, 7631 questions as tests, and 24479 questions for training, such as: "do you know when Querdaloguburg was finished? "; there are about 4 thousand 3 million pieces of information organized into knowledge base triplets for providing candidate information, such as: (Kuudaroubau, completion time, 1890 years).
Fig. 1 is a schematic diagram of a natural language question-answering method based on a deep convolutional neural network according to an embodiment of the present invention, where the method includes the following steps:
step 1: and reducing the candidate information size according to the problem.
According to the characteristics of the data set, in the information screening process, the embodiment of the invention limits the substrings of which the candidate information main bodies must be problems. Subsequently, features are extracted using the rules and a gradient iterative decision tree (GBDT) model is trained using the training data to score all substrings, taking only the top three substrings with scores no less than the first name score 1/100 as possible candidate entities. There are only about a few dozen database information entries with subject candidate entities per topic on average (e.g., the average number of candidate information entries over 7631 test set questions is 62.93).
Then, using the full segmentation word attention model, the top 20 ranked information of each question is selected as the question q candidate information set Dq for subsequent processing. (see Yuxuan Lai, Yang Lin, Jianhao Chen, Yansong Feng, and Dongyan Zha.: Open Domain query System Based on Knowledge base. in Proceedings of NLPCC 2016.)
Step 2: the question and the candidate information are represented as a representation of a sequence of vectors.
Both the problem q and candidate information set Dq triplets are represented using word vector sequences. The words used for the problems are word segmentation results of the rest part after the triple main body is removed, and the words used for the triples are word segmentation results of the triple predicate part. Thus, each question and each candidate information set entry may be expressed in the form of a word vector matrix. Word vectors trained using the google word2vector model on chinese encyclopedia corpus are used.
And step 3: deep semantic features are extracted using a deep convolutional neural network.
After the word vector matrix is obtained, the deep convolutional neural network is adopted to extract deep semantic information. The convolutional neural network uses 2-3 convolutional layers, each convolutional layer having 256 convolutional kernels of width 1, 512 convolutional kernels of width 2, and 256 convolutional kernels of width 3. Residual connection is added between the convolutional layers.
And 4, step 4: and calculating the semantic relevance according to the semantic features.
After obtaining the semantic features, the invention integrates the features from the problem and the candidate information by adopting bitwise multiplication, and then calculates the specific similarity score through a multilayer perceptron model with 1 hidden layer and 1024 hidden nodes.
And 5: generating answers based on semantic relatedness selected information
Due to the characteristics of the data adopted by the embodiment of the invention, the object character string of the information with the highest degree of correlation is directly selected as the answer of the question.
During model training, in order to deal with the condition that the positive and negative samples in the question-answering task are not uniformly distributed, the negative samples are sampled by using the model result (the result of the first round of using the step one or random) obtained in the previous round of training in each round of training, the negative samples are considered to be closer to the positive samples, and the sampling probability is higher when the next round of training data is generated. Essentially consisting in a specific reinforcement in the follow-up training against the weakness of the previous training. The sampling probability used by the invention for the negative sample is shown as follows:
wherein,for a certain negative sample at tiProbability of time selection, ranki-1To use ti-1Model of time of day, ranking of the sample among all candidates. In actual training, one time node is taken in each iteration, and 7 rounds of training are performed in total.
Table 1 shows the natural language question-answering effect of the method, in which the evaluation index is the first accuracy, i.e., the frequency with which the first-ranked answer to the question is the correct answer.
Layer number: | 1 | 2 | 3 | 5 |
top1 accuracy: | 43.57% | 43.82% | 43.85% | 42.13% |
TABLE 1
As can be seen from the table, the deep convolutional neural network can obtain better expression effect along with the increase of the number of model layers in a certain range, which shows that the method can better extract deep semantic information, thereby better calculating the semantic correlation between the question and the candidate information and obtaining better natural language question-answering effect.
In summary, in the embodiment of the present invention, a reliable natural language question-answering system is constructed based on the knowledge base data set and the question set provided by the evaluation task of the sixth natural language processing and chinese language computation conference (NLPCC 2017) chinese knowledge base question-answering. In the process of selecting the basis information, the method provided by the invention can effectively extract the semantic features of the summarized question and the candidate information, thereby better calculating the semantic correlation between the question and the candidate information and obtaining better natural language question-answering effect.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is intended to include such modifications and variations.
Claims (6)
1. A natural language question-answering method based on a deep convolutional neural network is characterized by comprising the following steps:
1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix;
2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features;
3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features;
4) selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question; wherein, the step 2) uses a multilayer convolution neural network to generate a vector matrix M according to the step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME,fE=pooling(MN) Wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijRepresents a first order width of 1 and the number of filters is sigmajkijWith an activation function, at ∑jkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, a series of inputted tensors are connected on the last component, and j below the concat represents connection of a dimension j; posing is a pooling function, and input matrixes are formed into a vector by calculating the maximum value, the minimum value, the average value and the median of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME,fE=flatten(MN) Wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijAt sigma, in the convolutional layer calculation ofjkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; posing is a pooling function, and input matrixes are formed into a vector by calculating the maximum value, the minimum value, the average value and the median of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,represents a width of m at one timeijThe number of filters is kijWith an activation function.
2. The method of claim 1, wherein the database information set of step 1) is an original database information set or a candidate information set with a reduced scope obtained by information screening.
3. The method of claim 1, wherein the deep semantic features f are combinedEAfter processing of a plurality of full connection layers, a new vector representation f 'of sentence i is obtained'EThen according to f'EAnd calculating the semantic relevance of the natural language problem and the information in the database information set.
4. The method of claim 1, wherein additional pooling functions are added between convolutional layer operations.
5. The method of claim 1, wherein one or more layers of recurrent neural networks are added to process the feature matrix before or after any convolutional layer operation.
6. A server, characterized in that the server comprises a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710841026.2A CN107832326B (en) | 2017-09-18 | 2017-09-18 | Natural language question-answering method based on deep convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710841026.2A CN107832326B (en) | 2017-09-18 | 2017-09-18 | Natural language question-answering method based on deep convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107832326A CN107832326A (en) | 2018-03-23 |
CN107832326B true CN107832326B (en) | 2021-06-08 |
Family
ID=61643392
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710841026.2A Active CN107832326B (en) | 2017-09-18 | 2017-09-18 | Natural language question-answering method based on deep convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107832326B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108563782B (en) * | 2018-04-25 | 2023-04-18 | 平安科技(深圳)有限公司 | Commodity information format processing method and device, computer equipment and storage medium |
CN109271926B (en) * | 2018-09-14 | 2021-09-10 | 西安电子科技大学 | Intelligent radiation source identification method based on GRU deep convolutional network |
CN109740126B (en) * | 2019-01-04 | 2023-11-21 | 平安科技(深圳)有限公司 | Text matching method and device, storage medium and computer equipment |
CN111666482B (en) * | 2019-03-06 | 2022-08-02 | 珠海格力电器股份有限公司 | Query method and device, storage medium and processor |
CN110348014B (en) * | 2019-07-10 | 2023-03-24 | 电子科技大学 | Semantic similarity calculation method based on deep learning |
CN110516145B (en) * | 2019-07-10 | 2020-05-01 | 中国人民解放军国防科技大学 | Information searching method based on sentence vector coding |
CN110990549B (en) * | 2019-12-02 | 2023-04-28 | 腾讯科技(深圳)有限公司 | Method, device, electronic equipment and storage medium for obtaining answer |
CN112434152B (en) * | 2020-12-01 | 2022-10-14 | 北京大学 | Education choice question answering method and device based on multi-channel convolutional neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN107066464A (en) * | 2016-01-13 | 2017-08-18 | 奥多比公司 | Semantic Natural Language Vector Space |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9904976B2 (en) * | 2015-01-16 | 2018-02-27 | Nec Corporation | High performance portable convulational neural network library on GP-GPUs |
-
2017
- 2017-09-18 CN CN201710841026.2A patent/CN107832326B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN107066464A (en) * | 2016-01-13 | 2017-08-18 | 奥多比公司 | Semantic Natural Language Vector Space |
Also Published As
Publication number | Publication date |
---|---|
CN107832326A (en) | 2018-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832326B (en) | Natural language question-answering method based on deep convolutional neural network | |
Lu et al. | Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity | |
CN111415740B (en) | Method and device for processing inquiry information, storage medium and computer equipment | |
CN109783817B (en) | Text semantic similarity calculation model based on deep reinforcement learning | |
CN107704563B (en) | Question recommendation method and system | |
Liu et al. | Probabilistic reasoning via deep learning: Neural association models | |
Nie et al. | Data-driven answer selection in community QA systems | |
Tandon et al. | Webchild: Harvesting and organizing commonsense knowledge from the web | |
CN105183833B (en) | Microblog text recommendation method and device based on user model | |
Cohen et al. | End to end long short term memory networks for non-factoid question answering | |
CN104615767A (en) | Searching-ranking model training method and device and search processing method | |
CN112115716A (en) | Service discovery method, system and equipment based on multi-dimensional word vector context matching | |
CN116134432A (en) | System and method for providing answers to queries | |
CN116992007B (en) | Limiting question-answering system based on question intention understanding | |
Panda | Developing an efficient text pre-processing method with sparse generative Naive Bayes for text mining | |
CN109344246B (en) | Electronic questionnaire generating method, computer readable storage medium and terminal device | |
Wu et al. | ECNU at SemEval-2017 task 3: Using traditional and deep learning methods to address community question answering task | |
CN110321421A (en) | Expert recommendation method and computer storage medium for website Knowledge Community system | |
CN112632261A (en) | Intelligent question and answer method, device, equipment and storage medium | |
CN111581364A (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
Wan | Sentiment analysis of Weibo comments based on deep neural network | |
CN106681986A (en) | Multi-dimensional sentiment analysis system | |
Zhao et al. | Interactive attention networks for semantic text matching | |
Polignano et al. | Identification Of Bot Accounts In Twitter Using 2D CNNs On User-generated Contents. | |
CN114722176A (en) | Intelligent question answering method, device, medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |