CN107832326B - Natural language question-answering method based on deep convolutional neural network - Google Patents

Natural language question-answering method based on deep convolutional neural network Download PDF

Info

Publication number
CN107832326B
CN107832326B CN201710841026.2A CN201710841026A CN107832326B CN 107832326 B CN107832326 B CN 107832326B CN 201710841026 A CN201710841026 A CN 201710841026A CN 107832326 B CN107832326 B CN 107832326B
Authority
CN
China
Prior art keywords
matrix
natural language
width
vector
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710841026.2A
Other languages
Chinese (zh)
Other versions
CN107832326A (en
Inventor
来雨轩
冯岩松
贾爱霞
赵东岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201710841026.2A priority Critical patent/CN107832326B/en
Publication of CN107832326A publication Critical patent/CN107832326A/en
Application granted granted Critical
Publication of CN107832326B publication Critical patent/CN107832326B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a natural language question-answering method based on a deep convolutional neural network. The method comprises the following steps: 1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix; 2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features; 3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features; 4) and selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question. The invention can better extract the deep generalized semantic features and accurately position the supporting data information, thereby obtaining better natural language question-answering effect.

Description

Natural language question-answering method based on deep convolutional neural network
Technical Field
The invention relates to a method for extracting semantic features of natural language questions and candidate information by utilizing a deep convolutional neural network to enhance the effect of correlation calculation so as to improve the accuracy of natural language question answering, and belongs to the field of natural language question answering.
Background
With the development of information technology and internet, information overload is more and more serious, and how to effectively understand the requirements of users and cross the gap between query and existing information inconsistency, so that the method for effectively acquiring the user requirements from a large amount of information becomes a very important problem.
A user's query typically appears as a problem expressed using natural language. The expression form of the resource database providing the answer information can be various, and a structured knowledge base can be formed by triples in the shapes of (subject, predicate and object), for example, the triples (China, capital and Beijing) contain the knowledge that the capital of China is Beijing; or a text set composed of a large number of common natural language sentences, and the corpus may come from various platforms such as encyclopedia, news, social media and the like and combinations thereof, such as "i come to the university of reading of capital-Beijing-China. "the knowledge that the first capital of China is Beijing is also included; similarly, the resource database may be a combination of multiple pieces of information in various forms. An important process in natural language question answering is to evaluate semantic relevance between information in a resource database and questions queried by a user, so as to select the most effective information to help answer the questions of the user.
Natural language problems are often characterized by flexibility and variability, and the organization of information in a resource database is also complex, so that it becomes a challenging task to effectively extract features to calculate semantic relevance between candidate information and natural language problems. The convolutional neural network can automatically organize the structures between adjacent words, extract the overall semantic features of the text and abstract and summarize semantic information. The deep convolutional neural network has deeper layers and a more complex structure, can integrally process semantics in a larger input window by using fewer parameters and model the semantics into deeper and more abstract feature representation, is favorable for better processing the problems of complexity of organization forms of natural language questions and candidate information, inconsistent expression among the natural language questions and the candidate information and the like, and better expresses the semantics of the questions and the candidate information in a unified feature space, thereby better calculating the semantic correlation among the questions and the candidate information and improving the accuracy of natural language question answering.
Disclosure of Invention
The invention aims to provide a method for better extracting semantic features of natural language questions and candidate information to assist in calculating semantic correlation between the natural language questions and the candidate information so as to improve the accuracy of natural language question answering. That is, for the natural language question q and the database information set D ═ Di}. Extracting a corresponding feature vector by using a deep neural network method:
Figure BDA0001410752010000021
and
Figure BDA0001410752010000022
and calculating the problem q and each database information D according to the problem qiSimilarity between S ═ SqiAnd based on this, select several pieces of information most relevant to the problem,and generates answers to the questions accordingly.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a natural language question-answering method based on a deep convolutional neural network comprises the following steps:
1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix;
2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features;
3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features;
4) and selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question.
The database information set in the step 1) is an original database information set or a candidate information set which is obtained by information screening and has a reduced range.
The specific steps of the process of the invention are further illustrated below:
(1) for each question q and the whole database information set D, reducing the range of effective information by some low-overhead means such as retrieval and the like and carrying out preliminary sorting on the screened results to obtain a candidate information set with the reduced range
Figure BDA0001410752010000023
(2) Natural language questions and candidate information are formulated as vector representations having a sequence structure that can be used for deep convolutional neural network input. I.e. for each representation E q ∈ { q }, u D of the inputqIt can be expressed as a set of terms: t ═ TnEach of which is tiA vector representation may be used and a full-order relationship may be defined on the set of items T. At this time, E can be expressed in a matrix form according to the sequential relationship: mE=(v1;v2;…;vn) Wherein v isiIs tiThe corresponding vector is represented by a vector that,
Figure BDA0001410752010000025
to pair
Figure BDA0001410752010000024
Wherein
Figure BDA0001410752010000026
The method is a symbol for indicating the order relationship, is used for expressing an abstract and wide order relationship, and is not limited to numerical order, lexicographic order and other order relationships defined naturally.
(3) Input matrix M is aligned using a multi-step long deep convolutional neural network with Residual chaining (Residual Connections)EIs subjected to a treatment of M0=ME,Mi+1=Mi+CNN(Mi). Where CNN represents a convolutional neural network. At MiAnd Mi+1When the feature numbers are different, the left term of the front formula needs to be subjected to one-dimensional convolution processing with the width of 1 to adjust the feature dimension. Between layers, additional pooling layers may be added to reduce the size of the feature matrix. The final output E eigenvector f ═ pooling (M)N) And the global pooling result is output for the last layer of convolution.
Residual Connections (Residual Connections) refer to adding the previous convolutional layer output result directly to the current convolutional layer output, replacing the output of the current layer for subsequent processing, and if the two tensor feature dimensions added have a difference, processing the convolutional layer with a width of 1 for dimension adjustment. The linkage can enable the deep convolutional network to adaptively adjust the network depth to a certain extent in the learning process, and reduce the influence of the deep convolutional network on gradient propagation.
(4) According to
Figure BDA0001410752010000031
And
Figure BDA0001410752010000032
calculating a score S ═ S for each candidate information representationqiAnd (6) calculating the relevance of each candidate message to the problem. And accordingly from DqSeveral pieces of information are selected as the main basis for generating answers.
(5) And generating answers of the questions according to the selected information and the weights by using a corresponding natural language generation technology.
In step (1), it is also feasible not to screen the information if the computational power is sufficient, but considering that the scale of D tends to be large in the question-and-answer task, it is also necessary to narrow down the candidate size by excluding irrelevant information using some simple means. If for the screening result DqThe sorting cannot be performed or the sorting with better effect is difficult to provide, and the disorder of the returned candidate information is also feasible, which can be completely compensated by the subsequent deep convolutional neural network training process.
In step (2), the vector representation may be a pre-trained low-dimensional dense semantic representation vector such as a word vector trained by using a neural language model, a word vector obtained by reducing the dimensions of a high-dimensional matrix by using a Singular Value Decomposition (SVD) method (e.g., a result obtained by LSA (Latent semantic analysis), and the like), or an original high-dimensional sparse vector such as a one-hot vector. The full-order relationship can be a natural full-order relationship such as word order, or an artificially defined full-order relationship. For example, for the full-segmentation word segmentation result, the order may be defined as a priority initial order, and then the final order is considered. In fact, the requirement of the deep convolutional network on the input is slightly weaker than that of the existing full-order relation, and we assume that the jth convolution kernel of the ith layer has the width of mijThe subset of items involved in the computation is clustered as { T }ijAre only needed to
Figure BDA0001410752010000033
Can define
Figure BDA0001410752010000034
On
Figure BDA0001410752010000035
Meta proximity relations
Figure BDA0001410752010000036
That is, it is defined as follows:
Figure BDA0001410752010000037
and is
Figure BDA0001410752010000038
May define a fully ordered relationship on the elements }, and recursively,
Figure BDA0001410752010000039
this is mainly only for the reason that the convolution process can be performed efficiently.
In step (3), the interlayer relationship of the deep convolutional network with residual linkage can be represented by the following formula:
Figure BDA00014107520100000310
wherein M isiIs the characteristic matrix after the ith layer,
Figure BDA00014107520100000311
for adjusting MiAnd Mi+1A convolution calculation of width 1 with a possible difference in the number of features between them, at Σjkij=∑jki-1jThe constant identity transformation can be omitted, concat is a connection function, and a series of tensors of the input are connected on the last component. m isijIs the width, k, of the jth convolution kernel of the ith layerijThe number of the jth convolution kernels of the ith layer. And CNNmkRepresenting a convolution operation with an activation function having a width m and a filter number k, satisfying:
CNNmk(ME)=CNNmk((v1;v2;…;vn))=(vo1-|m|/2;vo2-|m|/2;…;von-|m|/2)
wherein the content of the first and second substances,
Figure BDA0001410752010000041
if the subscript of v is less than 1 or greater than n, then 0 vector padding is used, i.e.
Figure BDA0001410752010000042
v>n or v<1. Wherein g represents a non-linear activation function, such as sigmoid function or hyperbolic tangent function, WmjRepresenting the weight matrix used in the convolutional layer calculation, T representing the matrix transposition operation, bmjThe offset vector used in the convolutional layer calculation may be degraded if necessary without being added.
Finally, after the N convolutional layers are laminated, the input can be characterized by the following equation:
fE=pooling(MN)
wherein the pooling can be any global pooling operation, such as maximal pooling or tie pooling.
The step 3) can be realized by adopting the following three methods besides the above method:
the method comprises the following steps:
Figure BDA0001410752010000043
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;
Figure BDA0001410752010000044
for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijAt sigma, in the convolutional layer calculation ofjkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure BDA0001410752010000045
represents a width of m at one timeijThe number of filters is kijWith an activation function.
The method 2 comprises the following steps:
Figure BDA0001410752010000046
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; posing is a pooling function, and an input matrix is converted into a vector by calculating information such as the maximum value, the minimum value, the average value, the median and the like of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure BDA0001410752010000047
represents a width of m at one timeijThe number of filters is kijWith an activation function.
The method 3 comprises the following steps:
Figure BDA0001410752010000048
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure BDA0001410752010000049
represents a width of m at one timeijThe number of filters is kijWith an activation function.
If necessary, the above obtained feature vector f can be usedEAfter some transformation, e.g. using several neural network full-link layer pairs fEIs processed to obtain f'EThen f 'is used'EIn place of fEAnd participating in subsequent calculations and operations. One or more layers of recurrent neural networks (e.g., structures such as LSTM, GRU, etc.) may be added before or after any convolutional layer computation to perform additional processing on the sequence of feature vectors. Additional processing of the sequence of feature vectors may also be performed by adding pooling operations (e.g., pooling operations of any of the above, such as pooling, k-max pooling) before or after any of the convolutional layer calculations. These additional optional operations may be selected with reference to the characteristics of the data set used in the method implementation.
In the process of multi-layer lap machine network training, skills such as dropout layers or regular terms L1 and L2 can be added to limit the expression capability of the model, enhance the generalization capability and prevent over-learning. During training, in order to prevent the distribution rule of the whole feature matrix from being greatly changed along with the number of layers due to deep convolution operation, and to cause gradient disappearance or gradient explosion, some normalization processing means such as the mean value and variance of the distribution of the part or the whole of the feature matrix on one training input batch can be added between layers.
In step (4), there are many methods for calculating the score, such as:
using cosine similarity directly:
Figure BDA0001410752010000051
using the multi-layer perceptron after the connection:
Figure BDA0001410752010000052
wherein the content of the first and second substances,
Figure BDA0001410752010000053
composite operations representing functions, per-layer perceptron operations
Figure BDA0001410752010000054
Where i ∈ {1,2, …, Nf-g denotes a non-linear activation function;
or the number of wins using pairwise comparisons such as:
Figure BDA0001410752010000055
where the notation num indicates that j, i is looped, the subscript indicates that j ═ i is skipped, and P indicates probability.
In fact, any of the forms is
Figure BDA0001410752010000056
Can be used to calculate the score.
In step (5), there are many implementation forms according to the generated answers, and there is usually a great relationship with the specific question-answer data form and task setting. Simpler examples such as the one D that will score the highestqAs an answer to a question (in the case of D being a set of texts) or the one D that will score the highestqAs an answer to the question (in the case of D being a knowledge base triplet). More complex methods such as the Memory Neural Network (MNN) may also be used.
The invention also provides a server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method described above.
Compared with the prior art, the invention has the following positive effects:
the invention uses deep convolutional neural network to extract semantic features so as to better extract and summarize semantic representation of question and answer related information. In the natural language question-answering task, compared with the traditional method, the method can extract semantic information with higher level, so that the method can better adapt to the characteristics of expression flexibility, information organization inconsistency and the like in the natural language question-answering task, and can better evaluate the semantic correlation degree of the question and the candidate information so as to improve the effect of natural language question-answering.
Drawings
Fig. 1 is a framework diagram of a natural language question answering method in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention is based on a knowledge base data set and a question set provided by an evaluation task of a sixth natural language processing and Chinese computing conference (NLPCC 2017) Chinese knowledge base question and answer. It should be clear to those skilled in the art that other candidate information sets and problem sets may be used in the implementation.
Specifically, in this embodiment, there are 32110 question-answer pairs, 7631 questions as tests, and 24479 questions for training, such as: "do you know when Querdaloguburg was finished? "; there are about 4 thousand 3 million pieces of information organized into knowledge base triplets for providing candidate information, such as: (Kuudaroubau, completion time, 1890 years).
Fig. 1 is a schematic diagram of a natural language question-answering method based on a deep convolutional neural network according to an embodiment of the present invention, where the method includes the following steps:
step 1: and reducing the candidate information size according to the problem.
According to the characteristics of the data set, in the information screening process, the embodiment of the invention limits the substrings of which the candidate information main bodies must be problems. Subsequently, features are extracted using the rules and a gradient iterative decision tree (GBDT) model is trained using the training data to score all substrings, taking only the top three substrings with scores no less than the first name score 1/100 as possible candidate entities. There are only about a few dozen database information entries with subject candidate entities per topic on average (e.g., the average number of candidate information entries over 7631 test set questions is 62.93).
Then, using the full segmentation word attention model, the top 20 ranked information of each question is selected as the question q candidate information set Dq for subsequent processing. (see Yuxuan Lai, Yang Lin, Jianhao Chen, Yansong Feng, and Dongyan Zha.: Open Domain query System Based on Knowledge base. in Proceedings of NLPCC 2016.)
Step 2: the question and the candidate information are represented as a representation of a sequence of vectors.
Both the problem q and candidate information set Dq triplets are represented using word vector sequences. The words used for the problems are word segmentation results of the rest part after the triple main body is removed, and the words used for the triples are word segmentation results of the triple predicate part. Thus, each question and each candidate information set entry may be expressed in the form of a word vector matrix. Word vectors trained using the google word2vector model on chinese encyclopedia corpus are used.
And step 3: deep semantic features are extracted using a deep convolutional neural network.
After the word vector matrix is obtained, the deep convolutional neural network is adopted to extract deep semantic information. The convolutional neural network uses 2-3 convolutional layers, each convolutional layer having 256 convolutional kernels of width 1, 512 convolutional kernels of width 2, and 256 convolutional kernels of width 3. Residual connection is added between the convolutional layers.
And 4, step 4: and calculating the semantic relevance according to the semantic features.
After obtaining the semantic features, the invention integrates the features from the problem and the candidate information by adopting bitwise multiplication, and then calculates the specific similarity score through a multilayer perceptron model with 1 hidden layer and 1024 hidden nodes.
And 5: generating answers based on semantic relatedness selected information
Due to the characteristics of the data adopted by the embodiment of the invention, the object character string of the information with the highest degree of correlation is directly selected as the answer of the question.
During model training, in order to deal with the condition that the positive and negative samples in the question-answering task are not uniformly distributed, the negative samples are sampled by using the model result (the result of the first round of using the step one or random) obtained in the previous round of training in each round of training, the negative samples are considered to be closer to the positive samples, and the sampling probability is higher when the next round of training data is generated. Essentially consisting in a specific reinforcement in the follow-up training against the weakness of the previous training. The sampling probability used by the invention for the negative sample is shown as follows:
Figure BDA0001410752010000071
wherein the content of the first and second substances,
Figure BDA0001410752010000072
for a certain negative sample at tiProbability of time selection, ranki-1To use ti-1Model of time of day, ranking of the sample among all candidates. In actual training, one time node is taken in each iteration, and 7 rounds of training are performed in total.
Table 1 shows the natural language question-answering effect of the method, in which the evaluation index is the first accuracy, i.e., the frequency with which the first-ranked answer to the question is the correct answer.
Layer number: 1 2 3 5
top1 accuracy: 43.57% 43.82% 43.85% 42.13%
TABLE 1
As can be seen from the table, the deep convolutional neural network can obtain better expression effect along with the increase of the number of model layers in a certain range, which shows that the method can better extract deep semantic information, thereby better calculating the semantic correlation between the question and the candidate information and obtaining better natural language question-answering effect.
In summary, in the embodiment of the present invention, a reliable natural language question-answering system is constructed based on the knowledge base data set and the question set provided by the evaluation task of the sixth natural language processing and chinese language computation conference (NLPCC 2017) chinese knowledge base question-answering. In the process of selecting the basis information, the method provided by the invention can effectively extract the semantic features of the summarized question and the candidate information, thereby better calculating the semantic correlation between the question and the candidate information and obtaining better natural language question-answering effect.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is intended to include such modifications and variations.

Claims (6)

1. A natural language question-answering method based on a deep convolutional neural network is characterized by comprising the following steps:
1) expressing the information in the natural language question and the database information set into vectors with a sequence structure, and forming a vector matrix;
2) processing the vector matrix by adopting a deep convolutional neural network, and extracting corresponding deep semantic features;
3) calculating the semantic correlation degree of the natural language problem and the information in the database information set according to the deep semantic features;
4) selecting information in the database information set according to the calculated semantic relevance to generate an answer of the natural language question; wherein, the step 2) uses a multilayer convolution neural network to generate a vector matrix M according to the step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME
Figure FDA0002944640850000011
fE=pooling(MN) Wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;
Figure FDA0002944640850000012
for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijRepresents a first order width of 1 and the number of filters is sigmajkijWith an activation function, at ∑jkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, a series of inputted tensors are connected on the last component, and j below the concat represents connection of a dimension j; posing is a pooling function, and input matrixes are formed into a vector by calculating the maximum value, the minimum value, the average value and the median of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure FDA0002944640850000013
represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME
Figure FDA0002944640850000014
fE=flatten(MN) Wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix;
Figure FDA0002944640850000015
for adjusting MiAnd Mi+1The width of the difference of the characteristic numbers possibly existing between the two is 1, and the number of the filters is sigmajkijAt sigma, in the convolutional layer calculation ofjkij=∑jki-1jCan be ignored as constant conversion; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure FDA0002944640850000016
represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME
Figure FDA0002944640850000021
fE=pooling(MN)
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; posing is a pooling function, and input matrixes are formed into a vector by calculating the maximum value, the minimum value, the average value and the median of each row of the matrix; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure FDA0002944640850000022
represents a width of m at one timeijThe number of filters is kijConvolution operations with activation functions;
or, step 2) using a multilayer convolutional neural network to generate the vector matrix M according to step 1)EGenerating deep semantic features fENamely, the following conditions are satisfied:
M0=ME
Figure FDA0002944640850000023
fE=flatten(MN)
wherein M isiThe characteristic matrix after the ith layer is taken as the characteristic matrix; concat is a connection function, and a series of tensors input are connected on the last component; flatten is a flattening function, and an input matrix is flattened into a vector; m isijIs the width, k, of the jth convolution kernel of the ith layerijFor the number of jth convolution kernels at the ith layer,
Figure FDA0002944640850000024
represents a width of m at one timeijThe number of filters is kijWith an activation function.
2. The method of claim 1, wherein the database information set of step 1) is an original database information set or a candidate information set with a reduced scope obtained by information screening.
3. The method of claim 1, wherein the deep semantic features f are combinedEAfter processing of a plurality of full connection layers, a new vector representation f 'of sentence i is obtained'EThen according to f'EAnd calculating the semantic relevance of the natural language problem and the information in the database information set.
4. The method of claim 1, wherein additional pooling functions are added between convolutional layer operations.
5. The method of claim 1, wherein one or more layers of recurrent neural networks are added to process the feature matrix before or after any convolutional layer operation.
6. A server, characterized in that the server comprises a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.
CN201710841026.2A 2017-09-18 2017-09-18 Natural language question-answering method based on deep convolutional neural network Active CN107832326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710841026.2A CN107832326B (en) 2017-09-18 2017-09-18 Natural language question-answering method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710841026.2A CN107832326B (en) 2017-09-18 2017-09-18 Natural language question-answering method based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN107832326A CN107832326A (en) 2018-03-23
CN107832326B true CN107832326B (en) 2021-06-08

Family

ID=61643392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710841026.2A Active CN107832326B (en) 2017-09-18 2017-09-18 Natural language question-answering method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN107832326B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563782B (en) * 2018-04-25 2023-04-18 平安科技(深圳)有限公司 Commodity information format processing method and device, computer equipment and storage medium
CN109271926B (en) * 2018-09-14 2021-09-10 西安电子科技大学 Intelligent radiation source identification method based on GRU deep convolutional network
CN109740126B (en) * 2019-01-04 2023-11-21 平安科技(深圳)有限公司 Text matching method and device, storage medium and computer equipment
CN111666482B (en) * 2019-03-06 2022-08-02 珠海格力电器股份有限公司 Query method and device, storage medium and processor
CN110348014B (en) * 2019-07-10 2023-03-24 电子科技大学 Semantic similarity calculation method based on deep learning
CN110516145B (en) * 2019-07-10 2020-05-01 中国人民解放军国防科技大学 Information searching method based on sentence vector coding
CN110990549B (en) * 2019-12-02 2023-04-28 腾讯科技(深圳)有限公司 Method, device, electronic equipment and storage medium for obtaining answer
CN112434152B (en) * 2020-12-01 2022-10-14 北京大学 Education choice question answering method and device based on multi-channel convolutional neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
CN107066464A (en) * 2016-01-13 2017-08-18 奥多比公司 Semantic Natural Language Vector Space

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9904976B2 (en) * 2015-01-16 2018-02-27 Nec Corporation High performance portable convulational neural network library on GP-GPUs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
CN107066464A (en) * 2016-01-13 2017-08-18 奥多比公司 Semantic Natural Language Vector Space

Also Published As

Publication number Publication date
CN107832326A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
CN107832326B (en) Natural language question-answering method based on deep convolutional neural network
CN111415740B (en) Method and device for processing inquiry information, storage medium and computer equipment
Liu et al. Probabilistic reasoning via deep learning: Neural association models
Nie et al. Data-driven answer selection in community QA systems
Tandon et al. Webchild: Harvesting and organizing commonsense knowledge from the web
CN109783817A (en) A kind of text semantic similarity calculation model based on deeply study
CN105183833B (en) Microblog text recommendation method and device based on user model
Cohen et al. End to end long short term memory networks for non-factoid question answering
CN111737426B (en) Method for training question-answering model, computer equipment and readable storage medium
CN116134432A (en) System and method for providing answers to queries
CN112115716A (en) Service discovery method, system and equipment based on multi-dimensional word vector context matching
CN109344246B (en) Electronic questionnaire generating method, computer readable storage medium and terminal device
Panda Developing an efficient text pre-processing method with sparse generative Naive Bayes for text mining
CN116992007B (en) Limiting question-answering system based on question intention understanding
Wu et al. ECNU at SemEval-2017 task 3: Using traditional and deep learning methods to address community question answering task
CN110321421A (en) Expert recommendation method and computer storage medium for website Knowledge Community system
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
Wan Sentiment analysis of Weibo comments based on deep neural network
CN106681986A (en) Multi-dimensional sentiment analysis system
Polignano et al. Identification Of Bot Accounts In Twitter Using 2D CNNs On User-generated Contents.
Zhao et al. Interactive attention networks for semantic text matching
CN111581364A (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
CN111581365B (en) Predicate extraction method
Liu et al. Attention based r&cnn medical question answering system in chinese

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant