CN108021555A - A kind of Question sentence parsing measure based on depth convolutional neural networks - Google Patents
A kind of Question sentence parsing measure based on depth convolutional neural networks Download PDFInfo
- Publication number
- CN108021555A CN108021555A CN201711162561.1A CN201711162561A CN108021555A CN 108021555 A CN108021555 A CN 108021555A CN 201711162561 A CN201711162561 A CN 201711162561A CN 108021555 A CN108021555 A CN 108021555A
- Authority
- CN
- China
- Prior art keywords
- question
- neural network
- convolutional neural
- sentence
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of Question sentence parsing measure based on depth convolutional neural networks, includes the following steps:S1, generate life corpus by ken related pages, crawls the Chinese character occurred in raw language material, generates the corresponding word vector of each Chinese character;S2, with corresponding word vector replace question sentence in each Chinese character, obtain corresponding to question sentence word vector set;Word vector set is calculated by convolutional neural networks obtains corresponding sentence justice vector;S3, question sentence carry out combination of two, and the cosine function absolute value by calculating sentence justice vector corresponding to two question sentences obtains similarity between two question sentences.This method is avoided due to influence of the cutting word mistake to subsequent analysis by the way of individual character analysis;Whole question sentence is extracted whole sentence feature by convolutional neural networks, avoids and isolates problem using sentence justice caused by word similarity matrix.
Description
Technical Field
The invention relates to a question similarity measuring method, in particular to a question similarity measuring method based on a deep convolutional neural network.
Background
The main functions of the financial self-service robot are business consultation, business handling, cash access, user guidance and the like. The business consultation function can be understood as a Chinese question-answering system aiming at the bank field, and the key technology is to carry out similarity calculation on the questions asked by the user and the questions in a bank question bank and return answers corresponding to the most similar questions. Because natural languages, especially spoken languages, have a variety of different expression modes for questions with the same meaning, how to calculate the similarity between questions according to the real semantics of the questions becomes a problem to be solved urgently.
The traditional question similarity calculation methods generally have two types: one is a keyword matching based approach and the other is a machine learning based approach. The method based on keyword matching mainly calculates the similarity between two sentences by comparing the information of the times, positions, sequences and the like of the same keywords in the two question sentences. The method is simple in calculation, but the processing effect is often poor for long sentences, especially synonyms of different expression modes. The machine learning method mainly analyzes a domain knowledge base to establish a model between question sentences and question sentence semantics to calculate the similarity between different question sentences. The method is complex in calculation, but the method can better process synonyms, so that the method gradually becomes the current mainstream.
In recent years, with the success of deep learning techniques in the fields of voice, images, and the like, they have also been introduced into the field of similarity calculation. As disclosed in the prior art, chinese patent No. CN106776545A, "a method for calculating similarity between short texts through a deep convolutional neural network" is a typical process, which includes first segmenting words in a question, then converting each word into a word vector, and finally inputting a similarity matrix formed by all word vectors in two questions into the convolutional neural network to calculate the similarity.
The method mainly has the following problems:
first, chinese word segmentation cannot be completely accurate, and the accuracy rate is closely related to a specific field. For example, in the field of banks, because of more professional terms, the word segmentation accuracy is generally lower, and the lower accuracy can affect subsequent calculation.
Second, such methods often use a similarity matrix between word vectors as a measure of question similarity, which splits the similarity between questions into similarities between words, destroying the overall semantics of the question.
Disclosure of Invention
In view of the defects in the prior art, the invention aims to provide a question similarity measurement method based on a deep convolutional neural network, which is used for calculating the similarity between questions according to the implicit semantics between the questions.
The purpose of the invention is realized by the following technical scheme: a question similarity measurement method based on a deep convolutional neural network comprises the following steps:
s1, generating a raw corpus through related pages in the knowledge field, crawling Chinese characters appearing in the raw corpus, and generating a corresponding character vector of each Chinese character;
s2, replacing each Chinese character in the question with the corresponding character vector to obtain a character vector set corresponding to the question; the word vector set obtains corresponding sentence meaning vectors through the calculation of a convolution neural network;
and S3, combining the question sentences in pairs, and calculating the cosine function absolute values of the sentence meaning vectors corresponding to the two question sentences to obtain the similarity between the two question sentences.
The technical scheme of the invention is further defined as follows: the method for generating the raw corpus through the knowledge domain related pages in the step S1 comprises the following steps:
s11, compiling a web crawler by using a python language, and crawling knowledge-related webpages;
s12, preprocessing the webpage, removing webpage marks, invalid characters, mathematical formulas, pictures and tables, combining all the webpages, and generating an original raw corpus;
and S13, segmenting the original raw corpus according to punctuations, segmenting each sentence into a plurality of clauses, wherein each clause occupies one line, and combining all the clauses to generate a final raw corpus.
As a further improvement of the present invention, in step 1, a word vector is generated by adopting a skip-gram algorithm of a word2vec tool, the window size of the skip-gram algorithm is set to 2, 3500 common words and a UNK are set in the algorithm, and the UNK is used for replacing uncommon words except the 3500 common words.
As a further improvement of the present invention, further, the convolutional neural network of step S2 includes a convolutional layer and a pooling layer, the convolutional layer adopts a convolutional kernel size of 2 × 200,2 indicates that only the association between 2 single words is considered, 200 is the dimension of the word vector, and the number of convolutional kernels in the convolutional layer is 100-200; the pooling layer employs 1-max pooling, i.e., taking a maximum for each dimension of the features after convolution.
As a further improvement of the present invention, further, the method for calculating and acquiring corresponding sentence meaning vectors by the convolutional neural network in step 2 comprises:
1) The sentence S contains n Chinese characters, each Chinese character corresponds to a d-dimensional character vector v i Then the sentence after replacement is represented as S' = { v = { 1 ,v 2 ,…,v n };
2) Inputting S' into convolution layer of convolution neural network to calculate to obtain result after convolutionThe calculation formula is as follows:
wherein, c i =[v i ,v i+1 ](0<i&N) being a vector formed by combining two adjacent word vectors, W k K-th convolution kernel moment for convolutional neural networkArray, b k A deviation vector corresponding to the kth convolution kernel;
3) The result after convolutionInputting the result into a pooling layer for calculation to obtain a result p after pooling k; The pooling adopts 1-max pooling, and the calculation formula is as follows:
wherein max is a maximum function representing all inputs of the previous layerThe maximum value of (a);
4) Repeatedly executing the steps 2) and 3); the repetition times are 1-3 times;
5) Output p of the corresponding pooling layer of the last repetition k I.e. the sentence meaning vector of sentence S.
As a further improvement of the present invention, in step S3, the formula for calculating the cosine function absolute values of the sentence meaning vectors corresponding to the two question sentences is as follows:
wherein, x and y respectively represent semantic vectors corresponding to the question 1 and the question 2, and the numeric area of sim (x, y) is [0,1].
As a further improvement of the invention, a training method for calculating and acquiring the sentence meaning vector by the convolutional neural network comprises the following steps:
1) Clustering the questions with the same answers according to answers of the questions, combining the questions in the same cluster in pairs to generate positive example samples, combining the questions in different clusters in pairs to generate negative example samples, and combining all the positive example samples and the negative example samples to generate a training set;
2) Configuring a convolutional neural network based on a tensoflow frame, wherein the maximum training time of the convolutional neural network is 1000, a loss function adopts L2 normalized mean square error, the batch size is 400, the characteristic number of network convolutional kernels is 200, the convolutional kernel size is 2 x 200, and a network pooling layer adopts 1-max pooling;
3) Taking a sample in the training set, replacing the sample with a corresponding word vector sample, and calculating a sentence meaning vector corresponding to the word vector sample through the whole neural network;
4) Calculating cosine function absolute values among sample sentence meaning vectors to obtain sample similarity, and adjusting the weight of the neural network according to the error of the similarity with the original sample;
5) And repeating the steps 2) to 4) until the maximum training times are met.
The invention has the outstanding effects that:
(1) And a single character analysis mode is adopted, so that the influence of word segmentation errors on subsequent analysis is avoided.
(2) The convolution neural network takes the whole question as a whole to extract the characteristics of the whole sentence, thereby avoiding the sentence meaning split problem caused by using a word similarity matrix.
(3) And adding an absolute value function on the basis of the original cosine similarity formula to ensure that the value range of the function is [0,1]. The problem that the value ranges of the sigmoid function (a common excitation function of the neural network) and the cosine function are inconsistent is solved, and the situation that the similarity is negative is also avoided.
(4) And generating data required by deep learning model training, and establishing the association between the question and the corresponding semantics.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a convolutional neural network structure according to the present invention;
FIG. 3 is a flowchart of a similarity analysis method according to an embodiment of the present invention;
Detailed Description
Example one
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1-3, the question similarity measurement method based on the deep convolutional neural network of the present invention includes the following steps:
s1, generating a raw corpus through related pages in the knowledge field, crawling Chinese characters appearing in the raw corpus, and generating a corresponding character vector of each Chinese character;
s2, replacing each Chinese character in the question with the corresponding character vector to obtain a character vector set corresponding to the question; the word vector set obtains corresponding sentence meaning vectors through the calculation of a convolution neural network;
and S3, combining the question sentences in pairs, and calculating the cosine function absolute values of the sentence meaning vectors corresponding to the two question sentences to obtain the similarity between the two question sentences.
The rules, modes, etc. of operation in the above steps S1 to S3 will be described in detail below,
the embodiment is a similarity detection process of financial field question based on the method of the invention
And S1, generating a financial field corpus. The specific implementation steps are as follows:
step 1, compiling a web crawler by using a python language, and crawling financial related webpages, wherein the website range crawled by the embodiment comprises various large bank websites, financial blocks of various large portal websites, professional financial websites and the like.
And step 2, preprocessing the web pages, removing web page marks, invalid characters, mathematical formulas, pictures, tables and the like, combining all the web pages, and generating the original raw corpus.
And 3, further processing the raw corpus, segmenting the original raw corpus according to punctuations, segmenting each sentence into single clauses, wherein each clause occupies one line, and combining all the clauses to generate a final financial corpus.
And generating a financial field word vector. The specific implementation steps are as follows:
step 1, configuring a word2vec program based on the tensoflow framework. The specific configuration is as follows, the algorithm adopts a skip-gram algorithm, the loss function adopts nce _ loss, the sliding window is 2, the feature size of the word vector is 200, the training times are 3000000, the mini-batch size is 128, the anti-sampling times are 10, and the dictionary size is 3501. The specific implementation process can be adjusted according to the situation.
And 2, utilizing the program to learn the financial corpus generated in the step 1, and generating a corresponding word vector aiming at each word in the dictionary.
Step S2, the calculation method for obtaining the corresponding sentence meaning vector by the convolution neural network calculation comprises the following steps:
step 1, the sentence S contains n Chinese characters, each Chinese character corresponds to a d-dimension character vector v i Then the sentence after replacement is represented as S' = { v = { 1 ,v 2 ,…,v n };
Step 2, inputting S' into convolution layer of convolution neural network to calculate to obtain result after convolutionThe calculation formula is as follows:
wherein, c i =[v i ,v i+1 ](0<i&N) being a vector formed by combining two adjacent word vectors, W k The kth convolution kernel matrix of the convolutional neural network, b k A deviation vector corresponding to the kth convolution kernel;
step 3, convolving the resultInputting the result into a pooling layer for calculation to obtain a result p after pooling k; Using in a pond1-max pooling, which is calculated by the formula:
where max is a function of the maximum value, indicating that all inputs of the previous layer are takenThe maximum value of (a);
step 4, repeatedly executing the steps 2) and 3); the repetition times are 1-3 times;
step 5, the output p of the corresponding pooling layer is repeated for the last time k I.e. the sentence meaning vector of sentence S.
In step S2, the formula for calculating the cosine function absolute values of the sentence meaning vectors corresponding to the two question sentences is:
wherein, x and y respectively represent semantic vectors corresponding to the question 1 and the question 2, and the numeric area of sim (x, y) is [0,1].
Step S2, the training step of generating sentence meaning vector by the convolution neural network is as follows:
step 1, clustering the questions with the same answers according to answers of the questions, combining every two questions in the same cluster to generate positive example samples, combining every two questions in different clusters to generate negative example samples, and combining all the positive example samples and the negative example samples to generate a training set.
And step 2, configuring a convolutional neural network program based on the tenserflow framework. The concrete configuration is as follows: the maximum training time of the convolutional neural network is 1000, the loss function adopts L2 normalized Mean Square Error (MSE), the batch size is 400, the feature number of the network convolution kernel is 200, the convolution kernel size is 2 x 200, and the network pooling layer adopts 1-max pooling.
And 3, using the training sample set T generated in the first step.
Step 4, randomly taking a training sample in T, sample = (Sen) 1 ,Sen 2 P) by replacing it with a set of two word vectors S vec1 ,S vec2 . The specific method comprises the following steps: question-question Sen 1 The ith Chinese character C i Using step S 2 The generated word vectors are replaced. For example, the question "I want to transfer", wherein Chinese characters are "I", "want", "transfer" and "Account", respectively, and assuming that vectors corresponding to "I" are {0.5,0.7,0.6}, vectors corresponding to "want", "transfer" and "Account" are {0.1,0.2,0.5}, {0.2,0.3,0.7}, and {0.9,0.2.0.7}, respectively, the whole sentence is represented as a set of combinations of all vectors, i.e., { {0.5,0.7,0.6}, {0.1,0.2,0.5}, {0.2,0.3,0.7}, and {0.9,0.2.0.7} }.
Step 5, adding S vec1 、S vec2 Inputting into the convolutional neural network for calculation to obtain sentence meaning vector S rep1 、S rep2 。
And 6, using a formula of cosine function absolute value:calculating S rep1 、S rep2 And then adjusting the weight of the neural network according to the error of the similarity with the original sample.
And 7, repeating the steps 2 to 6 until a termination condition (such as reaching the maximum training times or specifying an error) is met. The present embodiment employs the maximum number of training times as a termination condition.
In addition to the above steps, the embodiment may use the trained convolutional neural network to perform question similarity measurement. The method comprises the following specific steps:
step 1, loading a trained convolutional neural network model
And step 2, loading a bank question-answer library. For each question sentence S in the question-answer library req_i Converting it into sentence meaning vector S in step S3 rep_i
Step 3, receiving a question S of the user request Step S3 is utilized to convert the sentence meaning vector S into a sentence meaning vector S rep_request 。
Step 4, calculating S in sequence by using improved cosine function rep_request And S rep_i Similarity sim of i Then taking the maximum similarity sim max Finally, sim max The answer to the corresponding question is returned as the final answer.
The embodiment adopts a single word analysis mode, and avoids the influence on subsequent analysis due to word segmentation errors. The convolution neural network takes the whole question sentence as a whole to extract the characteristics of the whole sentence, thereby avoiding the sentence meaning splitting problem brought by using the word similarity matrix. And adding an absolute value function on the basis of the original cosine similarity formula to ensure that the value range of the function is [0,1]. The problem that the value ranges of the sigmoid function (a common excitation function of the neural network) and the cosine function are inconsistent is solved, and the situation that the similarity is negative is also avoided. And generating data required by deep learning model training, and establishing the association between the question and the corresponding semantics. In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the present invention.
Claims (7)
1. A question similarity measurement method based on a deep convolutional neural network is characterized by comprising the following steps:
s1, generating a raw corpus through related pages in the knowledge field, crawling Chinese characters appearing in the raw corpus, and generating a corresponding character vector of each Chinese character;
s2, replacing each Chinese character in the question with the corresponding character vector to obtain a character vector set corresponding to the question; the word vector set obtains corresponding sentence meaning vectors through the calculation of a convolutional neural network;
and S3, combining the question sentences in pairs, and calculating the cosine function absolute value of the sentence meaning vector corresponding to the two question sentences to obtain the similarity between the two question sentences.
2. The method for measuring question similarity based on the deep convolutional neural network according to claim 1, wherein the method for generating the corpus by the knowledge domain related pages in the step S1 is as follows:
s11, compiling a web crawler by using a python language, and crawling knowledge-related webpages;
s12, preprocessing the webpage, removing webpage marks, invalid characters, mathematical formulas, pictures and tables, combining all the webpages, and generating an original raw corpus;
and S13, segmenting the original raw corpus according to punctuations, segmenting each sentence into a plurality of clauses, wherein each clause occupies one line, and combining all the clauses to generate a final raw corpus.
3. The question similarity measurement method based on the deep convolutional neural network as claimed in claim 1, wherein a word vector is generated in step 1 by using a skip-gram algorithm of a word2vec tool, the window size of the skip-gram algorithm is set to 2, 3500 common words and an UNK are set in the algorithm, and the UNK is used for replacing uncommon words except 3500 common words.
4. The question similarity measurement method based on the deep convolutional neural network as claimed in claim 1, wherein the convolutional neural network of step S2 includes a convolutional layer and a pooling layer, the convolutional layer uses a convolutional kernel size of 2 x 200,2 indicates that only 2 single words of correlation are considered, 200 is a dimension of a word vector, and the number of convolutional kernels in the convolutional layer is 100-200; the pooling layer employs 1-max pooling, i.e., taking a maximum for each dimension of the features after convolution.
5. The question similarity measurement method based on the deep convolutional neural network as claimed in claim 1, wherein the calculation method for obtaining the corresponding sentence meaning vector by the convolutional neural network in step 2 is as follows:
1) The sentence S contains n Chinese characters, each of which corresponds to a d-dimensional character vector v i Then the sentence after replacement is represented as S' = { v = { 1 ,v 2 ,…,v n };
2) Inputting S' into convolution spiritCalculating by the convolution layer of the network to obtain the result after convolutionThe calculation formula is as follows:
wherein, c i =[v i ,v i+1 ](0<i&N) is a vector formed by combining two adjacent word vectors, W k The kth convolution kernel matrix of the convolutional neural network, b k A deviation vector corresponding to the kth convolution kernel;
3) The result after convolutionInputting the result into a pooling layer for calculation to obtain a result p after pooling k; The pooling adopts 1-max pooling, and the calculation formula is as follows:
where max is a function of the maximum value, indicating that all inputs of the previous layer are takenMaximum value of (d);
4) Repeatedly executing the steps 2) and 3); the repetition times are 1-3 times;
5) Output p of the corresponding pooling layer of the last repetition k I.e. the sentence meaning vector of sentence S.
6. The question similarity measurement method based on the deep convolutional neural network according to claim 1, wherein the formula for calculating the cosine function absolute values of the question meaning vectors corresponding to the two question sentences in step S3 is as follows:
wherein, x and y respectively represent semantic vectors corresponding to the question 1 and the question 2, and the numeric area of sim (x, y) is [0,1].
7. The question similarity measurement method based on the deep convolutional neural network as claimed in claim 1, wherein the training method for the convolutional neural network to calculate and obtain sentence meaning vector is as follows:
1) Clustering the questions with the same answers according to the answers of the questions, combining the questions in the same cluster in pairs to generate positive example samples, combining the questions in different clusters in pairs to generate negative example samples, and combining all the positive example samples and the negative example samples to generate a training set;
2) Configuring a convolutional neural network based on a tensoflow frame, wherein the maximum training frequency of the convolutional neural network is 1000, a loss function adopts L2 normalized mean square error, the batch size is 400, the characteristic number of network convolutional kernels is 200, the convolutional kernel size is 2 x 200, and a network pooling layer adopts 1-max pooling;
3) Taking a sample in the training set, replacing the sample with a corresponding word vector sample, and calculating a sentence meaning vector corresponding to the word vector sample through the whole neural network;
4) Calculating cosine function absolute values among sample sentence meaning vectors to obtain sample similarity, and adjusting the weight of the neural network according to the error of the similarity with the original sample;
5) And repeating the steps 2) to 4) until the maximum training times are met.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711162561.1A CN108021555A (en) | 2017-11-21 | 2017-11-21 | A kind of Question sentence parsing measure based on depth convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711162561.1A CN108021555A (en) | 2017-11-21 | 2017-11-21 | A kind of Question sentence parsing measure based on depth convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108021555A true CN108021555A (en) | 2018-05-11 |
Family
ID=62080014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711162561.1A Pending CN108021555A (en) | 2017-11-21 | 2017-11-21 | A kind of Question sentence parsing measure based on depth convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108021555A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984694A (en) * | 2018-07-04 | 2018-12-11 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of webpage, storage medium, electronic device |
CN109062892A (en) * | 2018-07-10 | 2018-12-21 | 东北大学 | A kind of Chinese sentence similarity calculating method based on Word2Vec |
CN109102809A (en) * | 2018-06-22 | 2018-12-28 | 北京光年无限科技有限公司 | A kind of dialogue method and system for intelligent robot |
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN109145290A (en) * | 2018-07-25 | 2019-01-04 | 东北大学 | Based on word vector with from the semantic similarity calculation method of attention mechanism |
CN109241249A (en) * | 2018-07-16 | 2019-01-18 | 阿里巴巴集团控股有限公司 | A kind of method and device of determining bursting problem |
CN109543179A (en) * | 2018-11-05 | 2019-03-29 | 北京康夫子科技有限公司 | The normalized method and system of colloquial style symptom |
CN109918491A (en) * | 2019-03-12 | 2019-06-21 | 焦点科技股份有限公司 | A kind of intelligent customer service question matching method of knowledge based library self study |
CN110032635A (en) * | 2019-04-22 | 2019-07-19 | 齐鲁工业大学 | One kind being based on the problem of depth characteristic fused neural network to matching process and device |
CN110309503A (en) * | 2019-05-21 | 2019-10-08 | 昆明理工大学 | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN |
CN110348024A (en) * | 2019-07-23 | 2019-10-18 | 天津汇智星源信息技术有限公司 | Intelligent identifying system based on legal knowledge map |
CN110969005A (en) * | 2018-09-29 | 2020-04-07 | 航天信息股份有限公司 | Method and device for determining similarity between entity corpora |
CN111666482A (en) * | 2019-03-06 | 2020-09-15 | 珠海格力电器股份有限公司 | Query method and device, storage medium and processor |
CN111669410A (en) * | 2020-07-24 | 2020-09-15 | 中国航空油料集团有限公司 | Industrial control network negative sample data generation method, device, server and medium |
CN111753081A (en) * | 2019-03-28 | 2020-10-09 | 百度(美国)有限责任公司 | Text classification system and method based on deep SKIP-GRAM network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
CN106776545A (en) * | 2016-11-29 | 2017-05-31 | 西安交通大学 | A kind of method that Similarity Measure between short text is carried out by depth convolutional neural networks |
CN106815311A (en) * | 2016-12-21 | 2017-06-09 | 杭州朗和科技有限公司 | A kind of problem matching process and device |
CN106844741A (en) * | 2017-02-13 | 2017-06-13 | 哈尔滨工业大学 | A kind of answer method towards specific area |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
-
2017
- 2017-11-21 CN CN201711162561.1A patent/CN108021555A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
CN106776545A (en) * | 2016-11-29 | 2017-05-31 | 西安交通大学 | A kind of method that Similarity Measure between short text is carried out by depth convolutional neural networks |
CN106815311A (en) * | 2016-12-21 | 2017-06-09 | 杭州朗和科技有限公司 | A kind of problem matching process and device |
CN106844741A (en) * | 2017-02-13 | 2017-06-13 | 哈尔滨工业大学 | A kind of answer method towards specific area |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102809A (en) * | 2018-06-22 | 2018-12-28 | 北京光年无限科技有限公司 | A kind of dialogue method and system for intelligent robot |
CN108984694A (en) * | 2018-07-04 | 2018-12-11 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of webpage, storage medium, electronic device |
CN109062892A (en) * | 2018-07-10 | 2018-12-21 | 东北大学 | A kind of Chinese sentence similarity calculating method based on Word2Vec |
CN109241249A (en) * | 2018-07-16 | 2019-01-18 | 阿里巴巴集团控股有限公司 | A kind of method and device of determining bursting problem |
CN109241249B (en) * | 2018-07-16 | 2021-09-14 | 创新先进技术有限公司 | Method and device for determining burst problem |
CN109145290A (en) * | 2018-07-25 | 2019-01-04 | 东北大学 | Based on word vector with from the semantic similarity calculation method of attention mechanism |
CN109145290B (en) * | 2018-07-25 | 2020-07-07 | 东北大学 | Semantic similarity calculation method based on word vector and self-attention mechanism |
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN110969005A (en) * | 2018-09-29 | 2020-04-07 | 航天信息股份有限公司 | Method and device for determining similarity between entity corpora |
CN110969005B (en) * | 2018-09-29 | 2023-10-31 | 航天信息股份有限公司 | Method and device for determining similarity between entity corpora |
CN109543179A (en) * | 2018-11-05 | 2019-03-29 | 北京康夫子科技有限公司 | The normalized method and system of colloquial style symptom |
CN111666482B (en) * | 2019-03-06 | 2022-08-02 | 珠海格力电器股份有限公司 | Query method and device, storage medium and processor |
CN111666482A (en) * | 2019-03-06 | 2020-09-15 | 珠海格力电器股份有限公司 | Query method and device, storage medium and processor |
CN109918491B (en) * | 2019-03-12 | 2022-07-29 | 焦点科技股份有限公司 | Intelligent customer service question matching method based on knowledge base self-learning |
CN109918491A (en) * | 2019-03-12 | 2019-06-21 | 焦点科技股份有限公司 | A kind of intelligent customer service question matching method of knowledge based library self study |
CN111753081A (en) * | 2019-03-28 | 2020-10-09 | 百度(美国)有限责任公司 | Text classification system and method based on deep SKIP-GRAM network |
CN111753081B (en) * | 2019-03-28 | 2023-06-09 | 百度(美国)有限责任公司 | System and method for text classification based on deep SKIP-GRAM network |
CN110032635B (en) * | 2019-04-22 | 2023-01-20 | 齐鲁工业大学 | Problem pair matching method and device based on depth feature fusion neural network |
CN110032635A (en) * | 2019-04-22 | 2019-07-19 | 齐鲁工业大学 | One kind being based on the problem of depth characteristic fused neural network to matching process and device |
CN110309503A (en) * | 2019-05-21 | 2019-10-08 | 昆明理工大学 | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN |
CN110348024A (en) * | 2019-07-23 | 2019-10-18 | 天津汇智星源信息技术有限公司 | Intelligent identifying system based on legal knowledge map |
CN111669410A (en) * | 2020-07-24 | 2020-09-15 | 中国航空油料集团有限公司 | Industrial control network negative sample data generation method, device, server and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021555A (en) | A kind of Question sentence parsing measure based on depth convolutional neural networks | |
CN110442760B (en) | Synonym mining method and device for question-answer retrieval system | |
CN107562792B (en) | question-answer matching method based on deep learning | |
CN106997376B (en) | Question and answer sentence similarity calculation method based on multi-level features | |
CN106776545B (en) | Method for calculating similarity between short texts through deep convolutional neural network | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN111831789B (en) | Question-answering text matching method based on multi-layer semantic feature extraction structure | |
CN111563384B (en) | Evaluation object identification method and device for E-commerce products and storage medium | |
CN112035730B (en) | Semantic retrieval method and device and electronic equipment | |
CN111368049A (en) | Information acquisition method and device, electronic equipment and computer readable storage medium | |
CN104615767A (en) | Searching-ranking model training method and device and search processing method | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN112052319B (en) | Intelligent customer service method and system based on multi-feature fusion | |
CN111444704A (en) | Network security keyword extraction method based on deep neural network | |
CN113486645A (en) | Text similarity detection method based on deep learning | |
Rahman et al. | NLP-based automatic answer script evaluation | |
CN110334204B (en) | Exercise similarity calculation recommendation method based on user records | |
CN112434533A (en) | Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium | |
Mahmoodvand et al. | Semi-supervised approach for Persian word sense disambiguation | |
CN114169447B (en) | Event detection method based on self-attention convolution bidirectional gating cyclic unit network | |
CN116757188A (en) | Cross-language information retrieval training method based on alignment query entity pairs | |
CN110287396A (en) | Text matching technique and device | |
CN111767388B (en) | Candidate pool generation method | |
CN111159405B (en) | Irony detection method based on background knowledge | |
CN110413956B (en) | Text similarity calculation method based on bootstrapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180511 |