CN112507097A - Method for improving generalization capability of question-answering system - Google Patents
Method for improving generalization capability of question-answering system Download PDFInfo
- Publication number
- CN112507097A CN112507097A CN202011494614.1A CN202011494614A CN112507097A CN 112507097 A CN112507097 A CN 112507097A CN 202011494614 A CN202011494614 A CN 202011494614A CN 112507097 A CN112507097 A CN 112507097A
- Authority
- CN
- China
- Prior art keywords
- similar
- words
- question
- standard
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Animal Behavior & Ethology (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method capable of automatically improving the generalization ability and the recall ability of a system, which improves the retrieval and sequencing in a question-answering system from two granularities of words and sentences by using an automatic generation technology of similar words and similar problems, improves the recall rate of the system, enhances the generalization ability of the system, avoids manual participation as far as possible and greatly improves the usability of the system.
Description
Technical Field
The invention relates to the field of natural language processing and machine learning, in particular to a method for improving the generalization ability of question answering.
Background
From the perspective of user experience, the existing service consulting question-answering system in the industry at present mainly has a class 2 question-answering mode: one is question-and-answer type, that is, each valid question of the user is given a definite reply or answer; the other type is a search type, and a similar question list is returned for the user question.
The two question-answer models both depend on a question-answer library, namely a question-answer set, when a system receives a user question, related question lists are required to be searched from the question-answer library and then are ranked, the search type question-answer system directly returns a plurality of related question lists, and the question-answer type system is additionally provided with a judgment mechanism on the basis to judge whether the user question has an accurate answer.
Therefore, no matter which business consulting question-answering system, the knowledge in the library needs to be searched and sequenced according to the questions of the user. The search is the first step in all question-answering systems, and the ranking algorithm is ranking of search results. To some extent, the accuracy of the retrieval system directly determines the accuracy of the entire question-answering system.
Given a problem set and user problems, how to screen out related problems, considering timeliness, reverse indexes are mostly applied at present, related problem lists are quickly screened out by constructing indexes of words and problems, and then sorting and returning are carried out through a sorting algorithm.
However, this method can only build an index in the existing knowledge (question-answer library), or can only build an inverted index for the existing participles in the question-answer library, if the user's question contains the participles that are not included in the question-answer library, the inverted index is not retrieved, that is, the problem of generalization of various spoken languages cannot be solved by using the inverted index only. It is clear that the inverted index itself has no generalization capability. For example, suppose that there are questions in the question-answer library that "water fee cannot be paid" and the user questions that "water fee is what cannot be paid", the system cannot correspond "cannot-cannot", "cause-what" and "payment-payment".
The traditional method is based on manual methods such as rules and templates to construct similar word lists and similar problem lists (such as patents CN201810768888.1 and CN 201911081549.7), and the method is time-consuming, labor-consuming and difficult to maintain.
Disclosure of Invention
Aiming at the defects of the prior art, the method for improving the generalization capability of the question-answering system improves the retrieval and sequencing in the question-answering system by two ways of retrieving similar words through word vectors and generating similar problems.
In order to solve the technical problem, the technical scheme adopted by the invention is as follows: obtaining a similar word list through a word embedding matrix; obtaining a similar problem list through similar problem generation; obtaining the similarity between the standard deviation and similar words and between the standard problem and the similar problem through the word vector and the sentence vector; the retrieval and sorting effects of the system are improved through the results. The scheme comprises the following steps:
(1) a word vector is trained. Using the open-source word vectors directly or training on their own, which is determined by how much industry data is available. If the user trains the user, firstly, extracting an industry keyword through tf-idf technology according to a dialogue corpus given by an industry user; according to the industry keywords, a large amount of industry weak related knowledge is crawled from Baidu consultation, Baidu knowledge and Baidu encyclopedia (since the keywords are automatically extracted and may not be related to specific industries, the knowledge crawled according to the keywords is not all related to industries, such as 'handling', the financial industry and the tax industry have the service type), and word2vec technology training words are used for embedding into the matrix.
(2) And extracting similar word lists. According to the word embedding matrix, a high-dimensional vector similarity fast indexing technology (the current mature technology is composed of a kd tree, Annoy, Faiss and the like) is used for constructing a word vector index, so that similar words of the industry keywords can be extracted fast, and then cosine similarity is used for calculating and storing the similarity between words.
(3) And obtaining similar questions to generate a training corpus. From the first step, a large number of question-answer pairs are obtained from hundred degrees knowledge through the industry keywords (about a million of question-answer pairs can be crawled by using 1000 keywords, but the proxy ip technology is needed), then vector representations of all problems are obtained by using a pre-training model I (the roberta-large effect is better, other pre-training models can be used), and as in the second step, the problem clustering is carried out by using a high-dimensional vector indexing technology, and a large number of similar problem pairs are sequentially constructed.
(4) Training a similar problem generation model. Training is performed using the bert-based improved pre-training generative model IIunilm (other pre-training generative models, such as mass, ernie-gen, etc., may also be used). The training samples are similar problem pairs, and the training target is text generation.
(5) A similar problem table is obtained. According to an industry question-answer library given by a client, a pre-training model is used for generating a similar question list of the industry question-answer library by taking questions in the library as standard questions. And calculating and storing the similarity between sentences by using the cosine similarity.
(6) And optimizing the inverted index. When constructing the inverted index, the similar problems and the similar word list obtained in the previous step are merged, so that the recall rate of the system is improved, and the similarity between the standard deviation and the similar words and the similarity between the standard problems and the similar problems are considered in the ordering process. Therefore, the generalization capability of the whole system can be greatly improved.
The invention has the beneficial effects that: the invention provides a method capable of automatically improving the generalization ability and the recall ability of a system, which improves the retrieval and sequencing in a question-answering system from two granularities of words and sentences by using an automatic generation technology of similar words and similar problems, improves the recall rate of the system, enhances the generalization ability of the system, avoids manual participation as far as possible and greatly improves the usability of the system.
Drawings
FIG. 1 is an architecture diagram of a prior art search question and answer system;
FIG. 2 is a flow chart of similar vocabulary acquisition;
FIG. 3 is a flow chart of similar problem table acquisition;
fig. 4 is a flowchart of the search question-answering system according to embodiment 1.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1
The embodiment is based on an intelligent question-answering/retrieval system, provides a method for improving the generalization ability of the system, optimizes the inverted index and sequencing algorithm from two aspects of similar word generation and similar problem generation, and effectively improves the generalization ability of the whole question-answering system. The method provides functions to the outside in a service mode, and can also be used on any equipment carrying an intelligent conversation system, such as WeChat public numbers, intelligent robots, virtual robots and the like.
The architecture of the existing dialogue/search question-answering system is shown in fig. 1, which mainly consists of a retrieval and ordering 2 part. In the case of an industry client giving a question-answer pair/knowledge base, in general, we need to construct an inverted index for searching a user Query, to return a question list related to the user Query, then sort the question list of the user, and return the sorted result to the user/foreground for display. But this method does not have semantic generalization capability. Therefore, the embodiment is based on the deep learning technology, and mainly optimizes the retrieval (index construction) and the ordering in the system so as to improve the generalization capability of the whole system.
The method comprises the following steps of firstly obtaining a similar word list through a word embedding matrix, as shown in figure 2:
1) and extracting the industry keywords. Using industry knowledge provided by a client to count word frequency; using multi-industry linguistic data to count the word frequency of the inverse document; and (5) performing keyword weight sequencing by using tf-idf, and performing industry keyword extraction. Note that the inverse document word frequency relates to the more industry corpora, the better.
2) A word vector matrix is trained. And (4) using the keywords extracted in the last step as seeds to crawl related industry knowledge from search platforms such as Baidu knowledge, Baidu consultation and Baidu encyclopedia. The hundredth degree is the largest search engine in China, and the knowledge amount coverage range is wide enough. Of course, knowledge extracted by keywords is not all industry knowledge, which does not affect training of word vectors. Word vector training is performed using word2vec techniques, but if statistical industry knowledge is within 5G, it is recommended to use open-source trained word vectors (e.g., Tencent open-source word vectors, etc.).
3) And acquiring a similar word list of the industry keywords. According to the word vector matrix, a high-dimensional vector index technology (such as kd tree, Annoy, Faiss and the like) is used for constructing a vector index, an industry keyword table is traversed in sequence, and the top ten most similar words are extracted. Note that since most similar words contain keywords when extracting similar words, taking Tencent word vectors as an example, similar words such as "exempt" are:
exempt from exemption preferential exemption policy exemption direct exemption payment exemption partial cost exemption
To do this, we extract 100 similar words at a time and remove the words containing the standard words, with the following results:
tax free preferential tax rate and tax free preferential tax free tax
It can be seen that the semantics of similar words are basically all related to "exemption".
In addition, in order to avoid the appearance of some words with a large difference from the standard word in the similar words, we need to store the similarity between the labeled word and the similar words:
exempt-0.7265 exempt-0.7184 tax coupon-0.6753 … …
The similarity between the similar word and the standard word can be understood as the contribution degree of the similar word to the standard word. If the user Query comprises the standard word, the contribution degree of the user Query to the standard word is 1; if the user Query includes the similar words, the contribution degree of the standard words is the similarity value of the standard words. Thus, a similar vocabulary of the industry keywords can be constructed.
As shown in fig. 3, the similar problem table acquisition system diagram, the method uses a text generation technology to generate the similar problem based on the pre-training model, and includes the following specific steps:
1) industry-related problem crawling. Most of user Query is more spoken, and questions in the question-and-answer library are more written, so that a large number of spoken industry-related questions need to be collected. The Baidu knowledge is the largest Chinese question-answer community, and most of the problems in the Chinese question-answer community are spoken, so that a large number of industry-related problems are crawled from the Baidu knowledge. And traversing the industry keywords (obtained in the previous step) in sequence, and crawling the related problems of each keyword (taking the similar problems of the tax industry as an example, using about 2000 keywords, namely crawling 200+ ten thousand related problems).
2) Similar problem pairs are generated. Firstly, a pre-training model I is used for extracting a sentence vector, wherein the sentence vector can be characterized by the output of the first token (cls), and the average of the sum of all token vectors can also be used. (the invention proposes to use the roberta-large model to extract the sentence vectors by comparing the prior advanced pre-training models such as bert, roberta, xlnet, albert and the like through experiments, and the semantic effect is relatively good). The sentence vector index is constructed by using a high-dimensional vector index (such as a kd tree, an annoy index and the like), all industry-related problems are sequentially traversed, and the first k most similar problems are extracted from the sentence vector index (k can be self-defined according to the size of an industry problem set, if the k is too large, the similar problems are biased, and if the k is too small, the training set is generally reduced, and all the problem sets are 200+ ten thousand, and k =4 is suggested).
FIG. 4 shows similar problems using sentence vector index extraction, where the first of the similar _ queries is always consistent with the standard problem, and other similar problems have substantially some semantic relevance to the standard problem.
The training set is composed of similar problem pairs and non-similar problem pairs in proportion, and the non-similar problem pairs can be randomly selected. Examples are as follows:
tax-free invoice for tax-free agricultural products 1
Data 0 of tax free invoice for value-added tax
1 represents a similar problem pair and 0 represents a non-similar problem pair.
3) Training of the similarity problem generation model. At the present stage, a plurality of text generation models (such as mass, ernie-gen, unilm and the like, which can be used) based on a transform for pre-training are provided, and the unilm is adopted as a pre-training model II in the invention to train the similar problem generation model. Inputting a question pair and a label by a model; the loss function is composed of 2 parts, one is the loss generated by the generation of the similarity problem, and the other is the loss generated by the classification task;
4) a similar problem table is obtained. Our ultimate goal is to generate similar questions against the standard questions in the question bank, so traversing the standard questions in the question bank generates a set of similar questions using the pre-trained model.
The invention adopts a topk coding strategy. The set model generates n similarity problems (n can be set to be larger, n =100 in the invention) at one time, the similarity problems are generated word by word, and each time one word is generated, random sampling is carried out from the previous Topk most possible words (taking the invention as an example, Topk is set to be 5, namely, the sampling result is taken as the next word of 100 similarity problems according to the probability of random sampling 100 times in the previous 5 most possible options.
Since the generated similarity problem is too close to the standard problem, all the generated similarity problems are filtered, namely, if the similarity problem does not produce a new word, only meaningless sentences such as punctuations, stop words and the like are transformed (added, deleted, checked and changed), and then the filtering is carried out. In addition, the problem of similar repeated appearance (such as only changing punctuation) in the similarity problem also needs to be removed.
Also in the previous example "how tax free invoices for agricultural products", the list of valid similar questions it generates is illustrated as follows:
how to make tax-free invoice for agricultural products
Tax free invoice for making agricultural products
How to issue tax-free invoice for agricultural products
How to fill in tax-free invoices for agricultural products
How to issue special invoice for agricultural product free of tax
How to do tax-free invoice for agricultural products
Tax-free value-added tax invoice for agricultural products
Invoices for agricultural products without tax
Tax free invoice for agricultural product import
How to issue zero tax rate invoice for agricultural products
How to issue value-added tax invoice for agricultural products
Since there is a deviation between the partial similarity problem and the standard problem, we use the similarity to evaluate the metric between the standard problem and the similar problem. The invention uses the pre-training model II to generate similar problem sentence vectors, and uses cosine to evaluate the similarity between the problems. The similarity between the similarity problem and the standard problem can be understood as the contribution degree of the similarity problem to the standard problem. If the user Query is matched with the standard problem, the contribution degree to the standard problem is 1; and if the user Query matches the similar question, the contribution degree of the user Query to the standard question is the similarity value of the user Query.
As shown in the improved search question-answering system retrieval system diagram of fig. 4, the retrieval and ordering system of the model is optimized through the similar word list and the similar question list obtained in the above steps.
1) And optimizing the inverted index. Given an industry question-and-answer set, the capacity of the inverted index can be greatly expanded by the similar words and the new words generated by the similar questions.
2) And optimizing a sorting algorithm. For the list of related problems generated by the inverted index, we can obtain the score of the related problem by using a scoring algorithm (any scoring algorithm can be used here, and the invention uses the tf-idf algorithm of the custom weight). Multiplying similar words in all scoresWeight of similar problemAnd after sorting, a final sorting result can be obtained. Since similar words and similar problems generated for different industries may have different quality, the suggestion multiplies a weight coefficient w on the score. The final scoring formula can be expressed as follows:
the method of the embodiment trains the industry word vectors by using a large amount of industry knowledge (open source word vectors can be used, the open source word vectors are more general, but the accuracy of specific industries is possibly poor), then the industry keywords are screened by tf-idf aiming at the data in the question-answer library, the similar word table (including the weight) is constructed, and the similar word table is added into the inverted index, so that when the Query of the user includes some similar words (which are not included in the original problem), the problem can still be retrieved.
The method of using the similar word list is essentially to expand the entry of the inverted index, and the recall rate of a certain question is improved by adding similar words to keywords in the question. However, this approach can only increase the recall of searches at the granularity of words and can only produce words in the corpus. Therefore, we use the pre-trained model to generate similar questions for any question in the question set, and improve the retrieval capability of the question-answering system on the sentence granularity.
In addition, cosine similarity is used for calculating the similarity between the standard words and the similar words and between the standard question sentences and the similar question sentences, and the final problem score is multiplied by the specific gravity in the final sorting process, so that higher scores can be obtained by matching the user Query with the standard words and the standard question sentences, and the more similar matching items are, the higher the scores are, the generalization capability of the whole system is effectively improved, and the recall rate of the system is improved.
The application cases of the NLP technology in the industry are relatively few, and the whole industry is still in an exploration stage at present. The business consultation dialogue system is one of the mature application cases, has mature application in each industry, but industry clients often only want to obtain a stable and complete plug-and-play consultation system, and do not want to spend much labor to improve the generalization capability of the system. According to the method for automatically improving the generalization ability and the recall ability of the system, the retrieval and sequencing in the question-answering system are improved from two granularities of words and sentences by using the automatic generation technology of the similar words and similar problems, the recall rate of the system is improved, the generalization ability of the system is enhanced, manual participation is avoided as far as possible, and the usability of the system is greatly improved.
The foregoing description is only for the basic principle and the preferred embodiments of the present invention, and modifications and substitutions by those skilled in the art are included in the scope of the present invention.
Claims (10)
1. A method for improving generalization ability of a question-answering system is characterized by comprising the following steps: the method comprises the following steps:
s01), obtaining a similar word list through the word embedding matrix;
s02), obtaining a similar question list through similar question generation;
s03), obtaining the similarity between the standard words and the similar words and between the standard questions and the similar questions through the word vectors and the sentence vectors;
s04), optimizing the question-answering system based on the similarity between the similar word list and the similar question list and the similarity between the standard word and the similar word and between the standard question and the similar question, and expanding the capacity of the inverted index through the information generated by the similar word and the similar question under the condition of giving an industry question-answering set; aiming at a related problem list generated by the inverted index, a scoring algorithm is used for obtaining scores of related problems, all the scores are multiplied by weights of similar words and similar problems, and a final sorting result can be obtained after sorting; the weight of the similar words and the similar problems is the similarity between the standard words and the similar words and between the standard problems and the similar problems.
2. The method of improving generalization ability of a question-answering system according to claim 1, wherein said step of: the process of obtaining the similar word list through the word embedding matrix is as follows:
s11), selecting open-source word vectors or training word vectors, and when training the word vectors, firstly extracting industry keywords by using a keyword extraction algorithm according to dialogue linguistic data given by an industry user, crawling industry weak related knowledge from a network by using the industry keywords as seeds, and then training word embedded matrixes by using word2vec technology;
s12), obtaining similar word lists of the industry keywords, constructing word vector indexes by using a high-dimensional vector index technology according to the word embedding matrix, sequentially traversing the industry keyword lists, and extracting the similar words of the industry keywords.
3. The method of improving generalization ability of a question-answering system according to claim 2, wherein: and when similar words of the industry keywords are extracted, removing the words containing the standard words.
4. The method of improving generalization ability of a question-answering system according to claim 2, wherein: calculating and storing the similarity between the similar words and the standard words by using the cosine similarity, wherein the standard words are extracted industry keywords; the similarity between the similar words and the standard words is used as the contribution degree of the similar words to the standard words, and if the user questions include the standard words, the contribution degree to the standard words is 1; if the user question comprises similar words, the contribution degree of the change criterion is the similarity value of the similar words; and constructing a similar word list of the industry keywords based on the similar words and the similarity between the similar words and the standard words.
5. The method of improving generalization ability of a question-answering system according to claim 1, wherein said step of: the process of obtaining the similar problem list through the similar problem generation is as follows:
s21), crawling industry related problems, and crawling the related problems of each industry keyword according to the industry keywords;
s22), generating similar problem pairs, firstly, extracting a sentence vector by using a pre-training model I, constructing a sentence vector index by using a high-dimensional vector index, then traversing all industry-related problems in sequence, and extracting the first k most similar problems from the sentence vector index;
s23), training a similar problem generation model, adopting a pre-training model II to train the similar problem generation model, inputting a training set and a label, wherein the training set is composed of similar problems and non-similar problems in equal proportion, a loss function is composed of 2 parts, one is loss generated by generating the similar problems, and the other is loss generated by a classification task;
s24), obtaining a similar problem table, traversing standard problems in a problem library, and generating a similar problem set by using a pre-training model II; and adopting a topk coding strategy, setting a model to generate n similar problems at one time, wherein the similar problems are generated word by word, randomly sampling from the previous topk most probable words when generating one word, and taking the sampling result as the next word of the n similar problems.
6. The method of improving generalization ability of a question-answering system according to claim 5, wherein said step of: if the generated similar problems are too close to the standard problems, filtering all the generated similar problems; if the similarity problem does not generate new words, only punctuation and stop words are transformed, filtering; similar problems with near-repetitive occurrences are also filtered out.
7. The method of improving generalization ability of a question-answering system according to claim 5, wherein said step of: and evaluating the deviation between the standard problem and the similar problem by using the similarity, wherein the similarity between the similar problem and the standard problem is used as the contribution of the similar problem to the standard problem, if the user problem is matched with the standard problem, the contribution of the similar problem to the standard problem is 1, and if the user problem is matched with the similar problem, the contribution of the similar problem to the standard problem is the similarity value of the similar problem.
8. The method of improving generalization ability of a question-answering system according to claim 5, wherein said step of: the pre-training model is one of bert, roberta, xlnet and albert.
9. The method of improving generalization ability of a question-answering system according to claim 5, wherein said step of: the pre-training model II is a unilm model.
10. The method of improving generalization ability of a question-answering system according to claim 1, wherein said step of: when the index is inverted, on the basis that all scores are multiplied by the weights of similar words and similar problems, a weight coefficient is multiplied, and the final score is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011494614.1A CN112507097B (en) | 2020-12-17 | 2020-12-17 | Method for improving generalization capability of question-answering system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011494614.1A CN112507097B (en) | 2020-12-17 | 2020-12-17 | Method for improving generalization capability of question-answering system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112507097A true CN112507097A (en) | 2021-03-16 |
CN112507097B CN112507097B (en) | 2022-11-18 |
Family
ID=74922111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011494614.1A Active CN112507097B (en) | 2020-12-17 | 2020-12-17 | Method for improving generalization capability of question-answering system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112507097B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795018B (en) * | 2023-02-13 | 2023-05-09 | 广州海昇计算机科技有限公司 | Multi-strategy intelligent search question-answering method and system for power grid field |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484664A (en) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | Similarity calculating method between a kind of short text |
CN108287822A (en) * | 2018-01-23 | 2018-07-17 | 北京容联易通信息技术有限公司 | A kind of Chinese Similar Problems generation System and method for |
CN108345585A (en) * | 2018-01-11 | 2018-07-31 | 浙江大学 | A kind of automatic question-answering method based on deep learning |
CN109271505A (en) * | 2018-11-12 | 2019-01-25 | 深圳智能思创科技有限公司 | A kind of question answering system implementation method based on problem answers pair |
CN109325040A (en) * | 2018-07-13 | 2019-02-12 | 众安信息技术服务有限公司 | A kind of extensive method, device and equipment in FAQ question and answer library |
CN109344236A (en) * | 2018-09-07 | 2019-02-15 | 暨南大学 | One kind being based on the problem of various features similarity calculating method |
CN110321419A (en) * | 2019-06-28 | 2019-10-11 | 神思电子技术股份有限公司 | A kind of question and answer matching process merging depth representing and interaction models |
CN110413761A (en) * | 2019-08-06 | 2019-11-05 | 浩鲸云计算科技股份有限公司 | A kind of method that the territoriality in knowledge based library is individually talked with |
CN110442760A (en) * | 2019-07-24 | 2019-11-12 | 银江股份有限公司 | A kind of the synonym method for digging and device of question and answer searching system |
CN110825866A (en) * | 2020-01-13 | 2020-02-21 | 江苏联著实业股份有限公司 | Automatic question-answering system and device based on deep network and text similarity |
CN110866100A (en) * | 2019-11-07 | 2020-03-06 | 北京声智科技有限公司 | Phonetics generalization method and device and electronic equipment |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN111125334A (en) * | 2019-12-20 | 2020-05-08 | 神思电子技术股份有限公司 | Search question-answering system based on pre-training |
CN111400458A (en) * | 2018-12-27 | 2020-07-10 | 上海智臻智能网络科技股份有限公司 | Automatic generalization method and device |
CN111597313A (en) * | 2020-04-07 | 2020-08-28 | 深圳追一科技有限公司 | Question answering method, device, computer equipment and storage medium |
-
2020
- 2020-12-17 CN CN202011494614.1A patent/CN112507097B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484664A (en) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | Similarity calculating method between a kind of short text |
CN108345585A (en) * | 2018-01-11 | 2018-07-31 | 浙江大学 | A kind of automatic question-answering method based on deep learning |
CN108287822A (en) * | 2018-01-23 | 2018-07-17 | 北京容联易通信息技术有限公司 | A kind of Chinese Similar Problems generation System and method for |
CN109325040A (en) * | 2018-07-13 | 2019-02-12 | 众安信息技术服务有限公司 | A kind of extensive method, device and equipment in FAQ question and answer library |
CN109344236A (en) * | 2018-09-07 | 2019-02-15 | 暨南大学 | One kind being based on the problem of various features similarity calculating method |
CN109271505A (en) * | 2018-11-12 | 2019-01-25 | 深圳智能思创科技有限公司 | A kind of question answering system implementation method based on problem answers pair |
CN111400458A (en) * | 2018-12-27 | 2020-07-10 | 上海智臻智能网络科技股份有限公司 | Automatic generalization method and device |
CN110321419A (en) * | 2019-06-28 | 2019-10-11 | 神思电子技术股份有限公司 | A kind of question and answer matching process merging depth representing and interaction models |
CN110442760A (en) * | 2019-07-24 | 2019-11-12 | 银江股份有限公司 | A kind of the synonym method for digging and device of question and answer searching system |
CN110413761A (en) * | 2019-08-06 | 2019-11-05 | 浩鲸云计算科技股份有限公司 | A kind of method that the territoriality in knowledge based library is individually talked with |
CN110866100A (en) * | 2019-11-07 | 2020-03-06 | 北京声智科技有限公司 | Phonetics generalization method and device and electronic equipment |
CN111125334A (en) * | 2019-12-20 | 2020-05-08 | 神思电子技术股份有限公司 | Search question-answering system based on pre-training |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN110825866A (en) * | 2020-01-13 | 2020-02-21 | 江苏联著实业股份有限公司 | Automatic question-answering system and device based on deep network and text similarity |
CN111597313A (en) * | 2020-04-07 | 2020-08-28 | 深圳追一科技有限公司 | Question answering method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112507097B (en) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885672B (en) | Question-answering type intelligent retrieval system and method for online education | |
CN106997382B (en) | Innovative creative tag automatic labeling method and system based on big data | |
CN106709040B (en) | Application search method and server | |
CN112800170A (en) | Question matching method and device and question reply method and device | |
CN112650840A (en) | Intelligent medical question-answering processing method and system based on knowledge graph reasoning | |
CN109271505A (en) | A kind of question answering system implementation method based on problem answers pair | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN109829104A (en) | Pseudo-linear filter model information search method and system based on semantic similarity | |
CN109635083B (en) | Document retrieval method for searching topic type query in TED (tele) lecture | |
CN109960786A (en) | Chinese Measurement of word similarity based on convergence strategy | |
CN110674252A (en) | High-precision semantic search system for judicial domain | |
CN109101479A (en) | A kind of clustering method and device for Chinese sentence | |
CN112163077A (en) | Domain-oriented question-answering knowledge graph construction method | |
CN109829045A (en) | A kind of answering method and device | |
CN110888991A (en) | Sectional semantic annotation method in weak annotation environment | |
CN112883165B (en) | Intelligent full-text retrieval method and system based on semantic understanding | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
Armouty et al. | Automated keyword extraction using support vector machine from Arabic news documents | |
CN112417170B (en) | Relationship linking method for incomplete knowledge graph | |
CN112036178A (en) | Distribution network entity related semantic search method | |
CN110851584A (en) | Accurate recommendation system and method for legal provision | |
CN116775846A (en) | Domain knowledge question and answer method, system, equipment and medium | |
KR101333485B1 (en) | Method for constructing named entities using online encyclopedia and apparatus for performing the same | |
CN114493783A (en) | Commodity matching method based on double retrieval mechanism | |
CN112507097B (en) | Method for improving generalization capability of question-answering system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |