CN110866102A - Search processing method - Google Patents

Search processing method Download PDF

Info

Publication number
CN110866102A
CN110866102A CN201911082817.7A CN201911082817A CN110866102A CN 110866102 A CN110866102 A CN 110866102A CN 201911082817 A CN201911082817 A CN 201911082817A CN 110866102 A CN110866102 A CN 110866102A
Authority
CN
China
Prior art keywords
document
massive
keyword
documents
probabilities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911082817.7A
Other languages
Chinese (zh)
Inventor
潘心冰
李明明
曾光
张红若
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201911082817.7A priority Critical patent/CN110866102A/en
Publication of CN110866102A publication Critical patent/CN110866102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model

Abstract

The embodiment of the invention discloses a retrieval processing method which can improve the retrieval efficiency of mass retrieval. The retrieval processing method comprises the following steps: obtaining a question, and extracting at least one keyword from the question; determining a massive document library for retrieving answers corresponding to the questions; extracting relevant documents of the problems from the massive document library to form a relevant document set according to the relevance degree of the at least one keyword; and retrieving answers corresponding to the questions from the associated document set. The embodiment of the invention obtains the question, extracts at least one keyword from the question, determines the massive document library used for retrieving the answer corresponding to the question, extracts the relevant documents of the question from the massive document library to form a relevant document set according to the relevance degree of the at least one keyword, and retrieves the answer corresponding to the question from the relevant document set. Therefore, the associated documents are selected from the massive document library according to the problems, and answers are searched in the associated documents, so that the searching efficiency of massive searching is improved.

Description

Search processing method
Technical Field
The invention relates to the field of retrieval, in particular to a retrieval processing method.
Background
In the information age, information is explosively increased, and the method for rapidly retrieving answers corresponding to user questions from massive information becomes one of the keys in the field of intelligent conversation systems. With the increase of the number of documents, for example, in mass documents such as product specifications, legal documents and the like, the amount of retrieval data is huge, which often results in slow query speed and even failure of query.
Disclosure of Invention
The embodiment of the invention provides a retrieval processing method which can improve the retrieval efficiency of mass retrieval.
The embodiment of the invention adopts the following technical scheme:
a search processing method, comprising:
obtaining a question, and extracting at least one keyword from the question;
determining a massive document library for retrieving answers corresponding to the questions;
extracting documents related to the problem from the massive document library to form a related document set according to the relevance of the keywords;
and retrieving a result corresponding to the problem from the associated document set.
Optionally, the extracting, according to the degree of association with the at least one keyword, documents associated with the question from the massive document library to form an associated document set includes:
obtaining the theme of each document in the massive document library, and matching each keyword in the at least one keyword with the theme of each document in the massive document library to obtain a first series of probabilities of the keywords;
matching the semantic similarity of each keyword in the at least one keyword with each document in the massive document library to obtain a second series of probabilities of the keywords;
and extracting documents associated with the problems from the massive document library according to the first series of probabilities and the second series of probabilities to form the associated document set.
Optionally, the obtaining the theme of each document in the mass document library includes:
constructing a theme model based on an LDA algorithm;
and determining the theme of each document in the massive document library according to the theme model.
Optionally, the determining the topic of each document in the mass document library according to the topic model includes:
determining a series of alternative topics of each document in the massive document library and the probability of each alternative topic according to the topic model;
and determining the theme of each document in the massive document library according to the probability of each candidate theme, wherein the theme of each document in the massive document library can be one or more.
Optionally, the matching of each keyword in the at least one keyword with the semantic similarity of each document in the mass document library to obtain a second series of probabilities of the keywords includes:
establishing a semantic similarity model of the massive document library according to at least one algorithm of TF-IDF algorithm, BM25 algorithm and ES algorithm;
and matching the semantic similarity of each keyword in the at least one keyword with the semantic similarity of each document in the massive document library based on the semantic similarity model to obtain a second series of probabilities of the keywords.
Optionally, the extracting, according to the first series of probabilities and the second series of probabilities, the documents associated with the question from the massive document library to form the associated document set includes:
determining the comprehensive probability of each document in the massive document library and the problem correlation degree according to the first series of probabilities and the second series of probabilities;
and sequencing the documents in the massive document library according to the comprehensive probability, and extracting the documents associated with the problems from the massive document library to form the associated document set.
Optionally, the determining, according to the first series of probabilities and the second series of probabilities, a comprehensive probability of the relevance of each document in the massive document library to the problem includes:
and weighting and adding the first series of probabilities and the second series of probabilities to obtain the comprehensive probability of the relevance of each document and the problem in the massive document library.
Optionally, the sorting the documents in the massive document library according to the comprehensive probability, and extracting the documents associated with the problem from the massive document library to form the associated document set includes:
sequencing the documents in the massive document library from high to low according to the comprehensive probability;
and acquiring a set number of documents from the first document in the sequence, and forming the associated document set as the documents associated with the problem.
Optionally, the obtaining the question, extracting at least one keyword from the question includes:
receiving the question input by a user;
and performing preprocessing operation on the problem to obtain the at least one keyword, wherein the preprocessing operation comprises one or more operations of word segmentation, error correction, stop-go, entity identification, long and difficult sentence compression and reference resolution.
Optionally, the retrieving the result corresponding to the question from the associated document set includes:
establishing a deep learning model;
and inquiring answers corresponding to the questions from the associated document set according to the deep learning model.
According to the retrieval processing method based on the technical scheme, the problem is obtained, at least one keyword is extracted from the problem, a massive document library used for retrieving answers corresponding to the problem is determined, documents related to the problem are extracted from the massive document library according to the association degree of the at least one keyword to form an associated document set, and the answer corresponding to the problem is retrieved from the associated document set. Therefore, the associated documents are selected from the massive document library according to the problems, and answers are searched in the associated documents, so that the searching efficiency of massive searching is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart of a retrieval processing method according to an embodiment of the present invention;
fig. 2 is a second flowchart of the search processing method according to the embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
According to the embodiment of the invention, topic extraction and semantic similarity analysis are fused, the documents associated with the question are extracted from a massive document library to form an associated document set, and the machine reading understanding method based on deep learning retrieves the query answers from the related articles, so that the answers are quickly and accurately retrieved from the massive documents.
Example 1
As shown in fig. 1, the present embodiment provides a retrieval processing method, including:
11. a question is obtained, and at least one keyword is extracted from the question.
12. And determining a massive document library for retrieving answers corresponding to the questions.
13. Extracting documents related to the problem from the massive document library to form a related document set according to the relevance of the keywords;
14. and retrieving a result corresponding to the problem from the associated document set.
In one embodiment, said extracting documents associated with said question from said mass document repository to form an associated document set according to the degree of association with said at least one keyword comprises:
obtaining the theme of each document in the massive document library, and matching each keyword in the at least one keyword with the theme of each document in the massive document library to obtain a first series of probabilities of the keywords;
matching the semantic similarity of each keyword in the at least one keyword with each document in the massive document library to obtain a second series of probabilities of the keywords;
and extracting documents associated with the problems from the massive document library according to the first series of probabilities and the second series of probabilities to form the associated document set.
In one embodiment, the obtaining the topic of each document in the mass document library includes:
constructing a theme model based on an LDA algorithm;
and determining the theme of each document in the massive document library according to the theme model.
Specifically, taking an lda (content Dirichlet allocation) document theme generation model as an example, the theme extraction refers to extracting a document theme from a document according to the content of the document. More specifically, the document set is regarded as a word sequence { a, B, c, d, … }, each word corresponds to a corresponding probability of different topics, for example, the probability that the word a corresponds to the topic a is p1, the probability that the word B corresponds to the topic B is p2 …, the different word sequences form different document topics, for example, abc, acd, and cda … take the word sequence with the highest probability as the document topic, and the LDA algorithm modeling process is a process for generating probabilities corresponding to the topics.
In one embodiment, the determining the topic of each document in the mass document library according to the topic model includes:
determining a series of alternative topics of each document in the massive document library and the probability of each alternative topic according to the topic model;
and determining the theme of each document in the massive document library according to the probability of each candidate theme, wherein the theme of each document in the massive document library can be one or more.
For example, a series of topics with certain probability values are obtained from each document in the massive document library through an LDA algorithm, the topics are sorted according to the probability values, and a top probability value is taken as the document topic. Specifically, the document is regarded as a word sequence { a, B, c, d, … }, each word corresponds to a corresponding probability of different topics, for example, the probability that the word a corresponds to the topic a is p1, the probability that the word B corresponds to the topic B is p2 …, and different word sequences form different document topics, for example, the word sequence with the highest probability is taken as the document topic, such as abc, acd, and cda ….
For another example, a series of topics with a certain probability value of each document in the massive document library are obtained through an LDA algorithm, the topics are sorted according to the probability value, the first few (settable) probability value topics are taken as the document topics, the topics can reflect the content of the document, the document can be taken as a series of topic sets subject to certain probability distribution, a certain topic is randomly extracted from the document, and the document generates the topic with a certain probability. Each topic may consist of words, words randomly drawn from the topic obeying a certain probability distribution, i.e. the words are included in the topic with a certain probability. A probability distribution is formed from words to topics and from topics to text.
In an embodiment, the matching semantic similarity between each keyword of the at least one keyword and each document in the mass document library to obtain the second series of probabilities of the keywords includes:
establishing a semantic similarity model of the massive document library according to at least one algorithm of TF-IDF algorithm, BM25 algorithm and ES algorithm;
and matching the semantic similarity of each keyword in the at least one keyword with the semantic similarity of each document in the massive document library based on the semantic similarity model to obtain a second series of probabilities of the keywords.
Specifically, the Semantic similarity algorithm is used for calculating the similarity between the keywords and the documents, and the Semantic similarity algorithm may be TF-IDF, BM25, ES algorithm, or DSSM (Deep Structured Semantic Models), CNN-DSSM (relational Neural Networks-Deep Structured Semantic Models), LSTM-DSSM (long short-Term Memory-Deep Structured Semantic Models), and the like based on a Neural network.
Specifically, in TF-IDF, there are two main things, first, TF word frequency, which represents the frequency with which a given word appears in the document. IDF is the inverse file frequency, indicating the importance of a word. For massive texts, segmenting the texts, and then counting the word frequency (TF) of each word in the current article, namely dividing the occurrence frequency by the total number of words in the current document to obtain the word frequency; calculating the IDF of each word, namely dividing the total document number by the number of the documents, and taking the logarithm of the obtained quotient to determine the measurement of the universal importance of the word; the TF-IDF is computed, i.e., the word frequency (TF) multiplied by the Inverse Document Frequency (IDF). The TF-IDF is proportional to the number of occurrences of a word in a document and inversely proportional to the number of occurrences of the word in the whole language, and a probability, i.e., the degree of association of the word in question with the current article, can be obtained by the TF-IDF for use in subsequent steps. Similarly, algorithms such as BM25, ES, DSSM, CNN-DSSM, LSTM-DSSM and the like model massive texts and output ranking lists with questions similar to the texts. The BM25 increases the document weight and the query weight, which is equivalent to an improved version of TF-IDF and can improve the accuracy of output results; the ES bottom layer is based on lucene.
The algorithms used for each are slightly different, while for a massive reading understanding, the different algorithms, input and output, are the same. Therefore, the embodiment of the present invention is not limited to the above three methods, and other modules for calculating semantic similarity may also be adopted.
In one embodiment, said extracting documents associated with the question from the corpus of documents to form the associated document set according to the first series of probabilities and the second series of probabilities comprises:
determining the comprehensive probability of each document in the massive document library and the problem correlation degree according to the first series of probabilities and the second series of probabilities;
and sequencing the documents in the massive document library according to the comprehensive probability, and extracting the documents associated with the problems from the massive document library to form the associated document set.
In one embodiment, the determining a combined probability of relevance of each document in the mass document library to the problem according to the first series of probabilities and the second series of probabilities includes:
and weighting and adding the first series of probabilities and the second series of probabilities to obtain the comprehensive probability of the relevance of each document and the problem in the massive document library.
In an embodiment, said ranking the documents in the massive document library according to the comprehensive probability, and extracting the documents associated with the problem from the massive document library to form the associated document set includes:
sequencing the documents in the massive document library from high to low according to the comprehensive probability;
and acquiring a set number of documents from the first document in the sequence, and forming the associated document set as the documents associated with the problem.
In one embodiment, the obtaining the question, the extracting at least one keyword from the question comprises:
receiving the question input by a user;
and performing preprocessing operations such as word segmentation, stop word removal and the like on the problem to obtain the at least one keyword. The word segmentation can use word segmentation tools such as jieba and HanLP which are open sources, for example, for the weather of the current day, the word segmentation processing result is similar to that of the word segmentation processing result: "today", "weather", "how; stop words include, ground, etc. The preprocessing operation comprises one or more operations of word segmentation, error correction, stop removal (stop word removal), entity recognition, long and difficult sentence compression and reference resolution, and can be combined according to different application scenes.
In one embodiment, the retrieving the result corresponding to the question from the associated document set includes:
establishing a deep learning model;
and inquiring answers corresponding to the questions from the associated document set according to the deep learning model.
Specifically, when the answers to the questions are obtained from the massive texts, the answers can be obtained from the associated document set based on the deep learning model. The deep learning model is a model obtained by training a large number of data sets, and answers corresponding to the questions are extracted from the documents by combining an algorithm and utilizing the model.
The retrieval processing method of the embodiment acquires a question, extracts at least one keyword from the question, determines a massive document library for retrieving answers corresponding to the question, extracts relevant documents of the question from the massive document library according to the relevance degree of the at least one keyword to form a relevant document set, and retrieves the answer corresponding to the question from the relevant document set. Therefore, the associated documents are selected from the massive document library according to the problems, and answers are searched in the associated documents, so that the searching efficiency of massive searching is improved.
Example 2
The present embodiment describes the search processing method according to the present invention in detail with reference to specific examples, as shown in fig. 2, the method includes:
21. and acquiring a mass document library.
The embodiment uses a massive document library for storing a large number of documents, and the documents contained in the massive document library are Doc1, Doc2 and Doc3 ….
22. And acquiring a question, and processing the question to obtain a keyword.
For example, Doc1 is an article introduced by chrysanthemum tea, and the input question is "what the growth transition of chrysanthemum tea is", when the question is preprocessed, the question is analyzed and processed, including the processes of deactivating words (for example, of), correcting errors (changing), dividing words, deleting non-keywords (words), and converting the question into keywords: chrysanthemum tea and growing environment.
23. And performing theme matching and semantic similarity matching on the keywords and the documents in the massive document library.
Specifically, a topic and semantic similarity matching model is generated for the documents in the mass document library (in other embodiments, this step may also precede 22). Taking the LDA document topic generation model as an example, a document is taken as a word sequence { a, B, c, d, … }, each word corresponds to a different topic with a corresponding probability, for example, the probability that word a corresponds to topic a is P1, the probability that word B corresponds to topic B is P2 …, different word sequences form different document topics, and each word corresponds to a probability P1(w/d) of the document, for example, abc, acd, cda … takes the word sequence with the highest probability as the document topic.
Further, after a model is built through a semantic similarity algorithm (such as a TF-IDF algorithm), the probability P2(w/d) of the document corresponding to each word is obtained, and the probability of the document corresponding to the question word is obtained through the model. Obtaining the probability of the document corresponding to the question word through a semantic similarity algorithm according to a series of probabilities of P1 ('chrysanthemum tea'/Doc 1), P1 ('growing environment'/Doc 1) and P1 ('chrysanthemum tea'/Doc 2) …; such as P2 ("chrysanthemum tea"/Doc 1), P2 ("growing environment"/Doc 1), P2 ("chrysanthemum tea"/Doc 2) ….
24. And comprehensively sequencing the matching results of the keywords and the documents in the massive document library by subject matching and semantic similarity.
Specifically, the probabilities generated by the two methods are weighted to obtain the comprehensive probability of generating words word by document Doc, wherein λ P1(w/d) + λ P2 (w/d). For example, the probability of the chrysanthemum tea in Doc1 is P ("chrysanthemum tea"/Doc 1) ═ λ P1 ("chrysanthemum tea"/Doc 1) + λ P2 ("chrysanthemum tea"/Doc 1), so as to obtain the probability of the chrysanthemum tea in Doc 1. For the "growth environment". Similarly, the probability of "chrysanthemum tea" and "growth environment" in Doc1, that is, the probability of Doc1 generating the problem "how is the growth transition of chrysanthemum tea", that is, the probability of the problem "how is the growth transition of chrysanthemum tea" being related to Doc1, may be calculated. By the same method, the probabilities of the fact that the growth and the transformation of the chrysanthemum tea are similar to Doc2, Doc3 and Doc4 … can be obtained.
25. And determining answers corresponding to the questions by basing the comprehensive ranking on a deep learning model.
Specifically, after the probabilities are ranked, the article set most relevant to the question is obtained, and the question and the article set most relevant to the question are submitted to a deep learning model (such as bert) to obtain an answer corresponding to the question.
The retrieval processing method of the embodiment acquires the question, extracts the key words from the question, determines the massive document library used for retrieving answers corresponding to the question, extracts the relevant documents of the question from the massive document library to form the associated document set according to the association degree of the key words, and retrieves the answers corresponding to the question from the associated document set. Therefore, the associated documents are selected from the massive document library according to the problems, and answers are searched in the associated documents, so that the searching efficiency of massive searching is improved.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A search processing method, comprising:
obtaining a question, and extracting at least one keyword from the question;
determining a massive document library for retrieving answers corresponding to the questions;
extracting relevant documents of the problems from the massive document library to form a relevant document set according to the relevance degree of the at least one keyword;
and retrieving answers corresponding to the questions from the associated document set.
2. The method according to claim 1, wherein said extracting relevant documents related to the question from the mass document library to form a relevant document set according to the relevance degree to the at least one keyword comprises:
obtaining the theme of each document in the massive document library, and matching each keyword in the at least one keyword with the theme of each document in the massive document library to obtain a first series of probabilities of the keywords;
matching the semantic similarity of each keyword in the at least one keyword with each document in the massive document library to obtain a second series of probabilities of the keywords;
and extracting relevant documents of the problems from the massive document library according to the first series of probabilities and the second series of probabilities to form the associated document set.
3. The method of claim 2, wherein obtaining the topic of each document in the mass document library comprises:
constructing a theme model based on an LDA algorithm;
and determining the theme of each document in the massive document library according to the theme model.
4. The method of claim 3, wherein determining the topic of each document in the mass document library according to the topic model comprises:
determining at least one alternative theme of each document in the massive document library and the probability of each alternative theme according to the theme model;
and determining the theme of each document in the massive document library according to the probability of each candidate theme, wherein the theme of each document in the massive document library can be one or more.
5. The method of claim 2, wherein matching semantic similarity of each keyword of the at least one keyword with each document of the corpus of documents to obtain a second series of probabilities of the keyword comprises:
establishing a semantic similarity model of the massive document library according to at least one algorithm of TF-IDF algorithm, BM25 algorithm and ES algorithm;
and matching the semantic similarity of each keyword in the at least one keyword with the semantic similarity of each document in the massive document library based on the semantic similarity model to obtain a second series of probabilities of the keywords.
6. The method according to any one of claims 2 to 5, wherein the extracting relevant documents related to the question from the mass document library according to the first series of probabilities and the second series of probabilities to form the associated document set comprises:
determining the comprehensive probability of each document in the massive document library and the problem correlation degree according to the first series of probabilities and the second series of probabilities;
and sequencing the documents in the massive document library according to the comprehensive probability, and extracting documents related to the problems from the massive document library to form the associated document set.
7. The method of claim 6, wherein determining a combined probability of relevance of each document in the corpus of documents to the problem based on the first and second series of probabilities comprises:
and weighting and adding the first series of probabilities and the second series of probabilities to obtain the comprehensive probability of the relevance of each document and the problem in the massive document library.
8. The method of claim 6, wherein the ranking the documents in the mass document repository according to the composite probability, and extracting the documents related to the problem from the mass document repository to form the associated document set comprises:
sequencing the documents in the massive document library from high to low according to the comprehensive probability;
and acquiring a set number of documents from the first document in the sequence, and forming the associated document set as the documents related to the problem.
9. The method of any one of claims 1 to 5, wherein the obtaining a question, the extracting at least one keyword from the question comprises:
receiving the question input by a user;
and performing preprocessing operation on the problem to obtain the at least one keyword, wherein the preprocessing operation comprises one or more operations of word segmentation, error correction, stop-go, entity identification, long and difficult sentence compression and reference resolution.
10. The method according to any one of claims 1 to 5, wherein the retrieving the answer corresponding to the question from the set of associated documents comprises:
establishing a deep learning model;
and inquiring answers corresponding to the questions from the associated document set according to the deep learning model.
CN201911082817.7A 2019-11-07 2019-11-07 Search processing method Pending CN110866102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911082817.7A CN110866102A (en) 2019-11-07 2019-11-07 Search processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911082817.7A CN110866102A (en) 2019-11-07 2019-11-07 Search processing method

Publications (1)

Publication Number Publication Date
CN110866102A true CN110866102A (en) 2020-03-06

Family

ID=69654403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911082817.7A Pending CN110866102A (en) 2019-11-07 2019-11-07 Search processing method

Country Status (1)

Country Link
CN (1) CN110866102A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052326A (en) * 2020-09-30 2020-12-08 民生科技有限责任公司 Intelligent question and answer method and system based on long and short text matching
CN112711657A (en) * 2021-01-06 2021-04-27 北京中科深智科技有限公司 Question-answering method and question-answering system
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding
WO2022088672A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Machine reading comprehension method and apparatus based on bert, and device and storage medium
CN115408491A (en) * 2022-11-02 2022-11-29 京华信息科技股份有限公司 Text retrieval method and system for historical data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699625A (en) * 2013-12-20 2014-04-02 北京百度网讯科技有限公司 Method and device for retrieving based on keyword
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal
CN109977399A (en) * 2019-03-05 2019-07-05 国网青海省电力公司 A kind of data analysing method and device based on NLP technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699625A (en) * 2013-12-20 2014-04-02 北京百度网讯科技有限公司 Method and device for retrieving based on keyword
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal
CN109977399A (en) * 2019-03-05 2019-07-05 国网青海省电力公司 A kind of data analysing method and device based on NLP technology

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052326A (en) * 2020-09-30 2020-12-08 民生科技有限责任公司 Intelligent question and answer method and system based on long and short text matching
WO2022088672A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Machine reading comprehension method and apparatus based on bert, and device and storage medium
CN112711657A (en) * 2021-01-06 2021-04-27 北京中科深智科技有限公司 Question-answering method and question-answering system
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding
CN115408491A (en) * 2022-11-02 2022-11-29 京华信息科技股份有限公司 Text retrieval method and system for historical data
CN115408491B (en) * 2022-11-02 2023-01-17 京华信息科技股份有限公司 Text retrieval method and system for historical data

Similar Documents

Publication Publication Date Title
CN110442760B (en) Synonym mining method and device for question-answer retrieval system
CN106649818B (en) Application search intention identification method and device, application search method and server
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
CN106156204B (en) Text label extraction method and device
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
CN109508414B (en) Synonym mining method and device
CN110866102A (en) Search processing method
CN106599054B (en) Method and system for classifying and pushing questions
CN111291188B (en) Intelligent information extraction method and system
CN111159363A (en) Knowledge base-based question answer determination method and device
CN106708929B (en) Video program searching method and device
CN112632228A (en) Text mining-based auxiliary bid evaluation method and system
Noaman et al. Naive Bayes classifier based Arabic document categorization
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
Shawon et al. Website classification using word based multiple n-gram models and random search oriented feature parameters
CN106570196B (en) Video program searching method and device
Kurniawan et al. Indonesian twitter sentiment analysis using Word2Vec
CN107239455B (en) Core word recognition method and device
Al Mostakim et al. Bangla content categorization using text based supervised learning methods
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
Saputra et al. Keyphrases extraction from user-generated contents in healthcare domain using long short-term memory networks
CN114996455A (en) News title short text classification method based on double knowledge maps
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200306