CN112035730A - Semantic retrieval method and device and electronic equipment - Google Patents

Semantic retrieval method and device and electronic equipment Download PDF

Info

Publication number
CN112035730A
CN112035730A CN202011221206.9A CN202011221206A CN112035730A CN 112035730 A CN112035730 A CN 112035730A CN 202011221206 A CN202011221206 A CN 202011221206A CN 112035730 A CN112035730 A CN 112035730A
Authority
CN
China
Prior art keywords
score
answer
candidate
answers
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011221206.9A
Other languages
Chinese (zh)
Other versions
CN112035730B (en
Inventor
周阳
钱泓锦
刘占亮
窦志成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyuan Artificial Intelligence Research Institute
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202011221206.9A priority Critical patent/CN112035730B/en
Publication of CN112035730A publication Critical patent/CN112035730A/en
Application granted granted Critical
Publication of CN112035730B publication Critical patent/CN112035730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a semantic retrieval method, a semantic retrieval device and electronic equipment, wherein the method comprises the following steps: receiving query information sent by a user; correcting the text in the query information to obtain a corrected text; performing user intention analysis on the corrected text, and determining a first score of the identified user intention; for simple fact question answering, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy; for the common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy; sorting the candidate answers according to the first score, the second score and the third score to obtain answers; compared with the search based on the key words, the method can better meet the query requirements of the user.

Description

Semantic retrieval method and device and electronic equipment
Technical Field
The invention relates to the technical field of information processing, in particular to a semantic retrieval method and device and electronic equipment.
Background
In mass information of the internet, information needed by a user often needs to be retrieved through a search engine, however, the retrieval effect of the search engine is not good at present, the user still needs to screen a large number of returned webpages, and the retrieval requirement of convenience and quickness cannot be met. This has resulted in intelligent services that digitize information using intelligent means, but it is still difficult to mine information associations between data, resulting in the inefficient use of much of the data information.
In the existing search engines, most of the existing search engines adopt the traditional ways of keyword matching, PageRank, inverted index and the like as search methods, and in order to meet the query requirements of users as much as possible, the user query is often subjected to lexical analysis based on word segmentation, part of speech recognition, named entity recognition and the like, and then is subjected to combined query. Although the method can improve the query effect, the method only remains in shallow semantic analysis and cannot understand the query intention of the user.
In a knowledge graph-based retrieval and question-answering system, most of retrieval and question-answering are queries based on simple facts, namely one-hop queries, and the more complicated multi-hop queries often cannot obtain good retrieval results or even return results.
Disclosure of Invention
The invention provides a semantic retrieval method, a semantic retrieval device and electronic equipment, which can effectively solve the problems that the existing retrieval method cannot understand the query intention and the query effect cannot meet the requirements of users.
A semantic retrieval method comprising:
receiving query information sent by a user;
correcting the text in the query information to obtain a corrected text;
performing user intention analysis on the corrected text based on a question template library, and determining a first score of the identified user intention, wherein the user intention comprises simple fact question answers and common question answers;
for simple fact question answering, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy;
for common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy;
and sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
Further, correcting the text in the query information to obtain a corrected text, including:
adopting a Chinese word segmentation device to cut words of the text, and carrying out error detection through word granularity and word granularity to generate a suspected error position candidate set;
traversing all suspected error positions, searching phonetic and morphological words from a pre-stored dictionary to replace words at the suspected error positions, and calculating sentence confusion degree through a language model;
sorting the replacement results according to the sentence confusion degree calculation result to obtain an optimal corrected word;
and generating the corrected text according to the optimal corrected word.
Further, for the simple fact question-answering, retrieving based on a pre-constructed knowledge graph, and obtaining a first candidate answer set comprises:
extracting entity information, relationship information and attribute information in the corrected text, and using a synonym dictionary to link the entity information, relationship information or attribute information to the entity, relationship or attribute in the knowledge graph to generate an SQL query statement;
and filling the SQL query statement to the position of the extracted corresponding word slot, and executing query to obtain a first candidate answer set.
Further, for the common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair, and obtaining a second candidate answer set includes:
and performing text vectorization on the correction text, searching similar vectors from the vectorized FAQ question-answer pair, obtaining corresponding answers, and generating a second candidate answer set.
Further, finding similar vectors from the vectorized FAQ question-answer pair comprises:
calculating the similarity between the vectorized correction text and the questions in the vectorized FAQ question-answer pair, and returning the answers corresponding to the questions with the highest similarity; and/or
And calculating the similarity between the vectorized correction text and the answer in the vectorized FAQ question-answer pair, and returning the answer with the highest similarity.
Further, ranking the candidate answers according to the first score, the second score, and the third score to obtain answers includes:
weighting and summing the first score and the second score of the simple fact question-answer to obtain a fourth score of each candidate answer in the first candidate answer set;
weighting and summing the first score and the third score of the common question answer to obtain a fifth score of each candidate answer in a second candidate answer set;
sorting all the candidate answers according to the fourth score and the fifth score, and selecting the answer with the highest sorting;
and generating an answer feedback to the user according to the selected answer and the answer template.
Further, the question template library is pre-constructed in the following way:
collecting historical user query information, and constructing the problem template library according to the user query information;
the vectorized FAQ question-answer pair is pre-constructed in the following way:
collecting common questions of a user, making standard answers, and vectorizing the common questions and the standard answers to obtain the vectorized FAQ question-answer pairs.
A semantic retrieval apparatus comprising:
the receiving module is used for receiving query information sent by a user;
the error correction module is used for correcting the text in the query information to obtain a corrected text;
an intent determination module for performing a user intent analysis on the corrected text based on a question template library, determining a first score of the identified user intent, the user intent including simple fact question answering and common question answering;
the first retrieval module is used for retrieving simple fact questions and answers based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy;
the second retrieval module is used for answering the common questions, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy;
and the answer generation module is used for sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
An electronic device comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor is used for reading the instructions and executing the semantic retrieval method.
A computer-readable storage medium having stored thereon a plurality of instructions readable by a processor and performing the semantic retrieval method described above.
The semantic retrieval method, the semantic retrieval device and the electronic equipment at least have the following beneficial effects:
(1) the natural language understanding based on the semantic level can better match the real intention of the user, improve the retrieval efficiency and accuracy, and better meet the query requirement of the user compared with the retrieval based on the key words;
(2) based on the synonym dictionary, normalized description can be carried out on the identified entities, attributes and relations, normalized description is carried out on the entities which are not normalized and have inaccurate expression in the query sentence of the user, the problem that the entities cannot be correctly linked to the entity nodes in the knowledge graph because the description of the entities is not normalized is avoided, and the robustness of the knowledge graph-based retrieval system is improved;
(3) for non-simple fact queries such as FAQ, answers which best meet the user intention can be queried through a vectorization retrieval service at a semantic level.
Drawings
Fig. 1 is a flowchart of an embodiment of a semantic retrieval method provided in the present invention.
Fig. 2 is a flowchart of an embodiment of a text error correction method in the semantic retrieval method provided by the present invention.
Fig. 3 is a flowchart of an embodiment of a knowledge-graph-based retrieval method in the semantic retrieval method provided by the present invention.
Fig. 4 is a flowchart of an embodiment of a retrieval method based on vectorized FAQ question-answer pairs in the semantic retrieval method provided by the present invention.
Fig. 5 is a flowchart illustrating an embodiment of a method for ranking candidate answers to obtain answers in the semantic retrieval method according to the present invention.
Fig. 6 is a schematic structural diagram of an embodiment of a semantic retrieval apparatus according to the present invention.
Fig. 7 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Referring to fig. 1, in some embodiments, there is provided a semantic retrieval method comprising:
step S101, receiving query information sent by a user;
step S102, correcting the text in the query information to obtain a corrected text;
step S103, analyzing user intentions of the corrected texts based on a question template library, and determining a first score of the identified user intentions, wherein the user intentions comprise simple fact questions and answers and common question answers;
step S104, for simple fact questions and answers, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy;
step S105, for common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy;
and S106, sorting the candidate answers according to the first score, the second score and the third score to obtain answers.
The semantic retrieval method provided by the embodiment can better match the real intention of the user, improves the retrieval efficiency and accuracy, and can better meet the query requirement of the user compared with the retrieval based on the key words.
Specifically, before the above method is performed, a question template library, a knowledge graph, and vectorized FAQ (Frequently Asked Questions) question-answer pairs are constructed in advance.
The knowledge graph adopts an entity-relation-entity triple form, and can organize a large amount of discrete information in the information in a structured mode. For example, high and new technology enterprise affirmation-transacting time-working day 09:00-12:00 am, 13:30-17:00 pm, the head entity is "high and new technology enterprise affirmation", the tail entity is "working day 09:00-12:00 am, 13:30-17:00 pm", and the relationship between the two entities is "transacting time".
The FAQ questions and answers are generally the most common questions and answers in the service handling, the frequently asked questions of the user can be manually collected and labeled to make relevant standard answers, and then the questions and the corresponding answers are vectorized by using a uniform semantic vectorization model to obtain the vectorized FAQ question-answer pairs. Common vectorization schemes include BM25, TFIDF, etc., with deep learning semantic directions such as Bert, etc. After vectorization, a vector search tool can be used to perform a fast match search, such as sessions, annoy, etc.
During cold start, the user's query idioms, such as "what the phone of xxx is", "where the xxx address is", etc., may be recorded in a variety of ways, such as in a manual window, mail, etc. And constructing the problem template library by collecting historical user query information.
In some embodiments, referring to fig. 2, in step S102, performing error correction on the text in the query information to obtain a corrected text, including:
step S1021, a Chinese word segmentation device is adopted to cut words of the text, error detection is carried out according to word granularity and word granularity, and a suspected error position candidate set is generated;
step S1022, traverse all suspected wrong positions, and look for similar words and similar words from the dictionary stored in advance to replace the words in suspected wrong positions, calculate the sentence puzzlement degree through the language model;
step S1023, sorting the replacement results according to the sentence confusion degree calculation result to obtain an optimal corrected word;
and step S1024, generating the corrected text according to the optimal corrected words.
In consideration of wrongly written characters, spoken descriptions, and non-standard words that may appear to a user (for example, "advanced technology enterprise" is simply referred to as "advanced enterprise" or "advanced enterprise"), chinese text correction is required. Error correction mainly has two steps: error detection and error correction. The method comprises the steps of firstly, carrying out error detection on a text word by a Chinese word segmentation device, wherein a word segmentation result often has a wrong segmentation condition due to wrongly-written characters contained in a sentence, so that errors are detected from two aspects of word granularity and word granularity, and suspected error results of the two granularities are integrated to form a suspected error position candidate set; and the error correction part traverses all suspected error positions, replaces words in the error positions with similar words, calculates sentence confusion degree through a language model, compares and sorts results of all candidate sets to obtain the optimal corrected words. The method for text error correction has the advantages of controllability, flexibility, high speed, small resource occupation and the like.
In some embodiments, in step S103, whether the user intends to be a simple factual question answer or a common question answer is identified through short text classification, and in order to improve the robustness of the system, different intentions are scored as a first score. A higher first score indicates a greater likelihood of conforming to the user's true query intent.
In some embodiments, referring to fig. 3, in step S104, for the simple fact question-answer, retrieving based on a pre-constructed knowledge graph, and obtaining a first candidate answer set includes:
step S1041, extracting entity information, relationship information and attribute information in the corrected text, using a synonym dictionary to link the synonym dictionary to the entity, relationship or attribute in the knowledge map, and generating an SQL query statement;
step S1042, filling the SQL query sentence to the extracted corresponding word slot position, and executing query to obtain a first candidate answer set.
Specifically, the entity linking step includes two parts: identification and disambiguation. The identification part mainly uses entity identification in lexical analysis to obtain entities and relationship attributes in user query. For some special fields, a field dictionary is also added in the lexical analysis. The disambiguation part mainly searches the identified entities from the knowledge graph, including aliases, acronyms and the like, as a candidate entity set. The method of Learning to Rank is then used to select the appropriate entity from the candidate set.
And determining a second score of each candidate answer in the first candidate answer set according to the relevance, wherein the higher the second score is, the higher the possibility that the retrieval result meets the requirement of the user is.
In some embodiments, referring to fig. 4, in step S105, for the common question solution, retrieving based on a pre-constructed vectorized FAQ question-answer pair, and obtaining a second candidate answer set includes:
step S1051, carrying out text vectorization on the correction text;
step S1052, searching for similar vectors from the vectorized FAQ question-answer pair, obtaining corresponding answers, and generating a second candidate answer set.
Wherein searching for similar vectors from the vectorized FAQ question-answer pair comprises:
calculating the similarity between the vectorized correction text and the questions in the vectorized FAQ question-answer pair, and returning the answers corresponding to the questions with the highest similarity; and/or
And calculating the similarity between the vectorized correction text and the answer in the vectorized FAQ question-answer pair, and returning the answer with the highest similarity.
The method for searching similar vectors from the vectorized FAQ question-answer pair comprises similar question matching and question answer matching. In practical application, similarity problem matching is mainly adopted, and the vectorization method is mainly based on the semantic level vector of Bert.
And determining a third score of each candidate answer in the second candidate answer set according to the relevance, wherein the higher the third score is, the higher the possibility that the retrieval result meets the requirement of the user is.
In some embodiments, referring to fig. 5, in step S106, ranking the candidate answers according to the first score, the second score, and the third score to obtain an answer includes:
step S1061, performing weighted summation on the first score and the second score of the simple fact question-answer to obtain a fourth score of each candidate answer in the first candidate answer set.
Step S1062, performing weighted summation on the first score and the third score of the common answer to obtain a fifth score of each candidate answer in the second candidate answer set.
Step S1063, sorting all candidate answers according to the fourth score and the fifth score, and selecting the answer with the highest sorting; the sorting is in the order of scores from high to low.
And step S1064, generating an answer feedback to the user according to the selected answer and the answer template.
After the system is operated online, logs are checked regularly, new questions proposed by a user are collected, vectorization processing is carried out after standard answers are made for marking, and the vectorization processing is added into vectorization FAQ question-answer pairs and/or updated into a knowledge graph, so that continuous optimization is realized.
In some embodiments, referring to fig. 6, there is provided a semantic retrieval apparatus including:
a receiving module 201, configured to receive query information sent by a user;
the error correction module 202 is configured to correct errors of the text in the query information to obtain a corrected text;
an intent determination module 203, configured to perform user intent analysis on the corrected text based on a question template library, and determine a first score of the identified user intent, where the user intent includes simple fact question answering and common question answering;
the first retrieval module 204 is configured to retrieve the simple fact questions and answers based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determine a second score of each candidate answer in the first candidate answer set according to the relevance;
the second retrieval module 205 is configured to solve the common questions, retrieve based on a pre-constructed vectorized FAQ question-answer pair, obtain a second candidate answer set, and determine a third score of each candidate answer in the second candidate answer set according to the relevance;
and the answer generating module 206 is configured to rank the candidate answers according to the first score, the second score, and the third score to obtain answers.
Specifically, the error correction module 202 is further configured to perform word segmentation on the text by using a chinese word segmenter, and perform error detection through word granularity and word granularity to generate a candidate set of suspected error positions; traversing all suspected error positions, searching phonetic and morphological words from a pre-stored dictionary to replace words at the suspected error positions, and calculating sentence confusion degree through a language model; sorting the replacement results according to the sentence confusion degree calculation result to obtain an optimal corrected word; and generating the corrected text according to the optimal corrected word.
The first retrieval module 204 is further configured to extract entity information, relationship information, and attribute information in the corrected text, link the entity information, relationship information, and attribute information to an entity, relationship, or attribute in the knowledge graph using a synonym dictionary, and generate an SQL query statement; and filling the SQL query statement to the position of the extracted corresponding word slot, and executing query to obtain a first candidate answer set.
The second retrieving module 205 is further configured to perform text vectorization on the correction text, search for similar vectors from the vectorized FAQ question-answer pair, obtain corresponding answers, and generate a second candidate answer set.
The second retrieving module 205 is further configured to calculate similarity between the vectorized correction text and the question in the vectorized FAQ question-answer pair, and return an answer corresponding to the question with the highest similarity; and/or calculating the similarity between the vectorized correction text and the answer in the vectorized FAQ question-answer pair, and returning the answer with the highest similarity.
The answer generating module 206 is further configured to perform weighted summation on the first score and the second score of the simple fact question-answer to obtain a fourth score of each candidate answer in the first candidate answer set; weighting and summing the first score and the third score of the common question answer to obtain a fifth score of each candidate answer in a second candidate answer set; sorting all the candidate answers according to the fourth score and the fifth score, and selecting the answer with the highest sorting; and generating an answer feedback to the user according to the selected answer and the answer template.
For the specific working principle, please refer to the above method embodiments, which are not described herein again.
Referring to fig. 7, in some embodiments, there is further provided an electronic device including a processor 301 and a memory 302, where the memory 302 stores a plurality of instructions, and the processor 301 is configured to read the plurality of instructions and execute the semantic retrieval method described above, for example, including: receiving query information sent by a user; correcting the text in the query information to obtain a corrected text; performing user intention analysis on the corrected text based on a question template library, and determining a first score of the identified user intention, wherein the user intention comprises simple fact question answers and common question answers; for simple fact question answering, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy; for common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy; and sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
In some embodiments, there is also provided a computer-readable storage medium storing a plurality of instructions that are readable by a processor and perform the semantic retrieval method described above, for example, comprising: receiving query information sent by a user; correcting the text in the query information to obtain a corrected text; performing user intention analysis on the corrected text based on a question template library, and determining a first score of the identified user intention, wherein the user intention comprises simple fact question answers and common question answers; for simple fact question answering, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy; for common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy; and sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
In summary, the semantic retrieval method, the semantic retrieval device, and the electronic device provided in the embodiments at least have the following advantages:
(1) the natural language understanding based on the semantic level can better match the real intention of the user, improve the retrieval efficiency and accuracy, and better meet the query requirement of the user compared with the retrieval based on the key words;
(2) based on the synonym dictionary, normalized description can be carried out on the identified entities, attributes and relations, normalized description is carried out on the entities which are not normalized and have inaccurate expression in the query sentence of the user, the problem that the entities cannot be correctly linked to the entity nodes in the knowledge graph because the description of the entities is not normalized is avoided, and the robustness of the knowledge graph-based retrieval system is improved;
(3) for non-simple fact queries such as FAQ, answers which best meet the user intention can be queried through a vectorization retrieval service at a semantic level.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A semantic retrieval method, comprising:
receiving query information sent by a user;
correcting the text in the query information to obtain a corrected text;
performing user intention analysis on the corrected text based on a question template library, and determining a first score of the identified user intention, wherein the user intention comprises simple fact question answers and common question answers;
for simple fact question answering, retrieving based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy;
for common question answers, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy;
and sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
2. The method of claim 1, wherein correcting the text in the query information to obtain corrected text comprises:
adopting a Chinese word segmentation device to cut words of the text, and carrying out error detection through word granularity and word granularity to generate a suspected error position candidate set;
traversing all suspected error positions, searching phonetic and morphological words from a pre-stored dictionary to replace words at the suspected error positions, and calculating sentence confusion degree through a language model;
sorting the replacement results according to the sentence confusion degree calculation result to obtain an optimal corrected word;
and generating the corrected text according to the optimal corrected word.
3. The method of claim 2, wherein for simple fact questions and answers, retrieving based on a pre-constructed knowledge graph, obtaining a first set of candidate answers comprises:
extracting entity information, relationship information and attribute information in the corrected text, and using a synonym dictionary to link the entity information, relationship information or attribute information to the entity, relationship or attribute in the knowledge graph to generate an SQL query statement;
and filling the SQL query statement to the position of the extracted corresponding word slot, and executing query to obtain a first candidate answer set.
4. The method of claim 3, wherein for the common problem solution, retrieving based on a pre-constructed vectorized FAQ question-answer pair, obtaining a second set of candidate answers comprises:
and performing text vectorization on the correction text, searching similar vectors from the vectorized FAQ question-answer pair, obtaining corresponding answers, and generating a second candidate answer set.
5. The method of claim 4, wherein finding similar vectors from the vectored FAQ question-answer pair comprises:
calculating the similarity between the vectorized correction text and the questions in the vectorized FAQ question-answer pair, and returning the answers corresponding to the questions with the highest similarity; and/or
And calculating the similarity between the vectorized correction text and the answer in the vectorized FAQ question-answer pair, and returning the answer with the highest similarity.
6. The method of claim 5, wherein ranking the candidate answers according to the first score, the second score, and the third score to obtain an answer comprises:
weighting and summing the first score and the second score of the simple fact question-answer to obtain a fourth score of each candidate answer in the first candidate answer set;
weighting and summing the first score and the third score of the common question answer to obtain a fifth score of each candidate answer in a second candidate answer set;
sorting all the candidate answers according to the fourth score and the fifth score, and selecting the answer with the highest sorting;
and generating an answer feedback to the user according to the selected answer and the answer template.
7. The method of claim 1, wherein the problem template library is pre-constructed in the following manner:
collecting historical user query information, and constructing the problem template library according to the user query information;
the vectorized FAQ question-answer pair is pre-constructed in the following way:
collecting common questions of a user, making standard answers, and vectorizing the common questions and the standard answers to obtain the vectorized FAQ question-answer pairs.
8. A semantic retrieval apparatus, comprising:
the receiving module is used for receiving query information sent by a user;
the error correction module is used for correcting the text in the query information to obtain a corrected text;
an intent determination module for performing a user intent analysis on the corrected text based on a question template library, determining a first score of the identified user intent, the user intent including simple fact question answering and common question answering;
the first retrieval module is used for retrieving simple fact questions and answers based on a pre-constructed knowledge graph to obtain a first candidate answer set, and determining a second score of each candidate answer in the first candidate answer set according to the relevancy;
the second retrieval module is used for answering the common questions, retrieving based on a pre-constructed vectorized FAQ question-answer pair to obtain a second candidate answer set, and determining a third score of each candidate answer in the second candidate answer set according to the relevancy;
and the answer generation module is used for sequencing the candidate answers according to the first score, the second score and the third score to obtain answers.
9. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor being configured to read the plurality of instructions and execute the semantic retrieval method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a plurality of instructions readable by a processor and performing the semantic retrieval method of any one of claims 1 to 7.
CN202011221206.9A 2020-11-05 2020-11-05 Semantic retrieval method and device and electronic equipment Active CN112035730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011221206.9A CN112035730B (en) 2020-11-05 2020-11-05 Semantic retrieval method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011221206.9A CN112035730B (en) 2020-11-05 2020-11-05 Semantic retrieval method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112035730A true CN112035730A (en) 2020-12-04
CN112035730B CN112035730B (en) 2021-02-02

Family

ID=73573564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011221206.9A Active CN112035730B (en) 2020-11-05 2020-11-05 Semantic retrieval method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112035730B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597748A (en) * 2020-12-18 2021-04-02 深圳赛安特技术服务有限公司 Corpus generation method, apparatus, device and computer readable storage medium
CN112818102A (en) * 2021-02-01 2021-05-18 杭州微洱网络科技有限公司 Context-based fast question answering method for FAQ (failure of expert knowledge) knowledge base
CN112905747A (en) * 2021-03-08 2021-06-04 国能大渡河流域水电开发有限公司 Professional system archive question-answering robot system based on semantic analysis technology
CN112988784A (en) * 2021-04-26 2021-06-18 广州思迈特软件有限公司 Data query method, query statement generation method and device
CN113515940A (en) * 2021-07-14 2021-10-19 上海芯翌智能科技有限公司 Method and equipment for text search
CN115495483A (en) * 2022-09-21 2022-12-20 企查查科技有限公司 Data batch processing method, device, equipment and computer readable storage medium
WO2023246093A1 (en) * 2022-06-24 2023-12-28 重庆长安汽车股份有限公司 Common question answering method and system, device and medium
CN117520485A (en) * 2024-01-08 2024-02-06 卓世科技(海南)有限公司 Large language model vector retrieval method based on knowledge graph integration

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090089277A1 (en) * 2007-10-01 2009-04-02 Cheslow Robert D System and method for semantic search
CN108509483A (en) * 2018-01-31 2018-09-07 北京化工大学 The mechanical fault diagnosis construction of knowledge base method of knowledge based collection of illustrative plates
CN109033277A (en) * 2018-07-10 2018-12-18 广州极天信息技术股份有限公司 Class brain system, method, equipment and storage medium based on machine learning
CN109145168A (en) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 A kind of expert service robot cloud platform
US10242049B2 (en) * 2015-01-14 2019-03-26 Baidu Online Network Technology (Beijing) Co., Ltd. Method, system and storage medium for implementing intelligent question answering
CN109885660A (en) * 2019-02-22 2019-06-14 上海乐言信息科技有限公司 A kind of question answering system and method based on information retrieval that knowledge mapping is energized
CN110020010A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Data processing method, device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090089277A1 (en) * 2007-10-01 2009-04-02 Cheslow Robert D System and method for semantic search
US10242049B2 (en) * 2015-01-14 2019-03-26 Baidu Online Network Technology (Beijing) Co., Ltd. Method, system and storage medium for implementing intelligent question answering
CN110020010A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Data processing method, device and electronic equipment
CN108509483A (en) * 2018-01-31 2018-09-07 北京化工大学 The mechanical fault diagnosis construction of knowledge base method of knowledge based collection of illustrative plates
CN109033277A (en) * 2018-07-10 2018-12-18 广州极天信息技术股份有限公司 Class brain system, method, equipment and storage medium based on machine learning
CN109145168A (en) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 A kind of expert service robot cloud platform
CN109885660A (en) * 2019-02-22 2019-06-14 上海乐言信息科技有限公司 A kind of question answering system and method based on information retrieval that knowledge mapping is energized

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597748A (en) * 2020-12-18 2021-04-02 深圳赛安特技术服务有限公司 Corpus generation method, apparatus, device and computer readable storage medium
CN112597748B (en) * 2020-12-18 2023-08-11 深圳赛安特技术服务有限公司 Corpus generation method, corpus generation device, corpus generation equipment and computer-readable storage medium
CN112818102A (en) * 2021-02-01 2021-05-18 杭州微洱网络科技有限公司 Context-based fast question answering method for FAQ (failure of expert knowledge) knowledge base
CN112905747A (en) * 2021-03-08 2021-06-04 国能大渡河流域水电开发有限公司 Professional system archive question-answering robot system based on semantic analysis technology
CN112988784A (en) * 2021-04-26 2021-06-18 广州思迈特软件有限公司 Data query method, query statement generation method and device
CN113515940A (en) * 2021-07-14 2021-10-19 上海芯翌智能科技有限公司 Method and equipment for text search
WO2023246093A1 (en) * 2022-06-24 2023-12-28 重庆长安汽车股份有限公司 Common question answering method and system, device and medium
CN115495483A (en) * 2022-09-21 2022-12-20 企查查科技有限公司 Data batch processing method, device, equipment and computer readable storage medium
CN117520485A (en) * 2024-01-08 2024-02-06 卓世科技(海南)有限公司 Large language model vector retrieval method based on knowledge graph integration
CN117520485B (en) * 2024-01-08 2024-03-29 卓世科技(海南)有限公司 Large language model vector retrieval method based on knowledge graph integration

Also Published As

Publication number Publication date
CN112035730B (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112035730B (en) Semantic retrieval method and device and electronic equipment
WO2021000676A1 (en) Q&a method, q&a device, computer equipment and storage medium
CN110399457B (en) Intelligent question answering method and system
US11586637B2 (en) Search result processing method and apparatus, and storage medium
CN107436864B (en) Chinese question-answer semantic similarity calculation method based on Word2Vec
CN109101479B (en) Clustering method and device for Chinese sentences
US8073877B2 (en) Scalable semi-structured named entity detection
CN112100356A (en) Knowledge base question-answer entity linking method and system based on similarity
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
CN110765277B (en) Knowledge-graph-based mobile terminal online equipment fault diagnosis method
CN109271524B (en) Entity linking method in knowledge base question-answering system
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN112883165B (en) Intelligent full-text retrieval method and system based on semantic understanding
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN115599902B (en) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN112328800A (en) System and method for automatically generating programming specification question answers
CN116166782A (en) Intelligent question-answering method based on deep learning
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN113742446A (en) Knowledge graph question-answering method and system based on path sorting
CN112989813A (en) Scientific and technological resource relation extraction method and device based on pre-training language model
WO2020074017A1 (en) Deep learning-based method and device for screening for keywords in medical document
Alshammari et al. TAQS: an Arabic question similarity system using transfer learning of BERT with BILSTM
CN113190692A (en) Self-adaptive retrieval method, system and device for knowledge graph
CN109684357B (en) Information processing method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant