CN109145099B - Question-answering method and device based on artificial intelligence - Google Patents

Question-answering method and device based on artificial intelligence Download PDF

Info

Publication number
CN109145099B
CN109145099B CN201810942612.0A CN201810942612A CN109145099B CN 109145099 B CN109145099 B CN 109145099B CN 201810942612 A CN201810942612 A CN 201810942612A CN 109145099 B CN109145099 B CN 109145099B
Authority
CN
China
Prior art keywords
question
target
feature vector
preset
text segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810942612.0A
Other languages
Chinese (zh)
Other versions
CN109145099A (en
Inventor
陈俊
施振辉
周景博
范斌
罗程亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810942612.0A priority Critical patent/CN109145099B/en
Publication of CN109145099A publication Critical patent/CN109145099A/en
Application granted granted Critical
Publication of CN109145099B publication Critical patent/CN109145099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a question-answering method and a question-answering device based on artificial intelligence, wherein the method comprises the following steps: acquiring a target problem sent by a user, and extracting a first feature vector of the target problem; inquiring a preset question-answer information base to obtain a second feature vector corresponding to each question set, and calculating the matching degree between the second feature vector of each question set and the first feature vector of the target question according to a preset algorithm; comparing all the matching degrees with a preset threshold respectively, acquiring the maximum matching degree greater than the threshold, and determining a target problem set corresponding to the maximum matching degree; and inquiring the question-answer information base to obtain a pre-stored answer text segment corresponding to the target question set, and feeding back the answer text segment to the user. Therefore, answers are intelligently provided according to questions input by the user, the range and the capability of the questions which can be processed by the question-answering technology are effectively expanded, and the stickiness of the user and products is increased.

Description

Question-answering method and device based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a question-answering method and device based on artificial intelligence.
Background
Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science which attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, and currently, one of the most important application scenarios of artificial intelligence is artificial intelligence-based question-answering technology. The question-answering technology is a technology for automatically returning a proper answer without human intervention according to a question text input by a user. While the current internet and artificial intelligence technology are continuously developed and continuously bring various conveniences to users, people gradually rely on obtaining required information in an online questioning mode. Such as search engines (hundredths, google), question and answer communities (hundredths known, known), question and answer APPs, etc.
In the related art, the methods for providing answers according to questions input by a user mainly include two types:
the first type: manually editing answers by other users or professionals, such as medical questions answered by doctors, questions answered by net friends for traveling strategies, and the like; in this way, each time a user asks a question, the user needs to wait for other users or professionals to find the question, analyze the question and edit the answer, so that problems such as too long waiting time and no response may occur, and meanwhile, the answer editor may ask a certain reward for the platform, so that more resource expenses are brought to the platform.
The second type: matching the new user question with other previous user questions in the database by means of information retrieval, and directly returning the corresponding original answer in the matching item to the question user if the matching is successful; however, the method completely depends on whether similar question and answer results exist in the database or not and whether the text matching technology is mature enough or not, and the situation that matching fails is common.
Disclosure of Invention
The invention provides a question-answering method and device based on artificial intelligence, which aim to solve the technical problem that the question-answering service based on artificial intelligence has stronger limitation on cost or service quality in the prior art.
The first embodiment of the invention provides a question-answering method based on artificial intelligence, which comprises the following steps: acquiring a target problem sent by a user, and extracting a first feature vector of the target problem; inquiring a preset question-answer information base to obtain a second feature vector corresponding to each question set, and calculating the matching degree between the second feature vector of each question set and the first feature vector of the target question according to a preset algorithm; comparing all the matching degrees with a preset threshold respectively, acquiring the maximum matching degree which is greater than the preset threshold, and determining a target problem set corresponding to the maximum matching degree; and inquiring the question-answer information base to obtain a pre-stored answer text segment corresponding to the target question set, and feeding back the answer text segment to the user.
A second embodiment of the present invention provides a question-answering device based on artificial intelligence, including: the extraction module is used for acquiring a target problem sent by a user and extracting a first feature vector of the target problem; the query module is used for querying a preset question and answer information base to obtain a second feature vector which is prestored and corresponds to each question set; the matching module is used for calculating the matching degree between the second characteristic vector of each problem set and the first characteristic vector of the target problem according to a preset algorithm; the determining module is used for respectively comparing all the matching degrees with a preset threshold value, acquiring the maximum matching degree which is greater than the preset threshold value, and determining a target problem set corresponding to the maximum matching degree; the query module is further used for querying the question-answer information base to obtain pre-stored answer text segments corresponding to the target question set; and the feedback module is used for feeding back the reply text segment to the user.
A third embodiment of the present invention provides a computer program product, wherein when being executed by an instruction processor, the computer program product implements the artificial intelligence based question-answering method according to the previous embodiment.
A fourth embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the artificial intelligence based question-answering method as described in the previous embodiments.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the method comprises the steps of carrying out text segment-based mining and learning on questions and answers in a question and answer information base in advance, abstracting question and answer characteristics into a second feature vector to represent based on a learning result, wherein the second feature vector reflects the commonalities of original answers, further, generating answers for target questions based on matching of the first feature vector and the second feature vector of the target questions, wherein the generated answers are subjected to disorder matching according to the granularity of text segments, the flexibility and the coverage range of the original answers are increased, the answers are intelligently provided according to questions input by users, the range and the capability of questions processable by a question and answer technology are effectively expanded, and the method is favorable for increasing the viscosity of users and products.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow diagram of a question-answering method based on artificial intelligence according to one embodiment of the present invention;
FIG. 2 is a flow diagram of a method of generating a first feature vector according to one embodiment of the invention;
FIG. 3 is a flow diagram of a method of generating a second feature vector according to one embodiment of the invention;
fig. 4 is a flowchart of a method of generating a first feature vector according to another embodiment of the present invention;
fig. 5 is a flowchart of a second feature vector generation method according to another embodiment of the present invention;
FIG. 6 is a flow diagram of a method for artificial intelligence based question answering according to another embodiment of the present invention;
FIG. 7-1 is a schematic diagram of an application scenario of an artificial intelligence-based question-answering method according to an embodiment of the present invention;
FIG. 7-2 is a diagram illustrating an application scenario of an artificial intelligence-based question-answering method according to another embodiment of the present invention; and
fig. 8 is a schematic structural diagram of an artificial intelligence-based question answering apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The question answering method and device based on artificial intelligence according to the embodiment of the invention are described below with reference to the accompanying drawings. The application scenarios of the question answering method and device based on artificial intelligence provided by the embodiment of the invention can be applied to any application program providing question answering service, such as hundredth knowledge, known answers, bean sauce and the like, wherein the scenarios of providing question answering service by the application program include but are not limited to medical question answering, study question answering, makeup question answering and the like.
As analyzed by the background art, when the question answering technology in the prior art provides question answering service, on one hand, the service efficiency is low and the expenditure cost is high due to the fact that the service mode of answering is provided manually, and on the other hand, the service mode of answering is provided by the database and the completeness and the convenience of the database are depended, which is obviously difficult to achieve, so that the problem solving range of the user is limited strongly in the prior art.
In order to solve the technical problems, the invention provides an intelligent question-answering service mode which can automatically generate and generate a proper answer based on the problem provided by the user, thereby enlarging the range of the problem provided by the user and being solved by the question-answering technology and having stronger practicability.
Fig. 1 is a flowchart of a question-answering method based on artificial intelligence according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, acquiring a target problem sent by a user, and extracting a first feature vector of the target problem.
In the actual execution process, the manner of acquiring the target problem sent by the user is different according to different manners input by the user, for example, if the user inputs voice, the voice information input by the user can be recognized and converted into the target problem in a text form, and if the user inputs text, the target problem can be generated after the text information input by the user is cleaned.
And 102, inquiring a preset question-answer information base to obtain a pre-stored second characteristic vector corresponding to each question set, and calculating the matching degree between the second characteristic vector of each question set and the first characteristic vector of the target question according to a preset algorithm.
It can be understood that, in an embodiment of the present invention, a question-answer information base is preset, where the question-answer information base includes multiple question sets, where each question set may be divided according to the semantics of the question or the uniform characteristics such as the category, and in the preset question-answer information base, a corresponding feature vector is set for each question set, where the feature vector corresponds to the division characteristics when the question set is divided, and the specific characteristics of the question set in the division characteristic dimension may be known based on the feature vector.
Specifically, a first feature vector of the target problem is extracted, a preset problem information base is further inquired to obtain a second feature vector corresponding to each problem set, and the matching degree between the second feature vector of each problem set and the first feature vector of the target problem is calculated according to a preset algorithm, so that the matching degree determines the problem set which is preset and has greater similarity with the current target problem.
It should be emphasized that the second feature vector in the embodiment of the present invention corresponds to features of a problem set, rather than a single problem, reflects the commonality of the problems, and has strong stability and practicability.
It should be understood that, for the purpose of matching, the first feature vector and the second feature vector are feature vectors based on the same feature dimension, for example, if the second feature vector is a semantic feature vector, then the first feature vector is also a semantic feature vector.
It should be noted that, the feature dimensions corresponding to the first feature vector are different, the manner of extracting the first feature vector of the target problem is different, and each of the first feature vector and the second feature vector may include one feature vector or multiple feature vectors, where for convenience of description, when describing the determination manner of the first feature vector, the obtaining manner of the second feature vector is described, and the obtaining manner of the first and second feature vectors is exemplified as follows:
example one:
in this example, the feature dimension to which the feature vector corresponds is a semantic feature.
Specifically, in the present example, extracting the first feature vector of the target problem in step 101 includes the following steps as shown in fig. 2:
step 201, performing word segmentation processing on the target problem according to preset screening conditions, and then extracting target keywords.
The target keywords are keywords capable of reflecting the semantics of the target problem.
As a possible implementation manner, the preset screening condition is to screen out verbs and nouns in the target problem which can embody the target semantics in the target problem, so that after the target problem is subjected to word segmentation processing, stop words, dummy words, numbers and the like are removed, and the nouns and the verbs with important semantics are extracted as the keywords of the target problem semantics.
Of course, in the embodiment of the present invention, in order to improve the determination efficiency of the target keyword, a deep learning model may be constructed in advance according to a preset screening condition, the target problem is input into the kidney deficiency learning model, and the output of the deep learning model is used as the target keyword.
Step 202, obtaining a first semantic feature vector of the target question according to the target keyword.
Specifically, in order to show the feature of the target problem in the semantic dimension, a first semantic feature vector of the target problem is obtained according to the target keyword, wherein a corresponding word list including the target keyword and the corresponding semantic feature vector may be preset, and the first semantic feature vector of the target problem is obtained by querying the corresponding word list through the obtained target keyword.
In this embodiment, the step of obtaining the second feature vector corresponding to each question set is as shown in fig. 3 below:
step 301, performing word segmentation processing on each question in each question set according to preset screening conditions, and extracting keywords in each question set and occurrence frequency corresponding to each keyword.
Specifically, word segmentation processing is performed on each problem in each problem set according to preset screening conditions, for example, stop words, dummy words, numbers and the like are removed from a plurality of problems, word segmentation with important semantics is reserved, and then the word segmentation with semantics is determined as a keyword which can reflect the commonality of the problem sets in each problem set.
Step 302, obtaining a second semantic feature vector of each question set according to the keywords in each question set and the occurrence frequency corresponding to each keyword.
Specifically, in the embodiment of the present invention, the second semantic feature vector of each question set is determined according to the keywords in each question set and the occurrence frequency corresponding to each keyword, and the second semantic feature vector reflects abstract feature representation of the question set at a semantic level.
Example two:
in this example, the feature dimension corresponding to the first feature vector is a subject category. Wherein, in different application scenarios, the subject categories may include different departments in medical treatment, different subjects in education, and the like.
Specifically, in the present example, extracting the first feature vector of the target problem in step 101 includes the following steps as shown in fig. 4:
step 401, acquiring subject category information corresponding to an application scenario of the target problem.
It is understood that, in different application scenarios, the subject categories of the question and answer service are different, for example, in the medical diagnosis service application program, the subject categories of the question and answer service include anorectal categories, skin categories, five sense organs categories and the like which belong to relevant subject categories of medical diagnosis, and for example, in a comprehensive question and answer service application program such as hundredth knowledge, the subject categories of the question and answer service obviously include medical treatment, education, beauty and the like.
Therefore, the subject category information corresponding to the application scene of the target problem is acquired, and the subject category to which the target problem belongs in the application scene is determined.
And 402, acquiring a first structural feature vector of the target problem according to the distribution vector of the subject category information.
Specifically, in order to show the feature of the target problem in the subject category dimension, a first structural feature vector of the target problem is obtained according to the target keyword, wherein a corresponding word list including the target keyword and the corresponding structural feature vector may be preset, and the first structural feature vector of the target problem is obtained by querying the corresponding word list through the obtained target keyword.
In this embodiment, the step of obtaining the second feature vector corresponding to each question set is as shown in fig. 5 below:
step 501, obtaining subject category information of each question in each question set.
And 502, acquiring a second structural feature vector of each question set according to the distribution vector of the subject category information of each question in the question sets.
Specifically, in this embodiment, structured classification is performed according to the subject category to which each question set belongs, and the subject categories to which the question set belongs are classified, for example, all questions belonging to the medical treatment scene are structured and classified according to the subject category to which the question belongs, and a second structural feature vector of each question set is obtained according to a distribution vector of the subject category information of each question in the question set, where the second structural feature vector reflects a feature of the subject category to which the question set belongs, and based on a similarity between the second structural feature vector and the first feature vector, the subject category to which the target question corresponds can be found, and an answer to the target question can be effectively generated.
And 103, comparing all the matching degrees with preset thresholds respectively, acquiring the maximum matching degree greater than the threshold, and determining a target problem set corresponding to the maximum matching degree.
It should be understood that, to which question set the target question most likely belongs, the first eigenvector corresponding to the target question is more matched with the second eigenvector corresponding to the question set, so in the embodiment of the present invention, in order to determine the question set to which the target question belongs, so as to further determine an answer to the target question according to the question set, the matching degree of the first eigenvector and the second eigenvector is compared with a preset threshold, the maximum matching degree greater than the preset threshold is obtained, and the target question set corresponding to the maximum matching degree is determined.
And 104, inquiring the question-answer information base to obtain a pre-stored answer text segment corresponding to the target question set, and feeding back the answer text segment to the user.
As analyzed above, the target question set is a set of questions most similar to the target question, for example, if the target question is "what is all the time the abdominal cramp is going to go back", the determined target question sets are all sets of questions for the abdominal cramp, and therefore the stored reply text segment corresponding to the target question can be fed back to the client as the reply text segment of the current target question.
Therefore, on one hand, the question-answering method based on artificial intelligence does not need to edit or match answers aiming at the current target question in real time, but determines the most similar question set to which the question belongs, and provides the answer to the user as the answer of the target question according to the answer corresponding to the question set, so that the question-answering efficiency is improved, and the answer is determined based on the original question set without increasing the answer cost. On the other hand, original questions and answers are fully combed, the original questions and answers are used as data sources of question-answer service, the original question-answers represent question-answer forms commonly used by users, similar questions are classified into question sets, commonality of the question-answers of the users in application scenes is mined, the target questions are determined to be the target question sets on the basis of the target questions and the abstract feature vectors of the question sets, the influence of the specificity of the target questions is weakened, the determinacy of the target questions when the categories of the question sets are determined to the target questions provided by the users is guaranteed, and therefore the quality of the question-answer service is guaranteed. In another aspect, the answer text segment based on the answer of the original question is used as the data source of the answer, the answer can be constructed according to the corresponding answer text segment in the target question set to which the current target question belongs, namely, the answer text segment based on the original answer is split into the answer text segments with smaller granularity, the answer of the current target question is combined by the answer text segments, so that the answer provided in the question and answer service is not limited to the original answer, the contradiction between the variable form target question and the fixed answer in the database is balanced, and the processing capability of the user for asking the question is expanded.
Based on the above description, it is understood that the core that can ensure the scalability of the artificial intelligence-based question-answering method in the embodiment of the present invention is the construction process of the question set and the corresponding answer text segment, and in practical applications, there are various construction manners of the question set and the corresponding answer text segment, and the following description of the process is given by taking one possible implementation manner as an example to help those skilled in the art to more clearly understand the present invention.
In an embodiment of the present invention, before querying the question-answer information base to obtain the pre-stored answer text segment corresponding to the target question set, as shown in fig. 6, the method includes:
step 601, performing text segmentation processing on each answer in the question and answer information base according to a preset segmentation strategy to generate a plurality of text segments, and combining the adjacent text segments to generate a plurality of candidate text segments with the segment quantity larger than a preset threshold value.
Specifically, each answer in the question-answer information base containing a large amount of question-answer information is mined based on the granularity of text segments, and since the answer is actually composed of a plurality of text segments, the mining of the answer based on the text segments has great significance for both the learning of the current answer and the automatic generation of the answer. In this embodiment, each answer in the question and answer information base is subjected to text segmentation processing according to a preset segmentation strategy to generate a plurality of text segments, and the plurality of adjacent text segments are combined to generate a plurality of candidate text segments with the segment number larger than a preset threshold value, wherein, in order to avoid that the candidate text segments have no definite semantics due to the small number of the text segments contained in the candidate text segments, the number of the text segments is limited to be larger than the preset threshold value, wherein the preset threshold value is calibrated according to a large amount of experimental data, and when the number of the text segments is larger than the preset threshold value, the corresponding candidate text segments have definite semantics.
As a possible implementation manner, the preset segmentation strategy is to segment the text segments according to punctuation marks (such as commas, semicolons, periods, question marks, and exclamation marks) of pauses of answers.
Step 602, filtering the candidate text segments according to a preset screening strategy to obtain a plurality of target text segments meeting the conditions, and obtaining a mapping relation between each target text segment and a corresponding question from the question and answer information base.
It should be noted that, some candidate text segments in the candidate text segments formed by text segments segmented according to a large number of answers represent the specificity among the answers, and some candidate text segments represent the commonality among the answers.
The probability of occurrence of each candidate text segment is calculated as a possible implementation manner, and as the higher the occurrence frequency of each candidate text segment is, the more the probability of occurrence of each candidate text segment is, the more the candidate text segment can represent the commonality between the plurality of answers, the more the candidate text segments are filtered according to the occurrence frequency of the candidate text segments, wherein the preset frequency threshold is set according to the needs of the application scenario, the higher the preset frequency threshold is, the more the candidate text segments are filtered, the more the obtained target text segments can represent the commonality between the answers, the lower the preset frequency threshold is, the fewer the filtered candidate text segments are, and the higher the coverage rate of the obtained target text segments on the problem is.
As another possible implementation manner, the inclusion relationship between the candidate text segments is detected, so as to ensure that the target text segment can better reflect the commonality among the multiple answers, the inclusion relationship between the candidate text segments is detected, and the candidate text segments contained in other candidate text segments are filtered out. For example, candidate text segments abcdf (each letter represents a text segment) included in the candidate text segments abcdfh (each letter represents a text segment), are filtered to obtain candidate text segments abcdf, so that the candidate text segments abcdfh which can better represent the commonality among the multiple answers are used as target text segments, and the target text segments can represent the commonality among the answers and can also more comprehensively correspond to the questions.
Further, in order to facilitate the subsequent providing of the answer providing service based on the target question, the mapping relation between each target text segment and the corresponding question is obtained from the question and answer information base, so as to associate each question with the corresponding target text segment.
Step 603, clustering the target text segments according to a preset algorithm to obtain a plurality of target text segment sets, and generating a problem set corresponding to each target text segment set according to the mapping relation.
It can be understood that answers of many questions have similarity, and therefore, in order to further mine the commonality between the answers, a plurality of target text segment sets are obtained by clustering a plurality of target text segments according to a preset algorithm, and a question set corresponding to each target text segment set is generated according to a mapping relationship. For example, feature vectors between each target text segment are calculated, the target text segments are clustered according to the matching degree of the feature vectors to obtain a plurality of target text segment sets, and then a problem set corresponding to each target text segment set is generated according to the problem to which each target text segment belongs.
Step 604, selecting the longest target text segment from each target text segment set, and storing the longest target text segment as the reply text segment of the corresponding question set.
Specifically, in practical applications, the determined target text segments may be similar in semantic, for example, the target text segments abd and bdef (where each letter represents one text segment) are similar because the target text segments abd and bdef include a plurality of similar text segments db, and in order to determine the most comprehensive answer from the answers similar in semantic, the longest target text segment is selected from each target text segment set, and the longest target text segment is stored as the reply text segment of the corresponding question set.
In order to make the artificial intelligence-based question-answering method more clear to those skilled in the art, the following is exemplified with reference to a specific application scenario in which the second feature vector and the first feature vector correspond to a vector of semantic feature dimensions and a vector of structural feature dimensions, and the following is described:
first, referring to fig. 7-1, a text segmentation process is performed on each answer a (including a1, a2, … An) in the question and answer information base according to a preset segmentation strategy, and is segmented into a plurality of text segments { a, b, c, … }, where it is understood that the text segment is a character string instead of a single character, and then, based on the segmented text segments, a plurality of candidate text segments N-Gram composed of N (corresponding to the preset threshold value) text segments are constructed, in order to ensure that the candidate text segments N-Gram have definite semantics, in this example, N is a positive integer greater than or equal to 3, and further, the plurality of candidate text segments N-Gram constitute a huge list.
Continuing to refer to fig. 7-1, filtering the candidate text segments N-Gram according to a preset filtering policy to obtain a plurality of target text segments N-Gram satisfying the condition, specifically, calculating the occurrence frequency of each candidate text segment, filtering candidate text segments smaller than a preset frequency threshold according to the occurrence frequency of all candidate text segments, for example, in fig. 7-1, candidate text segments abd occur 5 times, abde occur 2 times, and the like, filtering candidate text segments whose occurrence frequency is smaller than 3 times (corresponding to the preset frequency threshold), retaining candidate text segments greater than or equal to three times as target text segments, for example, candidate text segments abde and abdef in fig. 7-1 are both filtered, after filtering candidate text segments with a lower occurrence frequency, in this embodiment, detecting the inclusion relationship between the candidate text segments, candidate text snippets that are included in other candidate text snippets are filtered out, e.g., candidate text snippet bde in fig. 7-1 is completely included in candidate text snippet bdef and then candidate text snippet bde is filtered out, leaving candidate text snippet bdef as the target text snippet. And in the screened target text segment N-Gram, simultaneously acquiring the mapping relation between each target text segment and the corresponding question from a question-answer information base, wherein in the new question-answer pair, although each target text segment is not the initial answer any more, the question is still the original question.
Since many answers in the generated new question-answer pair are similar, for example, there are overlapping parts of abd and bdef in the last output target text segment N-Gram in fig. 7-1, which results in high similarity between the two, in order to reduce the overlap in different target text segments N-Gram, clustering a plurality of target text segments according to a preset algorithm, that is, extracting each target text segment N-Gram, extracting TF-IDF features, forming feature vector representation, then performing clustering processing on the target text segments according to K-Means or DBSCAN algorithm to obtain, and selecting the longest target text segment N-Gram in each cluster as the reply text segment a in the cluster*Each cluster is corresponding to a question set LQ { Q1, Q2, … }, and each question in LQ may be represented by a corresponding reply text fragment a*As an answer.
In order to provide subsequent answers, second feature vectors are extracted for a plurality of questions in the question set LQ, a keyword vocabulary { w1: f1, w2: f2, … } is correspondingly obtained for each question set LQ, wherein w and f respectively represent the keyword of each question and the frequency of the keyword appearing in the question, and the keyword is based onA vocabulary for extracting a second semantic feature vector TF-IDF, denoted as f, of each question set LQtf-idf
Furthermore, in the embodiment of the present invention, after obtaining the abstract representation of each question set at the semantic level, obtaining the abstract representation of each question combined at the structural level, and determining the second structural feature vector f of the question combination by counting the subject categories to which the questions in each question set LQ belongstruct
Thus, as shown in FIG. 7-2, after receiving a target question B input by a user, a first semantic feature vector h of the target question B is extractedtf-idfAnd a first structural feature vector hstructCalculating the matching degree score of each second semantic feature vector and second structural feature vector with the first semantic feature vector and the first structural feature vector according to the following formula (1):
Figure BDA0001769436640000101
wherein α (0 ≦ α ≦ 1) in formula (1) represents a parameter for balancing between the semantic feature vector and the structural feature vector, | | | htf-idfI and htf-idfAnd | | represents a two-norm value between the second semantic feature vector and the first semantic feature vector.
Further, the answer text segment A corresponding to the question set corresponding to the maximum value of the matching degree score is used for solving the question set*And is fed back to the client as a reply text.
To sum up, the question-answering method based on artificial intelligence according to the embodiment of the present invention performs text segment-based mining and learning on the question-answering in the question-answering information base in advance, and abstracts the question-answering characteristics into a second feature vector representation based on the learning result, the second feature vector represents the commonality of the original answers, and further, generates answers for the target questions based on matching of the first feature vector and the second feature vector of the target questions, wherein the generated answers are subjected to disorder matching according to the text segment granularity, so that the flexibility and the coverage range of the original answers are increased, the answers are provided intelligently according to the questions input by the user, the range and the capability of the questions processable by the question-answering technology are effectively expanded, and the increase of the stickiness of the user and the product is facilitated.
In order to implement the above embodiments, the present invention further provides an artificial intelligence based question answering device, fig. 8 is a schematic structural diagram of an artificial intelligence based question answering device according to an embodiment of the present invention, and as shown in fig. 8, the device includes: an extraction module 110, a query module 120, a matching module 130, a determination module 140, and a feedback module 150.
The extracting module 110 is configured to obtain a target problem sent by a user, and extract a first feature vector of the target problem.
The query module 120 is configured to query a preset question and answer information base to obtain a second feature vector corresponding to each question set.
And the matching module 130 is configured to calculate a matching degree between the second feature vector of each problem set and the first feature vector of the target problem according to a preset algorithm.
The determining module 140 is configured to compare all the matching degrees with preset thresholds respectively, obtain a maximum matching degree greater than the preset thresholds, and determine a target problem set corresponding to the maximum matching degree.
The query module 120 is further configured to query the question-answer information base to obtain pre-stored answer text segments corresponding to the target question set.
And a feedback module 150, configured to feed back the reply text segment to the user.
It should be noted that the above explanation of the embodiment of the artificial intelligence based question answering method is also applicable to the artificial intelligence based question answering device of the embodiment, and is not repeated herein.
To sum up, the question-answering device based on artificial intelligence according to the embodiment of the present invention pre-mines and learns the question-answering in the question-answering information base based on the text segment, and abstracts the question-answering characteristics into the second feature vector representation based on the learning result, where the second feature vector represents the commonality of the original answers, and further, generates the answers for the target questions based on the matching of the first feature vector and the second feature vector of the target questions, where the generated answers are subjected to disorder matching according to the text segment granularity, thereby increasing the flexibility and coverage of the original answers, intelligently providing answers according to the questions input by the user, effectively expanding the range and capability of the questions processable by the question-answering technology, and facilitating the increase of the stickiness of the user and the product.
In order to implement the foregoing embodiments, the present invention further provides a computer program product, which when executed by an instruction processor in the computer program product, executes the artificial intelligence based question-answering method shown in the foregoing embodiments.
In order to implement the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, in which instructions are executed by a processor, so as to be able to execute the artificial intelligence based question-answering method shown in the above embodiments.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A question-answering method based on artificial intelligence is characterized by comprising the following steps:
the method comprises the steps of obtaining a target problem sent by a user, and extracting a first feature vector of the target problem, wherein the extracting of the first feature vector of the target problem comprises the following steps:
performing word segmentation processing on the target problem according to preset screening conditions, extracting target keywords,
acquiring a first semantic feature vector of the target problem according to the target keyword;
inquiring a preset question-answer information base to obtain a second feature vector corresponding to each question set, and calculating the matching degree between the second feature vector of each question set and the first feature vector of the target question according to a preset algorithm;
comparing all the matching degrees with a preset threshold respectively, acquiring the maximum matching degree which is greater than the preset threshold, and determining a target problem set corresponding to the maximum matching degree;
performing text segmentation processing on each answer in the question-answer information base according to a preset segmentation strategy to generate a plurality of text segments, and combining the adjacent text segments to generate a plurality of candidate text segments with the segment quantity larger than a preset threshold value;
filtering the candidate text segments according to a preset screening strategy to obtain a plurality of target text segments meeting conditions, and obtaining a mapping relation between each target text segment and a corresponding question from the question-answering information base;
clustering the target text segments according to a preset algorithm to obtain a plurality of target text segment sets, and generating a problem set corresponding to each target text segment set according to the mapping relation;
selecting a longest target text segment from each target text segment set, and storing the longest target text segment as a reply text segment of a corresponding question set;
and inquiring the question-answer information base to obtain a pre-stored answer text segment corresponding to the target question set, and feeding back the answer text segment to the user.
2. The method as claimed in claim 1, wherein the filtering the candidate text segments according to a preset filtering policy to obtain a plurality of target text segments satisfying a condition comprises:
calculating the occurrence frequency of each candidate text segment, and filtering out candidate text segments smaller than a preset frequency threshold according to the occurrence frequency of all candidate text segments, and/or;
and detecting the inclusion relation among the candidate text segments, and filtering out the candidate text segments contained in other candidate text segments.
3. The method according to claim 1, wherein before the querying the preset question-and-answer information base to obtain the pre-stored second feature vector corresponding to each question set, further comprising:
performing word segmentation processing on each problem in each problem set according to preset screening conditions, and extracting keywords in each problem set and the occurrence frequency corresponding to each keyword;
and acquiring a second semantic feature vector of each question set according to the keywords in each question set and the occurrence frequency corresponding to each keyword.
4. The method according to claim 1, wherein before the querying the preset question-and-answer information base to obtain the pre-stored second feature vector corresponding to each question set, further comprising:
acquiring subject category information of each problem in each problem set;
and acquiring a second structural feature vector of each problem set according to the distribution vector of the subject category information of each problem in the problem sets.
5. A question-answering method based on artificial intelligence is characterized by comprising the following steps:
the method comprises the steps of obtaining a target problem sent by a user, and extracting a first feature vector of the target problem, wherein the extracting of the first feature vector of the target problem comprises the following steps:
acquiring subject category information corresponding to an application scenario of the target problem,
acquiring a first structural feature vector of the target problem according to the distribution vector of the subject category information;
inquiring a preset question-answer information base to obtain a second feature vector corresponding to each question set, and calculating the matching degree between the second feature vector of each question set and the first feature vector of the target question according to a preset algorithm;
comparing all the matching degrees with a preset threshold respectively, acquiring the maximum matching degree which is greater than the preset threshold, and determining a target problem set corresponding to the maximum matching degree;
performing text segmentation processing on each answer in the question-answer information base according to a preset segmentation strategy to generate a plurality of text segments, and combining the adjacent text segments to generate a plurality of candidate text segments with the segment quantity larger than a preset threshold value;
filtering the candidate text segments according to a preset screening strategy to obtain a plurality of target text segments meeting conditions, and obtaining a mapping relation between each target text segment and a corresponding question from the question-answering information base;
clustering the target text segments according to a preset algorithm to obtain a plurality of target text segment sets, and generating a problem set corresponding to each target text segment set according to the mapping relation;
selecting a longest target text segment from each target text segment set, and storing the longest target text segment as a reply text segment of a corresponding question set;
and inquiring the question-answer information base to obtain a pre-stored answer text segment corresponding to the target question set, and feeding back the answer text segment to the user.
6. A question answering device based on artificial intelligence is characterized by comprising:
the extraction module is configured to acquire a target problem sent by a user and extract a first feature vector of the target problem, where the extraction module is specifically configured to:
performing word segmentation processing on the target problem according to preset screening conditions, extracting target keywords,
acquiring a first semantic feature vector of the target problem according to the target keyword;
the query module is used for querying a preset question and answer information base to obtain a second feature vector which is prestored and corresponds to each question set;
the matching module is used for calculating the matching degree between the second characteristic vector of each problem set and the first characteristic vector of the target problem according to a preset algorithm;
the determining module is used for respectively comparing all the matching degrees with a preset threshold value, acquiring the maximum matching degree which is greater than the preset threshold value, and determining a target problem set corresponding to the maximum matching degree;
a first text segment obtaining module, configured to perform text segmentation processing on each answer in the question-answer information base according to a preset segmentation policy to generate a plurality of text segments, combining a plurality of adjacent text segments to generate a plurality of candidate text segments with the segment number larger than a preset threshold value, filtering the candidate text segments according to a preset screening strategy to obtain a plurality of target text segments meeting the conditions, and obtaining the mapping relation between each target text segment and the corresponding question from the question-answering information base, clustering the target text segments according to a preset algorithm to obtain a plurality of target text segment sets, generating a question set corresponding to each target text segment set according to the mapping relation, selecting a longest target text segment from each target text segment set, and storing the longest target text segment as a reply text segment of the corresponding question set;
the query module is further used for querying the question-answer information base to obtain pre-stored answer text segments corresponding to the target question set;
and the feedback module is used for feeding back the reply text segment to the user.
7. A question answering device based on artificial intelligence is characterized by comprising:
the extraction module is configured to acquire a target problem sent by a user and extract a first feature vector of the target problem, where the extraction module is specifically configured to:
acquiring subject category information corresponding to an application scenario of the target problem,
acquiring a first structural feature vector of the target problem according to the distribution vector of the subject category information;
the query module is used for querying a preset question and answer information base to obtain a second feature vector which is prestored and corresponds to each question set;
the matching module is used for calculating the matching degree between the second characteristic vector of each problem set and the first characteristic vector of the target problem according to a preset algorithm;
the determining module is used for respectively comparing all the matching degrees with a preset threshold value, acquiring the maximum matching degree which is greater than the preset threshold value, and determining a target problem set corresponding to the maximum matching degree;
a second text segment obtaining module, configured to perform text segmentation processing on each answer in the question-answer information base according to a preset segmentation policy to generate multiple text segments, combining a plurality of adjacent text segments to generate a plurality of candidate text segments with the segment number larger than a preset threshold value, filtering the candidate text segments according to a preset screening strategy to obtain a plurality of target text segments meeting the conditions, and obtaining the mapping relation between each target text segment and the corresponding question from the question-answering information base, clustering the target text segments according to a preset algorithm to obtain a plurality of target text segment sets, generating a question set corresponding to each target text segment set according to the mapping relation, selecting a longest target text segment from each target text segment set, and storing the longest target text segment as a reply text segment of the corresponding question set;
the query module is further used for querying the question-answer information base to obtain pre-stored answer text segments corresponding to the target question set;
and the feedback module is used for feeding back the reply text segment to the user.
8. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the artificial intelligence based question-answering method according to any one of claims 1 to 4 or 5.
CN201810942612.0A 2018-08-17 2018-08-17 Question-answering method and device based on artificial intelligence Active CN109145099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810942612.0A CN109145099B (en) 2018-08-17 2018-08-17 Question-answering method and device based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810942612.0A CN109145099B (en) 2018-08-17 2018-08-17 Question-answering method and device based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN109145099A CN109145099A (en) 2019-01-04
CN109145099B true CN109145099B (en) 2021-02-23

Family

ID=64789972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810942612.0A Active CN109145099B (en) 2018-08-17 2018-08-17 Question-answering method and device based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN109145099B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162604B (en) * 2019-01-24 2023-09-12 腾讯科技(深圳)有限公司 Statement generation method, device, equipment and storage medium
CN109933647A (en) * 2019-02-12 2019-06-25 北京百度网讯科技有限公司 Determine method, apparatus, electronic equipment and the computer storage medium of description information
CN110415828B (en) * 2019-06-21 2023-03-31 深圳壹账通智能科技有限公司 Pre-detection information interaction method based on data analysis and related equipment
CN110457440B (en) * 2019-08-09 2022-08-16 宝宝树(北京)信息技术有限公司 Answer feedback method, device, equipment and medium
CN110825283A (en) * 2019-09-18 2020-02-21 云知声智能科技股份有限公司 Defect document display method and device
CN110781662B (en) * 2019-10-21 2022-02-01 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment
CN110825852B (en) * 2019-11-07 2022-06-14 四川长虹电器股份有限公司 Long text-oriented semantic matching method and system
CN111159344A (en) * 2019-12-27 2020-05-15 京东数字科技控股有限公司 Robot response method, device, equipment and storage medium
CN111191034B (en) * 2019-12-30 2023-01-17 科大讯飞股份有限公司 Human-computer interaction method, related device and readable storage medium
CN113268572A (en) * 2020-02-14 2021-08-17 华为技术有限公司 Question answering method and device
CN111476669A (en) * 2020-03-26 2020-07-31 杭州十尾网络科技有限公司 Data analysis method and device
CN111611361B (en) * 2020-04-01 2022-06-14 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111552787B (en) * 2020-04-23 2023-06-30 支付宝(杭州)信息技术有限公司 Question-answering processing method, device, equipment and storage medium
CN111444320B (en) * 2020-06-16 2020-09-08 太平金融科技服务(上海)有限公司 Text retrieval method and device, computer equipment and storage medium
CN111797204A (en) * 2020-07-01 2020-10-20 北京三快在线科技有限公司 Text matching method and device, computer equipment and storage medium
CN112328741B (en) * 2020-11-03 2022-02-18 平安科技(深圳)有限公司 Intelligent association reply method and device based on artificial intelligence and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649612A (en) * 2016-11-29 2017-05-10 中国银联股份有限公司 Method and device for matching automatic question and answer template
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device
CN106844512A (en) * 2016-12-28 2017-06-13 竹间智能科技(上海)有限公司 Intelligent answer method and system
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107862000A (en) * 2017-10-22 2018-03-30 北京市农林科学院 A kind of agricultural technology seeks advice from interactive method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227740A (en) * 2016-07-12 2016-12-14 北京光年无限科技有限公司 A kind of data processing method towards conversational system and device
US10536579B2 (en) * 2016-10-24 2020-01-14 Sriram Venkataramanan Iyer System, method and marketplace for real-time interactive video/voice services using artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649612A (en) * 2016-11-29 2017-05-10 中国银联股份有限公司 Method and device for matching automatic question and answer template
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device
CN106844512A (en) * 2016-12-28 2017-06-13 竹间智能科技(上海)有限公司 Intelligent answer method and system
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107862000A (en) * 2017-10-22 2018-03-30 北京市农林科学院 A kind of agricultural technology seeks advice from interactive method

Also Published As

Publication number Publication date
CN109145099A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109145099B (en) Question-answering method and device based on artificial intelligence
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
CN109710841B (en) Comment recommendation method and device
KR102033388B1 (en) Apparatus and method for question answering
US9536444B2 (en) Evaluating expert opinions in a question and answer system
US20120259801A1 (en) Transfer of learning for query classification
Lotfian et al. Formulating emotion perception as a probabilistic model with application to categorical emotion classification
US10664755B2 (en) Searching method and system based on multi-round inputs, and terminal
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
US8620837B2 (en) Determination of a basis for a new domain model based on a plurality of learned models
CN110019729B (en) Intelligent question-answering method, storage medium and terminal
US11416534B2 (en) Classification of electronic documents
JP2023076413A (en) Method, computer device, and computer program for providing dialogue dedicated to domain by using language model
KR102117287B1 (en) Method and apparatus of dialog scenario database constructing for dialog system
US20150269162A1 (en) Information processing device, information processing method, and computer program product
US10838880B2 (en) Information processing apparatus, information processing method, and recording medium that provide information for promoting discussion
JP6983729B2 (en) Extractor, evaluation device, extraction method and extraction program
CN109241249B (en) Method and device for determining burst problem
CN113704422A (en) Text recommendation method and device, computer equipment and storage medium
CN115438158A (en) Intelligent dialogue method, device, equipment and storage medium
JP7057229B2 (en) Evaluation device, evaluation method and evaluation program
JP7013329B2 (en) Learning equipment, learning methods and learning programs
CN113704623A (en) Data recommendation method, device, equipment and storage medium
JP7160571B2 (en) Evaluation device, evaluation method and evaluation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant