CN110020163B - Search method and device based on man-machine interaction, computer equipment and storage medium - Google Patents

Search method and device based on man-machine interaction, computer equipment and storage medium Download PDF

Info

Publication number
CN110020163B
CN110020163B CN201711350393.9A CN201711350393A CN110020163B CN 110020163 B CN110020163 B CN 110020163B CN 201711350393 A CN201711350393 A CN 201711350393A CN 110020163 B CN110020163 B CN 110020163B
Authority
CN
China
Prior art keywords
words
entity
question
search
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711350393.9A
Other languages
Chinese (zh)
Other versions
CN110020163A (en
Inventor
姚源林
薛璐影
李远肇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201711350393.9A priority Critical patent/CN110020163B/en
Publication of CN110020163A publication Critical patent/CN110020163A/en
Application granted granted Critical
Publication of CN110020163B publication Critical patent/CN110020163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention provides a search method, a search device, computer equipment and a storage medium based on human-computer interaction, wherein the method comprises the following steps: the method comprises the steps of extracting entity words from user questions to obtain target search words, determining a plurality of candidate words having a co-occurrence relation with the target search words, determining a second category having a co-occurrence relation with the target search words according to a first category to which the target search words belong, selecting at least two supplementary search words belonging to the second category from the plurality of candidate words to generate a guide problem, and searching according to the supplementary search words selected by a user and the target search words to obtain a problem reply. The method has the advantages that the target search word is extracted from the questions asked by the user, the candidate words with the co-occurrence relation with the target search word are determined, at least two supplementary search words are selected from the candidate words to generate the guide questions to be selected by the user, the fuzzy problem is refined, and the technical problem that in the prior art, only a search list or a general answer is given to the answer aiming at the user question, so that the answer is inaccurate is solved.

Description

Search method and device based on man-machine interaction, computer equipment and storage medium
Technical Field
The invention relates to the technical field of internet, in particular to a search method and device based on human-computer interaction, computer equipment and a storage medium.
Background
With the development of artificial intelligence, intelligent question answering becomes an important application in the field of artificial intelligence, and a plurality of intelligent question answering applications appear in the market, such as Baidu Mimi, Microsoft XiaoBing, apple siri and the like. However, in the intelligent question-answering system, when a question posed by a user is handled, the user may have a fuzzy question, or a generalized question, for example, when the user asks "how to write an application" we cannot confirm whether the user is an "entrance party application" or a "leave job application", and the like.
In the prior art, for the condition that the user question is fuzzy, all possible answers are provided through a searching way, or some general answers are directly given, so that the answer wanted by the user cannot be clearly given, and the answer is not accurate.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first object of the present invention is to provide a search method based on human-computer interaction, so as to extract a target search word from a to-be-replied question provided by a user, determine a plurality of candidate words having a co-occurrence relationship with the target search word, and select at least two supplementary search words from the candidate words to generate a guide question for the user to select, thereby implementing refinement of a fuzzy question, and solving a problem in the prior art that an answer is inaccurate due to only providing a search list or a general answer for the user to ask a question.
The second purpose of the invention is to provide a search device based on human-computer interaction.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for searching based on human interaction, including:
acquiring a question to be replied, which is provided by a user;
extracting entity words from the question to be replied to obtain target search words;
inquiring the co-occurrence relation between entity words according to the target search word, and determining a plurality of candidate words having the co-occurrence relation with the target search word;
determining a second category having a co-occurrence relationship with the first category according to the first category to which the target search word belongs, and selecting at least two supplementary search words belonging to the second category from the plurality of candidate words;
generating a guide question according to the at least two supplementary search terms so that a user can select from the at least two supplementary search terms;
and searching according to the supplementary search words selected by the user and the target search words to obtain a question reply.
In the search method based on human-computer interaction, a problem to be replied, which is provided by a user, is obtained, entity words are extracted to obtain target search words, the co-occurrence relation between the entity words is inquired, a plurality of candidate words having the co-occurrence relation with the target search words are determined, a second category having the co-occurrence relation with the first category is determined according to the first category to which the target search words belong, at least two supplementary search words belonging to the second category are selected from the candidate words, a guide problem is generated according to the at least two supplementary search words, and a problem reply is obtained according to the supplementary search words selected by the user and the target search words. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate.
In order to achieve the above object, a second embodiment of the present invention provides a search device based on human-computer interaction, including:
the acquisition module is used for acquiring the problem to be replied, which is proposed by the user;
the extraction module is used for extracting the entity words of the question to be replied to obtain target search words;
the query module is used for querying the co-occurrence relation among the entity words according to the target search words and determining a plurality of candidate words which have the co-occurrence relation with the target search words;
the first determining module is used for determining a second category which has a co-occurrence relation with the first category according to the first category to which the target search word belongs, and selecting at least two supplementary search words belonging to the second category from the plurality of candidate words;
the generating module is used for generating a guide question according to the at least two supplementary search terms so that a user can select from the at least two supplementary search terms;
and the reply module is used for searching according to the supplementary search words selected by the user and the target search words to obtain question replies.
In the search device based on human-computer interaction, an acquisition module is used for acquiring a problem to be replied, which is proposed by a user, an extraction module is used for extracting entity words from the problem to be replied to obtain target search words, an inquiry module is used for inquiring the co-occurrence relation among the entity words and determining a plurality of candidate words which have the co-occurrence relation with the target search words, a first determination module is used for determining a second category which has the co-occurrence relation with the first category according to the first category to which the target search words belong, and selects at least two supplementary search terms belonging to a second category from the plurality of candidate terms, the generating module is used for generating a guide question according to the at least two supplementary search terms, the replying module is used for searching according to the supplementary search words and the target search words selected by the user to obtain question replying. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate.
To achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the human-computer interaction based search method according to the first aspect.
To achieve the above object, a fourth embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the human-computer interaction based search method according to the first aspect.
To achieve the above object, an embodiment of a fifth aspect of the present invention provides a computer program product, wherein when the instructions of the computer program product are executed by a processor, the human-computer interaction based search method according to the first aspect is performed.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a search method based on human interaction according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a user's question and response provided by an embodiment of the present invention;
FIG. 3 is a flowchart illustrating another search method based on human-computer interaction according to an embodiment of the present invention;
FIG. 4 is a schematic flowchart illustrating a further search method based on human-computer interaction according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a search apparatus based on human-computer interaction according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of another search apparatus based on human-computer interaction according to an embodiment of the present invention; and
FIG. 7 illustrates a block diagram of an exemplary computer device suitable for use to implement embodiments of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A search method, apparatus, computer device, and storage medium based on human-computer interaction according to embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a search method based on human interaction according to an embodiment of the present invention.
As shown in fig. 1, the method includes:
step 101, a question to be replied, which is provided by a user, is obtained.
Specifically, the questions to be replied, which are provided by the user, may be collected through a microphone, or may be collected through a keyboard.
Fig. 2 is a schematic diagram of a user question and a reply provided in the embodiment of the present invention, and as shown in fig. 2, the obtained user question is: how do the pen tools?
And 102, extracting entity words of the question to be replied to obtain target search words.
Optionally, the obtained problem to be replied is segmented according to nouns, verbs and auxiliary words, some auxiliary words with the highest occurrence frequency, such as yes, and so on, are removed, and inappropriate words, such as adverbs and so on, are filtered according to part-of-speech tagging, only real words, i.e., words with actual meanings, such as nouns, numerics and so on, are left, and the extracted entity words are target search words, which is not limited in this embodiment.
For example, as shown in fig. 2, when the user asks a question, the target search term obtained by real word extraction is "pen tool".
Step 103, inquiring the co-occurrence relation between the entity words according to the target search word, and determining a plurality of candidate words having the co-occurrence relation with the target search word.
Specifically, the target search word is the extracted entity word, the co-occurrence relationship between the entity word corresponding to the target search word and other entity words is queried, a plurality of candidate words having the co-occurrence relationship with the target search word are determined, and the candidate word refinement is realized, namely the refinement of the complementary search related words is realized.
For example, as shown in fig. 2, the determined target search term is a pen tool, and the plurality of candidate terms having a co-occurrence relationship with the pen tool obtained through query are: photoshop, Illstrator, Coreldraw, scratch, etc.
As a possible implementation manner, before querying the co-occurrence relationship between the entity words corresponding to the target search word, the entity word list may be pre-established according to the field, and the co-occurrence relationship between the entity words may be determined according to the established entity word list.
And 104, determining a second category which has a co-occurrence relation with the first category according to the first category to which the target search word belongs, and selecting at least two supplementary search words belonging to the second category from the plurality of candidate words.
Each field includes a plurality of categories, such as software categories, tool categories, and the like, and the corresponding field includes more entity words, the categories corresponding to different entity words are different, and the different categories may have a co-occurrence relationship.
Specifically, the obtained target search term and a plurality of candidate words having a co-occurrence relationship with the target search term may respectively belong to different categories in the field, and for convenience of distinguishing, a category to which the target search term identified from the question to be answered by the user belongs is referred to as a first category, and a category having a co-occurrence relationship with the first category is referred to as a second category. According to the method, the second category which has a co-occurrence relation with the first category is determined according to the first category to which the target search word belongs, and at least two supplementary search words which belong to the second category are selected from the multiple candidate words, so that when only one category of entity words exists in the question and answer of the user, detailed options of the categories with the co-occurrence relation can be provided according to the co-occurrence relation among the categories for the user to select, and the accuracy of the supplementary search words is further improved.
For example, as shown in fig. 2, the target search word is a pen tool, the pen tool belongs to the tool category, and a co-occurrence relationship exists between the tool category and the software category, so that at least 2 entity words Photoshop, ilstrator and Coreldraw belonging to the software category are selected from the entity words Photoshop, ilstrator, Coreldraw, cutout, and the like, which have a co-occurrence relationship with the pen tool.
And 105, generating a guide question according to the at least two supplementary search terms so that the user can select from the at least two supplementary search terms.
Specifically, according to the at least two supplementary search terms, a guide question for a user to ask a question is generated, and the user is guided to select from the at least two supplementary search terms.
For example, in fig. 2, the generated guidance problem is: what is you the question, i are still less aware of what aspect of the question is specifically asked? Photoshop, Illstrator, Coreldraw.
And step 106, searching according to the supplementary search words and the target search words selected by the user to obtain question replies.
Specifically, according to the supplementary search term selected by the user, the supplementary search error and the target search term are searched together to obtain a reply aiming at the user problem. I.e. for example in fig. 2, the reply is given last.
In the search method based on human-computer interaction, a problem to be replied, which is provided by a user, is obtained, entity words are extracted to obtain target search words, the co-occurrence relation between the entity words is inquired, a plurality of candidate words having the co-occurrence relation with the target search words are determined, a second category having the co-occurrence relation with the first category is determined according to the first category to which the target search words belong, at least two supplementary search words belonging to the second category are selected from the candidate words, a guide problem is generated according to the at least two supplementary search words, and a problem reply is obtained according to the supplementary search words selected by the user and the target search words. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate.
Based on the above embodiment, before querying a co-occurrence relationship between entity words according to a target search word, an entity word list needs to be generated offline in advance according to a specific field, and the co-occurrence relationship between the entity words and categories to which the entity words belong and the co-occurrence relationship between the categories are determined, so that the embodiment of the present invention provides another possible implementation manner of a search method based on human-computer interaction, fig. 3 is a schematic flow diagram of another search method based on human-computer interaction provided by the embodiment of the present invention, as shown in fig. 3, the method includes the following steps:
step 301, obtaining entity words appearing in the question and answer corpus of each field in an offline manner in advance to establish an entity word list.
Specifically, for each field, the question corpus of the field is obtained, and as a possible implementation manner, some initial question and answer corpora of the field can be obtained by searching and grabbing by using Baidu knowledge. The method comprises the steps of segmenting words of the obtained question and answer corpus, removing stop words, carrying out part-of-speech tagging on words reserved after the stop words are removed, reserving words with parts-of-speech being nouns and/or verbs as entity words, counting word Frequency of the entity words, and establishing an entity word list by taking the entity words with the word Frequency higher than the word Frequency threshold as the entity words in the field and by counting the word Frequency of the entity words through a Term Frequency (TF) algorithm as a possible implementation mode.
And step 302, counting the co-occurrence times of different entity words, and determining the co-occurrence relation between the entity words according to the co-occurrence times.
Specifically, the co-occurrence times of different entity words refer to the times of occurrence of different entity words in the same question or the same answer. Counting the co-occurrence times of different entity words, determining the co-occurrence relationship between the entity words according to the co-occurrence times, and as a possible implementation mode, setting a threshold time, determining that the co-occurrence relationship exists between the different entity words when the co-occurrence times of the different entity words are greater than the threshold time, otherwise, determining that the co-occurrence relationship does not exist.
For example, in the field of UE intelligent teaching assistance, for example, the user question-answer corpus is: how to use a Photoshop pen tool, how to scratch a Photoshop pen tool? Wherein Photoshop and pen tool appear in a question and answer together, and the number of co-occurrences is 2, if the threshold number of times is 1, then Photoshop and pen tool are considered to have co-occurrence relationship.
Step 303, performing semantic recognition on the entity words in the entity word list, and merging the co-occurrence relations of the entity words with the same semantics.
Specifically, semantic recognition is performed on each entity word in the entity word list, entity words with the same semantics are determined, and the co-occurrence relations of the entity words with the same semantics are combined. For example, Photoshop and PS are entity words with the same semantics, both represent the same software, Photoshop and a pen tool have a co-occurrence relationship, and PS and the pen tool also have a co-occurrence relationship, so that the same co-occurrence relationship is merged, and only one co-occurrence relationship is reserved, for example, the co-occurrence relationship between Photoshop and the pen tool is reserved. The query speed can be improved in practical application.
Step 304, determining the category of each entity word in the entity word list.
Specifically, each field has different categories, the categories are obtained by dividing according to application scenes of the entity words, and the category to which each entity word in the entity word list belongs is determined according to the different categories. As a possible implementation manner, the category to which each entity word in the entity word list belongs can be identified and labeled in a manual labeling manner, and the category to which each entity word in the entity word list belongs is determined. As another possible implementation manner, a corresponding machine learning model is established according to the field, the model is trained through the selected machine learning class sample, and the entity words are distinguished according to the trained machine learning model.
And 305, counting the co-occurrence times of different categories, and establishing the co-occurrence relationship among the categories according to the co-occurrence times of the different categories.
Specifically, the co-occurrence times of different categories refer to the times of occurrence of entity words belonging to different categories in the same question or the same answer, as a possible implementation manner, a threshold time is set, and if the times of occurrence of entity words of different categories in the same question or the same answer are greater than the threshold times, a co-occurrence relationship exists between the different categories, and the co-occurrence relationship between the different categories is established.
For example, the threshold number of times is set to 1, Photoshop belongs to the software category, pen tools belong to the tools category, and the user question is: how to use a Photoshop pen tool, how to scratch a Photoshop pen tool? In the problem, the occurrence frequency of the entity word Photoshop and the pen tool is 2 times, and is greater than 1 time of the threshold frequency, so that the software category corresponding to the entity word Photoshop and the tool category corresponding to the pen tool have a co-occurrence relationship, and the co-occurrence relationship between the software category and the tool category is established.
It should be noted that step 302 and step 303 may be executed before step 304 and step 305, may be executed after step 304 and step 305, and may be executed in parallel with step 304 and step 305.
In the human-computer interaction based search method, according to each field, an entity word list of the field is established in advance, the co-occurrence relationship among entity words in the entity word list is determined, the co-occurrence times of different categories are counted according to the category to which each entity word in the entity word list belongs, and the co-occurrence relationship among the different categories is determined. By establishing an entity word list in advance, determining the co-occurrence relationship among entity words and further determining the co-occurrence relationship among categories corresponding to the entity words, the problem reply is achieved by searching through the target search words and the supplementary search words after the target search words are extracted from the problem to be replied, the refinement of the fuzzy problem is achieved, and the technical problem that in the prior art, only a search list is given for the answer or a general answer is given for the user to ask a question, so that the answer is inaccurate is solved.
Based on the foregoing embodiment, the present invention further provides a possible implementation manner of a search method based on human-computer interaction, fig. 4 is a schematic flow chart of another search method based on human-computer interaction according to an embodiment of the present invention, as shown in fig. 4, after step 102, the following steps may also be included:
step 401, searching question replies according to the target search term, and determining that the obtained candidate question replies are multiple, and the semantic similarity degree of different candidate question replies is lower than a threshold value.
Specifically, according to the extracted target search word, a search question is replied, if a plurality of candidate questions are replied, semantic similarity degrees between different candidate question replies are calculated through semantic analysis, a threshold value of the semantic similarity degrees is set, and if the semantic similarity degrees of the different candidate question replies are lower than the threshold value, the semantic similarity degrees between the candidate question replies are considered to be small, a corresponding question reply cannot be determined through the target search word, a search keyword needs to be further supplemented to complete the reply of the user question, that is, step 103 to step 106 need to be further executed. Otherwise, directly giving the user response.
For example, many software have pen tools, such as Photoshop, Illustrator, etc., if the user only asks: how do the pen tools? The determined target search word is a pen tool, the pen tool is used as the target search word, the candidate question replies obtained by searching are methods for using the pen tool in different software, the semantic similarity degree between the candidate question replies is low, the replies of the user questions cannot be accurately given, and the search keywords need to be further supplemented to complete the replies of the user questions.
In the search method based on human-computer interaction, a problem to be replied, which is provided by a user, is obtained, entity words are extracted to obtain target search words, the co-occurrence relation between the entity words is inquired, a plurality of candidate words having the co-occurrence relation with the target search words are determined, a second category having the co-occurrence relation with the first category is determined according to the first category to which the target search words belong, at least two supplementary search words belonging to the second category are selected from the candidate words, a guide problem is generated according to the at least two supplementary search words, and a problem reply is obtained according to the supplementary search words selected by the user and the target search words. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate. Meanwhile, after the target search word is extracted, whether the semantic similarity degree of the candidate question reply obtained by searching the target search word is low or not is judged according to the threshold value, whether the user question can be directly replied is judged, if the semantic similarity degree is not lower than the threshold value, the user reply is directly given, the question reply speed is increased, the accuracy is good, if the semantic similarity degree is lower than the threshold value, the related search word needs to be further supplemented, the model problem is refined, and the question reply accuracy is improved.
In order to implement the above embodiment, the present invention further provides a search device based on human-computer interaction.
Fig. 5 is a schematic structural diagram of a search apparatus based on human-computer interaction according to an embodiment of the present invention.
As shown in fig. 5, the apparatus includes: an acquisition module 51, an extraction module 52, a query module 53, a first determination module 54, a generation module 55 and a reply module 56.
The obtaining module 51 is configured to obtain a question to be replied, which is proposed by a user.
And the extracting module 52 is configured to perform entity word extraction on the question to be replied to obtain the target search word.
The query module 53 is configured to query a co-occurrence relationship between entity words according to the target search word, and determine a plurality of candidate words having a co-occurrence relationship with the target search word.
The first determining module 54 is configured to determine, according to the first category to which the target search word belongs, a second category having a co-occurrence relationship with the first category, and select at least two supplementary search words belonging to the second category from the multiple candidate words.
And a generating module 55, configured to generate a guide question according to the at least two supplementary search terms, so that the user selects from the at least two supplementary search terms.
And the reply module 56 is used for searching according to the supplementary search words and the target search words selected by the user to obtain question replies.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
In the search device based on human-computer interaction, an acquisition module is used for acquiring a problem to be replied, which is proposed by a user, an extraction module is used for extracting entity words from the problem to be replied to obtain target search words, an inquiry module is used for inquiring the co-occurrence relation among the entity words and determining a plurality of candidate words which have the co-occurrence relation with the target search words, a first determination module is used for determining a second category which has the co-occurrence relation with the first category according to the first category to which the target search words belong, and selects at least two supplementary search terms belonging to a second category from the plurality of candidate terms, the generating module is used for generating a guide question according to the at least two supplementary search terms, the replying module is used for searching according to the supplementary search words and the target search words selected by the user to obtain question replying. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate.
Based on the foregoing embodiment, the embodiment of the present invention further provides a possible implementation manner of a search apparatus based on human-computer interaction, fig. 6 is a schematic structural diagram of another search apparatus based on human-computer interaction according to the embodiment of the present invention, and on the basis of the foregoing embodiment, as shown in fig. 6, the apparatus further includes: a building module 57, a statistics determination module 58, a second determination module 59, a statistics building module 60 and a third determination module 61.
The establishing module 57 is configured to acquire, in an offline manner in advance for each field, entity words appearing in the question-answer corpus of the field, so as to establish an entity word list.
The statistics determining module 58 is configured to count the number of co-occurrence times of different entity words, where the number of co-occurrence times of different entity words refers to the number of times that different entity words appear in the same question or the same answer. And determining the co-occurrence relation between the entity words according to the co-occurrence times of different entity words.
The second determining module 59 is configured to determine a category to which each entity word in the entity word table belongs, where the category is obtained according to application scene division of the entity word.
The statistics establishing module 60 is configured to, for each field, count co-occurrence times of different categories according to the question and answer corpus of the field, where the co-occurrence times of different categories refer to times of occurrence of entity words belonging to different categories in the same question or the same answer, and establish a co-occurrence relationship between the categories according to the co-occurrence times of different categories.
The third determining module 61 is configured to determine that multiple candidate question replies obtained by search are provided according to the target search term search question reply, and semantic similarity degrees of different candidate question replies are lower than a threshold.
As a possible implementation manner, the establishing module 57 is specifically configured to:
segmenting the question and answer corpus, removing stop words, performing part-of-speech tagging on the words remained after the stop words are removed, keeping the words with parts-of-speech being nouns and/or verbs as entity words, and establishing an entity word list according to the word frequency of each entity word.
And then, performing semantic recognition on the entity words in the entity word list, determining the entity words with the same semantics in the entity word list, and merging the co-occurrence relations of the entity words with the same semantics.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
In the search device based on human-computer interaction, an acquisition module is used for acquiring a problem to be replied, which is proposed by a user, an extraction module is used for extracting entity words from the problem to be replied to obtain target search words, an inquiry module is used for inquiring the co-occurrence relation among the entity words and determining a plurality of candidate words which have the co-occurrence relation with the target search words, a first determination module is used for determining a second category which has the co-occurrence relation with the first category according to the first category to which the target search words belong, and selects at least two supplementary search terms belonging to a second category from the plurality of candidate terms, the generating module is used for generating a guide question according to the at least two supplementary search terms, the replying module is used for searching according to the supplementary search words and the target search words selected by the user to obtain question replying. The method comprises the steps of extracting a target search word from a question to be replied, determining a plurality of candidate words which have a co-occurrence relation with the target search word, and selecting at least two supplementary search words from the candidate words to generate a guide question for a user to select, thereby realizing the refinement of the fuzzy question, and solving the technical problem that in the prior art, the answer is only given out a search list or is general, so that the answer is inaccurate.
In order to implement the foregoing embodiments, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the computer device implements the human-computer interaction based search method according to the foregoing method embodiments.
In order to implement the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the human-computer interaction based search method as described in the foregoing method embodiments.
In order to implement the above embodiments, the present invention further provides a computer program product, wherein when the instructions in the computer program product are executed by a processor, the human-computer interaction based search method as described in the foregoing method embodiments is implemented.
FIG. 7 illustrates a block diagram of an exemplary computer device suitable for use to implement embodiments of the present application. The computer device 12 shown in fig. 7 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in FIG. 7, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a memory 28, and a bus 18 that couples various system components including the memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, and commonly referred to as a "hard drive"). Although not shown in FIG. 7, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via Network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the memory 28, for example, implementing the methods mentioned in the foregoing embodiments.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (9)

1. A search method based on human-computer interaction is characterized by comprising the following steps:
acquiring a question to be replied, which is provided by a user;
extracting entity words from the question to be replied to obtain target search words;
inquiring the co-occurrence relation between entity words according to the target search word, and determining a plurality of candidate words having the co-occurrence relation with the target search word;
determining a second category having a co-occurrence relationship with the first category according to the first category to which the target search word belongs, and selecting at least two supplementary search words belonging to the second category from the plurality of candidate words; wherein the co-occurrence relationship is determined according to the co-occurrence times of the first category and the second category;
generating a guide question according to the at least two supplementary search terms so that a user can select from the at least two supplementary search terms;
and searching according to the supplementary search words selected by the user and the target search words to obtain a question reply.
2. The searching method according to claim 1, wherein before the querying for the co-occurrence relationship between the entity words according to the target search word and determining the plurality of candidate words having the co-occurrence relationship with the target search word, further comprising:
acquiring entity words appearing in question and answer linguistic data of each field in an off-line mode aiming at each field in advance to establish an entity word list;
counting the co-occurrence times of different entity words; the co-occurrence times of different entity words refer to the times of the different entity words appearing in the same question or the same answer;
and determining the co-occurrence relation between the entity words according to the co-occurrence times of different entity words.
3. The searching method according to claim 2, wherein the obtaining of the entity words appearing in the corpus of questions and answers in the field to establish an entity word list comprises:
performing word segmentation on the question and answer corpus, and removing stop words;
performing part-of-speech tagging on the words reserved after the stop words are removed, and reserving the words with the parts-of-speech being nouns and/or verbs as entity words;
and establishing an entity word list according to the word frequency of each entity word.
4. The method according to claim 2, wherein after determining the co-occurrence relationship between the entity words, the method further comprises:
performing semantic recognition on the entity words in the entity word list;
determining entity words with the same semantics in the entity word list;
and merging the co-occurrence relations of the entity words with the same semantics.
5. The searching method according to claim 2, wherein the obtaining entity words appearing in the question-answer corpus of each field in an offline manner in advance for each field to establish an entity word list further comprises:
determining the category of each entity word in the entity word list; the categories are obtained by dividing according to application scenes of the entity words;
counting the co-occurrence times of different categories according to question and answer corpora of the fields aiming at each field; the co-occurrence times of different categories refer to the times of occurrence of entity words belonging to different categories in the same question or the same answer;
and establishing a co-occurrence relation among the categories according to the co-occurrence times of different categories.
6. The search method according to any one of claims 1 to 5, wherein after the extracting the entity word from the question to be replied to obtain the target search word, the method further comprises:
searching question reply according to the target search word;
and determining that the searched candidate question replies are multiple, and the semantic similarity degree of different candidate question replies is lower than a threshold value.
7. A search device based on human-computer interaction is characterized by comprising:
the acquisition module is used for acquiring the problem to be replied, which is proposed by the user;
the extraction module is used for extracting the entity words of the question to be replied to obtain target search words;
the query module is used for querying the co-occurrence relation among the entity words according to the target search words and determining a plurality of candidate words which have the co-occurrence relation with the target search words;
the first determining module is used for determining a second category which has a co-occurrence relation with the first category according to the first category to which the target search word belongs, and selecting at least two supplementary search words belonging to the second category from the plurality of candidate words; wherein the co-occurrence relationship is determined according to the co-occurrence times of the first category and the second category;
the generating module is used for generating a guide question according to the at least two supplementary search terms so that a user can select from the at least two supplementary search terms;
and the reply module is used for searching according to the supplementary search words selected by the user and the target search words to obtain question replies.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the human-computer interaction based search method according to any one of claims 1 to 6 when executing the computer program.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the human-machine interaction based search method according to any one of claims 1 to 6.
CN201711350393.9A 2017-12-15 2017-12-15 Search method and device based on man-machine interaction, computer equipment and storage medium Active CN110020163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711350393.9A CN110020163B (en) 2017-12-15 2017-12-15 Search method and device based on man-machine interaction, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711350393.9A CN110020163B (en) 2017-12-15 2017-12-15 Search method and device based on man-machine interaction, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110020163A CN110020163A (en) 2019-07-16
CN110020163B true CN110020163B (en) 2021-08-17

Family

ID=67186989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711350393.9A Active CN110020163B (en) 2017-12-15 2017-12-15 Search method and device based on man-machine interaction, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110020163B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749328B (en) * 2020-04-21 2024-01-05 腾讯科技(深圳)有限公司 Searching method, searching device and computer equipment
CN112287086A (en) * 2020-11-13 2021-01-29 北京京东尚科信息技术有限公司 Intelligent response method, device, server and medium
CN112749266B (en) * 2021-01-19 2023-03-21 海尔数字科技(青岛)有限公司 Industrial question and answer method, device, system, equipment and storage medium
CN113486071B (en) * 2021-07-27 2022-04-26 掌阅科技股份有限公司 Searching method, server, client and system based on electronic book

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097573A (en) * 2006-06-28 2008-01-02 腾讯科技(深圳)有限公司 Automatically request-answering system and method
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform
CN101676909A (en) * 2008-09-16 2010-03-24 联想(北京)有限公司 Method and computer for providing self-service for users
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006252382A (en) * 2005-03-14 2006-09-21 Fuji Xerox Co Ltd Question answering system, data retrieval method and computer program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097573A (en) * 2006-06-28 2008-01-02 腾讯科技(深圳)有限公司 Automatically request-answering system and method
CN101676909A (en) * 2008-09-16 2010-03-24 联想(北京)有限公司 Method and computer for providing self-service for users
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof

Also Published As

Publication number Publication date
CN110020163A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN110020163B (en) Search method and device based on man-machine interaction, computer equipment and storage medium
CN108460014B (en) Enterprise entity identification method and device, computer equipment and storage medium
CN108829894B (en) Spoken word recognition and semantic recognition method and device
CN107330023B (en) Text content recommendation method and device based on attention points
CN107193807B (en) Artificial intelligence-based language conversion processing method and device and terminal
CN107357849B (en) Interaction method and device based on test application
CN108182246B (en) Sensitive word detection and filtering method and device and computer equipment
CN111858859A (en) Automatic question-answering processing method, device, computer equipment and storage medium
CN108829682B (en) Computer readable storage medium, intelligent question answering method and intelligent question answering device
CN107766325B (en) Text splicing method and device
CN108090211B (en) Hot news pushing method and device
CN108563655A (en) Text based event recognition method and device
CN111401071A (en) Model training method and device, computer equipment and readable storage medium
CN110473543B (en) Voice recognition method and device
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
WO2023236253A1 (en) Document retrieval method and apparatus, and electronic device
CN110287286B (en) Method and device for determining similarity of short texts and storage medium
CN113806500A (en) Information processing method and device and computer equipment
CN112559711A (en) Synonymous text prompting method and device and electronic equipment
CN109508390B (en) Input prediction method and device based on knowledge graph and electronic equipment
CN115309994A (en) Location search method, electronic device, and storage medium
CN110276001B (en) Checking page identification method and device, computing equipment and medium
CN113836296A (en) Method, device, equipment and storage medium for generating Buddhist question-answer abstract
CN113761104A (en) Method and device for detecting entity relationship in knowledge graph and electronic equipment
CN109299294B (en) Resource searching method and device in application, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant