CN110795542A - Dialogue method and related device and equipment - Google Patents

Dialogue method and related device and equipment Download PDF

Info

Publication number
CN110795542A
CN110795542A CN201910806215.5A CN201910806215A CN110795542A CN 110795542 A CN110795542 A CN 110795542A CN 201910806215 A CN201910806215 A CN 201910806215A CN 110795542 A CN110795542 A CN 110795542A
Authority
CN
China
Prior art keywords
question
questions
word
server
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910806215.5A
Other languages
Chinese (zh)
Other versions
CN110795542B (en
Inventor
潘伟洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910806215.5A priority Critical patent/CN110795542B/en
Publication of CN110795542A publication Critical patent/CN110795542A/en
Application granted granted Critical
Publication of CN110795542B publication Critical patent/CN110795542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a dialogue method, a related device and equipment, wherein the dialogue method presets first response information corresponding to a set question in a question-answer database, preferentially matches a question similar to a first question of a first question request from the question-answer database after the server receives the dialogue request aiming at the first question sent by the first terminal, and returns the first response information corresponding to the matched question to the first terminal; and when the problem similar to the first problem is not matched, the QA system is used for generating response information aiming at the first problem, and the chatting conversation process is optimized by the method, so that the accuracy of the response information is improved.

Description

Dialogue method and related device and equipment
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to a conversation method and a related device and equipment.
Background
Along with the continuous development of the field of artificial intelligence, the interaction frequency of people and intelligent equipment is gradually increased, the life of people is continuously enriched by the progress of the intelligent equipment, and the processing and feedback of information are the key points of the artificial intelligence technology.
At present, in the field of chat conversation robots, processing of acquired information mainly depends on a parameterized evaluation system, and appropriate information is matched in a knowledge base according to objective and parameterized analysis and comparison of input information, so that a user can obtain corresponding feedback. However, in the field of chat conversation, it is difficult for a question answering system (QA system) to provide reply information with high accuracy to a question with strong subjectivity through a parameterized evaluation system, so that improvement of the accuracy of the reply information in the field of chat conversation with strong subjectivity is a technical problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the invention discloses a dialogue method, a related device and equipment, which can solve the technical problem of low accuracy of reply information aiming at the problem of strong subjectivity in the prior art, so as to optimize chatting dialogue and improve the accuracy of response.
In a first aspect, an embodiment of the present application provides a dialog method, including:
the method comprises the steps that a server receives a conversation request sent by a first terminal, wherein the conversation request is used for requesting response information corresponding to a first question from the server;
the server searches a second question matched with the first question in N questions of a question-and-answer database, wherein the question-and-answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
under the condition that the second question is found, the server sends first response information corresponding to the second question in the question-answer database to the first terminal;
when the second question is not found, the server generates response information aiming at the first question through a question answering system (QA system);
and sending the generated response information to the first terminal.
As a possible implementation manner, the server searches a second question matched with the first question from N questions in a question-and-answer database, specifically including:
determining a maximum similarity in a first similarity set according to the similarity between the first question and each question in the N questions, wherein the first similarity set comprises the similarity between the first question and each question in the N questions;
if the maximum similarity is larger than a first threshold, determining that the second problem is the problem corresponding to the maximum similarity;
and if the maximum similarity is smaller than the first threshold, determining that the second question is not found in the question-answer database.
As a possible implementation, the third question is any one of the N questions, and the method further includes:
and calculating the cosine similarity of the word frequency vector of the first question and the word frequency vector of the third question to obtain the similarity of the first question and the third question.
As a possible implementation, the method further comprises:
the server acquires a first question set, wherein the first question set is a set of questions historically responded by the server;
the server groups the problems in the first problem set to obtain L problem groups;
and the server screens K problem groups with the total frequency greater than a second threshold value from the L problem groups according to the total frequency of each problem group in the L problem groups, wherein the total frequency of a first problem group is the sum of the frequency of the server for responding to each problem in the first problem group in history, and the first problem group is any one problem group in the L problem groups.
As a possible implementation, grouping the questions in the first question set to obtain L question groups includes:
calculating the similarity between a fourth question and each question in a second question set, wherein the second question set is a set of questions which are not grouped currently in the first question set, and the fourth question is one question in the second question set;
and dividing the questions in the second question set, the similarity of which to the fourth question is greater than a third threshold value, into a question group.
As a possible implementation manner, the fifth question is any one question in the second question set, and the calculating the similarity between the fourth question and each question in the second question set specifically includes:
determining keywords of the fourth question according to the weight of each word in the fourth question, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth question, and the first word is any word in the fourth question;
determining keywords of the fifth question according to the weight of each word in the fifth question, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth question, and the second word is any word in the fifth question;
determining a word frequency vector of the fourth question and a word frequency vector of the fifth question according to the keywords of the fourth question and the keywords of the fifth question;
and calculating the cosine similarity of the word frequency vector of the fourth question and the word frequency vector of the fifth question to obtain the similarity of the fourth question and the fifth question.
As a possible implementation, the method further comprises:
the server generates at least one response message aiming at a sixth question, wherein the sixth question is any one question in K question groups;
the server sends the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message and sends a first response message corresponding to the sixth question to the server, wherein the first response message corresponding to the sixth question is a selected response message in the at least one response message received by the second terminal or a response message input by the second terminal aiming at the sixth question;
and the server receives the first response information corresponding to the sixth question and updates the sixth question and the first response information corresponding to the sixth question to the question-answer database.
In a second aspect, an embodiment of the present application further provides a dialog method, including:
the method comprises the steps that a first terminal generates a conversation request according to an input first question, wherein the conversation request is used for requesting response information corresponding to the first question from a server;
a session request sent by the first terminal to a server, so that after the server receives the session request, the server searches a second question matched with the first question from N questions in a question-and-answer database and sends first answer information corresponding to the second question in the question-and-answer database to the first terminal under the condition that the second question is found, wherein the question-and-answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
and the first terminal receives and outputs the first response information.
As a possible implementation, the method further comprises:
receiving and displaying a sixth question sent by the server and at least one piece of response information corresponding to the sixth question;
receiving first response information corresponding to the sixth question, wherein the first response information corresponding to the sixth question is selected response information in at least one response information corresponding to the sixth question or response information input aiming at the sixth question;
and sending the sixth question and first response information corresponding to the sixth question to the server, so that the server updates the sixth question and the first response information corresponding to the sixth question to the question-answer database.
A third aspect discloses a dialogue apparatus, including:
a receiving unit, configured to receive a session request sent by a first terminal, where the session request is used to request, from the server, response information corresponding to a first question;
the search unit is used for searching a second question matched with the first question in N questions of a question and answer database, the question and answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
a sending unit, configured to send, to the first terminal, first answer information corresponding to the second question in the question-answer database when the second question is found;
a first generating unit configured to generate response information for the first question by a question answering system (QA system) when the second question is not found;
the sending unit is further configured to send the generated response information to the first terminal.
It should be noted that the apparatus may further include other means for performing the dialog method disclosed in the first aspect or any embodiment of the first aspect.
A fourth aspect discloses a conversation apparatus, including:
the generating unit is used for generating a dialogue request according to the input first question, and the dialogue request is used for requesting response information corresponding to the first question to the server;
a sending unit, configured to send a session request to the server, so that after receiving the session request, the server searches for a second question matched with the first question in N questions in a question-and-answer database, and sends first response information corresponding to the second question in the question-and-answer database to the first terminal when the second question is found, and generates response information for the first question through a QA system and sends the generated response information to the first terminal when the second question is not found, where the question-and-answer database includes N questions and first response information corresponding to each of the N questions, where N is a positive integer;
and the receiving unit is used for receiving and outputting the received response information.
It should be noted that the apparatus may further include other units for executing the dialog method disclosed in the second aspect or any embodiment of the second aspect.
A fifth aspect discloses a dialog device comprising a processor and a memory, the processor being connected to the memory, wherein the memory is configured to store program code, and the processor is configured to invoke the program code to implement a dialog method as disclosed in the first aspect or any embodiment of the first aspect.
A sixth aspect discloses a dialog device comprising a processor and a memory, the processor being connected to the memory, wherein the memory is adapted to store program code and the processor is adapted to invoke the program code to implement a dialog method as disclosed in the second aspect or any of the embodiments of the second aspect.
A seventh aspect discloses a computer readable storage medium storing a computer program or computer instructions which, when executed, implement a dialog method as disclosed in the first aspect or any of its embodiments.
An eighth aspect discloses a computer readable storage medium storing a computer program or computer instructions which, when executed, implement a dialog method as disclosed in the second aspect or any of its embodiments.
In the embodiment of the invention, a server receives a conversation request sent by a first terminal, wherein the conversation request is used for requesting response information corresponding to a first question to the server; searching a second question matched with the first question in N questions of a question-answer database, wherein the question-answer database comprises the N questions and first answer information corresponding to each question in the N questions, and sending the first answer information corresponding to the second question in the question-answer database to a first terminal under the condition that the second question is searched; and when the second question is not found, generating response information aiming at the first question through a QA system and sending the generated response information to the first terminal. By implementing the embodiment of the invention, the first response information corresponding to the set question is preset in the question-answer database, after the server receives the question dialogue request, the question similar to the first question of the question-answer request can be preferentially matched from the question-answer database, and the first response information corresponding to the matched question is returned to the first terminal, so that the chatting dialogue is optimized, and the accuracy of the response information is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
fig. 2 is a schematic illustration of the first server 102 responding to the session request according to the embodiment of the disclosure;
FIG. 3 is a schematic diagram of a graphical user interface according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a dialog method according to an embodiment of the present invention;
FIG. 5 is a method for calculating similarity between a first problem and a third problem according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart illustrating a method for updating a question-answer database according to an embodiment of the present invention;
FIG. 7 illustrates a method for grouping a first set of questions in accordance with an embodiment of the present invention;
FIG. 8 is a flowchart of a method for computing similarity between two problems during grouping, according to an embodiment of the present invention;
FIG. 9a is a schematic structural diagram of a dialogue device according to an embodiment of the present invention;
FIG. 9b is a schematic structural diagram of another dialogue device disclosed in the embodiment of the invention;
FIG. 10 is a schematic structural diagram of another dialogue device disclosed in the embodiment of the invention;
FIG. 11 is a schematic structural diagram of another dialogue device disclosed in the embodiment of the invention;
fig. 12 is a schematic structural diagram of another dialogue device disclosed in the embodiment of the invention.
Detailed Description
The embodiment of the invention discloses a dialogue method and a dialogue device, which are used for optimizing chatting dialogue information. The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be understood that the terminology used in the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
First, concepts and terms related to the embodiments of the present application will be briefly described.
(1) Problem(s)
The questions referred by the embodiments of the present application may be question sentences, statement sentences, exclamation sentences, imperative sentences, or other sentence patterns. For example, "how is the weather today? "; as another example, "you look nice. "; as another example, the "sea beauty! ", are not limited herein.
(2) Response message
The response information referred to in the embodiments of the present application may be images, audio, text, or other types. For example, for questions such as "what is eaten to whiten", the response message may be the text "lemon"; for another example, the answer information may be audio data for the question "recommend a pure music", such as pure music "wish to enjoy the moon and flow you"; as another example, for the question "picture of the discotheque princess", the response information may be an image, such as one or more pictures of the discotheque princess.
(3) QA system
The Question Answering System (QA System) is a high-level form of information retrieval System, and it uses knowledge representation, information retrieval, natural language processing and other technologies comprehensively, so that users can ask questions in natural language instead of keyword combination to ask information query. The QA system may analyze the input questions, automatically find the most probable response information from various content sources (e.g., electronic documents), and return the response information.
The process of understanding natural language by the question-answering system is not a simple character matching process, the semantics or intention of a question can be identified only by deeply understanding the question, and then a large-scale knowledge base can be searched to find needed response information. In addition, in the process of understanding the natural language by the question-answering system, the "situation" of the question, for example, "nearby restaurant" needs to be considered, and the understanding process of the question needs to not only understand that the question is a question of a geographic information category, but also obtain the geographic location information of the questioner. In some cases, the problems expressed in natural language are synonymous but the expression forms of the words are various. For example, "nearby restaurant," "I want to eat," etc., the user's intent to ask a question is to find a nearby restaurant, but the form of the expression is very different.
(4)TF-IDF
TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining, belonging to a statistical method. For a problem, the importance of a word increases as its frequency of occurrence in the problem increases, but at the same time decreases as its frequency of occurrence in a set of multiple problems increases. For a word in a problem, the TF-IDF of the word can be obtained by the following formula:
TF-IDF=TF×IDF
where TF is the Term Frequency (Term Frequency) indicating the Frequency with which the Term appears in the question. The IDF is an Inverse text Frequency index (Inverse Document Frequency), and the main idea of the IDF is that if the number of questions containing a word t in a question set is less, the larger the IDF of the word t is, the word t has good category distinguishing capability.
(5) Word frequency vector
The word frequency vector referred by the embodiment of the application is the word frequency vector of the problem, namely, a problem is expressed by converting the problem into a vector form. A question set, as a combination of questions, may include a plurality of words that form a word set, in which case a question in the question set may be represented as a vector in a multidimensional space formed by the word set. For example, the set of questions includes a first question and a third question, where:
the first problem is: this box is expensive and that price is suitable.
The third problem is: this box is not inexpensive and is more suitable.
The set of words formed by the problem set is { this, box, price, expensive, that, proper, not, cheap, better }.
Each word in the word set has a word frequency of { this 1, box 1, price 2, expensive 1, that 1, proper 1, not 0, cheap 0, more 0} respectively at the first question, and the word frequency vector of the first question can be represented as (1,1,2,1,1, 0,0, 0).
Each word in the word set has a word frequency of { this 1, box 1, price 1, expensive 0, that 1, proper 1, not 1, cheap 1, more 1} in the third question, respectively, then the word frequency vector of the third question can be represented as (1,1,1,0,1,1, 1).
(6) Cosine similarity
Cosine similarity measures the difference between two individuals by using the cosine value of the included angle between two vectors in the vector space. The closer the cosine value is to 1, the closer the angle is to 0 degrees, i.e. the more similar the two vectors are. According to the method and the device, the similarity of the two problems can be calculated according to the cosine similarity of the word frequency vectors of the two problems, and the closer the calculation result is to 1, the higher the similarity between the two problems is.
(7)TextRank
The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is derived from the PageRank algorithm of Google, a text is divided into a plurality of composition units (words and problems) and a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction and abstract can be realized only by using the information of a single document.
Referring to fig. 1, fig. 1 is a schematic diagram of a dialog system architecture according to an embodiment of the present invention. As shown in fig. 1, the system architecture diagram may include a first terminal 101, a first server 102, a second server 103, a question and answer database 104, and a second terminal 105. The first terminal 101, based on that the client can receive a question of the user, which may be a text converted from a voice of the user, a text converted from an image sent by the user, or a text converted from another form, sends a dialog request containing the question to the first server 102. The first terminal 101 may be a robot, a computer, a mobile phone, an intelligent sound, or other intelligent devices capable of receiving a problem input by a user, which is not limited herein. The first terminal 101 is further configured to receive response information corresponding to the first question returned by the first server 102.
The question-answer database 104 includes a plurality of questions and first answer information corresponding to each question in the plurality of questions, and the questions in the question-answer database 104 may be subjective questions, may be knowledge-based objective questions, or may be other types of questions. The first response message may be text, voice, video, or other message.
The first server 102 is used to provide services for clients. Specifically, the first server 102 may receive a dialogue request sent by the first terminal 101 for searching for a second question matching the first question, and respond to the dialogue request. Referring to fig. 2, fig. 2 is a schematic illustration of a first server 102 responding to a session request, where the first server 102 may parse the session request, obtain a first question, search for response information corresponding to the first question from the question-and-answer database 104, and if the response information corresponding to the first question is found, it is considered to be a hit, and the first server 102 may return the found response information to the first terminal 101, such as "answer a" shown in fig. 2; if the response information corresponding to the question is not found, it is considered as a miss, the first server 102 generates the response information of the first question through the QA system, and the first server 102 may return the generated response information to the first terminal 101, such as "answer B" shown in fig. 2.
In some embodiments, the first server 102 is further configured to obtain questions historically answered by the first server 102 or questions in other databases, such as questions in the QQ chat records of multiple users, and the first server 102 may duplicate the questions historically answered to obtain a first question set, further group the questions in the first question set, and filter the question groups based on the frequency of each question group to obtain a high-frequency question group. The first server 102 may also generate at least one response message for each question in the high-frequency question group by the QA system, and transmit each question in the high-frequency question group and at least one response message corresponding to each question to the second server 103.
The second server 103 may be a response information calibration platform, which may perform manual calibration on response information of received questions, after obtaining response information of each question and at least one corresponding to each question in a high-frequency question group, the second server 103 may send the obtained one or more questions and at least one corresponding to each question to the second terminal 105, a user of the second terminal 105 may manually calibrate response information corresponding to the question with respect to the received question, to obtain first response information of each question, and may also send each question and first response information corresponding to each question to the second server 103.
As shown in fig. 3, fig. 3 exemplarily shows a Graphical User Interface (GUI), also called a calibration interface, which is displayed on the second terminal 105. The calibration interface comprises a calibration area 301, a calibration progress control 302, an input control 303, a save control 304, a delete control 305 and an export data control 306. The information displayed in the calibration area 301 is a question to be calibrated and at least one response message corresponding to the question, and the question may be modified or the first response message corresponding to the question may be determined according to different input operations. For example, a question of operating the calibration interface by double-clicking may modify a question of the calibration interface, and for example, when the second terminal 105 detects a selection operation for one response message of the at least one response message, the selected response message is used as a calibration result of the question, that is, the first response message corresponding to the question. The second terminal 105 displays information indicating the number of problem pieces requiring calibration in total and the number of problem pieces already calibrated in the second server 103 through the calibration progress control 302, and the calibration progress information of the calibration progress control 302 is continuously updated according to the calibration operation of the calibration area 301. The second terminal 105 may detect the information input for the input control 303, and when the information input for the input control 303 is detected, the input information is used as a calibration result of a question in the calibration area 301 in the calibration interface, that is, the first response information corresponding to the question. When the second terminal 105 detects the user operation input for the saving control 304, the calibration result of the problem in the current calibration interface is saved and updated to the next calibration interface. When the delete control 305 detects a user operation input for the control, the next calibration interface is entered. When the second terminal 105 detects the user operation input for the export data control 306, all the questions which have been calibrated and the corresponding first response information are exported. Wherein the user operation may include, but is not limited to, clicking, double clicking, selecting, etc. The second terminal 105 may send the calibration results of the questions, that is, each question and the first response information corresponding to each question to the second server 103. Not limited to the calibration interface for calibrating the first response information corresponding to the problem shown in fig. 3, the interface may also present other designs, and the embodiment of the present application is not limited.
After receiving the questions and the first response information corresponding to the questions, the second server 103 may send the received questions and the first response information corresponding to the questions to the first server 102, and the first server 102 may update the received questions and the first response information corresponding to the questions to the question and answer database 104. The second terminal 105 may be a response information calibration terminal, and the second terminal 105 may be a robot, a computer, a mobile phone, an intelligent sound, or other intelligent devices capable of receiving a problem input by a user, which is not limited herein.
In an alternative embodiment, the second server 103 interacts with the second terminal 105 in the following way:
the second server 103 sends the question and at least one response message corresponding to the question to the second terminal 105, the second terminal 105 displays a calibration interface as shown in fig. 3 based on the client, when the second terminal 105 detects an input operation of the saving control 305, the question of the calibration interface and the first response message corresponding to the question are saved, and the second terminal 105 sends the question in the temporary cache and the first response message corresponding to the question to the second server 103 and requests the calibration interface of the next question to the second server 103. The problem of the calibration interface and the first response information corresponding to the problem, which are stored in the temporary cache, may be obtained by detecting an input operation of the calibration area 301 and an input operation of the input control 303. For example, as shown in fig. 3, if the selected response message is "there is no, i can have a large eye for a double water stroke" detected in the calibration area 301, and if no input message is detected in the input control 303, the messages stored in the temporary buffer are "there is no you've an eye" and "there is no, i can have a large eye for a double water stroke". For another example, if the selected response information is not detected in the calibration area 301, and the input information "i can see you" is detected in the input control 303, the information stored in the temporary cache is "do you have eyes" and "i can see you". For another example, if the selected response message is detected in the calibration area 301 as "no, i can have a large eye on a double stroke", and the input message "i can see you" is detected in the input control 303, the messages stored in the temporary buffer are "do you have" and "i can see you".
Alternatively, the first terminal 101 and the second terminal 105 may be the same device.
Alternatively, the first server 102 and the second server 103 may be the same device.
Alternatively, the question-answer database 104 may be located in the first server 102 or may exist separately. Without being limited to the system architecture diagram shown in fig. 1, the dialogue system provided in the embodiment of the present application may further include other devices, for example, a third-party server, which may be a server for detecting whether the problem is suspected of illegal content. The first server 102 may provide third-party data and third-party functions for the dialog system by interacting with the third-party server, so as to further guarantee the services that can be provided by the dialog system platform.
Referring to fig. 4, fig. 4 is a flowchart illustrating a dialog method according to an embodiment of the present invention. As shown in fig. 4, the dialog method may be implemented by the dialog system shown in fig. 1, where the first terminal may be the first terminal 101, the first server may be the first server 102, and the question-answer database may be the question-answer database 104, and the implementation of the dialog method may include the following steps.
S101, the first terminal receives a first question input by a user based on the client.
The client may be an APP on the mobile terminal, a browser on a computer, or other programs capable of providing services for the user, which is not limited herein. The first question may be a voice input by the first terminal through a voice sensor (e.g., a microphone), a text input by the first terminal through an input device (e.g., a touch panel or a keyboard), or other formats, such as an image, and is not limited herein.
S102, the first terminal generates a dialogue request according to the input first question.
The dialogue request is used for requesting response information corresponding to the first question from the first server. When the first question is non-text, the first terminal may convert the first question into text, for example, convert input voice into text through a voice recognition algorithm; for another example, problems in the input image are identified by a picture recognition algorithm.
S103, the first terminal sends the generated conversation request to the first server.
S104, the first server receives the dialogue request.
The first server, upon receiving the dialog request, may parse the dialog request, extracting a first question from the dialog request.
S105, the first server searches the question-answer database for a second question matched with the first question.
The first server searches the question-answer data for a second question matching the first question according to the first question. The question-answer database may include N questions and first answer information corresponding to each of the N questions, where N is a positive integer. When the second question matched with the first question is found, S106 is executed, and the first server sends first response information corresponding to the second question to the first terminal; when the second question matched with the first question is not found, S107 is executed, and the first server generates response information for the first question through the QA system and sends the response information to the first terminal.
The first server searching the question-answer data for the second question matching the first question according to the first question may include the following three implementation manners:
implementation mode (one):
the first server can calculate the similarity between the first question and each question in the N questions to obtain a first similarity set, wherein the first similarity set comprises the similarity between the first question and each question in the N questions; further, the first server may determine the maximum similarity in the first similarity set, and then take the problem corresponding to the maximum similarity as a problem, which is a second problem, of the N problems that matches the first problem.
Implementation mode (b):
after obtaining the first similarity set, the first server may determine whether the maximum similarity is greater than a first threshold. If the maximum similarity is greater than the first threshold, the first server takes the problem corresponding to the maximum similarity as a problem matched with the first problem in the N problems, that is, a second problem, which indicates that the second problem is found, that is, the first problem is hit, and the first server may execute S106; if the maximum similarity is not greater than the first threshold, it indicates that the second problem is not found, and the first server may perform S107.
Implementation mode (c):
after the first server obtains the first similarity set, the first server can screen out the problems corresponding to the similarity with the similarity larger than a preset threshold value from the N problems to obtain one or more problems matched with the first problem, and one of the problems is selected as a second problem matched with the first problem; and if the similar question set is empty, determining that the second question is not found in the question-answer database.
Alternatively, when the first server receives the same first question sent by the first terminal multiple times, the first server may return first response information corresponding to any one of one or more questions matched with the first question to the first terminal, so that different response information may be given for the same question.
In practical applications, the first threshold or the set threshold may be determined according to the accuracy of the problem to be matched.
In the above three implementation manners, the embodiment of the present application takes calculating the similarity between the first question and the third question as an example to describe a calculation manner of the similarity between the first question and any one of the N questions in the question and answer database. Wherein the third question may be any one of the N questions. Referring to fig. 5, fig. 5 is a method for calculating the similarity between the first question and the third question, which can be performed in the first server 102 shown in fig. 1, and the method for calculating the similarity between the first question and the third question includes the following steps.
S1051, carrying out word segmentation processing on the first question and the third question to obtain a first word set.
The word segmentation processing is to split the first question and the third question into one or more independent words, and the words may be nouns, verbs, adjectives, or words of any other part of speech. The first word set refers to a set of words obtained by splitting the first question and the third question, wherein the first word set does not include repeated words.
For example, the first problem: this box is expensive and that price is suitable. The first question is participled to obtain { this, box, price, expensive, that, price, appropriate }. The third problem is that: the box is not inexpensive and the one that is more appropriate, the third question is participled to obtain { this, box, price, not, inexpensive, one that is more appropriate }. At this point, the first set of words { this, box, price, expensive, that, fit, not, cheap, better } is available.
Each word in the first set of words has a word frequency of { this 1, box 1, price 2,1, that 1, proper 1, not 0, cheap 0, more 0} respectively, then the word frequency vector of the first question can be represented as (1,1,2,1,1,1,0,0, 0).
Each word in the first set of words has a word frequency of { this 1, case 1, price 1,0, that 1, proper 1, not 1, cheap 1, more 1} in the third question, respectively, the word frequency vector of the third question can be represented as (1,1,1,0,1,1, 1).
And S1052, representing the word frequency vector of the first question and the word frequency vector of the third question in a space formed by the first word set.
In one implementation of the embodiment of the present application, the first server may calculate a word frequency of each word in the first word set in the first question, and calculate a word frequency of each word in the first word set in the third question.
The word frequency of each word in the first word set in the first question refers to the number of times each word in the first word set appears in the first question. For example, the word frequency of each word in the first set of words in the first question is { this 1, case 1, price 2,1, that 1,1, fit, not 0,0, less 0, more 0}, and the word frequency vector of the first question is (1,1,2,1,1, 0,0, 0).
The word frequency of each word in the first word set in the third question refers to the number of times each word in the first word set appears in the third question. For example, the word frequency of each word in the first word set in the third question is { this 1, case 1, price 1,0, that 1,1, not 1,1,1, 1}, and the word frequency vector a of the third question is (1,1,1,0,1,1,1,1, 1).
And S1053, calculating the similarity between the first problem and the third problem.
Wherein the similarity represents the degree of similarity between two questions. Optionally, the similarity between the first problem and the third problem may be calculated by a cosine similarity formula, that is, the cosine similarity between the word frequency vector of the first problem and the word frequency vector of the third problem is calculated, so as to obtain the similarity between the first problem and the third problem. Wherein, the cosine similarity formula is:
Figure BDA0002183080050000151
wherein xiThe ith component, y, in the word frequency vector representing the first questioniThe ith component vector in the word frequency vector representing the third question. Wherein i is a positive integer, n is a positive integer, 1<=i<N is the length of the word frequency vector of the first and third questions for calculating the similarity. For example, calculating the similarity between the first question and the third question according to the word frequency vector of the first question and the word frequency vector of the third question can obtain: cos θ is 0.81.
Optionally, the similarity may also be obtained by other methods for calculating the similarity, for example, the Jaccard distance and the Dice coefficient, that is, the similarity between two questions may be calculated according to the number of the same words, which is not limited herein.
By the above method, the similarity of the first question and each of the N questions can be obtained.
S106, the first server sends first response information corresponding to the second question in the question-answer database to the first terminal.
S107, the first server generates response information aiming at the first question through the QA system.
In some possible implementations, the generation of the corresponding response information by the QA system to the first question includes question understanding, document retrieval, and answer extraction. Firstly, the problem understanding refers to classifying the problems, understanding the semantics of the problems, generating a corresponding problem concept graph for the problems, secondly, performing document retrieval according to the understanding of the problems, wherein the document retrieval refers to retrieving the related contents of the problems in a network knowledge base, and finally, performing answer extraction according to the retrieval result, wherein the answer extraction refers to clustering the concept graphs and sorting the answers so as to obtain the response information corresponding to the problems.
Optionally, the manner in which the QA system generates the corresponding response information for the question may also be other manners, which is not limited herein.
S108, the first server sends response information generated by the QA system to the first terminal.
S109, the first terminal receives the response information sent by the first server.
The received response information may be first response information corresponding to the second question in the question-answer database, or may be response information generated by the QA system for the first question.
And S110, the first terminal outputs the received response information.
The manner of outputting the response message by the first terminal may include, but is not limited to, outputting via a display, outputting via a sound, outputting via a microphone, and outputting via other manners, and the output form of the first terminal is not limited herein.
It should be noted that, in some embodiments of the present application, steps S107 to S108 are not necessary, and steps S107 to S108 may not be included.
In this embodiment of the application, the N questions in the question-and-answer database and the first answer information corresponding to the N questions may be pre-stored by the first server, or may be a database generated by continuously updating the questions and the first answer information corresponding to the questions. One implementation of the question-answer database update or generation is described below.
As shown in fig. 6, fig. 6 is a schematic flowchart of a process of updating or generating a question-answer database according to an embodiment of the present invention. The method for updating the question-answer database may be implemented by the dialog system shown in fig. 1, where the first server may be the first server 102, the second server may be the second server 103, and the first server and the second server may also be the same device; the question-answer database may be question-answer database 104; the second terminal may be the second terminal 105, and the second terminal may be the same device as the first terminal. As shown in fig. 6, the method may include the following steps.
S601, the first server acquires a first problem set.
In some embodiments, the first server 102 may obtain the questions of the historical responses of the first server 102 or other questions in a database, such as questions in the QQ chat records of multiple users, and the first server 102 may perform a deduplication operation on the questions of the historical responses to obtain the first question set.
It should be understood that the first set of questions is a set of questions that the first server has historically answered, the first set of questions not including duplicate questions. Repeated questions may be included in the questions of the historical responses.
S602, the first server divides the problems in the first problem set into L problem groups, wherein L is a positive integer. S602 may include, but is not limited to, the following implementation manner, as shown in fig. 7, where fig. 7 is a method for grouping a first question set, and an implementation manner of S602 may include the following steps:
and S6021, calculating the similarity between the fourth question and each question in the second question set.
The second question set is a set of questions which are not grouped currently in the first question set, and the fourth question is one question in the second question set.
It should be appreciated that in obtaining the first set of questions, the fourth question is any one of the first set of questions and the second set of questions is the first set of questions.
And S6022, dividing the questions in the second question set, wherein the similarity between the questions and the fourth question is larger than a third threshold value, into a question group.
And S6023, judging whether the number of the second problem sets is less than M.
Where M is a positive integer, such as 1, 3, 6,2, 4, and the like. And when the number of the second problem set is not less than M, continuously calculating the similarity between the fourth problem and each problem in the second problem set, and when the number of the second problem set is less than M, stopping grouping, wherein all obtained groups are L problem groups. The third threshold for implementing the above-described S602 is set according to the degree of similarity between the required questions, and the higher the requirement for the degree of similarity between the required questions is, the closer the set value of the third threshold is to 1.
The method for calculating the similarity between two questions in the implementation manner of S6021 may include, but is not limited to, the following implementation method, as shown in fig. 8, where fig. 8 is a method for calculating the similarity between two questions in a grouping process, and the method may be implemented in the first server 102 shown in fig. 1, and the method includes the following partial or all steps.
And S60211, screening a plurality of keywords from the fourth question and the fifth question.
In one implementation of the embodiment of the present application, the first server may determine the keyword of the fourth question according to the weight of each word in the fourth question, and determine the keyword of the fifth question according to the weight of each word in the fifth question. In a specific implementation, the keywords of the fourth question may be S keywords with the highest weight among all the words in the fourth question. The keyword of the fifth question may be S keywords having the highest weight among all the words in the fifth question. Wherein, S is a positive integer, and the value of S depends on the accuracy of the similarity required by the fourth problem and the fifth problem.
The combination of the keyword of the fourth question and the keyword of the fifth question is the plurality of keywords. Wherein the weight of the first term in the fourth question represents the contribution of the first term to the semantics of the fourth question, the first term being any one of the terms in the fourth question; the weight of the second term in the fifth question represents the contribution of the second term to the semantics of the fifth question, the second term being any one of the terms in the fifth question.
The weight W of the first word in the fourth question may be represented by the TF-IDF of the first word R1 in the fourth question q4R1,q4
WR1,q4=TFR1,q4×IDFR1,Q
Wherein, TFR1,q4Indicates the word frequency of the first word R1 in the fourth question q4, indicating how important the first word R1 is to the semantics of the fourth question q 4.
IDFR1,QRepresents the inverse file frequency of the first term R1 in the first question set Q, indicating the category-distinguishing capability of the first term R1 in the first question set Q.
It should be understood that the calculation manner of the weight of the second word in the fifth question may refer to the related description of the calculation manner of the weight of the first word in the fourth question, and will not be described herein again.
For example, the fourth problem is: i like listening to music. The fifth problem is: i dislike writing brushes and pens. And respectively sequencing the words in the fourth problem and the fifth problem according to the weights from large to small, and screening the words ranked in the first three in each problem as keywords. For example, the keyword in the fourth question is { i, like, music }, and the keyword in the fifth question is { writing brush, pen, not }. At this time, the obtained multiple keywords are { I, like, music, writing brush, pen, not }.
S60212 shows the word frequency vector of the fourth question and the word frequency vector of the fifth question in the space vector composed of the plurality of keywords.
The word frequency vector of the fourth question is a vector expression of the fourth question in a space formed by the plurality of keywords; similarly, the word frequency vector of the fifth question is a vector expression of the fifth question in the space composed of the plurality of keywords. The word frequency vector of the fourth question is obtained according to the frequency of each word in the plurality of keywords appearing in the fourth question, and the word frequency vector of the fifth question is obtained according to the frequency of each word in the plurality of keywords appearing in the fifth question.
For example, the frequency of occurrence of "i" in the fourth question in the keyword set is "1". Similarly, the frequency of the other words in the multiple keywords appearing in the fourth question can be obtained, so that the word frequency vector of the fourth question is (1,1,1,0, 0, 0). Similarly, the word frequency vector of the fifth problem is {1, 2, 0,1,1, 2 }.
And S60213, calculating cosine similarity of the word frequency vector of the fourth question and the word frequency vector of the fifth question to obtain similarity of the fourth question and the fifth question.
The cosine similarity may be calculated as described above with reference to step S704. And calculating the similarity between the fourth question and the fifth question according to the word frequency vector of the fourth question and the word frequency vector of the fifth question. For example, calculating the similarity between the fourth question and the fifth question according to the word frequency vector of the fourth question and the word frequency vector of the fifth question can obtain: cos θ is 0.52.
Optionally, the similarity implementation manner for calculating the two problems in the implementation manner may also be implemented based on the key phrase classification of TextRank, and other methods for calculating the similarity may also be adopted, which are not limited herein.
S603, the first server screens K problem groups from the L problem groups.
The first server screens K problem groups of which the total frequency is greater than a second threshold value from the L problem groups according to the total frequency of each problem group in the L problem groups, wherein the total frequency of the first problem group is the sum of the frequency of the first server for responding to each problem in the first problem group in history, and the first problem group is any one problem group in the L problem groups. Wherein the total frequency of each problem group is the total number of problems in the problem group. The second threshold is adjusted based on the number of similar questions in the set of questions to be found, for example, the second threshold is 5,6,4,7, and so on, K is a positive integer, and K is different according to the second threshold.
S604, the first server generates at least one corresponding response message for each question in the K question groups. The at least one response message generated for each question may be generated by the QA system, or may be generated by another dialogue robot, which is not limited herein.
S605, the first server sends the questions in the K question groups and at least one response message corresponding to each question to the second server.
S606, the second server receives the questions in the K question groups and at least one piece of response information corresponding to each question.
S607, the second server sends the sixth question and at least one response message corresponding to the sixth question to the second terminal.
Wherein the sixth question is any one of the K question groups.
And S608, the second terminal receives and displays the sixth question and the calibration interface of at least one response message corresponding to the sixth question.
In some embodiments of the application, the second terminal displays the calibration interface as shown in fig. 3, which may specifically refer to the related description in fig. 3, and details are not repeated here. It should be noted that the display of the sixth question and at least one piece of response information corresponding to the sixth question by the second terminal is not limited to the display mode shown in fig. 3.
And S609, the second terminal receives the sixth question and the first response information corresponding to the sixth question.
Here, referring to fig. 3, the first response information corresponding to the sixth question may be selected response information in at least one response information corresponding to the sixth question or response information input for the sixth question.
S610, the second terminal sends the sixth question and the first response message corresponding to the sixth question to the second server.
S611, the second server receives the sixth question sent by the second terminal and the first response information corresponding to the sixth question.
And S612, the second server sends the sixth question and the first response information corresponding to the sixth question to the first server.
S613, the first server receives the sixth question and the first response information corresponding to the sixth question.
And the first server updates the received sixth question and the first response information corresponding to the sixth question to the question-answer database.
It should be noted that, in some embodiments of the present application, the first server and the second server may be the same device, and steps S605, S606, and S612 may not be executed. Alternatively, the second terminal and the first terminal may be the same device.
The following describes apparatuses and devices according to embodiments of the present application.
Fig. 9a is a schematic structural diagram of a dialog device according to an embodiment of the present invention. As shown in fig. 9a, the dialog apparatus 900a may be applied to the first server in the corresponding embodiment of fig. 4 or fig. 6, and the apparatus 900 may include:
a receiving unit 901, configured to receive a session request sent by a first terminal, where the session request is used to request response information corresponding to a first question from the server;
a searching unit 902, configured to search a second question matched with the first question in N questions of a question and answer database, where the question and answer database includes N questions and first answer information corresponding to each question of the N questions, and N is a positive integer;
a sending unit 903, configured to send, to the first terminal, first response information corresponding to the second question in the question-and-answer database when the second question is found;
a first generating unit 904, configured to generate response information for the first question through a question answering system (QA system) when the second question is not found;
the sending unit 903 is further configured to send the generated response information to the first terminal.
In an implementation of the embodiment of the present application, the searching unit 902 is specifically configured to:
determining a maximum similarity in a first similarity set according to the similarity between the first question and each question in the N questions, wherein the first similarity set comprises the similarity between the first question and each question in the N questions;
if the maximum similarity is larger than a first threshold, determining that the second problem is the problem corresponding to the maximum similarity;
and if the maximum similarity is smaller than the first threshold, determining that the second question is not found in the question-answer database.
A third question is any one of the N questions, the method further comprising:
and calculating the cosine similarity of the word frequency vector of the first question and the word frequency vector of the third question to obtain the similarity of the first question and the third question. Fig. 9b is a schematic structural diagram of another dialog apparatus according to an embodiment of the present invention. As shown in fig. 9b, in an implementation of the embodiment of the present application, the apparatus 900b may further include, in addition to the various units in the apparatus 900a described above:
an obtaining unit 905 configured to obtain a first problem set; the first question set is a set of questions historically responded to by the server;
a grouping unit 906, configured to group the questions in the first question set to obtain L question groups;
a screening unit 907, configured to screen K problem groups of which total frequency is greater than a second threshold from the L problem groups according to the total frequency of each of the L problem groups, where the total frequency of a first problem group is a sum of frequencies of the server that historically responds to each problem in the first problem group, and the first problem group is any one of the L problem groups.
In an implementation of the embodiment of the present application, the grouping unit 906 is specifically configured to:
calculating the similarity between a fourth question and each question in a second question set, wherein the second question set is a set of questions which are not grouped currently in the first question set, and the fourth question is one question in the second question set;
and dividing the questions in the second question set, the similarity of which to the fourth question is greater than a third threshold value, into a question group.
In an implementation of the embodiment of the present application, the fifth question is any one question in the second question set, and when the grouping unit 906 is configured to calculate the similarity between the fourth question and each question in the second question set, it is specifically configured to:
determining keywords of the fourth question according to the weight of each word in the fourth question, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth question, and the first word is any word in the fourth question;
determining keywords of the fifth question according to the weight of each word in the fifth question, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth question, and the second word is any word in the fifth question;
determining a word frequency vector of the fourth question and a word frequency vector of the fifth question according to the keywords of the fourth question and the keywords of the fifth question;
and calculating the cosine similarity of the word frequency vector of the fourth question and the word frequency vector of the fifth question to obtain the similarity of the fourth question and the fifth question.
In one implementation of the embodiment of the present application, the apparatus 900a or the apparatus 900b may further include:
a second generating unit 908, configured to generate at least one response message for a sixth question, where the sixth question is any one of the K question groups;
the sending unit 903 is further configured to send the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message, and sends a first response message corresponding to the sixth question to the server, where the first response message corresponding to the sixth question is a selected response message in the at least one response message received by the second terminal, or a response message input for the sixth question is received by the second terminal;
the receiving unit 901 is further configured to receive first response information corresponding to the sixth question, and update the sixth question and the first response information corresponding to the sixth question to the question-answer database.
It should be understood that, for specific functional implementation manners of the above-mentioned functional units, reference may be made to the related description in the corresponding embodiment of fig. 4 or fig. 6, and details are not described here again.
Fig. 10 is a schematic structural diagram of a dialog device according to an embodiment of the present invention. As shown in fig. 10, the dialog apparatus 1000 may be applied to the first server in the corresponding embodiment of fig. 4 or fig. 6, and the apparatus 1000 may include:
a generating unit 1001, configured to generate a dialogue request according to an input first question, where the dialogue request is used to request a server for response information corresponding to the first question;
a sending unit 1002, configured to send a session request to the server, so that after receiving the session request, the server searches for a second question matched with the first question in N questions in a question-and-answer database, sends, by the server, first response information corresponding to the second question in the question-and-answer database to the first terminal when the second question is found, and generates response information for the first question through a QA system and sends the generated response information to the first terminal when the second question is not found; the question-answer database comprises N questions and first answer information corresponding to each question in the N questions, wherein N is a positive integer;
a receiving unit 1003, configured to receive and output the received response information.
In one implementation of the embodiment of the present application, the apparatus 1000 further includes: the receiving unit 1003 is further configured to receive and display a sixth question sent by the server and at least one piece of response information corresponding to the sixth question;
the receiving sheet 1003 is further configured to receive first response information corresponding to the sixth question, where the first response information corresponding to the sixth question is selected response information in the first response information set or response information input for the sixth question;
the sending unit 1002 is further configured to send the sixth question and the first response information corresponding to the sixth question to the server, so that the server updates the sixth question and the first response information corresponding to the sixth question to the question and answer database.
It should be understood that, for specific functional implementation manners of the above-mentioned functional units, reference may be made to the related description in the corresponding embodiment of fig. 4 or fig. 6, and details are not described here again.
Fig. 11 is a schematic structural diagram of another dialog apparatus 1100 according to an embodiment of the present invention. The session apparatus 1100 may specifically be the first server 102 in fig. 1, and may include: a processor 1101, a bus 1102, a user interface 1103, a network interface 1104, and a memory 1105. Wherein a communication bus 1102 is used to enable connective communication between these components. The user interface 1103 may optionally include a display screen, a keyboard. The network interface 1104 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). As shown in fig. 11, the memory 1105, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program, which may be executed when the apparatus 1100 is run. In the session device 1100 shown in fig. 11, the network interface 1104 can provide network communication functions; and the processor 1101 may be configured to invoke a device control application stored in the memory 1105 to implement:
receiving a dialogue request sent by a first terminal through a network interface 1104, wherein the dialogue request is used for requesting response information corresponding to a first question from the server;
searching a second question matched with the first question in N questions of a question-and-answer database, wherein the question-and-answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
under the condition that the second question is found, sending first response information corresponding to the second question in the question-answer database to the first terminal through a network interface 1104;
when the second question is not found, response information for the first question is generated by a question and answer system (QA system) and transmitted to the first terminal through the communication interface 902.
In an implementation of the embodiment of the present application, when the third question is any one of the N questions, the processor 1101 is further configured to:
and calculating the cosine similarity of the word frequency vector of the first question and the word frequency vector of the third question to obtain the similarity of the first question and the third question.
In one implementation of the embodiment of the present application, the processor 1101 is further configured to:
acquiring a first question set, wherein the first question set is a set of questions historically responded by the server;
grouping the problems in the first problem set to obtain L problem groups;
and screening K problem groups with the total frequency greater than a second threshold value from the L problem groups according to the total frequency of each problem group in the L problem groups, wherein the total frequency of a first problem group is the sum of the frequency of the server for responding to each problem in the first problem group in history, and the first problem group is any one problem group in the L problem groups.
In an implementation of the embodiment of the present application, at least one response message is generated for a sixth question, where the sixth question is any one of the K question groups, and the processor 1101 is further configured to:
sending the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message and sends a first response message corresponding to the sixth question to the server, wherein the first response message corresponding to the sixth question is a selected response message in the at least one response message received by the second terminal or a response message input aiming at the sixth question received by the second terminal;
receiving the first response information corresponding to the sixth question through a network interface 1104, and updating the sixth question and the first response information corresponding to the sixth question to the question-answer database.
It should be noted that the receiving unit 901, the sending unit 903, and the obtaining unit 905 in fig. 9a or 9b may be implemented by the network interface 1104 in fig. 11, and the searching unit 902, the first generating unit 904, the grouping unit 906, the screening unit 907, and the second generating unit 908 in fig. 9a or 9b may be implemented by the processor 1104 in fig. 11.
It should be understood that the dialog apparatus 1100 described in the embodiment of the present invention can perform the description of the dialog method in the embodiment corresponding to any one of fig. 4 and fig. 6, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Referring to fig. 12, fig. 12 is a schematic structural diagram of another dialog apparatus 1200 according to an embodiment of the present invention. As shown in fig. 12, the dialog apparatus 1200 may correspond to the first terminal 101 in the embodiment corresponding to fig. 1, and the dialog apparatus 1200 may include: the processor 1201, the network interface 1204 and the memory 1205, and the dialog device 1022 may further include: a user interface 1203, and at least one communication bus 1202. Wherein a communication bus 1202 is used to enable connective communication between these components. The user interface 1203 may include a Display screen (Display) and a Keyboard (Keyboard), and optionally, the user interface 1203 may also include a standard wired interface and a standard wireless interface. The network interface 1204 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1204 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1205 may also optionally be at least one storage device located remotely from the processor 1201 described previously. As shown in fig. 12, a memory 1201, which is a type of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the dialog apparatus 1200 shown in fig. 12, the network interface 1204 may provide a network communication function; and user interface 1203 is primarily an interface for providing input to a user; and the processor 1201 may be configured to invoke the device control application stored in the memory 1205 to perform:
generating a dialogue request according to an input first question, wherein the dialogue request is used for requesting response information corresponding to the first question from the server;
a dialogue request is sent to a server, so that after the server receives the dialogue request, the server searches a second question matched with the first question in N questions in a question-answer database, and sends first answer information corresponding to the second question in the question-answer database to the first terminal under the condition that the second question is found, wherein the question-answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
and receiving and outputting the first response information.
It should be noted that the receiving unit 1003 and the sending unit 1002 in fig. 10 may be implemented by the network interface 1104 in fig. 12, and the generating unit 1001 in fig. 10 may be implemented by the processor 1104 in fig. 12.
It should be understood that the dialog apparatus 1200 described in the embodiment of the present invention may perform the description of the dialog method in the embodiment corresponding to any one of fig. 4 and fig. 6, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores the aforementioned computer program executed by the session device 900a, 900b, or 1100, and the computer program includes program instructions, and when the processor executes the program instructions, the method executed by the first server in the embodiment corresponding to fig. 4 or fig. 6 can be executed, which will not be described herein again.
Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores the aforementioned computer program executed by the dialog apparatus 1000 or 1200, and the computer program includes program instructions, and when the processor executes the program instructions, the method executed by the first terminal in the embodiment corresponding to fig. 4 or fig. 6 can be executed, which will not be described again here.
In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (10)

1. A conversation method, applied to a server, comprising:
receiving a conversation request sent by a first terminal, wherein the conversation request is used for requesting response information corresponding to a first question from the server;
searching a second question matched with the first question in N questions of a question-and-answer database, wherein the question-and-answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
under the condition that the second question is found, sending first response information corresponding to the second question in the question-answer database to the first terminal;
when the second question is not found, generating response information for the first question by a question answering system (QA system) and transmitting the generated response information to the first terminal.
2. The method according to claim 1, wherein said searching for a second question that matches said first question among N questions in a question-and-answer database comprises:
determining a maximum similarity in a first similarity set according to the similarity between the first question and each question in the N questions, wherein the first similarity set comprises the similarity between the first question and each question in the N questions;
if the maximum similarity is larger than a first threshold, determining that the second problem is the problem corresponding to the maximum similarity;
and if the maximum similarity is smaller than the first threshold, determining that the second question is not found in the question-answer database.
3. The method of claim 1 or 2, wherein the third question is any one of the N questions, and the method further comprises:
and calculating the cosine similarity of the word frequency vector of the first question and the word frequency vector of the third question to obtain the similarity of the first question and the third question.
4. The method of any of claims 1-3, further comprising:
acquiring a first question set, wherein the first question set is a set of questions historically responded by the server;
grouping the problems in the first problem set to obtain L problem groups;
and screening K problem groups with the total frequency greater than a second threshold value from the L problem groups according to the total frequency of each problem group in the L problem groups, wherein the total frequency of a first problem group is the sum of the frequency of the server for responding to each problem in the first problem group in history, and the first problem group is any one problem group in the L problem groups.
5. The method of claim 4, wherein grouping the questions in the first question set to obtain L question groups comprises:
calculating the similarity between a fourth question and each question in a second question set, wherein the second question set is a set of questions which are not grouped currently in the first question set, and the fourth question is one question in the second question set;
and dividing the questions in the second question set, the similarity of which to the fourth question is greater than a third threshold value, into a question group.
6. The method according to claim 5, wherein the fifth question is any one question in the second question set, and the calculating the similarity between the fourth question and each question in the second question set specifically comprises:
determining keywords of the fourth question according to the weight of each word in the fourth question, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth question, and the first word is any word in the fourth question;
determining keywords of the fifth question according to the weight of each word in the fifth question, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth question, and the second word is any word in the fifth question;
determining a word frequency vector of the fourth question and a word frequency vector of the fifth question according to the keywords of the fourth question and the keywords of the fifth question;
and calculating the cosine similarity of the word frequency vector of the fourth question and the word frequency vector of the fifth question to obtain the similarity of the fourth question and the fifth question.
7. The method of claim 4, further comprising:
generating at least one response message for a sixth question, wherein the sixth question is any one question in the K question groups;
sending the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message, and sends a first response message corresponding to the sixth question to the server, wherein the first response message corresponding to the sixth question is a selected response message in the at least one response message received by the second terminal or a response message input by the second terminal for the sixth question;
and receiving first response information corresponding to the sixth question, and updating the sixth question and the first response information corresponding to the sixth question to the question-answer database.
8. A conversation device, applied to a server, comprising:
the terminal comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a conversation request sent by a first terminal, and the conversation request is used for requesting response information corresponding to a first question;
the search unit is used for searching a second question matched with the first question in N questions of a question and answer database, the question and answer database comprises N questions and first answer information corresponding to each question in the N questions, and N is a positive integer;
a sending unit, configured to send, to the first terminal, first answer information corresponding to the second question in the question-answer database when the searching unit finds the second question;
a first generating unit configured to generate response information for the first question by a question answering system (QA system) when the second question is not found;
the sending unit is further configured to send the generated response information to the first terminal.
9. A dialog device comprising a processor and a memory, the processor being coupled to the memory, wherein the memory is adapted to store program code and the processor is adapted to call the program code to implement the method of any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program or computer instructions are stored which, when executed, implement the method according to any one of claims 1 to 7.
CN201910806215.5A 2019-08-28 2019-08-28 Dialogue method, related device and equipment Active CN110795542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806215.5A CN110795542B (en) 2019-08-28 2019-08-28 Dialogue method, related device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806215.5A CN110795542B (en) 2019-08-28 2019-08-28 Dialogue method, related device and equipment

Publications (2)

Publication Number Publication Date
CN110795542A true CN110795542A (en) 2020-02-14
CN110795542B CN110795542B (en) 2024-03-15

Family

ID=69427065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806215.5A Active CN110795542B (en) 2019-08-28 2019-08-28 Dialogue method, related device and equipment

Country Status (1)

Country Link
CN (1) CN110795542B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339282A (en) * 2020-03-27 2020-06-26 中国建设银行股份有限公司 Intelligent online response method and intelligent customer service system
CN111537968A (en) * 2020-05-12 2020-08-14 江铃汽车股份有限公司 Angle radar calibration method and system
CN111694941A (en) * 2020-05-22 2020-09-22 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN112632239A (en) * 2020-12-11 2021-04-09 南京三眼精灵信息技术有限公司 Brain-like question-answering system based on artificial intelligence technology
CN112800209A (en) * 2021-01-28 2021-05-14 上海明略人工智能(集团)有限公司 Conversation corpus recommendation method and device, storage medium and electronic equipment
CN112818225A (en) * 2021-01-27 2021-05-18 上海明略人工智能(集团)有限公司 Display method and device of pushed data
CN113283238A (en) * 2021-05-19 2021-08-20 上海明略人工智能(集团)有限公司 Text data processing method and device, electronic equipment and storage medium
CN113360626A (en) * 2021-07-02 2021-09-07 北京容联七陌科技有限公司 Multi-scene mixed question-answer recommendation method for intelligent customer service robot
CN116860951A (en) * 2023-09-04 2023-10-10 贵州中昂科技有限公司 Information consultation service management method and management system based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217515A (en) * 2008-01-03 2008-07-09 腾讯科技(深圳)有限公司 A system and method based on question sorting and push
CN102789496A (en) * 2012-07-13 2012-11-21 携程计算机技术(上海)有限公司 Method and system for implementing intelligent response
US20140358890A1 (en) * 2013-06-04 2014-12-04 Sap Ag Question answering framework
CN108170792A (en) * 2017-12-27 2018-06-15 北京百度网讯科技有限公司 Question and answer bootstrap technique, device and computer equipment based on artificial intelligence
CN108491433A (en) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 Chat answer method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217515A (en) * 2008-01-03 2008-07-09 腾讯科技(深圳)有限公司 A system and method based on question sorting and push
CN102789496A (en) * 2012-07-13 2012-11-21 携程计算机技术(上海)有限公司 Method and system for implementing intelligent response
US20140358890A1 (en) * 2013-06-04 2014-12-04 Sap Ag Question answering framework
CN104216913A (en) * 2013-06-04 2014-12-17 Sap欧洲公司 Problem answering frame
CN108170792A (en) * 2017-12-27 2018-06-15 北京百度网讯科技有限公司 Question and answer bootstrap technique, device and computer equipment based on artificial intelligence
CN108491433A (en) * 2018-02-09 2018-09-04 平安科技(深圳)有限公司 Chat answer method, electronic device and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339282A (en) * 2020-03-27 2020-06-26 中国建设银行股份有限公司 Intelligent online response method and intelligent customer service system
CN111537968A (en) * 2020-05-12 2020-08-14 江铃汽车股份有限公司 Angle radar calibration method and system
CN111694941B (en) * 2020-05-22 2024-01-05 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN111694941A (en) * 2020-05-22 2020-09-22 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN112632239A (en) * 2020-12-11 2021-04-09 南京三眼精灵信息技术有限公司 Brain-like question-answering system based on artificial intelligence technology
CN112818225A (en) * 2021-01-27 2021-05-18 上海明略人工智能(集团)有限公司 Display method and device of pushed data
CN112800209A (en) * 2021-01-28 2021-05-14 上海明略人工智能(集团)有限公司 Conversation corpus recommendation method and device, storage medium and electronic equipment
CN113283238A (en) * 2021-05-19 2021-08-20 上海明略人工智能(集团)有限公司 Text data processing method and device, electronic equipment and storage medium
CN113283238B (en) * 2021-05-19 2023-12-22 上海明略人工智能(集团)有限公司 Text data processing method and device, electronic equipment and storage medium
CN113360626B (en) * 2021-07-02 2022-02-11 北京容联七陌科技有限公司 Multi-scene mixed question-answer recommendation method for intelligent customer service robot
CN113360626A (en) * 2021-07-02 2021-09-07 北京容联七陌科技有限公司 Multi-scene mixed question-answer recommendation method for intelligent customer service robot
CN116860951A (en) * 2023-09-04 2023-10-10 贵州中昂科技有限公司 Information consultation service management method and management system based on artificial intelligence
CN116860951B (en) * 2023-09-04 2023-11-14 贵州中昂科技有限公司 Information consultation service management method and management system based on artificial intelligence

Also Published As

Publication number Publication date
CN110795542B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN110795542B (en) Dialogue method, related device and equipment
TWI732271B (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
WO2020177282A1 (en) Machine dialogue method and apparatus, computer device, and storage medium
US20200184146A1 (en) Techniques for combining human and machine learning in natural language processing
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN109033156B (en) Information processing method and device and terminal
CN111753060A (en) Information retrieval method, device, equipment and computer readable storage medium
JP2019536119A (en) User interest identification method, apparatus, and computer-readable storage medium
CN109902665A (en) Similar face retrieval method, apparatus and storage medium
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN105808590B (en) Search engine implementation method, searching method and device
KR20160149978A (en) Search engine and implementation method thereof
US11977567B2 (en) Method of retrieving query, electronic device and medium
CN111709223B (en) Sentence vector generation method and device based on bert and electronic equipment
CN111507573A (en) Business staff assessment method, system, device and storage medium
CN108287848B (en) Method and system for semantic parsing
CN111694941B (en) Reply information determining method and device, storage medium and electronic equipment
CN110147494A (en) Information search method, device, storage medium and electronic equipment
CN111881283A (en) Business keyword library creating method, intelligent chat guiding method and device
CN112052297A (en) Information generation method and device, electronic equipment and computer readable medium
US20220108071A1 (en) Information processing device, information processing system, and non-transitory computer readable medium
CN113392640B (en) Title determination method, device, equipment and storage medium
JP7438808B2 (en) Needs matching equipment and programs
CN111581326B (en) Method for extracting answer information based on heterogeneous external knowledge source graph structure
CN111046151B (en) Message processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40020199

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant