CN112115244B

CN112115244B - Dialogue interaction method and device, storage medium and electronic equipment

Info

Publication number: CN112115244B
Application number: CN202010847014.2A
Authority: CN
Inventors: 杨振宇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2024-05-03
Anticipated expiration: 2040-08-21
Also published as: CN112115244A

Abstract

The embodiment of the application discloses a dialogue interaction method, a dialogue interaction device, a storage medium and electronic equipment, and belongs to the technical field of computers. The method is applied to electronic equipment with a built-in preset dialogue model, the electronic equipment analyzes interaction satisfaction of a first user according to a first interaction result, the first interaction result is an interaction result of the preset dialogue model on voice data input by the first user, when the interaction satisfaction is smaller than or equal to a preset threshold, voice data are sent to a server, an interaction instruction corresponding to the voice data sent by the server is received, the interaction instruction and the voice data are used as sample data to conduct optimization training on the preset dialogue model, new sample data for optimizing training on the preset dialogue model are obtained in a mode that the server searches answer data corresponding to the voice data, accordingly labor participation is reduced, and the process of optimizing training on the preset dialogue model is more intelligent.

Description

Dialogue interaction method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for dialogue interaction, a storage medium, and an electronic device.

Background

With the development of artificial intelligence technology, man-machine conversation has been developed in the fields of intelligent home, intelligent navigation, intelligent assistant and the like. The man-machine conversation process mainly comprises the following steps: the user interacts with the dialogue model using natural language, and the dialogue model parses natural language input from the user and provides corresponding output. In the related art, a conversation model of a man-machine conversation can be obtained through training of a large amount of sample data, but in the subsequent process of using the conversation model, the situation that voice data input by a user cannot be identified possibly exists, new sample data can be obtained after processing the interactive content of the conversation of the user and the conversation model in a manual marking mode, the conversation model is optimized and trained by utilizing the new sample data, and the new sample data can be obtained in a manual marking mode, so that higher labor cost is caused.

Disclosure of Invention

The embodiment of the application provides a dialogue interaction method, a dialogue interaction device, a storage medium and electronic equipment, which can solve the problem of high labor cost for acquiring sample data for optimizing and training a dialogue model in a man-machine dialogue in the related technology. The technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a dialogue interaction method, where the method includes:

Analyzing the interaction satisfaction degree of the first user according to the first interaction result; the first interaction result is an interaction result of a preset dialogue model on voice data input by the first user;

when the interaction satisfaction is smaller than or equal to a preset threshold value, sending the voice data to a server, and receiving an interaction instruction corresponding to the voice data sent by the server;

And carrying out optimization training on the preset dialogue model by taking the interaction instruction and the voice data as sample data.

In a second aspect, an embodiment of the present application provides a dialogue interaction apparatus, where the apparatus includes:

the analysis module is used for analyzing the interaction satisfaction degree of the first user according to the first interaction result; the first interaction result is an interaction result of a preset dialogue model on voice data input by the first user;

the processing module is used for sending the voice data to a server and receiving an interaction instruction corresponding to the voice data sent by the server when the interaction satisfaction is smaller than or equal to a preset threshold value;

and the training module is used for optimally training the preset dialogue model by taking the interaction instruction and the voice data as sample data.

In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: the device comprises a processor, a memory and a display screen; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by the embodiments of the application has the beneficial effects that at least:

When the scheme of the embodiment of the application is executed, the electronic equipment analyzes the interaction satisfaction degree of the first user according to the first interaction result, wherein the first interaction result is the interaction result of the preset dialogue model for outputting the voice data input by the first user, when the interaction satisfaction degree is smaller than or equal to the preset threshold value, the voice data is sent to the server, the interaction instruction corresponding to the voice data sent by the server is received, the interaction instruction and the voice data are used as sample data for carrying out optimization training on the preset dialogue model, and the new sample data for optimizing training on the preset dialogue model is obtained in a mode of searching answer data corresponding to the voice data by the server, so that the workload of manual participation is reduced, and the process of optimizing training on the preset dialogue model is more intelligent.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture for conversational interaction according to an embodiment of the application;

FIG. 2 is a schematic flow chart of a dialogue interaction method according to an embodiment of the present application;

FIG. 3 is another flow chart of a dialogue interaction method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a dialogue interaction device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application as detailed in the accompanying claims.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Referring to fig. 1, a schematic system architecture of a dialogue interaction provided by an embodiment of the present application includes a user 101, a terminal device 102, and a server 103. The terminal device 102 is internally provided with a preset dialogue model for man-machine dialogue, the terminal device 102 can perform dialogue interaction with the user 101 based on the preset dialogue model, the terminal device 102 can also send query data related to the user dialogue interaction to the server 103, and correspondingly, the server 103 can send a query result to the terminal device 102 after querying/searching the query data sent by the terminal device 102.

The terminal device 102 may be various electronic devices with speakers, and the terminal device 102 has an original trained preset dialogue model built therein, so that the user 102 can perform dialogue interaction with the terminal device 102; terminal devices 102 include, but are not limited to, smart speakers, smart phones, tablet computers, portable computers, and desktop computers, among others, that have smart voice capabilities. The terminal device 102 may be hardware or software. When the terminal device 102 is software, it may be installed in the above-mentioned electronic device, and it may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not specifically limited herein. When the terminal device 102 is hardware, a display device may be further installed thereon, and the display device may be various devices capable of realizing a display function, such as: the display device may be a cathode ray tube display (Cathode ray tubedisplay, CR), a Light-emitting diode display (Light-emitting diode display, LED), an electronic ink screen, a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), a plasma display panel (PLASMA DISPLAYPANEL, PDP), or the like. The terminal device 102 may also interact with the server 103 in data communication, the terminal device 102 may query the server 103 for data and receive query result data sent by the server 103, and the server 103 may also send query result data to the terminal device 102 and receive query data sent by the terminal device 102.

The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein.

It should be understood that the number of users, terminal devices and servers in fig. 1 is merely illustrative. Any number of users, terminal devices and servers may be used according to the actual needs.

The following describes in detail the dialogue interaction method provided in the embodiment of the present application with reference to fig. 2 to 3.

Referring to fig. 2, a flow chart of a dialogue interaction method is provided in an embodiment of the application. The present embodiment is exemplified by a dialogue interaction method applied to an electronic device (terminal), and the dialogue interaction method may include the steps of:

s201, analyzing the interaction satisfaction degree of the first user according to the first interaction result.

The first interaction result is an interaction result of the preset dialogue model on voice data input by the first user, namely dialogue content data generated by the preset dialogue model. The first user refers to a user who inputs the voice data to a preset dialog model for the first time. The "first" is to distinguish between users who subsequently input the same voice data to a preset dialog model. Accordingly, the second user refers to a user who inputs voice data similar to or identical to the voice data again to the preset dialog model. The interactive satisfaction refers to the satisfaction degree of the user on the answer content of the preset dialogue model aiming at the voice data of the user, and the interactive satisfaction degree refers to the prediction estimation made by the terminal on the answer content of the preset dialogue model, and is not the actual satisfaction degree of the user.

In general, the terminal may train to obtain a preset dialogue model by using preset sample data, and perform real-time dialogue interaction with the first user based on the trained preset dialogue model, where the preset sample data is sample data that can be trained to obtain an existing preset dialogue model, the sample data is text data input and output by a known dialogue, and the greater the data amount of the sample data, the higher the accuracy of performing man-machine dialogue interaction by the preset dialogue model that can be trained. After the first user inputs voice data to the terminal, the terminal can make a corresponding answer to the voice data input by the first user by using the existing preset dialogue model. And obtaining a first interaction result corresponding to the voice data input by the first user. Meanwhile, the terminal can acquire a first interaction result output by a preset dialogue model aiming at voice data input by a first user, performs semantic analysis on the first interaction result to obtain semantic information corresponding to the first interaction result, determines similarity between the semantic information and preset semantics, and can further determine interaction satisfaction of the first user based on the similarity, namely, predicts whether the user is satisfied with the dialogue interaction process.

Illustrating: the first user performs dialogue interaction with a terminal built-in with a preset dialogue model, and inputs voice data (voice data input by the first user) of ' help me open ' small degree ' to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, the terminal cannot recognize the instruction corresponding to the voice data input by the first user, and correspondingly returns the voice data (namely, the first interaction result) containing 'inexperienced and not understood the meaning' to the first user. Meanwhile, the terminal performs semantic analysis on the voice data returned to the first user, so that semantic information corresponding to the first interaction result is negative semantic containing keywords and/or keywords such as ' inexhaustible, inexhaustible ', and the preset semantic is negative semantic containing keywords and/or keywords such as ' inexhaustible, and the like. Based on the analysis that the similarity of the semantic information corresponding to the first interaction result and the preset semantic is 80%, further, based on the similarity, the interaction satisfaction degree of the first user is determined to be 20%, and the preset threshold value of the interaction satisfaction degree is 60%, the first user can be determined to be unsatisfied with the first interaction result.

S202, when the interaction satisfaction is smaller than or equal to a preset threshold value, voice data are sent to the server, and interaction instructions corresponding to the voice data are received, wherein the interaction instructions are sent by the server.

The preset threshold is a criterion for determining whether the user is satisfied with the interaction result, that is, a minimum lower limit value of the interaction satisfaction that the user's interaction satisfaction needs to satisfy in the case that the user is determined to be satisfied with the interaction result. The voice data refers to dialogue voice data of the first user input terminal. The interactive instructions refer to interactive data for analyzing and/or solving the voice data.

Generally, when determining that the interaction satisfaction degree of the first user is smaller than or equal to a preset threshold, the terminal indicates that the first user is not satisfied with the current interaction result, the terminal analyzes voice data input by the first user to obtain keywords and/or keywords corresponding to the voice data, generates first query data corresponding to the voice data based on the keywords and/or keywords, and sends the first query data to the server. The first query data is text data generated based on voice data input by a first user and containing key information of the voice data. The first query data may instruct the server to search answer data related to the first query data on a preset website through the internet. The preset website refers to a website that can provide a search service or a community service. The preset website can be set by a user at will, and can also be a server default website. After receiving the first query data sent by the terminal, the server can search answer data corresponding to the first query data on a preset website through the Internet, select answer data with highest matching degree or highest confidence degree with the first query data from a plurality of searched answer data as an interaction instruction corresponding to voice data input by the first user, and send the interaction instruction to the terminal so that the terminal can acquire the interaction instruction.

Illustrating: the first user performs dialogue interaction with a terminal built-in with a preset dialogue model, and inputs voice data (voice data input by the first user) of ' help me open ' small degree ' to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, the terminal cannot recognize the instruction corresponding to the voice data input by the first user, and correspondingly returns the voice data (namely, the first interaction result) containing 'inexperienced and not understood the meaning' to the first user. Meanwhile, the terminal performs semantic analysis on the voice data returned to the first user, so that semantic information corresponding to the first interaction result is negative semantic containing keywords and/or keywords such as ' inexhaustible, inexhaustible ', and the preset semantic is negative semantic containing keywords and/or keywords such as ' inexhaustible, and the like. Based on the analysis that the similarity of the semantic information corresponding to the first interaction result and the preset semantic is 80%, further, based on the similarity, the interaction satisfaction degree of the first user is determined to be 20%, and the preset threshold value of the interaction satisfaction degree is 60%, the first user can be determined to be unsatisfied with the first interaction result. Further, when the terminal determines that the first user is not satisfied with the first interaction result, keywords such as "help me open" of the small degree "and" get "open, small degree" of the voice data input by the first user can be analyzed, and text data ""' what meaning "of the small degree" can be correspondingly generated for querying the voice data input by the first user (first query data) based on the keywords. And sending the first query data to a server, wherein the server can search answer data corresponding to the first query data on a preset website based on the first query data, select answer data with highest confidence degree to open a hundred-degree application program based on the confidence degrees corresponding to the searched answer data, and take the answer data as an interactive instruction for voice data ' help me open ' small degree ' input by a first user.

And S203, performing optimization training on a preset dialogue model by taking the interaction instruction and the voice data as sample data.

The sample data is data for training a dialogue model capable of realizing man-machine dialogue, and can be sample data in text and/or voice modes, wherein the sample data comprises dialogue data from the angles of a user and the angles of the dialogue model. The preset dialogue model is a model which can perform man-machine interaction with a user, can be trained by a large amount of sample data based on a preset dialogue algorithm, and can perform real-time dialogue interaction with a terminal with the preset dialogue model.

Generally, after searching answer data matched with voice data input by a first user on a preset website, the server takes the answer data as an interaction instruction corresponding to the voice data and sends the interaction instruction to the terminal. After receiving the interaction instruction, the terminal associates the interaction instruction with the voice data and stores the interaction instruction and the voice data as sample data in a database so as to realize expansion and update of the sample data for training a preset dialogue model. The preset dialogue model can be optimized and trained based on the sample data to obtain an optimized preset dialogue model, and when the optimized preset dialogue model encounters voice data similar to the voice data input by the first user next time, corresponding answers are made and/or corresponding processing operations are executed based on the interaction instruction corresponding to the voice data input by the first user, which is searched by the last server.

As can be seen from the foregoing, in the dialogue interaction method provided by the present invention, the electronic device analyzes the interaction satisfaction of the first user according to the first interaction result, where the first interaction result is an interaction result of the preset dialogue model for outputting the voice data input by the first user, and when the interaction satisfaction is less than or equal to the preset threshold, the electronic device sends the voice data to the server, receives the interaction instruction corresponding to the voice data sent by the server, and performs optimization training on the preset dialogue model by using the interaction instruction and the voice data as sample data, and obtains new sample data for optimizing training on the preset dialogue model by searching answer data corresponding to the voice data by the server, thereby reducing the workload of manual participation and making the process of optimizing training on the preset dialogue model more intelligent.

Referring to fig. 3, another flow chart of a dialogue interaction method is provided in an embodiment of the application. The present embodiment is exemplified by a dialogue interaction method applied to an electronic device (terminal). The dialogue interaction method may include the steps of:

s301, acquiring a first interaction result.

The first interaction result is an interaction result of the preset dialogue model on voice data input by the first user, namely dialogue content data generated by the preset dialogue model. The first interaction result may be text data or voice data.

Generally, the terminal may train to obtain a preset dialogue model by using preset sample data, and may perform real-time dialogue interaction with the first user based on the trained preset dialogue model. After the first user inputs voice data to the terminal, the terminal can make a corresponding answer to the voice data input by the first user by using the existing preset dialogue model. And obtaining a first interaction result corresponding to the voice data input by the first user. The terminal may analyze whether the user is satisfied with the session interaction procedure based on the first interaction result.

S302, carrying out semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result.

The first semantic information refers to semantics contained in the first interaction result. The terminal makes corresponding answer semantic information for the voice data input by the first user when carrying out dialogue interaction with the first user based on the existing preset dialogue model.

In general, semantic analysis may derive some formal representation that reflects the meaning of a sentence from the syntactic structure of the sentence and the word meaning of each word in the sentence, converting human-understandable natural language into computer-understandable formal language. The first interaction result may be voice data or text data. If the first interaction result is voice data, text conversion is carried out on the first interaction result to obtain corresponding text data, and semantic analysis is carried out on the text data to obtain semantic information corresponding to the first interaction result.

S303, determining the interaction satisfaction degree of the first user based on the first semantic information.

The first user refers to a user who inputs the voice data to a preset dialogue model for the first time. The "first" is to distinguish between users who subsequently input the same voice data to a preset dialog model. Accordingly, the second user refers to a user who inputs voice data similar to or identical to the voice data again to the preset dialog model. The interactive satisfaction refers to the satisfaction degree of the user on the answer content of the preset dialogue model aiming at the voice data of the user, the interactive satisfaction degree is the prediction estimation made by the terminal on the satisfaction degree of the user based on the answer content of the preset dialogue model, and is not the actual satisfaction degree of the user.

Generally, after performing semantic analysis processing on a first interaction result, the terminal may obtain first semantic information corresponding to the first interaction result. Further analysis of the first semantic information may then determine the interaction satisfaction of the first user. The terminal is internally provided with preset semantics, the preset semantics are semantics which usually comprise negative meanings, and the interaction satisfaction corresponding to the similarity, namely the interaction satisfaction of the first user, can be obtained predictably by analyzing the similarity of the first semantic information and the preset semantics. In general, the interaction satisfaction of the user is inversely related to the similarity of the semantics, and the higher the similarity of the semantics is, the lower the interaction satisfaction of the user is.

Illustrating: when the similarity between the first semantic information and the preset semantic information is 0-20%, the corresponding interaction satisfaction is 80-100%; when the similarity between the first semantic information and the preset semantic information is 21% -50%, the corresponding interaction satisfaction is 60% -79%; when the similarity between the first semantic information and the preset semantic information is analyzed to be 51% -80%, the corresponding interaction satisfaction is 20% -59%; when the similarity between the first semantic information and the preset semantic information is 81% -100%, the corresponding interaction satisfaction is 0% -19%.

S304, when the interaction satisfaction is smaller than or equal to a preset threshold value, analyzing the voice data to obtain keywords and/or keywords corresponding to the voice data.

Wherein the preset threshold is a criterion for determining whether the user is satisfied with the interaction result. That is, in the case where it is determined that the user is satisfied with the interactive result, the interactive satisfaction of the user needs to satisfy the lowest lower limit value of the interactive satisfaction. The voice data herein refers to dialogue voice data of the first user input terminal, that is, voice data input by the first user when the interaction satisfaction degree of the first user is less than or equal to a preset threshold value.

Generally, when the interaction satisfaction degree of the first user is smaller than or equal to a preset threshold value, the first user can be determined to be unsatisfied with the dialogue interaction process, the terminal can analyze the voice data input by the first user to obtain corresponding keywords and/or keywords, the keywords and/or keywords can refine and reflect the key content of the voice data input by the first user, and query data corresponding to the voice data input by the first user can be generated based on the keywords and/or keywords.

S305, generating first query data corresponding to the voice data based on the keywords and/or the keywords, sending the first query data to a server, and receiving an interaction instruction corresponding to the voice data sent by the server.

The voice data refers to dialogue voice data of the first user input terminal, namely, voice data input by the first user when the interaction satisfaction degree of the first user is smaller than or equal to a preset threshold value; the first query data refers to query data generated by the terminal based on voice data input by a first user, and the first query data can instruct the server to search answer data related to the first query data on a preset website through the Internet; the interactive instructions refer to interactive data for analyzing and/or solving the voice data.

Generally, when determining that the interaction satisfaction degree of the first user is smaller than or equal to a preset threshold, the terminal indicates that the first user is not satisfied with the current interaction result, and then the terminal analyzes voice data input by the first user to obtain keywords and/or keywords corresponding to the voice data. First query data corresponding to the voice data may be generated based on the keyword and/or the keyword and transmitted to a server. The first query data is text data generated based on voice data input by a first user and containing key information of the voice data. The first query data may instruct the server to search answer data related to the first query data on a preset website through the internet. The preset website refers to a website that can provide a search service or a community service. The preset website can be set by a user at will, and can also be a server default website. After receiving the first query data sent by the terminal, the server can search answer data corresponding to the first query data on a preset website through the Internet, select answer data with highest matching degree or highest confidence degree with the first query data from a plurality of searched answer data as an interaction instruction corresponding to voice data input by the first user, and send the interaction instruction to the terminal so that the terminal can acquire the interaction instruction.

S306, optimizing and training a preset dialogue model by taking the interaction instruction and the voice data as sample data.

Specifically, reference may be made to the step S203, which is not repeated here.

S307, analyzing the voice data input by the second user by adopting the optimized preset dialogue model, and outputting a second interaction result.

The second user refers to a user who inputs voice data which is the same as or similar to the voice data input by the first user to the optimized preset dialogue model again. The second user may be the first user who inputs the same or similar voice data as the first user to the optimized preset dialogue model for the second time, or may be the user who inputs the same or similar voice data as the first user to the optimized preset dialogue model, and is different from the first user. The voice data input by the second user is the same as or similar to the voice data input by the first user to the optimized preset dialogue model. The second interaction result refers to an interaction result of the optimized preset dialogue model on voice data input by the second user. The optimized preset dialogue model is used for generating dialogue content data corresponding to the voice data input by the second user. The second interaction result may be text data or voice data.

Generally, after the terminal performs optimization training on the preset dialogue model based on the steps S301 to S306, the terminal may perform dialogue interaction with a new user based on the optimized preset dialogue model or perform new dialogue interaction with the first user, and the user interacting with the optimized preset dialogue model is referred to as a second user, when performing dialogue interaction based on the optimized preset dialogue model, the user may encounter speech data identical or similar to the speech data input by the first user, the optimized preset dialogue model may analyze the speech data input by the second user and output a second interaction result corresponding to the speech data, where the second interaction result is based on an interaction instruction corresponding to the speech data input by the first user and searched on a preset website by the server through the internet, and the dialogue content data corresponding to the speech data generated to respond to the speech data input by the second user may be more intelligently optimized, so that the accuracy of the preset model for identifying the user instruction can be improved without considering participation. If the user is not satisfied with the dialogue content data, the terminal can send the voice data currently input by the second user to the server, so that the server can issue the question data corresponding to the voice data on a preset website, and answer data of other netfriends aiming at the question data are collected in a similar crowd-sourced answer mode.

S308, carrying out semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result.

The second semantic information refers to semantics contained in the second interaction result. The terminal makes corresponding answer semantic information for the voice data input by the second user when carrying out dialogue interaction with the second user based on the optimized preset dialogue model.

In general, semantic analysis may derive some formal representation that reflects the meaning of a sentence from the syntactic structure of the sentence and the word meaning of each word in the sentence, converting human-understandable natural language into computer-understandable formal language. The second interaction result may be voice data or text data. And if the second interaction result is voice data, performing text conversion on the second interaction result to obtain corresponding text data, and performing semantic analysis on the text data to obtain semantic information corresponding to the second interaction result.

S309, determining the interaction satisfaction of the second user based on the second semantic information.

The interactive satisfaction degree of the second user refers to the satisfaction degree of the second user to the optimized dialogue model of the dialogue interaction, the interactive satisfaction degree is a prediction estimation made by the terminal to the satisfaction degree of the user based on the answer content of the preset dialogue model, and the prediction estimation is not the actual satisfaction degree of the user.

Generally, after the terminal performs semantic analysis processing on the second interaction result, second semantic information corresponding to the second interaction result can be obtained, and further analysis on the second semantic information can predict and obtain the interaction satisfaction of the second user. The terminal is internally provided with preset semantics, the preset semantics are semantics which usually comprise negative meanings, and the interaction satisfaction corresponding to the similarity, namely the interaction satisfaction of the second user, can be predicted by analyzing the similarity of the second semantic information and the preset semantics. In general, the interaction satisfaction of the user is inversely related to the similarity of the semantics, and the higher the similarity of the semantics is, the lower the interaction satisfaction of the user is.

And S310, when the interaction satisfaction degree of the second user is smaller than or equal to a preset threshold value, sending second query data to the server, and receiving a target query result sent by the server.

The second query data is query data generated based on keywords and/or keywords corresponding to voice data input by a second user, and the second query data is query data for indicating a server to issue on a preset website, so that the server can solicit answer data of other net friends aiming at the query data on the preset website. The target query result is a query result corresponding to the second query data, which is queried by the server on the preset website, namely, the server selects/queries answer data which is most matched with the second query data from the published question data as the query result. The second query data corresponds to a plurality of query results, the confidence degrees corresponding to different query results are different, the target query result is the query result with the highest confidence degree in the plurality of query results, the confidence degree is input by a plurality of users based on a preset website, the confidence degree is the supporting rate/praise rate of other website users on the query data, and the preset website is a website capable of providing search service or community service, can be set by the users at will, and can be a default website of a server.

Generally, when determining that the interaction satisfaction degree of the second user is smaller than or equal to a preset threshold, the terminal indicates that the second user is not satisfied with the current interaction result, the terminal analyzes voice data input by the second user to obtain keywords and/or keywords corresponding to the voice data, generates second query data corresponding to the voice data based on the keywords and/or keywords, namely query data for indicating a server to issue on a preset website, and sends the second query data to the server, so that the server can solicit answer data of other netfriends for the query data on the preset website, and the preset website can be arbitrarily set by the user or can be a default website of the server; because the server searches the corresponding answer data in a mode of actively publishing the question data on the preset website, the server needs a certain waiting time to wait for other website users to answer the question data, the server can query the answer data corresponding to the published question data based on a preset time interval, the question data possibly corresponds to a plurality of answer data, namely, the server can query a plurality of query results, the confidence degrees corresponding to different query results are different, so that the server can select the query result with the highest confidence degree from the query results capable of being queried each time as a target query result, the target query result obtained by query is sent to the terminal, and after receiving the target query result sent by the server, the terminal can re-optimize the optimized preset dialogue model by taking the target query result and voice data input by a second user as new sample data.

Illustrating: the first user performs dialogue interaction with a terminal built-in with a preset dialogue model, and inputs voice data (voice data input by the first user) of ' help me open ' small degree ' to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, the terminal cannot recognize the instruction corresponding to the voice data input by the first user, and correspondingly returns the voice data (namely, the first interaction result) containing 'inexperienced and not understood the meaning' to the first user. Meanwhile, the terminal performs semantic analysis on the voice data returned to the first user, so that semantic information corresponding to the first interaction result is negative semantic containing keywords and/or keywords such as ' inexhaustible, inexhaustible ', and the preset semantic is negative semantic containing keywords and/or keywords such as ' inexhaustible, and the like. Based on the analysis that the similarity of the semantic information corresponding to the first interaction result and the preset semantic is 80%, further based on the similarity, the interaction satisfaction degree of the first user is determined to be 20%, and the preset threshold value of the interaction satisfaction degree is 60%, the first user can be determined to be unsatisfied with the first interaction result. Further, when the terminal determines that the first user is not satisfied with the first interaction result, keywords such as "help me open" of the small degree "and" get "open, small degree" of the voice data input by the first user can be analyzed, and text data ""' what meaning "of the small degree" can be correspondingly generated for querying the voice data input by the first user (first query data) based on the keywords. And sending the first query data to a server, wherein the server can search answer data corresponding to the first query data on a preset website based on the first query data, select answer data with highest confidence degree to open a hundred-degree application program based on the confidence degrees corresponding to the searched answer data, and take the answer data as an interactive instruction for voice data input by a first user, wherein the terminal can correlate the interactive instruction with the voice data input by the first user and then take the interactive instruction as sample data to perform optimization training on the existing preset dialogue model to obtain a priority preset dialogue model.

And after the second user inputs voice data which is similar to the voice data input by the first user and is subjected to 'small-scale' navigation to the optimized preset dialogue model, the terminal can not recognize the instruction corresponding to the voice data input by the second user after analyzing the voice data based on the optimized preset dialogue model, and the instruction corresponding to the voice data input by the second user is returned to the second user, wherein the instruction comprises 'sorry' which does not find the voice data of 'small-scale' capable of being navigated (namely, the second interaction result). And simultaneously, the terminal performs semantic analysis on the second interaction result to obtain semantic information corresponding to the second interaction result, wherein the semantic information comprises 'sorry, no' and other keywords and/or negative semantic of the keywords, the similarity of the semantic and preset semantic is calculated to be 80%, further, the interaction satisfaction degree of the second user is determined to be 20% based on the similarity, the preset threshold value of the interaction satisfaction degree is 60%, and the dissatisfaction of the second user on the second interaction result can be determined. Further, when the terminal determines that the second user is not satisfied with the second interaction result, the terminal may analyze the voice data input by the second user to obtain keywords such as "small, navigation" by using "small navigation", and based on the keywords, text data "small navigation" for querying the voice data input by the second user may be correspondingly generated, what is used to navigate? How does it navigate with a small degree? "(second query data), i.e., question data for instructing the server to post on a preset website. The terminal will send the second query data to the server, which will send the question data '"small navigation' what is used? How does it navigate with a small degree? "post on preset website, wait for other website users to answer the question; the server can inquire answer data corresponding to the issued question data on a preset website at intervals of every three days, and the server inquires that 10 answer data (inquiry results) corresponding to the question data exist. The confidence degrees of the 10 query results are different, namely the support rate/praise rate of other website users on the question data and the like, the server selects the query result ' small navigation ' with the highest confidence degree from the 10 query results, the server uses a hundred-degree map to navigate, after opening the hundred-degree map, the destination can be input in the corresponding search column, the hundred-degree map can plan a navigation route ' reaching the destination for you according to the current position of you as a target query result, the target query result is sent to the terminal, and the terminal can correlate the target query result with voice data input by a second user and then serve as new sample data to perform optimized training on the optimized preset dialogue model again.

S311, optimizing and training the optimized preset dialogue model by taking the target query result and the voice data as sample data.

Generally, after receiving a target query result sent by a server, a terminal correlates the target query result with voice data input by a second user and stores the correlated target query result and the voice data as new sample data in a database, so as to realize expansion and update of sample data for training a preset dialogue model; and the optimized preset dialogue model can be subjected to re-optimization processing based on the new sample data, so that when the re-optimized preset dialogue model re-recognizes the voice data similar to or the same as the voice data input by the second user, the corresponding answer can be made and/or the corresponding processing operation can be executed based on the target query result.

According to the dialogue interaction method provided by the scheme, the electronic equipment acquires the first interaction result, performs semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result, determines interaction satisfaction of the first user based on the first semantic information, analyzes the voice data to obtain keywords and/or keywords corresponding to the voice data when the interaction satisfaction is smaller than or equal to a preset threshold, generates first query data corresponding to the voice data based on the keywords and/or keywords, sends the first query data to the server, receives an interaction instruction corresponding to the voice data sent by the server, performs optimization training on a preset dialogue model by taking the interaction instruction and the voice data as sample data, performs analysis on voice data input by a second user by adopting the optimized preset dialogue model, outputs the second interaction result, performs semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result, determines the interaction satisfaction of the second user based on the second semantic information, sends the second query data to the server when the interaction degree of the second user is smaller than or equal to the preset threshold, and sends the second query data to the server, and receives the second query data corresponding to the preset dialogue data as the preset dialogue model, and performs optimization training on the interaction result by taking the optimal dialogue model and the voice data as the sample data, and the optimal dialogue model, and the interaction training can be used for improving the accuracy of the dialogue training.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 4, a schematic structural diagram of a dialogue interaction device according to an exemplary embodiment of the present application is shown, which is hereinafter referred to as device 4. The apparatus 4 may be implemented as all or part of an electronic device by software, hardware or a combination of both. The device 4 comprises:

an analysis module 401, configured to analyze the interaction satisfaction of the first user according to the first interaction result; the first interaction result is an interaction result of a preset dialogue model on voice data input by the first user;

a processing module 402, configured to send the voice data to a server and receive an interaction instruction corresponding to the voice data sent by the server when the interaction satisfaction is less than or equal to a preset threshold;

And the training module 403 is configured to optimally train the preset dialogue model by using the interaction instruction and the voice data as sample data.

Optionally, the analysis module 401 includes:

the acquisition unit is used for acquiring the first interaction result;

The first analysis unit is used for carrying out semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

And the first determining unit is used for determining the interaction satisfaction degree of the first user based on the first semantic information.

Optionally, the processing module 402 includes:

The second analysis unit is used for analyzing the voice data to obtain keywords and/or keywords corresponding to the voice data;

and the first processing unit is used for generating first query data corresponding to the voice data based on the keywords and/or the keywords and sending the first query data to the server.

Optionally, the training module 403 further includes:

The second processing unit is used for analyzing the voice data input by the second user by adopting the optimized preset dialogue model and outputting a second interaction result;

a third analysis unit, configured to analyze the interaction satisfaction of the second user according to the second interaction result;

The third processing unit is used for sending second query data to the server and receiving a target query result sent by the server when the interaction satisfaction degree of the second user is smaller than or equal to the preset threshold value; the target query result is a query result corresponding to the second query data, which is queried by the server on a preset website; the second query data are obtained according to keywords and/or keywords corresponding to the voice data;

and the training unit is used for optimally training the optimized preset dialogue model by taking the target query result and the voice data as sample data.

Optionally, the training module 403 further includes:

the fourth analysis unit is used for carrying out semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result;

And a second determining unit configured to determine the interaction satisfaction of the second user based on the second semantic information.

It should be noted that, when the dialogue interaction device provided in the above embodiment performs the dialogue interaction method, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the dialogue interaction device and the dialogue interaction method embodiment provided in the foregoing embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not repeated here.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executed as described above, and a specific execution process may refer to a specific description of the embodiment shown in fig. 2 to 3, which is not repeated herein.

The application also provides an electronic device, which comprises a processor, a memory and a display screen; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

Referring to fig. 5, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 5, the electronic device 500 may include: at least one processor 501, at least one network interface 504, a user interface 503, a memory 505, at least one communication bus 502.

Wherein a communication bus 502 is used to enable connected communications between these components.

The user interface 503 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 503 may further include a standard wired interface and a standard wireless interface.

The network interface 504 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 501 may include one or more processing cores. The processor 501 utilizes various interfaces and lines to connect various portions of the overall electronic device 500, perform various functions of the electronic device 500, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 505, and invoking data stored in the memory 505. Alternatively, the processor 501 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 501 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 501 and may be implemented by a single chip.

The Memory 505 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 505 comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 505 may be used to store instructions, programs, code sets, or instruction sets. The memory 505 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 505 may also optionally be at least one storage device located remotely from the processor 501. As shown in FIG. 5, an operating system, a network communication module, a user interface module, and a conversational interaction application may be included in memory 505, which is one type of computer storage medium.

In the electronic device 500 shown in fig. 5, the user interface 503 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 501 may be configured to invoke the dialog interaction application stored in the memory 505 and specifically perform the following operations:

In one embodiment, the processor 501, when executing the analysis of the interaction satisfaction of the first user according to the first interaction result, specifically performs the following operations:

Acquiring the first interaction result;

Carrying out semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

and determining the interaction satisfaction degree of the first user based on the first semantic information.

In one embodiment, the processor 501, when executing the sending of the voice data to a server, further performs the following operations:

analyzing the voice data to obtain keywords and/or keywords corresponding to the voice data;

generating first query data corresponding to the voice data based on the keywords and/or the keywords, and sending the first query data to the server.

In one embodiment, after performing the optimization training on the preset dialog model using the interaction instruction and the voice data as sample data, the processor 501 further performs the following operations:

Analyzing the voice data input by the second user by adopting the optimized preset dialogue model, and outputting a second interaction result;

analyzing the interaction satisfaction degree of the second user according to the second interaction result;

When the interaction satisfaction degree of the second user is smaller than or equal to the preset threshold value, sending second query data to a server, and receiving a target query result sent by the server; the target query result is a query result corresponding to the second query data, which is queried by the server on a preset website; the second query data are obtained according to keywords and/or keywords corresponding to the voice data;

And carrying out optimization training on the optimized preset dialogue model by taking the target query result and the voice data as sample data.

In one embodiment, the processor 501, when executing the analysis of the interaction satisfaction of the second user according to the second interaction result, further executes the following operations:

carrying out semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result;

The interactive satisfaction of the second user is determined based on the second semantic information.

In the embodiment of the application, the electronic equipment analyzes the interaction satisfaction of the first user according to the first interaction result, wherein the first interaction result is the interaction result of the preset dialogue model on the voice data input by the first user, when the interaction satisfaction is smaller than or equal to the preset threshold value, the voice data is sent to the server, the interaction instruction corresponding to the voice data sent by the server is received, the interaction instruction and the voice data are used as sample data to perform optimization training on the preset dialogue model, and the server searches answer data corresponding to the voice data to acquire new sample data for optimizing training on the preset dialogue model, so that the workload of manual participation is reduced, and the process of optimizing training on the preset dialogue model is more intelligent.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of conversational interaction, the method comprising:

Acquiring a first interaction result; the first interaction result is an interaction result of a preset dialogue model on voice data input by a first user;

Determining the interaction satisfaction degree of the first user based on the similarity between the first semantic information and preset semantics; the interaction satisfaction is predicted satisfaction of the first user on the first interaction result, the preset semantic includes negative meaning semantic, and the interaction satisfaction is in negative correlation with the preset semantic;

2. The method of claim 1, wherein the sending the voice data to a server comprises:

3. The method of claim 1, wherein after optimally training the preset dialog model using the interaction instruction and the voice data as sample data, further comprising:

4. A method according to claim 3, wherein said analyzing the interaction satisfaction of the second user based on the second interaction result comprises:

5. The method of claim 3, wherein the second query data corresponds to a plurality of query results, the confidence levels of the different query results are different, and the target query result is a query result with a highest confidence level among the plurality of query results.

6. The method of claim 5, wherein the confidence level is entered by a plurality of users based on the preset website.

7. A dialog interaction device, the device comprising:

the acquisition module is used for acquiring a first interaction result; the first interaction result is an interaction result of a preset dialogue model on voice data input by a first user;

The analysis module is used for carrying out semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

The determining module is used for determining the interaction satisfaction degree of the first user based on the similarity between the first semantic information and preset semantics; the interaction satisfaction is predicted satisfaction of the first user on the first interaction result, the preset semantic includes negative meaning semantic, and the interaction satisfaction is in negative correlation with the preset semantic;

8. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 6.

9. An electronic device, comprising: the device comprises a processor, a memory and a display screen; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-6.