CN114443821A

CN114443821A - Robot conversation method and device, electronic equipment and storage medium

Info

Publication number: CN114443821A
Application number: CN202111444986.8A
Authority: CN
Inventors: 詹明捷; 梁鼎
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-05-06

Abstract

The disclosure provides a robot conversation method, a robot conversation device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring first statement information input by a target user and historical dialogue information generated by the target user and the robot in a historical period; merging the first statement information and the historical dialogue information to obtain initial merging information; and performing statement prediction on the initial merging information by using a target neural network to obtain and output second statement information aiming at the first statement information. In the conversation process of the robot and the target user, the semantic features of the first statement information and the historical conversation information are considered for prediction, so that the predicted second statement information is more consistent with the conversation intention of the target user, the target user is more willing to continuously interact with the robot, and long-time accompanying interactive service is provided for the target user.

Description

Robot conversation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of robotics, and in particular, to a method and an apparatus for robot conversation, an electronic device, and a storage medium.

Background

At present, robots based on artificial intelligence technology are continuously appeared and widely applied to various industries. The robot can replace a part of mechanized and flow manual service, and the working efficiency is greatly improved. However, most robots are designed for rapidly solving the problem of low efficiency of manual work, the time for interacting with the user is limited, and long-time accompanying interaction cannot be provided for the user.

Disclosure of Invention

The embodiment of the disclosure at least provides a robot conversation method, a robot conversation device, electronic equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for robot conversation, including:

acquiring first statement information input by a target user and historical dialogue information generated by the target user and a robot in a historical period;

merging the first statement information and the historical dialogue information to obtain initial merging information;

and performing statement prediction on the initial merging information by using a target neural network to obtain and output second statement information aiming at the first statement information.

By adopting the robot conversation method, the first statement information and the historical dialogue information are merged to obtain the initial merged information, so that when the target neural network performs statement prediction on the initial merged information, the second statement information aiming at the first statement information can be obtained. Therefore, in the process of conversation with the target user, the robot carries out sentence prediction by considering the semantic features of the first sentence information and the historical conversation information, so that the predicted second sentence information is more consistent with the conversation intention of the target user, the fluency of man-machine interaction is higher, the target user is more willing to carry out continuous interaction with the robot, and long-time accompany type interaction service is provided for the target user conveniently.

In an optional embodiment, the performing, by using a target neural network, statement prediction on the initial merging information to obtain second statement information for the first statement information includes:

in response to the fact that the statement data volume indicated by the initial merging information is larger than the preset statement data volume, deleting the initial merging information to obtain processed target merging information, wherein the statement data volume indicated by the target merging information is smaller than or equal to the preset statement data volume;

and performing statement prediction on the target merging information by using the target neural network to obtain the second statement information.

In this embodiment, when the amount of term data indicated by the initial merge information is relatively large, the data processing amount of the target neural network can be kept within a small range by performing the subtraction processing on the initial merge information, thereby reducing the computational burden on the target neural network.

In an optional implementation manner, the performing a pruning process on the initial merging information to obtain processed target merging information includes:

and deleting the historical dialogue information included in the initial merging information based on the generation timing of each statement information in the historical dialogue information to obtain the target merging information, wherein the deleted statement information is continuous, and the generation timing of the deleted statement information is before the generation timing of each statement information in the target merging information.

In this embodiment, the historical dialogue information included in the initial merge information, particularly the historical dialogue information that is relatively far from the current time of inputting the first sentence information, may affect the accuracy of the sentence prediction result to some extent, and here, this part of information may be preferentially deleted, thereby further improving the accuracy of the sentence prediction performed by the target neural network.

In an optional embodiment, the performing statement prediction on the initial merging information by using a target neural network includes:

semantic feature extraction is carried out on the initial merging information, and semantic feature information used for representing the conversation intention of the target user is determined;

determining a session scene type corresponding to the target user's session intent;

and performing statement prediction on the initial merging information by using the target neural network matched with the conversation scene type.

In the embodiment, different conversation intents correspond to different conversation scenes, and different target neural networks can be trained in advance according to the different conversation scenes, so that under the condition of determining the conversation scene type corresponding to a target user, sentence prediction can be performed on the basis of the corresponding matched target neural network, the sentence prediction result can be more fit with the conversation scenes, and the fluency of human-computer interaction is further improved.

In an optional embodiment, after the determining the session scene type corresponding to the session intention of the target user, the method further includes:

and in the case that the conversation scene type corresponding to the conversation intention of the target user comprises a psychological counseling scene, generating and outputting a psychological persuasive sentence corresponding to the psychological counseling scene.

In the implementation mode, under a psychological consultation scene, psychological persuasive sentences are generated and output for the target user, so that the target user can be pacified in time, and the target user is guided to continue interacting with the robot, and continuous accompanying service is facilitated.

In an optional implementation manner, in a case that the target user is multiple, the method further includes:

responding to a multi-person conversation instruction, and detecting whether statement information is input by a plurality of target users within a preset time length;

and responding to the fact that no statement information is input by a plurality of target users within the preset time, and obtaining currently output statement information based on the acquired historical dialogue information.

In the embodiment, once the statement information is determined not to be input by a plurality of target users within the preset time length in the multi-person conversation process, the cold field phenomenon can be shown in the current meeting place scene, and at the moment, the history conversation information can be used for generating the currently output statement information, so that the robot can enter the multi-person conversation scene in time, the cold field problem in the whole conversation scene is avoided, and the multi-person conversation is facilitated to continue.

In an optional implementation manner, the detecting whether statement information is input by a plurality of target users within a preset time period includes:

under the condition that a first target user included by a plurality of target users initiates session information, whether statement information responding to the session information is input by the plurality of target users within a preset time length is detected.

In the embodiment, in the multi-user conversation process, once the conversation information initiated by the first target user is not responded by a person, the fact that the multiple target users do not input statement information within the preset time length is determined, the robot can enter a multi-user conversation scene in time to respond to the conversation information, and therefore the cold field problem of a single user under the whole conversation scene is avoided.

In an alternative embodiment, the target neural network is trained as follows:

acquiring a plurality of turns of dialogue samples, wherein each turn of dialogue sample comprises an inquiry statement and an answer statement aiming at the inquiry statement;

splitting a plurality of turns of conversation samples into a plurality of conversation sample groups;

aiming at a target dialogue sample group in the plurality of dialogue sample groups, respectively taking each sentence in the target dialogue sample group as an input sentence and an output supervision sentence of the target neural network to be trained based on the generation time sequence of each sentence in the target dialogue sample group, and performing at least one round of network training on the target neural network to obtain the trained target neural network;

wherein the generation timing of the sentence input to the target neural network to be trained precedes the generation timing of the output supervision sentence of the target neural network to be trained.

In the embodiment, in consideration of the sequence of the sentences included in the multi-turn dialogue sample, the preceding sentence and the following sentence can be used as the input and the output of the target neural network to be trained respectively, so that the sequence relation between the sentences and the content of the sentences capable of being output can be learned correspondingly, and the accuracy of sentence prediction of the target neural network obtained by training is higher.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for robot conversation, including:

the acquisition module is used for acquiring first statement information input by a target user and historical dialogue information generated by the target user and the robot in a historical time period;

the merging module is used for merging the first statement information and the historical dialogue information to obtain initial merging information;

and the prediction module is used for performing statement prediction on the initial merging information by using a target neural network to obtain and output second statement information aiming at the first statement information.

In an optional embodiment, when performing statement prediction on the initial merging information by using a target neural network to obtain second statement information for the first statement information, the prediction module is configured to:

In an optional implementation manner, when performing pruning processing on the initial merging information to obtain processed target merging information, the prediction module is configured to:

In an optional embodiment, the prediction module, when performing statement prediction on the initial merging information by using a target neural network, is configured to:

In an optional embodiment, the prediction module, after determining the session scene type corresponding to the target user's session intention, is further configured to:

In an optional embodiment, in a case that there are a plurality of target users, the apparatus further includes:

the detection module is used for responding to a multi-person conversation instruction and detecting whether statement information is input by a plurality of target users within a preset time length;

and the output module is used for responding to the fact that no statement information is input by a plurality of target users within the preset time length, and obtaining currently output statement information based on the acquired historical dialogue information.

In an optional implementation manner, when detecting whether statement information is input by a plurality of target users within a preset time period, the detecting module is configured to:

and under the condition that a first target user included by the target users initiates session information, detecting whether statement information responding to the session information is input by the target users within a preset time length.

In an alternative embodiment, the target neural network is trained as follows:

aiming at a target dialogue sample group in the plurality of dialogue sample groups, respectively taking each sentence in the target dialogue sample group as an input sentence and an output supervision sentence of the target neural network to be trained on the basis of the generation time sequence of each sentence in the target dialogue sample group, and performing at least one round of network training on the target neural network to obtain a trained target neural network;

wherein the generation timing of the sentence input to the target neural network to be trained precedes the generation timing of the output supervisory sentence of the target neural network to be trained.

In a third aspect, an optional implementation manner of the present disclosure further provides an electronic device, including: a processor, a memory storing machine-readable instructions executable by the processor, the processor being configured to execute the machine-readable instructions stored in the memory, the machine-readable instructions, when executed by the processor, performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the above-mentioned apparatus, electronic device, and computer-readable storage medium for robot session, reference is made to the description of the above-mentioned method for robot session, which is not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flowchart of a method of a robot session provided by an embodiment of the present disclosure;

fig. 2 shows a flowchart of a specific method for training a target neural network in the method for robot session provided by the embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a specific method of sentence prediction in the method of robot conversation provided by the embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a specific method for robot conversation in a multi-person conversation scenario provided by an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating another specific method for robot conversation in a multi-person conversation scenario provided by an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an apparatus for robotic conversation provided by an embodiment of the present disclosure;

fig. 7 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Researches show that the robot based on the artificial intelligence technology continuously appears and is widely applied to various industries. The robot can replace a part of mechanized and flow-processed manual service, and the working efficiency is greatly improved. However, most robots are designed for rapidly solving the problem of low efficiency of manual work, the time for interacting with the user is limited, and long-time accompanying interaction cannot be provided for the user.

Based on the research, the disclosure provides a robot conversation method, a robot conversation device, an electronic device and a storage medium, so as to improve human-computer interaction fluency, thereby enabling a target user to prefer continuous interaction with a robot, and facilitating provision of long-time companion-type interaction service for the target user.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a method for robot session disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for robot session provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like, where the terminal device may be an intelligent robot, such as a chat robot, and may also be other robots with chat functions. In some possible implementations, the method of the robotic session may be implemented by a processor invoking computer readable instructions stored in a memory.

The following describes a method for robot conversation provided by an embodiment of the present disclosure, taking an execution subject as a robot as an example.

Referring to fig. 1, a flowchart of a method for robot conversation provided by an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

s101: acquiring first statement information input by a target user and historical dialogue information generated by the target user and the robot in a historical period;

s102: merging the first statement information and the historical dialogue information to obtain initial merging information;

s103: and performing statement prediction on the initial merging information by using a target neural network to obtain and output second statement information aiming at the first statement information.

In order to facilitate understanding of the method for robot conversation provided by the embodiments of the present disclosure, a brief description of an application scenario of the method is first provided below. The robot conversation method in the embodiment of the disclosure can be applied to the field of chat robots, can be used for robots at entity terminals, can also be used for robots with virtual images, has various application scenes, and can be applied to scenes such as food, news, entertainment, education, personal assistants, psychological consultation and the like.

Considering that most of the existing robots are used for mechanical and flow work, the robots are mostly applied to scenes such as logistics transportation, assembly line assembly and the like. Some chat robots exist, but the chat robots are mostly applied to customer service and personal assistant scenes, the application scenes are single, the time for interaction with a user is limited, and long-time accompanying interaction cannot be provided for the user.

In order to solve the above problem, embodiments of the present disclosure provide a method for performing a robot session with statement prediction by using a target neural network, so as to improve the service quality of human-computer interaction.

The first sentence information in the embodiment of the present disclosure may refer to sentence information input to the robot by the target user, and the mode of inputting the sentence information by the target user is not unique, and may be a voice input or a text input. The way of session start is not unique, and may be a voice command, such as a similar voice command like "little x classmate" or "xx assistant", or a button click, such as a "start chat" button provided on the robot.

The history period may refer to a period before the target user inputs the first sentence information, and the duration of the history period may be 10 minutes, half an hour or other periods, and may be set correspondingly in combination with different application scenarios and/or processing capabilities of the device, for example, for a scenario where chat topics are dense, a shorter duration may be set, and for a scenario where chat topics are sparse, a longer duration may be set to adaptively capture the critical historical conversation information. The second term information may be answer information generated by the robot, and the robot may present the generated answer information to the user in a manner of voice or text.

The following description is made with reference to specific scenarios. For example, the target user has chatted with the robot for 10 minutes, in the 10 minutes, the chatting content is spread around the gourmet, the target user chats about the gourmet such as "noodle", "duck", "bun with mutton" and the like in the chatting process, the target user inputs "what your favorite gourmet is" at the current moment, the robot answers "i like duck", and the answer output by the robot comprehensively considers the gourmet chatted in the 10 minutes. In this example, the 10 minutes in which the target user continuously chats with the robot is the history period, the chat content between the user and the robot in the 10 minutes is the history dialogue information, the "what your favorite food is" input by the target user at the current time is the first sentence information, and the answer "i like roast duck" of the robot is the second sentence information for the first sentence information generated and output by the target neural network.

The second term information may be a response given based on the first term information, or may be content belonging to the same category as the first term information, and the like, and the expression form of the first term and the second term is not limited herein, for example, the first term information may be an question sentence, and the second term information may be a statement sentence obtained for the first term information, or the first term information may be a statement sentence, and the second term information may be another statement sentence obtained for the first term information, and the like, and the expression form of the first term and the second term information is the same or different.

Under the condition of acquiring first statement information input by a target user, the robot may merge the first statement information and historical dialogue information to obtain initial merged information, and perform statement prediction on the initial merged information by using a target neural network, so that semantic features of the first statement information and the historical dialogue information can be considered at the same time when the target neural network performs statement prediction, thereby generating an answer more conforming to the expectation of the target user. Here, the merging manner is not unique, and for example, the first sentence information and the historical dialogue information may be spliced together according to the respective corresponding generation timings to obtain the initial merging information.

The target neural network in the embodiment of the present disclosure may train a correspondence between sentences, for example, a correspondence between a sentence with a time sequence being prior to the generation of the sentence with a time sequence being subsequent to the generation of the sentence may be generated, so that in a case where a sentence is arbitrarily input to the target neural network, a corresponding answer sentence may be output.

Referring to fig. 2, a flowchart for training a target neural network provided in the embodiment of the present disclosure includes the following steps:

s201: acquiring a plurality of turns of dialogue samples, wherein each turn of dialogue sample comprises an inquiry statement and an answer statement aiming at the inquiry statement;

s202: splitting a plurality of turns of conversation samples into a plurality of conversation sample groups;

s203: aiming at a target dialogue sample group in a plurality of dialogue sample groups, respectively taking each sentence in the target dialogue sample group as an input sentence and an output supervision sentence of a target neural network to be trained based on the generation time sequence of each sentence in the target dialogue sample group, and performing at least one round of network training on the target neural network to obtain the trained target neural network; wherein the generation timing of the sentence input to the target neural network to be trained precedes the generation timing of the output supervisory sentence of the target neural network to be trained.

Here, the query sentence in each pair of speech samples may be a specific question or a statement sentence without meaning of a question, such as "good mood of today" and "smooth work of today".

In practical applications, the target neural network in the embodiment of the present disclosure may be a neural network trained in advance according to different scenarios, where, for different scenarios, the chat range of the dialog sample involved here may be related content surrounding the corresponding scenario.

To facilitate training of the target neural network, the multiple rounds of dialog samples are split into multiple dialog sample sets, which are illustrated here by way of example. Taking 10-wheel conversations of 2 persons as a multi-wheel conversation sample, taking 20 total conversations, wherein the question-answer sequence is continuous, splitting the conversation sample according to the question-answer sequence to form 19 groups of conversation sample groups, wherein the first 2 sentences form a 1 st group, the first 3 sentences form a 2 nd group, the first 4 sentences form a 3 rd group and the like, the 19 th group comprises 20 sentences, and each group except the 1 st group comprises the sentences of the previous group. The 10-wheel dialogue samples can also be split into 10-group dialogue sample groups according to the question-answer order, wherein each group contains one round of dialogue, namely two sentences of one question and one answer.

The target dialog sample group may be any one of a plurality of dialog sample groups, a specified dialog sample group of the plurality of dialog sample groups, or each of the plurality of dialog sample groups.

It should be noted that, in practical applications, in addition to training the target neural network using an answer sentence including a question sentence and a question sentence corresponding to the question sentence as a dialogue sample, a plurality of sentences corresponding to the same topic with a certain time sequence from different users/subjects may also be used as dialogue samples to train the target neural network. In view of the wide application of the query service, a combination of the query sentence and the answer sentence may be selected as the dialogue sample.

In the training process, each sentence in the target dialogue sample group is required to be respectively used as an input sentence and an output supervision sentence of a target neural network to be trained according to the time sequence, an answer sentence actually output by the target neural network is compared with the output supervision sentence, and the neural network is adjusted according to the comparison result, so that the sentence output by the target neural network is closer to the supervision sentence.

This is illustrated here by way of example. For example, training a target neural network for a food scene, using 10 rounds of conversations of 2 persons, which are conversations discussing food, using the conversations as conversation samples, selecting a conversation sample group containing food from a plurality of groups of separated conversation sample groups as a target conversation sample group, for example, two sentences of 'which kind of pasta you like' and 'i like to eat fried bean paste' are included in the target conversation sample group, inputting the 'which kind of pasta you like' into the target neural network, using the 'i like to eat fried bean paste' as an output supervision sentence, outputting an answer sentence by the target neural network, comparing the answer sentence with the 'i like to eat fried bean paste', when the comparison is inconsistent, adjusting a loss function value of the network, adjusting the target neural network according to the adjusted loss function value, and performing the next round of training according to the adjusted target neural network, and until the training cutoff condition is reached, the training cutoff condition can be that the training frequency reaches a preset frequency, and can also be that the loss function value is smaller than a preset threshold value and the like. After training is completed, the trained target neural network can be used for robot conversation of a specific scene.

Under a specific scene, a large amount of data needs to be processed by a trained target neural network, in order to reduce the data processing pressure of the target neural network, when the statement data amount of the initial merging information is larger than the preset statement data amount, the initial merging information needs to be deleted to obtain processed target merging information, and then the target neural network is used for carrying out statement prediction on the target merging information to obtain second statement information. Here, the deletion process of the initial merged information is to delete the history dialogue information included in the initial merged information based on the generation timing of each sentence information in the history dialogue information. The deleted term information is continuous, and the generation timing of the deleted term information is before the generation timing of each term information in the target merge information.

Here, the description is given with reference to specific examples. For example, within 1 hour, the user chats with the robot to chat about the ball game and then about the sports goods, and when the initial combined information is deleted, the first chatting is the ball game, and the first deleted sentence information is the previously chatted ball game content.

The deleting mode is not unique, the initial merging information can be deleted based on the number of characters of the historical dialogue information, the historical dialogue information with the preset number of characters can be deleted, and the initial merging information can also be deleted based on the number of times of the dialogue turns of the historical dialogue information, and the historical dialogue information with the preset number of times can be deleted. For example, when the history session is 10 sessions, the session turn permitted to be deleted is set to 2 times when deletion is performed, and the first 2 sessions can be deleted in time series, and when the history session is 50 sessions, the session turn permitted to be deleted is set to 10 times when deletion is performed, and the first 10 sessions can be deleted in time series.

In order to make the predicted second statement information more consistent with the conversation intention of the target user, when performing statement prediction on the initial merging information, the following steps as shown in fig. 3 may be performed:

s301: semantic feature extraction is carried out on the initial merging information, and semantic feature information used for representing the conversation intention of the target user is determined;

s302: determining a session scene type corresponding to the session intention of the target user;

s303: and performing statement prediction on the initial merging information by using a target neural network matched with the conversation scene type.

Here, the semantic feature information for representing the conversation intention of the target user may include parts of speech, sentence structures, segments, and the like. And determining the conversation intention of the target user according to the relation among the words, the relation among the sentences and the relation among the paragraphs.

The session intention of the target user corresponds to the session scene, which is illustrated here by way of example. For example, the target user asks "which playful scenic spots are nearby", here, the conversation intention of the target user is to make the robot answer some playful scenic spot names nearby, and the conversation scene corresponding to the conversation intention may be travel; for another example, the target user asks "recommend a local favorite chinese restaurant", where the target user's conversation intention is to make the robot list the name of the local favorite chinese restaurant, and the conversation scene corresponding to the conversation intention may be a food.

Here, the session scene may be of various types, and the session scene may include a general session scene and a professional session scene, wherein the general session scene may include news, food, travel, and the like, and the professional session scene may include professional skill, psychological counseling, and the like.

Aiming at different conversation scene types, the target neural network can adopt different strategies, and under a general conversation scene, the target neural network can directly generate answer sentences aiming at the inquiry sentences; in a professional conversation scene, the target neural network needs to search a knowledge base corresponding to the professional conversation scene, and then generates an answer sentence by combining sentences searched from the knowledge base.

For example, in a travel scene, when a target user inputs "what fun scenic spots are in beijing", answers directly generated by a target neural network may be names of scenic spots such as AA and BB; for example, in a psychological consultation scene, when a target user inputs sentences describing emotions such as "today my mood is too much," i feel afraid recently often, "the target neural network may search a knowledge base in the psychological consultation field first to search for some psychological persuasions, and then select a sentence matched with the input sentence of the target user as an answer to strove the target user.

The embodiment of the disclosure can also be used in a multi-person conversation scene, mainly to avoid the cold scene in the multi-person conversation. Referring to fig. 4, a flowchart of a method for robot conversation in a case where a plurality of target users are provided for an embodiment of the present disclosure includes the following steps:

s401: responding to a multi-person conversation instruction, and detecting whether statement information is input by a plurality of target users within a preset time length;

s402: and responding to the fact that no statement information is input by a plurality of target users within the preset time length, and obtaining currently output statement information based on the acquired historical dialogue information.

Here, the manner of recognizing the multi-person conversation instruction is not unique, a plurality of target users may be determined by voice recognition, and the multi-person conversation may be triggered by clicking a specific button. Different characteristic voices correspond to different target users, and a specific button for triggering a multi-person conversation may be an 'initiate group chat' button. In a multi-user session scenario, each target user corresponds to a different Identity (ID), and the robot can identify the different target users through the IDs.

The preset duration can be set according to the chat severity, the chat severity can be judged by detecting the number of input sentences in a period of time, for example, the chat severity can be divided into three levels, namely, high, medium and low by detecting the number of sentences input to the robot by the target user in 1 minute, and when the number of sentences input by the target user in 1 minute is detected to be 50 or more than 50, the chat severity is high, and the preset duration can be set to be 1 minute; when the number of sentences input by the target user is detected to be more than 30 and less than 50 within 1 minute, the chat severity is middle, and the preset time length can be set to be 40 seconds; the number of sentences input by the target user in 1 minute is detected to be 30 or less than 30, the chat severity is low, and the preset time duration can be set to be 20 seconds.

For example, in a multi-person conversation in which the chat severity is high, if a plurality of target users do not input sentence information to the robot within 1 minute, the target neural network outputs the sentence information based on the history dialogue information in order to avoid a cold spot.

In a multi-user conversation scenario, in a case where there is an input by a target user, a cold spot may also occur, and to solve this problem, this embodiment further provides a method for robot conversation in a case where there are a plurality of target users, and as shown in fig. 5, the specific steps are:

s501: under the condition that a first target user included by a plurality of target users initiates session information, detecting whether the plurality of target users input statement information responding to the session information within a preset time length;

s502: and responding to the fact that no statement information is input by a plurality of target users within the preset time length, and obtaining currently output statement information based on the acquired historical conversation information and the conversation information initiated by the first target user.

Here, the first target user refers to any one of a plurality of target users; the session information initiated by the first target user may be a specific question, or may be a statement sentence without meaning of the question, such as "good mood of today", "work well of today", but not a sentence indicating ending the session, such as "thank you", "irrelevant", "good". The term information responding to the session information may be a response to the answer content, or a response to the session information belonging to the same topic or a different topic, for example, the term information responding to the session information is a content on the same topic as the session information or a topic content related to the topic, and is not limited herein. The related topics refer to topics of the same category as the topic to which the session information belongs, for example, the topics are all diet-related contents, or certain related topics exist in time sequence, space and the like, for example, the topic to which the session information belongs is travel, and then the topic to which the response content belongs may be traffic, hotel and other contents related to a living line.

The method of this multi-person conversation is described here by taking a travel scenario as an example. For example, in a multi-person conversation, when a certain target user asks that "what playful places in city a can be recommended", and no other target user answers the question within 1 minute, in order to avoid cold spots, the robot can output a sentence of answer by using a target neural network, and the answer includes names of some scenic spots.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a device for robot conversation corresponding to the method for robot conversation, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the method for robot conversation described above in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not described again.

Referring to fig. 6, a schematic diagram of an apparatus for robot conversation provided in an embodiment of the present disclosure is shown, the apparatus including: an obtaining module 601, a merging module 602, and a prediction module 603; wherein the content of the first and second substances,

the acquisition module 601 is used for acquiring first statement information input by a target user and historical dialogue information generated by the target user and the robot in a historical time period;

a merging module 602, configured to merge the first statement information and the historical dialog information to obtain initial merging information;

the predicting module 603 is configured to perform statement prediction on the initial merging information by using the target neural network, obtain and output second statement information for the first statement information.

The device for robot conversation combines the first statement information and the historical dialogue information to obtain the initial combination information, so that when the target neural network performs statement prediction on the initial combination information, the second statement information aiming at the first statement information can be obtained. Therefore, in the process of conversation with the target user, the robot carries out sentence prediction by considering the semantic features of the first sentence information and the historical conversation information, so that the predicted second sentence information is more consistent with the conversation intention of the target user, the fluency of man-machine interaction is higher, the target user is more willing to carry out continuous interaction with the robot, and long-time accompany type interaction service is provided for the target user conveniently.

In an optional embodiment, the predicting module 603, when performing statement prediction on the initial merging information by using the target neural network to obtain second statement information for the first statement information, is configured to:

and performing statement prediction on the target merging information by using the target neural network to obtain second statement information.

In an optional embodiment, when performing pruning processing on the initial merging information to obtain processed target merging information, the predicting module 603 is configured to:

and deleting the historical dialogue information included in the initial combination information based on the generation sequence of each statement information in the historical dialogue information to obtain target combination information, wherein the deleted statement information is continuous, and the generation sequence of the deleted statement information is before the generation sequence of each statement information in the target combination information.

In an alternative embodiment, the prediction module 603, when performing statement prediction on the initial merging information by using the target neural network, is configured to:

determining a session scene type corresponding to the session intention of the target user;

and performing statement prediction on the initial merging information by using a target neural network matched with the conversation scene type.

In an alternative embodiment, the prediction module 603, after determining the session scene type corresponding to the target user's session intention, is further configured to:

and in the case that the conversation scene type corresponding to the conversation intention of the target user includes a psychological counseling scene, generating and outputting a psychological persuasive sentence corresponding to the psychological counseling scene.

In an optional embodiment, in the case that the target user is multiple, the apparatus further includes:

the detecting module 604 is configured to detect whether statement information is input by multiple target users within a preset duration in response to a multi-user conversation instruction;

and the output module 605 is configured to, in response to that no statement information is input by any of the target users within a preset time period, obtain currently output statement information based on the obtained historical dialogue information.

In an alternative embodiment, the detecting module 604, when detecting whether statement information is input by multiple target users within a preset time period, is configured to:

In an alternative embodiment, the prediction module 603 is configured to train the target neural network according to the following steps:

acquiring a plurality of rounds of dialogue samples, wherein each round of dialogue sample comprises an inquiry statement and an answer statement aiming at the inquiry statement;

aiming at a target dialogue sample group in a plurality of dialogue sample groups, respectively taking each sentence in the target dialogue sample group as an input sentence and an output supervision sentence of a target neural network to be trained based on the generation time sequence of each sentence in the target dialogue sample group, and performing at least one round of network training on the target neural network to obtain the trained target neural network;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

The embodiment of the disclosure also provides an electronic device. Referring to fig. 7, a schematic structural diagram of an electronic device 700 provided in the embodiment of the present disclosure includes a processor 71, a memory 72, and a bus 73. The memory 72 is used for storing execution instructions, and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 71 and the data exchanged with the external memory 722 such as a hard disk, the processor 71 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 is operated, the processor 71 communicates with the memory 72 through the bus 73, so that the processor 71 executes the following instructions:

acquiring first statement information input by a target user and historical dialogue information generated by the target user and the robot in a historical period;

For the specific execution process of the instruction, reference may be made to the steps of the method for robot session described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for robot conversation described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the method for robot session in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes and substitutions do not depart from the spirit and scope of the embodiments disclosed herein, and they should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of robotic conversation, comprising:

2. The method of claim 1, wherein performing statement prediction on the initial combined information by using a target neural network to obtain second statement information for the first statement information comprises:

in response to that the statement data amount indicated by the initial merging information is larger than a preset statement data amount, performing deletion processing on the initial merging information to obtain processed target merging information, wherein the statement data amount indicated by the target merging information is smaller than or equal to the preset statement data amount;

3. The method of claim 2, wherein the performing the puncturing process on the initial merging information to obtain the processed target merging information comprises:

4. The method of any one of claims 1 to 3, wherein performing statement prediction on the initial merging information by using a target neural network comprises:

5. The method of claim 4, wherein after the determining the session context type corresponding to the target user's session intent, the method further comprises:

6. The method according to any one of claims 1 to 5, wherein in case that there are a plurality of target users, the method further comprises:

7. The method according to claim 6, wherein the detecting whether statement information is input by a plurality of target users within a preset time period comprises:

8. The method of any one of claims 1 to 7, wherein the target neural network is trained by:

9. An apparatus for robotic conversation, comprising:

10. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor to execute machine-readable instructions stored in the memory, the machine-readable instructions, when executed by the processor, the processor to perform the steps of the method of robotic conversation of any of claims 1 to 8.

11. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by an electronic device, performs the steps of the method of robotic conversation as claimed in any one of claims 1 to 8.