CN114328867A

CN114328867A - Intelligent interruption method and device in man-machine conversation

Info

Publication number: CN114328867A
Application number: CN202111600658.2A
Authority: CN
Inventors: 余文芳; 曾文佳; 陈新月; 宋成业; 冯梦盈; 梁鹏斌; 李航; 韩亚昕
Original assignee: Lingxi Beijing Technology Co Ltd
Current assignee: Lingxi Beijing Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-12

Abstract

The application provides a method and a device for intelligent interruption in man-machine conversation, wherein the method is used for acquiring a voice signal of a user and converting the voice signal into a text in a conversation playing process; determining whether the text meets an interruption rule of the playing dialogue; under the condition that the text does not meet the interruption rule, inputting the text into an interruption model trained in advance to obtain the type name of the text; whether to interrupt a dialog is determined based on the type name of the text. The effect of improving the conversation quality in the man-machine conversation process is realized.

Description

Intelligent interruption method and device in man-machine conversation

Technical Field

The application relates to the field of artificial intelligence technology, in particular to a method and a device for intelligent interruption in man-machine conversation.

Background

With the rapid development of intelligent voice of the robot, man-machine conversation is commonly used in various fields.

However, in the process of reporting dialogs by the intelligent dialogue robot at present, after the intelligent dialogue robot reports the current dialogs, the user can ask questions or answer about the content in the dialogs, and the intelligent dialogue robot responds according to the questions or answers of the user, which wastes much time in the process of man-machine dialogue and makes the user experience of conversation very poor.

Therefore, how to improve the quality of the call in the man-machine conversation process becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the application aims to provide a method for intelligently interrupting a man-machine conversation, and the effect of improving the conversation quality in the man-machine conversation process can be achieved through the technical scheme of the embodiment of the application.

In a first aspect, the present application provides a method for intelligent interruption in a human-machine conversation, which obtains a voice signal of a user and converts the voice signal into a text during a conversation playing process; determining whether the text meets an interruption rule of the playing dialogue; under the condition that the text does not meet the interruption rule, inputting the text into an interruption model trained in advance to obtain the type name of the text; whether to interrupt a dialog is determined based on the type name of the text.

In the process, the voice signal of the user is recognized and converted into the text, whether the conversation is interrupted or not is judged preferentially through the interruption rule, the text is input into the interruption model when the conversation is not interrupted, whether the conversation is interrupted or not is judged again through the type output by the interruption model, the real-time and accurate interruption of the man-machine conversation is realized through the recognized text and the judgment of the interruption rule and the interruption model, and the conversation quality in the man-machine conversation process is further improved.

Optionally, determining whether to interrupt the dialog according to the type name of the text includes:

determining whether the type name of the text exists in an interruption list, wherein the interruption list contains the type name of the text corresponding to the dialog needing to be interrupted;

if the type name of the text is present in the interrupt list, the dialog is interrupted.

In the process, whether the name exists in the interrupt list or not is determined according to the type name, if the name exists in the interrupt list, the dialog is interrupted, the time of the whole process can be saved by simple judgment, and then the quick and accurate interrupt of the man-machine dialog is realized.

Optionally, in a case where it is determined that the text satisfies the interruption rule of the play session, after determining whether the text satisfies the interruption rule of the play session, the method further includes:

generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

In the process, after the conversation is interrupted by the interruption rule, the voice signal corresponding to the text is directly generated and fed back to the user, so that the user can obtain the desired answer information in time.

Optionally, determining whether the text satisfies an interruption rule of the playing session includes:

determining whether keywords and/or keyword groups exist in the text;

when determining that the keywords and/or the keyword groups exist in the text, determining that the text meets an interruption rule;

and when the keywords do not exist in the text and the keyword group does not exist, determining that the text does not meet the interruption rule.

In the process, the interruption rule can be in the form of a keyword and/or a keyword group, interruption can be realized if the requirement is met, and the process of interrupting the conversation can be quickly realized by a method for searching whether the keyword and/or the keyword group exists or not.

Optionally, before acquiring the speech signal of the user and converting the speech signal into text, the method further includes:

acquiring a plurality of texts in a corpus and type names corresponding to the texts;

preprocessing a plurality of texts and type names corresponding to the texts, and combining each text and the type name corresponding to the text into a plurality of samples according to a set format;

and training the existing basic model according to the multiple samples and a deep learning algorithm of the training model to obtain an interruption model.

In the process, the basic model can be trained by using the sample to obtain the corresponding interruption model, the algorithm in the related deep learning is used in the training process, so that the model training is more accurate, and the interruption of the intelligent robot recording playing can be realized accurately in the using process.

Optionally, in a case where it is determined not to interrupt the dialog according to the type name of the text, the method further includes:

sending a signal for prompting the user to speak or the signal is weak to the user when any voice signal of the user is not received within a preset time length, and playing the recording again;

after the recording is played again, any voice signal of the user is still not received within a preset time length, and the conversation is ended.

In the process, the user voice signal is received in real time, so that the silent state of the user can be processed and the prompt effect is achieved.

Optionally, the method further includes:

under the condition of determining to interrupt the conversation according to the type name of the text, generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

In the process, after the dialog is determined to be interrupted, the corresponding reply voice signal is generated according to the recognized text and is sent to the client, so that the user can obtain the desired information in real time, and the user experience is facilitated.

In a second aspect, the present application provides a robot comprising:

the acquisition module is used for acquiring a voice signal of a user and converting the voice signal into a text in the process of playing the conversation;

the first determining module is used for determining whether the text meets the interruption rule of the playing dialogue;

the input module is used for inputting the text into an interrupt model trained in advance under the condition that the text does not meet the interrupt rule, so as to obtain the type name of the text;

and the second determining module is used for determining whether to interrupt the conversation according to the type name of the text.

Optionally, the second determining module is specifically configured to:

Optionally, the apparatus further comprises:

the first sending module is used for generating a reply voice signal according to the text after the second determining module interrupts the conversation if the type name of the text exists in the interruption list;

and sending the reply voice signal to the user.

Optionally, the first determining module is specifically configured to:

determining whether keywords and/or keyword groups exist in the text;

Optionally, the apparatus further comprises:

the training module is used for acquiring a plurality of texts in the corpus and type names corresponding to the texts before the acquisition module acquires the voice signals of the user and converts the voice signals into the texts;

Optionally, the device number includes:

the receiving module is used for sending a signal for prompting the user to speak or the signal is weak to the user and playing the recording again under the condition that the second determining module determines not to interrupt the conversation according to the type name of the text and under the preset duration;

Optionally, the apparatus further comprises:

the second sending module is used for generating a reply voice signal according to the text under the condition that the second determining module determines that the text meets the interruption rule of the playing dialogue;

and sending the reply voice signal to the user.

In a third aspect, the present application provides a robot comprising:

a processor and a memory, said memory storing computer readable instructions which, when executed by said processor, perform the steps of the method as provided in the first aspect above.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a diagram of interaction between a robot and a user provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for intelligent interruption in human-machine conversation according to an embodiment of the present disclosure;

fig. 3 is a schematic block diagram of a robot provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a robot according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

The intelligent voice playing method and device are applied to the intelligent interrupting scene in the man-machine conversation, the specific scene is the process that the intelligent dialogue robot plays the voice record when running, the voice signal of the user is received, and the intelligent dialogue robot judges whether the current playing voice record is interrupted or not according to the voice signal.

However, at present, in the process of broadcasting and reporting speech by the intelligent dialogue robot, the user speaks, the intelligent dialogue robot cannot stop listening to the user, the intelligent dialogue robot cannot timely reply to the problem provided by the user, the user cannot obtain the answer to the problem, and the conversation experience of the user is poor. For example: in the intelligent telephone customer service for product sale, the intelligent conversation robot receives the user speaking in the speech broadcasting and reporting process, the intelligent conversation robot cannot stop to listen to the user expression, the intelligent conversation robot cannot timely reply the problem proposed by the user, and the user cannot obtain the answer of the wanted problem.

In the application, the intelligent dialogue robot establishes a dialogue channel with an RPA (robot flow automation) client through a third party outbound platform, receives a signal of user speech through a double sound card (an input sound card and an output sound card) with an internal recording function of the RPA client, converts the speech flow of the user speech into text through an ASR (speech recognition technology), inputs the text into a DM (dialogue management service), inputs the text into an NLU (semantic understanding service) to recognize emotion, entity intention, attitude and emotion of text content, the DM performs intention selection and dialogue circulation according to output information of the NLU and a designed dialogue flow, and inputs the acquired information into the RPA to select a corresponding reply word, the RPA acquires a recording file of the reply word from a recording service according to a recording number in the reply word of the DM and downloads the recording file, and then the recording is written to the output sound card in a voice stream mode, and the intelligent robot plays the recording, so that the conversation between people and the intelligent robot is realized. When the intelligent robot plays the sound recording in the conversation process, the robot recognizes the voice signal of the user again, the text is recognized through the functions, whether the text hits the interruption rule or not is judged, the sound recording played by the intelligent robot is directly interrupted when the text hits the interruption rule, the corresponding reply dialect is selected by inputting the obtained text information into the RPA, the RPA selects and downloads the corresponding text according to the sound recording number in the reply dialect of the DM and plays the corresponding sound recording file, if the information of the text does not hit the interruption rule, the text is input into the interruption model, the output result exists in the interruption list, the played sound recording is interrupted, the corresponding reply sound recording file is played, and finally the replied signal is sent to the client.

The method for intelligent interruption in man-machine conversation according to the embodiment of the present application is described in detail below with reference to fig. 1.

Referring to fig. 1, fig. 1 is an interaction diagram of a robot and a user according to an embodiment of the present application, and a method for intelligent interruption in a human-machine conversation shown in fig. 1 includes:

the user 110 sends a voice signal to the intelligent dialogue robot, the intelligent dialogue robot 120 generates a corresponding reply voice signal through recognition of the voice signal, and the intelligent dialogue robot 120 sends the reply signal to the user 110.

The real-time interruption of the recording played by the robot is realized by receiving the voice signal of the user, the intelligent judgment of the voice signal of the user is realized, the real-time interruption of the user in the process of broadcasting the speech operation by the intelligent conversation robot is supported, the user experience is improved, and the complaint rate is reduced.

The intelligent dialogue robot can directly acquire voice signals face to face through users, for example: the process of the conversation with the hotel robot can be that the voice signal of the user is directly obtained by the speaking of the user, or the voice signal of the user is obtained by the user side, and the intelligent conversation robot obtains the voice signal of the user through the user side, for example: the process of the conversation with the intelligent voice customer service can be that the user terminal acquires the voice signal of the user.

The method for intelligent interruption in man-machine conversation according to the embodiment of the present application is described in detail below with reference to fig. 2.

Fig. 2 is a flowchart of a method for intelligent interruption in a human-machine conversation according to an embodiment of the present application, where the method for intelligent interruption in a human-machine conversation shown in fig. 2 is applied to a robot, and includes:

210: and in the process of playing the conversation, acquiring a voice signal of a user and converting the voice signal into a text.

In the process, the voice signal is converted into the text, so that the text information can be identified and judged better in the subsequent steps.

The scene of the intelligent dialogue can be dialogue with intelligent voice customer service, such as: the smart phone customer service of product sales can also be a dialogue with the intelligent robot, such as: the hotel or the intelligent delivery robot used by the hotel can also be other software with related intelligent voice programs and the like, and the application is not described in detail.

Optionally, before acquiring the speech signal of the user and converting the speech signal into text, the method shown in fig. 2 may further include:

The corpus may be a corpus accumulated and classified in the NLU, a corpus in a database, or other libraries or systems capable of acquiring related corpora, which are not described herein in detail, and the corpus may be a combination of "text and text type", or may be only text, and the text type is labeled manually. The basic model can be a fastText (a text classification tool) classification model, a text can be correspondingly input through training the model to obtain the type probability corresponding to the text, and the text with the highest probability is selected as the type name of the text. In addition, the model can also use three layers of basic DNN (application framework), and the probability of the corresponding categories of a plurality of texts can be obtained according to the input of the texts. In the training of the model, the text needs to be preprocessed, some content such as the mood words, symbols, and speech signals which cannot be recognized can be removed, and then the assembled sample needs to be split into two parts, for example: 1: 1, inputting a text into a model, outputting a corresponding type name, comparing the type name with the type name corresponding to the input text, adjusting parameters of the model by using an algorithm in deep learning to realize the training of the model, verifying whether the model meets the standard or not by using a small number of samples, for example: the probability that the type name output in the sample is the same as the type name corresponding to the text is greater than 95% and is considered to be in accordance with the standard.

220: it is determined whether the text satisfies an interruption rule for playing the dialog.

In the process, if the text meets the interruption rule, the dialogue is directly interrupted, and if the text does not meet the interruption rule, the text is input into the interruption model, double judgment is carried out, and the interruption function of the dialogue can be more accurately realized.

The interruption rule may be some rule of words or phrases that satisfy the condition, for example: the occurrence of a positive or negative word in the text may also be some limiting condition for the text, such as: the 'I' and the 'do not need to appear simultaneously', the method can also be labels and type names added manually, and if the text or words in the text hit the labels and the type names and other related graph names, real-time interruption of the conversation can also be realized.

determining whether keywords and/or keyword groups exist in the text;

Further, determining whether the text satisfies an interruption rule of playing the dialog includes: determining that the text satisfies the interruption rule of the play session and determining that the text does not satisfy the interruption rule of the play session.

Optionally, in a case that it is determined that the text satisfies the interruption rule of the playing session, the method shown in fig. 2 may further include:

generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

230: and under the condition that the text does not meet the interruption rule, inputting the text into an interruption model trained in advance to obtain the type name of the text.

In the process, when the interruption rule is not met, the text is input into the interruption model, the corresponding type name is output, and whether the man-machine conversation is interrupted or not can be directly judged subsequently according to the type name.

The type name of the text may be a name indicating an intention of the text, or a word in the text, and the name may be labeled manually or acquired by a system or a corpus.

240: whether to interrupt a dialog is determined based on the type name of the text.

In the process, whether the man-machine conversation is interrupted or not can be determined directly according to the type name of the text, and the effect of quickly realizing interruption is achieved.

The interruption list contains the type name of the text corresponding to the conversation to be interrupted, some words or sentences are stored in the list and used for representing the type, the intention and the like of the text, and whether the conversation is interrupted or not can be judged by judging whether the type name output by the interruption model hits the content in the interruption list or not.

Furthermore, after the interrupt model outputs the corresponding type name, there may be two results of interrupting and not interrupting the human-machine conversation.

Optionally, in the case that it is determined not to interrupt the dialog according to the type name of the text, the method shown in fig. 2 may further include:

Optionally, in the case that it is determined to interrupt the dialog according to the type name of the text, the method shown in fig. 2 may further include:

generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

The method for intelligent interruption in a human-computer conversation is described in detail in the foregoing by means of fig. 2, and an intelligent conversation robot for intelligent interruption in a human-computer conversation is described below in conjunction with fig. 3-4.

Referring to fig. 3, a schematic block diagram of a robot 300 provided in the embodiment of the present application is shown, where the robot 300 may be a module, a program segment, or code on an electronic device. The robot 300 corresponds to the method embodiment shown in fig. 2, and can perform the steps performed by the server in the method embodiment shown in fig. 2, and the specific functions of the robot 300 can be referred to the description above, and the detailed description is appropriately omitted here to avoid redundancy.

Optionally, the robot 300 includes:

the obtaining module 310 is configured to obtain a voice signal of a user and convert the voice signal into a text in a session playing process;

a first determining module 320, configured to determine whether the text satisfies an interruption rule of the playing session;

the input module 330 is configured to input the text into an interrupt model trained in advance to obtain a type name of the text, when it is determined that the text does not satisfy the interrupt rule;

the second determining module 340 is configured to determine whether to interrupt the dialog according to the type name of the text.

Optionally, the second determining module is specifically configured to:

Optionally, the apparatus further comprises:

and sending the reply voice signal to the user.

Optionally, the first determining module is specifically configured to:

determining whether keywords and/or keyword groups exist in the text;

Optionally, the apparatus further comprises:

Optionally, the device number includes:

the receiving module is used for the second determining module to determine whether to interrupt the conversation according to the type name of the text under the condition that the conversation is not interrupted according to the type name of the text, sending a signal for prompting the user to speak or the signal is weak to the user within a preset time length after determining whether to interrupt the conversation according to the type name of the text, and playing the recording again;

and after the recording is played again, any voice signal of the user is still not received within another preset time length, and the conversation is ended.

Optionally, the apparatus further comprises:

and sending the reply voice signal to the user.

Referring to fig. 4, a schematic structural diagram of a robot provided in an embodiment of the present application is shown, where the robot may include a processor 410 and a memory 420. Optionally, the method may further include: a communication interface 430 and a communication bus 440. The robot corresponds to the above-mentioned embodiment of the method in fig. 2, and can perform the steps performed by the robot in the embodiment of the method in fig. 2, and the specific functions of the robot can be referred to the following description.

In particular, memory 420 is used to store computer readable instructions.

A processor 410 for processing the instructions stored in the memory 420, the server being capable of performing the steps involved in the intelligent dialog robot of the embodiments 210 to 240 of the method of fig. 2.

And a communication interface 430, which is used for the intelligent dialogue robot to communicate signaling or data with the terminal. For example: the embodiments of the present application are not limited to the above-described node devices for communication with a server or a user.

And a communication bus 440 for realizing direct connection communication of the above components.

The memory 420 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 420 may optionally be at least one memory device located remotely from the aforementioned processor. The memory 420 stores computer readable instructions that, when executed by the processor 410, cause the robot to perform the method processes described above with respect to fig. 2. A processor 410 may be used on the robot 300 and to perform the functions herein. The Processor 410 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component, for example, and the embodiments of the present Application are not limited thereto.

Optionally, an embodiment of the present application provides a readable storage medium, and when being executed by a processor, the computer program performs a method process performed by an electronic device in an embodiment in which the method shown in fig. 2 is applied to a server.

In summary, the embodiment of the present application provides a method and an apparatus for intelligent interruption in a human-computer conversation, in which a voice signal of a user is obtained and converted into a text during a conversation playing process; determining whether the text meets an interruption rule of the playing dialogue; under the condition that the text does not meet the interruption rule, inputting the text into an interruption model trained in advance to obtain the type name of the text; whether to interrupt a dialog is determined based on the type name of the text. The effect of improving the conversation quality in the man-machine conversation process is realized.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for intelligent interruption in man-machine conversation is characterized in that the method is applied to an intelligent conversation robot and comprises the following steps:

in the process of playing a conversation, acquiring a voice signal of a user and converting the voice signal into a text;

determining whether the text meets an interruption rule of a playing dialogue;

under the condition that the text is determined not to meet the interruption rule, inputting the text into an interruption model trained in advance to obtain the type name of the text;

and determining whether to interrupt the conversation according to the type name of the text.

2. The method of claim 1, wherein said determining whether to interrupt said dialog based on a type name of said text comprises:

and interrupting the conversation if the type name of the text exists in the interruption list.

3. The method of claim 2, wherein after interrupting the conversation if the type name of the text is present in the interruption list, the method further comprises:

generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

4. The method of any of claims 1-3, wherein the determining whether the text satisfies a break rule for playing a conversation comprises:

determining whether a keyword and/or a keyword group exists in the text;

determining that the text meets the interruption rule when determining that the keywords and/or the keyword groups exist in the text;

determining that the text does not satisfy the breaking rule when it is determined that the keyword does not exist in the text and the keyword group does not exist.

5. The method of any of claims 1-3, wherein prior to said obtaining a user's speech signal and converting to text, the method further comprises:

obtaining texts in a corpus and type names corresponding to the texts;

preprocessing the texts and the type names corresponding to the texts, and combining each text and the type name corresponding to the text into a plurality of samples according to a set format;

and training the existing basic model according to the multiple samples and a deep learning algorithm of the training model to obtain the breaking model.

6. The method according to any one of claims 1 to 3, wherein in a case where it is determined not to interrupt the dialog based on the type name of the text, the method further comprises:

sending a signal for prompting the user to speak or the signal is weak to the user and playing the recording again;

after the recording is played again, any voice signal of the user is still not received within another preset time length, and the conversation is ended.

7. The method according to any one of claims 1 to 3, further comprising:

under the condition that the text is determined to meet the interruption rule of the playing conversation, generating a reply voice signal according to the text;

and sending the reply voice signal to the user.

8. A robot, comprising:

the acquisition module is used for acquiring a voice signal of a user and converting the voice signal into a text in the process of playing a conversation;

the first determining module is used for determining whether the text meets an interruption rule of a playing dialogue;

the input module is used for inputting the text into an interrupt model which is trained in advance under the condition that the text is determined not to meet the interrupt rule, so as to obtain the type name of the text;

9. A robot, comprising: a memory and a processor, the memory storing computer readable instructions which, when executed by the processor, perform the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, comprising:

computer program, which, when run on a computer, causes the computer to carry out the method according to any one of claims 1 to 7.