CN111508477B - Voice broadcasting method, device, equipment and computer readable storage medium - Google Patents

Voice broadcasting method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN111508477B
CN111508477B CN201910712196.XA CN201910712196A CN111508477B CN 111508477 B CN111508477 B CN 111508477B CN 201910712196 A CN201910712196 A CN 201910712196A CN 111508477 B CN111508477 B CN 111508477B
Authority
CN
China
Prior art keywords
identifier
voice
paragraph
interruption
segmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910712196.XA
Other languages
Chinese (zh)
Other versions
CN111508477A (en
Inventor
李宽
杨春勇
权圣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN201910712196.XA priority Critical patent/CN111508477B/en
Publication of CN111508477A publication Critical patent/CN111508477A/en
Application granted granted Critical
Publication of CN111508477B publication Critical patent/CN111508477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The application discloses a voice broadcasting method, a voice broadcasting device, voice broadcasting equipment and a computer-readable storage medium. The voice broadcasting method comprises the following steps: the method comprises the steps of obtaining segmented dialogs to be broadcasted, wherein the dialogs are segmented in sequence, each paragraph is marked with an interruption mark, the types of the interruption marks comprise an interruption mark and an uninterrupted mark, the segmented dialogs are broadcasted in sequence, and the voice instructions of users are responded based on the interruption marks of the paragraphs. According to the scheme, the communication efficiency can be improved in the voice broadcasting process.

Description

Voice broadcasting method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of intelligent voice technologies, and in particular, to a voice broadcasting method, apparatus, device, and computer-readable storage medium.
Background
The intelligent chat robot is widely applied in various industries, particularly service industries, so that various commercial and civil products including intelligent customer service, intelligent sound boxes, entertainment products and the like are derived. As an advanced form of the intelligent voice robot, the intelligent voice robot is more and more favored by the industry in a more natural and convenient voice interaction mode. In voice interaction, interruption is a scene with high frequency, namely, a user starts to insert a voice in the middle of voice broadcasting, so that the intention is indicated in advance, and the interaction time is saved.
However, in the voice broadcasting process, the interruption of the call completely dominated by the user may cause the user to miss some important information in the voice broadcasting process and restart the broadcasting process, thereby prolonging the communication duration and reducing the communication efficiency. In view of this, how to improve the communication efficiency in the voice broadcasting process becomes an urgent problem to be solved.
Disclosure of Invention
The method and the device mainly solve the problems that a user arbitrarily interrupts voice broadcasting, certain important information in the voice broadcasting process is missed, the broadcasting process is restarted, the communication time is too long, and the communication efficiency is too low.
In order to solve the above problem, a first aspect of the present application provides a voice broadcasting method, including obtaining a segmented conversation to be broadcasted, where the conversation is segmented in sequence, each paragraph is labeled with an interruption identifier, types of the interruption identifiers include an interruptible identifier and an non-interruptible identifier, the segmented conversation is broadcasted in sequence, and a voice command of a user is responded based on the interruption identifier of the paragraph.
In order to solve the above problem, a second aspect of the present application provides a voice broadcasting device, including an obtaining module and a broadcasting module, where the obtaining module is configured to obtain a segmented speech to be broadcasted, where the speech is segmented in sequence, each segment is labeled with an interruption identifier, types of the interruption identifiers include an interruptible identifier and a non-interruptible identifier, and the broadcasting module is configured to broadcast the segmented speech in sequence, and respond to a voice command of a user based on the interruption identifier of the segment.
In order to solve the above problem, a third aspect of the present application provides a voice broadcast device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the voice broadcast method in the first aspect.
In order to solve the above problem, a fourth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being for implementing the voice broadcast method of the first aspect.
According to the scheme, the segmented speech to be broadcasted is obtained, the speech is segmented according to the sequence, each paragraph is marked with the interrupt identifier, the types of the interrupt identifiers comprise the interrupt identifier and the non-interrupt identifier, the segmented speech is broadcasted according to the sequence, and the voice command of the user is responded based on the interrupt identifier of the paragraph, so that the user can not completely lead interrupt in the voice broadcasting process, but interrupt in cooperation with the interrupt identifier of the paragraph, the user is prevented from missing some important information in the voice broadcasting process with high probability, repeated broadcast of the same speech is avoided, the communication time is shortened, and the communication efficiency is improved.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a voice broadcast method according to the present application;
FIG. 2 is a flowchart illustrating an embodiment of step S12 in FIG. 1;
fig. 3 is a schematic flowchart of another embodiment of a voice broadcast method of the present application;
fig. 4 is a schematic frame diagram of an embodiment of a voice broadcast system based on the voice broadcast method of the present application;
fig. 5 is a schematic flowchart of another embodiment of a voice broadcast method of the present application;
fig. 6 is a schematic frame diagram of an embodiment of a voice broadcast device according to the present application;
fig. 7 is a schematic frame diagram of an embodiment of a voice broadcast apparatus according to the present application;
FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a voice broadcasting method according to an embodiment of the present application. Specifically, the method may include the steps of:
step S11: and acquiring the segmented words to be broadcasted.
Dialogs are conversational modalities in the various flows of speech interaction. Taking the customer service of an operator as an example, when a customer service telephone is connected, generally broadcasting ' respected customers, good, mobile service request … …, broadband service request … …, manual service request … …, and end of hanging up ', when a current seat cannot answer, generally broadcasting ' the seat is busy and please wait ', when a conversation ends, generally broadcasting ' thank you for incoming call, please hang up ', and the like ', and guiding a user to solve the problem step by step through a broadcasting conversation, or providing information for the user.
In this embodiment, the speech technology is segmented in sequence, each paragraph is marked with an interruption identifier, and the types of the interruption identifiers include an interruptible identifier and an non-interruptible identifier, so that whether a user is allowed to interrupt the speech in the broadcasting process can be distinguished. Still taking the above-mentioned terminology "respected customer, good you, mobile service request … …, broadband service request … …, manual service request … …, end please hang up" as an example, after being segmented in sequence, the paragraphs "respected customer, good you", paragraph "mobile service request … …", paragraph "broadband service request … …", paragraph "manual service request … …", paragraph "end please hang up", and each paragraph is marked with an interrupt identifier.
In an implementation scenario, a paragraph may be labeled with a breakable identifier or a non-breakable identifier according to the importance of the paragraph in the conversation, and still taking the above-mentioned conversation as an example, the paragraph "mobile service request … …", the paragraph "broadband service request … …", and the paragraph "manual service request … …" are important paragraphs in the conversation, and may be labeled as a non-breakable identifier, and the paragraph "respected client, hello", and the paragraph "end please hang up" may be labeled as a breakable identifier; in another implementation scenario, the paragraphs may be labeled with a breakable identifier or a non-breakable identifier according to the importance of the paragraphs in the conversational information and the position of the paragraphs in the conversational information, and still taking the above-mentioned conversational as an example, the paragraph "mobile service request … …", the paragraph "broadband service request … …", and the paragraph "manual service request … …" are more important paragraphs in the conversational, and the paragraphs are all located at the middle position of the conversational, and the paragraph "respected customer, hello" is a minor paragraph in speech and is at the head of speech, therefore, the paragraphs "honored customer, your good", the paragraph "mobile service request … …", the paragraph "broadband service request … …", and the paragraph "manual service request … …" may be marked as an unbreakable identifier, and the paragraph "end please hang up" may be marked as an unbreakable identifier, which is not limited in this embodiment.
The dialect may be Text information, and the Text information is broadcasted by TTS (Text To Speech, Speech synthesis). In an implementation scenario, the speech may also be a voice message recorded in advance, or synthesized manually, and the embodiment is not limited in this respect.
In one implementation scenario, the step of acquiring the dialect to be broadcasted in this embodiment is triggered only after the user dials the corresponding phone number. For example, when the user dials 10086, the guidance call when the call is connected is triggered, such as "respected customer, your good, mobile service request … …, broadband service request … …, manual service request … …, end call on hook" in the above call, other numbers are similar, and this embodiment is not illustrated here.
Step S12: and broadcasting the segmented words in sequence, and responding to the voice command of the user based on the interrupt identification of the paragraph.
And broadcasting the segmented words in sequence. Taking the foregoing words as an example, the paragraph "honored customer, you are", the paragraph "mobile service request … …", the paragraph "broadband service request … …", the paragraph "manual service request … …", and the paragraph "end request on hook" can be broadcasted in sequence. In addition, the user responds to the voice command of the user based on the interruption identification of the paragraph while the paragraph is broadcasted. For example, when the paragraph "honored customer, your" is broadcasted, the user's voice command is responded to based on the interruption identifier of the paragraph, when the paragraph "mobile service request … …" is broadcasted, the user's voice command is responded to based on the interruption identifier of the paragraph, and so on for other paragraphs, which is not illustrated here.
According to the scheme, the segmented speech to be broadcasted is obtained, the speech is segmented according to the sequence, each paragraph is marked with the interrupt identifier, the types of the interrupt identifiers comprise the interrupt identifier and the non-interrupt identifier, the segmented speech is broadcasted according to the sequence, and the voice command of the user is responded based on the interrupt identifier of the paragraph, so that the user can not completely lead interrupt in the voice broadcasting process, but interrupt in cooperation with the interrupt identifier of the paragraph, the user is prevented from missing some important information in the voice broadcasting process with high probability, repeated broadcast of the same speech is avoided, the communication time is shortened, and the communication efficiency is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S12 in fig. 1. Specifically, step S12 may include the steps of:
step S121: judging whether the type of the break mark of the paragraph is an unbreakable mark or an breakable mark, if the type of the break mark is the unbreakable mark, executing the step S122, and if the type of the break mark is the unbreakable mark, executing the step S123.
And judging whether the paragraph can be interrupted in the broadcasting process according to the interruption mark of the paragraph. Still taking the example of the saying "respected customer, hello, mobile service request … …, broadband service request … …, manual service request … … and end hang-up", if the type of the interruption identifier of the paragraph "respected customer, hello" is an interruptible identifier, the paragraph can be interrupted by the user during the broadcasting process, and if the type of the interruption identifier of the paragraph "respected customer, hello" is an uninterruptible identifier, the paragraph cannot be interrupted by the user during the broadcasting process.
Step S122: and when the paragraph is broadcasted, the voice instruction of the user is shielded.
When the type of the interruption flag of the paragraph in step S121 is an uninterruptible flag, the paragraph is broadcasted, and the voice command of the user is shielded.
Step S123: and monitoring and executing a voice command of a user when the paragraph is broadcasted.
When the interruption flag type of the paragraph is an interruptible flag in step S123, a voice command of the user is monitored and executed during the broadcasting of the paragraph.
Above-mentioned scheme, because each paragraph can separately report according to the preface to can the accurate control whole words art in break the position, and then can refine the granularity of breaking, also satisfied the enterprise to the touching of the important information in the words art when promoting user experience.
In an embodiment, in the step S122, if the type of the break flag of the paragraph is the non-break flag, when the paragraph is broadcasted, after the voice command of the user is shielded, the next paragraph in the conversation is obtained, and the step of "simultaneously responding to the voice command of the user based on the break flag of the paragraph" in the step S12 is executed again.
In another embodiment, in step S123, if the type of the interruption identifier of the segment is an interruptible identifier, a voice instruction of the user is monitored during broadcasting the segment, the broadcasting of the segment is stopped, and the voice instruction is recognized and executed.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating another embodiment of a voice broadcasting method according to the present application. Specifically, before the step S11, the following steps may be included:
step S31: and marking interruption marks on different parts of the non-segmented dialect according to a preset rule.
As in the foregoing embodiment, the preset rule may be importance of each part of the non-segmented term in the non-segmented term, or may be a combination of position and importance of each part of the non-segmented term in the whole non-segmented term, which specifically refers to the foregoing embodiment, and this embodiment is not described herein again.
In this embodiment, for different application scenarios, the preset rule matched with the application scenario may be used to mark the interruption identifier for different parts of the non-segmented dialect, for example, for an application scenario of an operator customer service, the preset rule matched with the application scenario may be set, or for an application scenario of an express customer service, the preset rule matched with the application scenario may be set. The break-out flag is marked at a different location of the utterance so that the unsegmented utterance can be subsequently segmented directly by the break-out flag in step S33. In an implementation scenario, when the interruption identifier is set, a segmentation identifier may be further set, and the segmentation identifier is also marked at a different position of the non-segmented utterance, so that the non-segmented utterance may be segmented by the segmentation identifier in step S32, which is not limited in this embodiment.
Still taking the "respected customer, your, mobile service request … …, broadband service request … …, manual service request … …, end request on hook" in the speech as an example, the characters "respect", "move", "width", "person" and "knot" in the speech can be marked with an interruption identifier, so that the speech is divided by taking the interruption identifier as a starting point and the next interruption identifier as an end point.
In an implementation scenario, for a specific application scenario, a mark that can interrupt identification may be performed on all dialogs that may be involved in the application scenario at one time, so that the dialogs that are matched with the voice instruction in the application scenario can be obtained subsequently based on a recognition result of the voice instruction of the user. For example, for an application scenario for operator customer service, the involved dialogs may include: initial menu bootup dialogs, dialogs for a number of functions mentioned in bootup dialogs, e.g. for broadband services, mobile services, television services, etc., and end interaction dialogs, etc.
Step S32: and segmenting the dialogue according to the marked position of the interrupt identifier to obtain the segmented dialogue marked with the interrupt identifier which is arranged in sequence.
Still take the word "respected customer, hello, mobile business please … …, broadband business please … …, manual service please … …, finish hang up" as an example, when the words "respect", "move", "width", "person" and "knot" are marked with the breaking marks in the dialogies, a certain breaking mark is taken as a starting point, the next breaking mark is taken as an end point, the dialogue is segmented to get the paragraphs "respected customer, your good", paragraph "mobile business please … …", paragraph "broadband business please … …", paragraph "manual service please … …", paragraph "end please hang up", and the break identifier of the paragraph "honored customer, your" corresponds to the break identifier marked on the "honor" word, the break identifier of the paragraph "mobile service please … …" corresponds to the break identifier marked on the "mobile" word, and so on, and the other paragraphs, this embodiment is not illustrated one by one here.
Step S33: and acquiring segmented dialogs to be broadcasted, wherein the dialogs are segmented in sequence, each paragraph is marked with an interruption identifier, and the type of the interruption identifier comprises an interruptible identifier and an non-interruptible identifier.
Please refer to step S11.
Step S34: and broadcasting the segmented words in sequence, and responding to the voice command of the user based on the interrupt identification of the paragraph.
Please refer to step S12.
According to the scheme, the marks of the marks can be interrupted by different parts of the dialect, so that the marks can be interrupted by enterprises according to the marks of the importance degrees of the parts in the dialect, the marks of the parts which are relatively important are not interrupted, the marks of the parts which are relatively minor are interrupted, and the requirement of the enterprises on the reaching of important information in the dialect is met.
Referring to fig. 4, fig. 4 is a schematic frame diagram of an embodiment of a voice broadcast system based on the voice broadcast method of the present application. As shown in fig. 4, the voice broadcast system in this embodiment may be designed based on FreeSwitch, or may be designed based on other soft switch software, such as Asterisk. The relevant technical standards for Freeswitch and Asterisk are prior art in the field, and are not described herein in detail. The voice broadcasting system in this embodiment may include a process engine, which is used to edit a process and a speech technology and provide corresponding services, and the related technical standards of the process engine are the prior art in the field, which is not described herein again.
With continued reference to fig. 4, the operator may edit dialogs in the process engine, for example, for application scenarios served by the operator, may edit a guiding dialog for starting interaction, dialogs for functions mentioned in the guiding dialog, for example, for broadband services, for mobile services, and the like, and a dialog for ending interaction at one time, which is not illustrated here, and then mark different parts of the unsegmented dialog with a promised break identifier, so that the script file such as Lua may parse the unsegmented dialog according to the break identifier to obtain sequentially arranged fragmented dialogs marked with the break identifier. In another implementation scenario, an operator may edit a dialog in a flow engine, and label different parts of an un-segmented dialog with an agreed interrupt identifier, or parse the un-segmented dialog by the flow engine to obtain sequentially arranged segmented dialogs labeled with interrupt identifiers, so that the flow engine returns the segmented dialogs to a script file such as Lua, which is not limited in this embodiment.
With continued reference to fig. 4, a script file such as Lua decides whether to call TTS for broadcasting by a spot command or to call TTS for broadcasting by a play _ and _ detect _ speed command based on the interruption identifier of the currently broadcasted paragraph, and calls ASR (Automatic Speech Recognition) to listen to the user's voice. In one implementation scenario, when the segment break execution module in the Lua script file judges that the break identifier type of the paragraph is a breakable identifier, a speech synthesis TTS broadcast paragraph is called through a play _ and _ detect _ speed command, an automatic speech recognition ASR is called to monitor a speech instruction of a user when the paragraph is broadcast, and the speech instruction of the user is executed in combination with the flow engine. In another implementation scenario, when the segment interruption execution module in the Lua script file judges that the interruption identifier type of the paragraph is an uninterruptible identifier, a speech synthesis TTS broadcast segment is called by a speak command, and the speech instruction of the user is not monitored in the process.
In the embodiment, a script file such as Lua calls TTS to play a paragraph through a play _ and _ detect _ speed command, and when ASR is called to monitor and a user voice instruction is recognized, the TTS is called to stop broadcasting the paragraph. In an implementation scenario, the underlying code of the FreeSwitch command play _ and _ detect _ speed may also be modified, the play _ and _ detect _ speed is directly used to analyze the interruption identifier of the paragraph, if the type of the interruption identifier is an uninterruptible identifier, the ASR is not called, and if the type of the interruption identifier is an interruptible identifier, the ASR is called, which is not specifically limited in this embodiment.
According to the scheme, the script files written by the script languages supported by the soft switch platforms such as Lua and Python are combined with speech synthesis (TTS) and Automatic Speech Recognition (ASR) to realize that the speech instructions of the user are responded based on the break identification of the paragraphs while the paragraphs are broadcasted, so that the scheme can utilize the logic processing of the script files without modifying the bottom codes of FreeSwtich, ASR and TTS, the system stability is ensured, the architecture mode of FreeSwtich + ASR + TTS + flow engine is not required to be greatly modified, the development cost and the training cost are reduced, and the method has high popularization value.
Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a voice broadcasting method according to another embodiment of the present application. Specifically, the method comprises the following steps:
step S501: and marking interruption marks on different parts of the non-segmented dialect according to a preset rule.
Please refer to step S31.
Step S502: and segmenting the dialogue according to the marked position of the interrupt identifier to obtain the segmented dialogue marked with the interrupt identifier which is arranged in sequence.
Please refer to step S32.
Step S503: and acquiring segmented dialogs to be broadcasted, wherein the dialogs are segmented in sequence, each paragraph is marked with an interruption identifier, and the type of the interruption identifier comprises an interruptible identifier and an non-interruptible identifier.
Please refer to step S11.
Step S504: judging whether the type of the break mark of the paragraph is an unbreakable mark or an breakable mark, if the type of the break mark is the unbreakable mark, executing the step S505, and if the type of the break mark is the unbreakable mark, executing the step S508.
Please refer to step S121.
Step S505: and when the paragraph is broadcasted, the voice instruction of the user is shielded.
Please refer to step S122.
In an implementation scenario, after the step S505, the method further includes:
step S506: the next paragraph in the dialog is acquired.
After the broadcasting of the interruption mark is completed and the interruption mark is of the type of the non-interruption mark, the next paragraph in the conversation can be further acquired. Still taking the "respected customers, good you, mobile service request … …, broadband service request … …, manual service request … …, end call on hook" in speech as an example, after the broadcast of the "respected customers, good you" paragraph, the next paragraph "mobile service request … …" in speech is continuously obtained.
Step S507: step S504 is re-executed.
And judging whether the type of the interrupt identifier of the acquired paragraph is an interrupt identifier or an interrupt-free identifier.
Step S508: and judging whether a voice instruction of the user is monitored. If yes, go to step S509, otherwise go to step S512.
Please refer to step S123.
Step S509: and stopping broadcasting the paragraph.
And when the type of the interruption identifier of the paragraph is an interruptible identifier and the voice instruction of the user is monitored, stopping broadcasting the paragraph.
Step S510: based on the recognition result of the voice instruction, requesting to acquire a speech technology matching the recognition result.
While stopping broadcasting the paragraphs, the step of "recognizing and executing the voice command" in the above embodiment may specifically include: based on the recognition result of the voice instruction, requesting to acquire a speech technology matching the recognition result. Still taking the "respected customer, hello, mobile service request … …, broadband service request … …, manual service request … …, and end hang-up" as an example, when a voice command of "please transfer broadband service" of the user is monitored in the broadcast paragraph "broadband service request … …", the broadcast paragraph "broadband service request … …" is stopped, and based on the voice command "please transfer broadband service", the matched speech is obtained, for example, "broadband service query request … …, broadband balance query request … …, and broadband coverage query request … …".
Step S511: step S504 is re-executed.
In an implementation scenario, if the type of the break identifier of the segment is a breakable identifier, after monitoring and executing the voice instruction of the user when the segment is broadcasted, the broadcast of the session may also be directly terminated, and step S504 is executed again. In another implementation scenario, if the type of the break identifier of the segment is a breakable identifier, and after monitoring and executing the voice command of the user when the segment is broadcasted, the method may further continue to broadcast the segments that are not broadcasted in the speech operation until the whole speech operation is broadcasted, and then re-execute step S504, which is not limited in this embodiment.
Step S512: the next paragraph in the dialog is acquired.
And when the type of the break identifier of the paragraph is a breakable identifier and the voice instruction of the user is not monitored, acquiring the next paragraph in the conversation.
Step S513: step S504 is re-executed.
And judging whether the type of the obtained interruption identifier of the paragraph is an interruptible identifier or an non-interruptible identifier again. Corresponding to the foregoing embodiment, when the type of the break identifier of the segment is the non-break identifier, when the segment is broadcasted, after the voice instruction of the user is shielded, the step of responding to the voice instruction of the user based on the break identifier of the segment while the segment is broadcasted may be re-executed.
Referring to fig. 6, fig. 6 is a schematic frame diagram of a voice broadcast device 60 according to an embodiment of the present application. Specifically, the voice broadcast device 60 includes an acquisition module 61 and a broadcast module 62, wherein the acquisition module 61 is configured to acquire a segmented speech to be broadcast, wherein the speech is segmented in sequence, each segment is marked with an interruption identifier, the type of the interruption identifier includes an interruptible identifier and a non-interruptible identifier, and the broadcast module 62 is configured to broadcast the segmented speech in sequence and respond to a voice command of a user based on the interruption identifier of the segment.
According to the scheme, the segmented speech to be broadcasted is obtained, the speech is segmented according to the sequence, each paragraph is marked with the interrupt identifier, the types of the interrupt identifiers comprise the interrupt identifier and the non-interrupt identifier, the segmented speech is broadcasted according to the sequence, the voice command of the user is responded according to the interrupt identifier of the paragraph, the user can not completely lead interrupt in the voice broadcasting process, the interrupt is matched with the interrupt identifier of the paragraph for interruption, the user is prevented from missing some important information in the voice broadcasting process with high probability, repeated broadcast of the same speech is avoided, the communication time is shortened, and the communication efficiency is improved.
In some embodiments, the broadcast module 62 includes a judgment submodule and an execution submodule, where the judgment submodule is configured to shield a voice instruction of a user when the execution submodule executes broadcast of a paragraph when judging that the type of an interruption identifier of the paragraph is an uninterruptible identifier, and the judgment submodule is further configured to execute a step of monitoring and executing the voice instruction of the user when broadcasting the paragraph by the execution submodule when judging that the type of the interruption identifier of the paragraph is an interruptible identifier.
In some embodiments, the determining sub-module is further configured to, when the type of the interruption identifier of the segment is determined to be an interruptible identifier, execute, by the executing sub-module, a voice instruction of the user monitored during the broadcasting of the segment, stop broadcasting the segment, and recognize and execute the voice instruction.
In some embodiments, the execution sub-module is specifically configured to request to obtain a dialog matching the recognition result based on the recognition result of the voice command, and re-execute the step of obtaining the dialog to be broadcasted and subsequent steps.
In some embodiments, if the determining submodule determines that the type of the interruption identifier of the paragraph is the non-interruptible identifier, after the step of shielding the voice instruction of the user when the executing submodule executes the broadcast paragraph, the executing submodule is further configured to obtain a next paragraph in the conversation, and re-execute the step of responding to the voice instruction of the user based on the interruption identifier of the paragraph.
In some embodiments, the voice broadcasting device 60 further includes a segmentation module 63, where the segmentation module is configured to mark an interruption identifier on different parts of the non-segmented speech according to a preset rule, and the segmentation module 63 is further configured to segment the speech according to the marked position of the interruption identifier, so as to obtain the segmented speech marked with the interruption identifier and arranged in sequence.
Referring to fig. 7, fig. 7 is a schematic frame diagram of a voice broadcast device 70 according to an embodiment of the present application. The voice broadcasting device 70 includes a memory 71 and a processor 72 coupled to each other, and the processor 72 is configured to execute program instructions stored in the memory 71 to implement the steps in any of the above-described voice broadcasting method embodiments.
Specifically, the processor 72 is configured to control itself and the memory 71 to implement the steps in any of the above-described voice broadcast method embodiments. The processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, processor 72 may be commonly implemented by a plurality of integrated circuit chips.
By the scheme, the user can be prevented from missing some important information in the voice broadcasting process with high probability, repeated broadcasting of the same conversation technique is avoided, communication time is shortened, and communication efficiency is improved.
Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer readable storage medium 80 according to the present application. The computer-readable storage medium 80 stores program instructions 81 that can be executed by the processor, and the program instructions 81 are used to implement the voice broadcast method in any of the above embodiments.
By the scheme, the user can be prevented from missing some important information in the voice broadcasting process with high probability, repeated broadcasting of the same conversation technique is avoided, communication time is shortened, and communication efficiency is improved.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. A voice broadcast method, comprising:
the method comprises the steps of obtaining segmented dialogs to be broadcasted, wherein the dialogs are segmented in sequence, each paragraph is marked with an interruption mark, and the type of the interruption mark comprises an interruption mark and an interruption-free mark;
and broadcasting the segmented dialogs in sequence, and responding to a voice instruction of a user based on the interrupt identifier of the segment.
2. The voice broadcasting method according to claim 1, wherein said broadcasting the segmented speech in sequence and responding to a voice command of a user based on the interruption identification of the segment comprises:
if the type of the interruption identifier of the paragraph is an uninterruptible identifier, shielding the voice instruction of the user when the paragraph is broadcasted;
and if the type of the interruption identifier of the paragraph is an interruptible identifier, monitoring and executing the voice instruction of the user when the paragraph is broadcasted.
3. The voice broadcasting method according to claim 2, wherein the step of monitoring and executing the voice command of the user while broadcasting the passage if the type of the interruption flag of the passage is an interruptible flag comprises:
and if the type of the break identifier of the paragraph is a break identifier, monitoring a voice instruction of the user when the paragraph is broadcasted, stopping broadcasting the paragraph, and recognizing and executing the voice instruction.
4. The voice broadcasting method according to claim 3, wherein the step of recognizing and executing the voice command includes:
requesting to acquire a dialect matched with the recognition result based on the recognition result of the voice command;
and re-executing the step of acquiring the segmented words to be broadcasted and the subsequent steps.
5. The voice broadcasting method according to claim 2, wherein the step of masking the voice command of the user when broadcasting the passage if the type of the break flag of the passage is an unbreakable flag further comprises:
acquiring a next paragraph in the dialogues;
re-executing the step of responding to the user's voice instruction based on the punctuation identification of the passage.
6. The voice broadcasting method according to claim 1, wherein the step of obtaining the segmented dialect to be broadcasted previously comprises:
marking the interruption marks on different parts of the non-segmented dialect according to a preset rule;
and segmenting the dialect according to the marked position of the interruption mark to obtain the segmented dialect marked with the interruption mark which is arranged in sequence.
7. A voice broadcast device, comprising:
the system comprises an acquisition module, a broadcast module and a broadcast module, wherein the acquisition module is used for acquiring segmented dialogs to be broadcasted, the dialogs are segmented in sequence, each paragraph is marked with an interruption identifier, and the type of the interruption identifier comprises an interruptible identifier and an uninterruptable identifier;
and the broadcasting module is used for broadcasting the segmented dialogues in sequence and responding to the voice instruction of the user based on the interrupt identifier of the segment.
8. The voice broadcast device of claim 7, further comprising:
and the segmentation module is used for marking the interruption marks on different parts of the speech technology which is not segmented according to a preset rule, and segmenting the speech technology according to the marked positions of the interruption marks to obtain the segmented speech technologies marked with the interruption marks which are sequentially arranged.
9. A voice broadcasting device comprising a memory and a processor coupled to each other;
the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1 to 6.
10. A computer-readable storage medium, characterized in that program instructions are stored which can be executed by a processor for implementing the method of any one of claims 1 to 6.
CN201910712196.XA 2019-08-02 2019-08-02 Voice broadcasting method, device, equipment and computer readable storage medium Active CN111508477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910712196.XA CN111508477B (en) 2019-08-02 2019-08-02 Voice broadcasting method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910712196.XA CN111508477B (en) 2019-08-02 2019-08-02 Voice broadcasting method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111508477A CN111508477A (en) 2020-08-07
CN111508477B true CN111508477B (en) 2021-03-19

Family

ID=71877386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910712196.XA Active CN111508477B (en) 2019-08-02 2019-08-02 Voice broadcasting method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111508477B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862966A (en) * 2019-08-22 2020-10-30 马上消费金融股份有限公司 Intelligent voice interaction method and related device
CN112185355B (en) * 2020-09-18 2021-08-24 马上消费金融股份有限公司 Information processing method, device, equipment and readable storage medium
CN112700775A (en) * 2020-12-29 2021-04-23 维沃移动通信有限公司 Method and device for updating voice receiving period and electronic equipment
CN113113013B (en) * 2021-04-15 2022-03-18 北京帝派智能科技有限公司 Intelligent voice interaction interruption processing method, device and system
CN113535925B (en) * 2021-07-27 2023-09-05 平安科技(深圳)有限公司 Voice broadcasting method, device, equipment and storage medium
CN114863929B (en) * 2022-07-11 2022-10-21 深圳市人马互动科技有限公司 Voice interaction method, device, system, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN109360567A (en) * 2018-12-12 2019-02-19 苏州思必驰信息科技有限公司 The customizable method and apparatus waken up
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
US20190122659A1 (en) * 2015-06-01 2019-04-25 Sinclair Broadcast Group, Inc. Break state detection for reduced capability devices
CN109935242A (en) * 2019-01-10 2019-06-25 上海言通网络科技有限公司 Formula speech processing system and method can be interrupted

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122659A1 (en) * 2015-06-01 2019-04-25 Sinclair Broadcast Group, Inc. Break state detection for reduced capability devices
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN109360567A (en) * 2018-12-12 2019-02-19 苏州思必驰信息科技有限公司 The customizable method and apparatus waken up
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
CN109935242A (en) * 2019-01-10 2019-06-25 上海言通网络科技有限公司 Formula speech processing system and method can be interrupted

Also Published As

Publication number Publication date
CN111508477A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN111508477B (en) Voice broadcasting method, device, equipment and computer readable storage medium
US8666040B2 (en) Analyzing Speech Application Performance
US10757631B2 (en) Pausing functions of an assistant device during an active telephone call
KR20190097267A (en) Create and send call requests to use third party agents
US20220334795A1 (en) System and method for providing a response to a user query using a visual assistant
US20110158392A1 (en) Interactive voice response (ivr) cloud user interface
EP3138272B1 (en) Voice call diversion to alternate communication method
CN108874904A (en) Speech message searching method, device, computer equipment and storage medium
US9014347B2 (en) Voice print tagging of interactive voice response sessions
CN110708430A (en) Call management method, communication terminal and storage medium
US10027800B2 (en) Method and apparatus for analyzing situation of called terminal, and program for implementing the same
US10511713B1 (en) Identifying recorded call data segments of interest
KR20200005617A (en) Speaker division
CN110995938A (en) Data processing method and device
CN112637431A (en) Voice interaction method and device and computer readable storage medium
US20190349480A1 (en) Inquiry processing method, system, terminal, automatic voice interactive device, display processing method, telephone call controlling method, and storage medium
CN109348048B (en) Call message leaving method, terminal and device with storage function
CN108962228B (en) Model training method and device
US9917948B2 (en) Call processing method and apparatus
US9749465B1 (en) Identifying recorded call data segments of interest
CN111862966A (en) Intelligent voice interaction method and related device
CN105812535A (en) Method of recording speech communication information and terminal
JP6462291B2 (en) Interpreting service system and interpreting service method
CN110740212A (en) Call answering method and device based on intelligent voice technology and electronic equipment
CN105930697A (en) Method and device for opening use permission of interactive information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant