CN114724561A

CN114724561A - Voice interruption method and device, computer equipment and storage medium

Info

Publication number: CN114724561A
Application number: CN202210371472.2A
Authority: CN
Inventors: 王锁平; 周登宇; 乔磊; 石浩
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-08

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a voice interruption method which comprises the steps of obtaining broadcast voice, extracting broadcast attributes from the broadcast voice, and determining interruption conditions according to the broadcast attributes; when broadcasting the broadcast voice, receiving the voice of a client, and extracting voice information from the voice of the client; and judging whether the broadcast voice is interrupted or not according to the voice information and the interruption condition. The application also provides a voice interruption device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the broadcast voice and the client voice can be stored in the block chain. The interruption condition is determined according to the broadcasting attribute to the different interruption modes are selected for use according to the broadcasting voice of difference, so that the different service scenes are adapted, the applicability is wide, whether the interruption condition is met or not is judged according to the content expressed by the client in the voice information, the accuracy of the voice interruption judgment is improved, the voice interruption mode is more consistent with the mode of real person conversation, and the user experience is effectively improved.

Description

Voice interruption method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for speech interruption, a computer device, and a storage medium

Background

At present, in a telephone or video telephone service scene such as intelligent customer service or intelligent outbound call, voice service is generally provided by replacing artificial customer service with intelligent AI. However, the existing intelligent customer service has a single interruption mode, and the interruption point of the broadcast voice is difficult to master in practical application, so that in some scenes, the intelligent customer service may stop playing the broadcast voice immediately or at any time, the semantic playing of the current broadcast voice is incomplete, the mode of human conversation is not met, and the customer experience is poor.

Disclosure of Invention

An embodiment of the application aims to provide a voice interruption method, a voice interruption device, computer equipment and a storage medium, so as to solve the problems that a voice interruption mode is single and customer experience is poor in the prior art.

In order to solve the above technical problem, an embodiment of the present application provides a speech interruption method, which adopts the following technical solutions:

acquiring broadcast voice, extracting broadcast attributes from the broadcast voice, and determining interruption conditions according to the broadcast attributes;

when the broadcast voice is played, receiving client voice and extracting voice information from the client voice;

and judging whether the broadcast voice is interrupted or not according to the voice information and the interruption condition.

Further, the break condition comprises a word number break rule and a time break rule; the step of determining the interruption condition according to the broadcast attribute comprises the following steps:

if the broadcast attribute is a general attribute, determining that the interruption condition is the time interruption rule;

and if the broadcast attribute is an important attribute, determining that the interruption condition is the word number interruption rule or the word number interruption rule and the time interruption rule.

Further, the step of determining that the interruption condition is a word number interruption rule or a mixed interruption rule includes:

matching client information corresponding to the client voice from a preset information base to obtain a matching result;

if the matching result is that the customer information corresponding to the customer voice is matched from the preset information base, determining that the word number breaking rule is the breaking condition;

and if the matching result is that the client information corresponding to the client voice is not matched from the preset information base, determining the interruption condition as the word number interruption rule and the time interruption rule.

Further, the voice information comprises voice characteristics and voice content; the step of judging whether to interrupt the broadcast voice according to the voice information and the interruption condition comprises the following steps:

extracting preset features and preset content from the interruption conditions;

when the voice feature meets the preset feature, judging whether the voice content meets the preset content;

interrupting the broadcast voice if the content of the client voice meets the preset content;

and if the content of the client voice does not meet the preset content, the broadcast voice is not interrupted.

Further, the step of interrupting the broadcast voice includes:

acquiring the current time when the content of the client voice meets the preset content;

and determining an interruption time according to the current time, and interrupting the broadcast voice according to the interruption time.

Further, when broadcasting the broadcast voice, the step of receiving the client voice includes:

acquiring preset voice receiving time;

and receiving the voice of the client after the time for playing the broadcast voice meets the voice receiving time.

Further, the step of extracting voice information from the customer voice comprises:

converting the customer speech into speech information through an ASR model;

the step of judging whether to interrupt the broadcast voice according to the voice information and the interruption condition comprises the following steps:

analyzing the voice information through an NLP model to obtain an analysis result;

and judging whether the broadcast voice is interrupted or not according to the analysis result and the interruption condition.

In order to solve the above technical problem, an embodiment of the present application further provides a speech interruption device, which adopts the following technical solutions:

the condition determining module is used for acquiring broadcast voice, extracting broadcast attributes from the broadcast voice and then determining interruption conditions according to the broadcast attributes;

the characteristic extraction module is used for receiving the voice of a client and extracting voice information from the voice of the client when the broadcast voice is played; and

and the voice interruption module is used for judging whether the broadcast voice is interrupted or not according to the voice information and the interruption condition.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

comprising a memory having computer readable instructions stored therein which when executed by the processor implement the steps of the speech interruption method as described above.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the speech interruption method as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: after the broadcasting voice is obtained and the broadcasting attribute is extracted from the broadcasting voice, determining an interruption condition according to the broadcasting attribute; when the broadcast voice is played, receiving client voice and extracting voice information from the client voice; and judging whether the broadcast voice is interrupted or not according to the voice information and the interruption condition. In the application, the interruption condition is determined according to the broadcasting attribute, different interruption modes are selected according to different broadcasting voices, so that different service scenes are adapted, the applicability of the voice interruption method is improved, whether the interruption condition is met or not is judged according to the content expressed by the client in the voice information, the accuracy of voice interruption judgment is improved, the voice interruption is more in line with the mode of real person conversation, and the user experience is effectively improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a speech interruption method according to the present application;

FIG. 3 is a schematic block diagram of one embodiment of a speech interruption device according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The client may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the speech interruption method provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the speech interruption apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of a speech interruption method in accordance with the present application is shown. The voice interruption method comprises the following steps:

step S201, broadcast voice is obtained, broadcast attributes are extracted from the broadcast voice, and then interruption conditions are determined according to the broadcast attributes.

In this embodiment, the broadcast attribute is characterized as a broadcast type of broadcast voice, where the broadcast attribute includes a general attribute and an important attribute; for example, in the financial service, if the broadcast voice is a core term (such as a price, a service content that needs to be confirmed by a customer, etc.), the broadcast attribute of the broadcast voice is an important attribute, and if the broadcast voice is a general term (such as a preamble, a company profile, etc.), the broadcast voice is a general attribute.

It should be noted that, in a complete service flow, multiple broadcast voices are included, where broadcast attributes of the broadcast voices may be the same or different.

In the interruption condition, because the broadcasting attributes of each broadcasting voice are different, different interruption conditions are set based on different broadcasting attributes, and the interruption conditions comprise at least one of word number interruption rules and time interruption rules, so that the applicability of the interruption method is improved, and the man-machine conversation is more consistent with the way of real man conversation.

And step S202, receiving the customer voice when the broadcast voice is played, and extracting voice information from the customer voice.

In this embodiment, the broadcast voice is played by the service end and listened to by the service end, the service end can be in a receiving state in the whole process when broadcasting the voice, and at the moment, the client voice sent by the service end in the whole process of broadcasting the voice by the service end can be received by the service end; in addition, the service end may be in a receiving state in a non-whole process when broadcasting the voice, that is, after a certain time in the process of broadcasting the voice by the service end, the client voice sent by the client may be received by the service end, and for this specific embodiment, please refer to the following description.

The voice information is characterized as voice content in the voice of the client.

It should be noted that the electronic device (for example, the server/terminal device shown in fig. 1) on which the speech interruption method operates may receive the client speech from a client (for example, a terminal with a call function such as a mobile phone or a tablet) through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

And step S203, judging whether the broadcast voice is interrupted or not according to the voice information and the interruption condition.

In this embodiment, if the voice information satisfies the interruption condition, interrupting the broadcast voice; and if the voice information does not meet the interruption condition, the broadcast voice is not interrupted.

It should be noted that, in a complete service flow, the service flow includes multiple broadcast voices, where the broadcast voice includes service voice and response speech; recording a breaking point when judging whether to break the broadcast voice according to the voice information and the breaking condition; after the interruption of the broadcast voice is determined according to the voice information and the interruption condition, the service voice in the next broadcast voice of the current broadcast voice can be called in the multi-section broadcast voice directly for broadcasting, or a corresponding response voice is generated according to the voice information for answering (if the voice information is 'policy content', the response voice generated after voice recognition can be 'different policy contents of different application items, please describe the application items'), and after the next broadcast voice or the response voice is broadcasted, the steps S201 to S203 are repeated. And after determining that the broadcast voice is not interrupted according to the voice information and the interruption condition, continuing to broadcast the voice from the interruption point, and repeating the steps S201 to S203.

In the method, the interruption condition is determined according to the broadcasting attribute, different interruption modes are selected according to different broadcasting voices, so that different service scenes are adapted, the applicability of the voice interruption method is improved, whether the interruption condition is met is judged according to the content expressed by the client in the voice information, the accuracy of voice interruption judgment is improved, the voice interruption is more in line with the mode of real person conversation, and the user experience is effectively improved.

In some optional implementations, in step S201, the break condition includes a word number break rule and a time break rule; the step of determining interruption conditions according to the broadcast attributes comprises the following steps:

and if the broadcasting attribute is an important attribute, determining that the breaking condition is the word number breaking rule or the word number breaking rule and the time breaking rule.

In this embodiment, the general attribute is characterized as an uncore clause that does not require customer confirmation, and the important attribute is characterized as a core clause that requires customer confirmation.

The time interruption rule has preset response time, collects customer voice in the preset response time, and judges whether to interrupt the broadcast voice according to the collected customer voice and the interruption condition in the subsequent step.

The word number interruption rule has a preset answer word number, and when the word number in the customer voice answered by the customer meets the preset answer word number, whether the broadcast voice is interrupted or not is judged according to the customer voice meeting the preset answer word number and the interruption condition in the subsequent step.

In some optional implementations, the step of determining that the interruption condition is a word number interruption rule or a mixed interruption rule includes:

if the matching result is that the client information corresponding to the client voice is matched from the preset information base, determining that the word number interruption rule is the interruption condition;

In this embodiment, when the matching result is that the client information corresponding to the client voice is matched from the preset information base, the specific information of the client is represented, so the word number interruption rule is adopted, and the preset answer word number is set according to the client information.

When the matching result is that the client information corresponding to the client voice is not matched from the preset information base, the specific information of the client is represented as not having, so that the universal preset answer word number and the reply time of the common client are set by adopting the mixing of the word number interruption rule and the time interruption rule; for example, in setting the number of the common preset answer words, if the name of the client needs to be confirmed, the setting is performed according to the number of the common words of the name and a certain redundancy is reserved, and if the number of the common words of the name is 3, the number of the common preset answer words is 5.

In some alternative implementations, the voice information includes voice characteristics and voice content; the step of judging whether to interrupt the broadcast voice according to the voice information and the interruption condition comprises the following steps:

extracting preset features and preset content from the interruption conditions;

In this embodiment, if the interruption condition is a time interruption rule, the preset feature is a preset response time, and the voice feature is a voice time; after the voice time is the same as the preset response time, whether the voice content meets the preset content is judged, wherein the preset response time can be set according to the reply time of the historical client, the preset content can be 'good', 'skip', and the like, which means that words capable of interrupting skipping of the currently broadcasted voice can be skipped.

If the interruption condition is a word number interruption rule, the preset feature is a preset answer word number, and the voice feature is a voice word number; for example, when the number of words of voice is the same as the number of response words, it is determined whether the voice content satisfies the preset content, when the play attribute of the broadcast voice is an important attribute and there is client information corresponding to the client voice in the preset information base, such as when the client name needs to be confirmed in a service, the number of response words is determined according to the number of words of the client name of the client information prestored in the preset information base, and the preset content is determined according to the client name, and when the number of words of voice having a voice characteristic is equal to the number of response words, it is determined whether the content replied in the voice content is the client name in the preset content.

If the interruption condition is a word number interruption rule and a time interruption rule, the preset features are preset response time and preset response word number, the voice features are voice time and voice word number, and the preset response time, the preset response word number, the voice time and the voice word number can be referred to above; for example, when the playing attribute of the broadcast voice is an important attribute and the preset information base has the client information corresponding to the client voice, the preset response time or the preset number of response words may be satisfied first, and if the client name needs to be confirmed in the service, if the number of voice words in the voice feature first satisfies the preset number of response words, it is not determined whether the voice time satisfies the preset response time, and it is directly determined whether the content replied in the voice content is the client name in the preset content.

Therefore, according to different broadcasting attributes of broadcasting voice, various interruption conditions are set, including word number interruption rules, time interruption rules, word number interruption rules and time interruption rules, so as to adapt to different service scenes, and the applicability is wide.

In some optional implementations, the step of interrupting the broadcast voice includes the steps of:

In the embodiment, in the process of dialogue with the client, the duration of dialogue with the client is continuously acquired, when the content of the voice of the client meets the preset content, the current duration is recorded as the current time, the interruption time is determined according to the current time, the acquisition of the duration of dialogue with the client is suspended after the interruption time is determined, the semantics of the voice of the client is recognized according to an NLP (natural language processing) model, the dialogue for replying the voice of the client is generated, the dialogue for replying the voice of the client is played from the interruption time, and the acquisition of the duration of dialogue with the client is restarted.

For example, when the broadcast voice is the notice of the financial contract, at this time, after the client replies with the good voice or knows the voice of the client who includes the determined conversation, the current time when the client replies with the good voice or knows the good voice, the interruption time is determined according to the current time to interrupt the currently broadcast voice, after the NLP model (natural language processing technology) recognizes the voice of the client to generate the conversation of the client, the conversation of the client is played from the interruption time, and the acquisition of the conversation time with the client is restarted.

In some optional implementation manners, in step S202, when the broadcast voice is played, the step of receiving the client voice includes:

acquiring preset voice receiving time;

In this embodiment, when the broadcast voice is played, timing is started, and when the timing time reaches the voice receiving time, the client voice is received; if the broadcast voice contains the matters which the client needs to pay attention to, after the matters which the client needs to pay attention to in the broadcast voice of the service are played, namely the time for broadcasting the voice meets the voice receiving time, the voice of the client is received, and therefore the client can be ensured to be clear of the matters which the client needs to pay attention to in the service.

In some optional implementation manners, in step S202, the voice information is text information; the step of extracting voice information from the customer voice comprises:

converting the customer speech into speech information through an ASR model;

In this embodiment, the speech information includes a speech text, and the ASR model is an automatic speech recognition technology, and is used to convert a client speech into a speech text; the NLP model is a natural language processing technology and is used for recognizing the semantics of the voice text, namely, the voice information is analyzed through the NLP model to obtain the semantics of the voice text, and then whether the broadcast voice is interrupted or not is judged according to the recognized semantics of the voice text and interruption conditions.

It should be emphasized that, in order to further ensure the privacy and security of the broadcast voice and the client voice information, the broadcast voice and the client voice information may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a speech interruption apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is specifically applicable to various electronic devices.

As shown in fig. 3, the speech interruption device 300 according to the present embodiment includes: a condition determination module 301, a feature extraction module 302, and a speech interruption module 303. Wherein:

In the application, the interruption condition is determined according to the broadcasting attribute, different interruption modes are selected according to different broadcasting voices, so that different service scenes are adapted, the applicability of the voice interruption method is improved, whether the interruption condition is met or not is judged according to the content expressed by the client in the voice information, the accuracy of voice interruption judgment is improved, the voice interruption is more in line with the mode of real person conversation, and the user experience is effectively improved.

In some optional implementations, the condition determining module 301 includes a first determining submodule and a second determining submodule. Wherein:

In some optional implementations, the second determining submodule includes a matching unit, a first determining unit, and a second determining unit. Wherein:

the matching unit is used for matching the client information corresponding to the client voice from a preset information base to obtain a matching result;

a first determining unit, configured to determine that the word number breaking rule is the breaking condition if the matching result is that customer information corresponding to the customer voice is matched from the preset information base;

a second determining unit, configured to determine that the interruption condition is the word number interruption rule and the time interruption rule if the matching result indicates that the client information corresponding to the client voice is not matched from the preset information base.

In some optional implementations, the voice interruption module 303 includes an extraction sub-module, a judgment sub-module, a first interruption sub-module, and a non-interruption sub-module. Wherein:

the extraction submodule is used for extracting preset characteristics and preset content from the interruption condition;

the judging submodule is used for judging whether the voice content meets the preset content or not when the voice characteristics meet the preset characteristics;

the first disconnection submodule is used for disconnecting the broadcast voice if the content of the client voice meets the preset content;

and the non-interruption submodule is used for not interrupting the broadcast voice if the content of the client voice does not meet the preset content.

In some optional implementations, the first breaking submodule further includes an obtaining unit and a breaking unit. Wherein:

the obtaining unit is used for obtaining the current time when the content of the client voice meets the preset content;

and the interruption unit is used for determining an interruption time according to the current time and interrupting the broadcast voice according to the interruption time.

In some optional implementations, the feature extraction module 302 includes an obtaining sub-module and a receiving sub-module, where:

the acquisition submodule is used for acquiring preset voice receiving time;

and the receiving submodule is used for receiving the voice of the client after the time for playing the broadcast voice meets the voice receiving time.

In some alternative implementations, the feature extraction module 302 described above includes a conversion sub-module. Wherein:

and the conversion sub-module is used for converting the client voice into voice information through an ASR model.

The voice interruption module comprises an analysis submodule and a second interruption submodule. Wherein:

the analysis submodule is used for analyzing the voice information through an NLP (non line segment) model to obtain an analysis result;

and the second interruption submodule is used for judging whether the broadcasting voice is interrupted or not according to the analysis result and the interruption condition.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can be in man-machine interaction with a client in a keyboard, a mouse, a remote controller, a touch panel or a voice control device and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various application software, such as computer readable instructions of a speech interruption method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or to process data, such as computer readable instructions for executing the speech interruption method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The present application provides yet another embodiment, which provides a computer-readable storage medium having stored thereon computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the speech interruption method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A speech interruption method, comprising the steps of:

2. A speech interruption method according to claim 1, characterized in that the interruption conditions comprise word number interruption rules and time interruption rules; the step of determining the interruption condition according to the broadcast attribute comprises the following steps:

3. A speech interruption method according to claim 2, wherein said step of determining that said interruption condition is a word count interruption rule or a hybrid interruption rule comprises:

4. A speech interruption method according to claim 2 or 3, characterized in that the speech information comprises speech characteristics and speech content; the step of judging whether to interrupt the broadcast voice according to the voice information and the interruption condition comprises the following steps:

extracting preset features and preset content from the interruption conditions;

5. A speech interruption method according to claim 4 wherein said step of interrupting said announcement speech comprises:

6. A voice interruption method according to any one of claims 1 to 3, wherein the step of receiving the customer's voice while playing the announcement voice comprises:

acquiring preset voice receiving time;

7. A speech interruption method according to any one of claims 1 to 3, characterized in that the speech information is text information; the step of extracting voice information from the customer voice comprises:

converting the customer speech into speech information through an ASR model;

8. A speech interruption device, comprising:

and the voice interruption module is used for judging whether the broadcasting voice is interrupted or not according to the voice information and the interruption condition.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the speech interruption method of any of claims 1 to 7.

10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the speech interruption method of any of claims 1 to 7.