CN112242135A - Voice data processing method and intelligent customer service device - Google Patents

Voice data processing method and intelligent customer service device Download PDF

Info

Publication number
CN112242135A
CN112242135A CN201910650265.9A CN201910650265A CN112242135A CN 112242135 A CN112242135 A CN 112242135A CN 201910650265 A CN201910650265 A CN 201910650265A CN 112242135 A CN112242135 A CN 112242135A
Authority
CN
China
Prior art keywords
user
emotion
speaking
content
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910650265.9A
Other languages
Chinese (zh)
Inventor
陈孝良
祖拓
王江
冯大航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN201910650265.9A priority Critical patent/CN112242135A/en
Publication of CN112242135A publication Critical patent/CN112242135A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice data processing method and an intelligent customer service device, wherein the method comprises the following steps: the method comprises the steps that an intelligent customer service device collects audio signals sent by user equipment in real time in the process of playing preset voice contents to the user equipment; detecting audio information in the audio signal for indicating the user behavior type; if the audio information is determined to be used for indicating that the user has a question, the playing of the preset voice content is interrupted; and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within the preset time. In the scheme, when the intelligent customer service device detects that the user speaks in the process of playing the voice content, the voice playing volume is reduced or the voice playing is interrupted according to the behavior type of the user. The speech content of the user is collected and identified, subsequent services are provided for the user, and the use experience of the user is improved.

Description

Voice data processing method and intelligent customer service device
Technical Field
The invention relates to the technical field of voice data processing, in particular to a voice data processing method and an intelligent customer service device.
Background
With the continuous development of science and technology, artificial intelligence technology is also widely used gradually. The intelligent customer service device is a common artificial intelligence technology for serving users.
The intelligent customer service device generally provides services for users in a voice playing mode, and the current mode of providing services for users by the intelligent customer service device is as follows: the method comprises the steps of introducing contents such as services and activities to a user, identifying the user's questions according to audio signals sent by user equipment, and finally answering the user's questions. However, the current intelligent customer service device cannot interrupt the voice playing of the intelligent customer service device in the process of introducing services and activities to the user or in the process of answering the user questions. In other words, in the process of playing the voice by the intelligent customer service device, even if the user has a new problem or does not want to hear the currently played content, the intelligent customer service device still plays the currently played voice content completely, and then re-identifies the audio signal of the user, thereby greatly reducing the user experience.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method for processing voice data and an intelligent customer service device, so as to solve the problems of low user experience and the like of the existing intelligent customer service device.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiment of the invention discloses a method for processing voice data, which is suitable for an intelligent customer service device, and comprises the following steps:
the method comprises the steps that in the process that an intelligent customer service device plays preset voice contents to user equipment, the intelligent customer service device collects audio signals sent by the user equipment in real time;
detecting audio information which is used for indicating a user behavior type in the audio signal, wherein the user behavior type is that a user has a question or the user is speaking;
if the audio information is determined to be used for indicating that the user has a question, the preset voice content is interrupted to be played;
and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within a preset time.
Preferably, if the audio information is used to indicate that the user has a question, after the playing of the preset voice content is interrupted, the method further includes:
inquiring the user about the user's question and collecting an audio signal sent by the user equipment;
performing voice recognition and emotion recognition by using the audio signal, and determining speaking content of the user and an emotion tag for indicating the speaking emotion of the user;
and answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
Preferably, after determining that the audio information is used to indicate that the user is speaking, the method further includes:
after the preset time, if audio information used for indicating that the user is speaking is detected to exist in the audio signal, the preset voice content is interrupted to be played;
performing voice recognition and emotion recognition by using the audio signal, and determining speaking content of the user and an emotion tag for indicating the speaking emotion of the user;
and answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
Preferably, the answering the question of the user, forwarding a manual customer service or ending a call according to the speaking content and the emotion label includes:
if the speaking content and/or the emotion label accord with a preset reply rule, inquiring the user about the user question and replying the user question;
if the speaking content and/or the emotion label accord with a preset switching rule, switching to manual customer service for the user;
if the speaking content and/or the emotion label accord with a preset hang-up rule, ending the conversation with the user equipment;
and executing a reply rule, a forwarding rule or a hang-up rule according to the speaking content, wherein the priority of the reply rule, the forwarding rule or the hang-up rule is higher than that of the emotion label.
Preferably, the voice recognition and emotion recognition by using the audio signal, and the determination of the speaking content of the user and the emotion label for indicating the speaking emotion of the user, include:
and simultaneously inputting the audio signal into a preset voice recognition model and a preset emotion recognition model for voice recognition and emotion recognition, and determining the speaking content of the user and an emotion label for indicating the speaking emotion of the user, wherein the voice recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
Preferably, before performing speech recognition and emotion recognition by using the audio signal, and determining the speaking content of the user and an emotion tag indicating the emotion of speaking of the user, the method further includes:
and determining the age of the user based on the user information of the user, and selecting an emotion recognition model corresponding to the age, wherein emotion recognition models corresponding to different age groups are preset.
A second aspect of an embodiment of the present invention discloses an intelligent customer service device, including:
the intelligent customer service device is used for collecting audio signals sent by the user equipment in real time in the process of playing preset voice contents to the user equipment;
the determining unit is used for detecting audio information which is used for indicating a user behavior type in the audio signal, wherein the user behavior type is that a user has a question or the user is speaking;
the first interruption unit is used for interrupting the playing of the preset voice content if the audio information is determined to indicate that the user has a question;
and the adjusting unit is used for reducing the volume of the preset voice content in the preset time if the audio information is determined to indicate that the user is speaking.
Preferably, the intelligent customer service device further comprises:
a second interruption unit, configured to, after the preset time, interrupt playing of the preset voice content if it is detected that audio information indicating that a user is speaking exists in the audio signal;
the recognition unit is used for carrying out voice recognition and emotion recognition by utilizing the audio signal, and determining speaking content of the user and an emotion label for indicating the speaking emotion of the user;
and the processing unit is used for answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
Preferably, the determining unit is specifically configured to: and simultaneously inputting the audio signal into a preset voice recognition model and a preset emotion recognition model for voice recognition and emotion recognition, and determining the speaking content of the user and an emotion label for indicating the speaking emotion of the user, wherein the voice recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
Preferably, the processing unit includes:
the reply module is used for inquiring the user about the question and replying the question of the user if the speaking content and/or the emotion label accord with a preset reply rule;
the switching module is used for switching to the manual customer service for the user if the speaking content and/or the emotion label accord with a preset switching rule;
and the hang-up module is used for ending the conversation with the user if the speaking content and/or the emotion label accord with a preset hang-up rule.
And executing a reply rule, a forwarding rule or a hang-up rule according to the speaking content, wherein the priority of the reply rule, the forwarding rule or the hang-up rule is higher than that of the emotion label.
Based on the above-mentioned voice data processing method and intelligent customer service device provided by the embodiments of the present invention, the method is: the method comprises the steps that an intelligent customer service device collects audio signals sent by user equipment in real time in the process of playing preset voice contents to the user equipment; detecting audio information in the audio signal for indicating the user behavior type; if the audio information is determined to be used for indicating that the user has a question, the playing of the preset voice content is interrupted; and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within the preset time. In the scheme, when the intelligent customer service device detects that the user speaks in the process of playing the voice content, the voice playing volume is reduced or the voice playing is interrupted according to the behavior type of the user. The speech content of the user is collected and recognized, follow-up services such as answering questions, switching manual customer service or ending conversation are provided for the user, and the use experience of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for processing voice data according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a processing method of voice data according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating another voice data processing method according to an embodiment of the present invention;
fig. 4 is a block diagram of an intelligent customer service device according to an embodiment of the present invention;
FIG. 5 is a block diagram of another intelligent customer service device according to an embodiment of the present invention;
FIG. 6 is a block diagram of another intelligent customer service device according to an embodiment of the present invention;
fig. 7 is a block diagram of another intelligent customer service device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, the current intelligent customer service device cannot interrupt the voice playing of the intelligent customer service device in the process of introducing services and activities to the user or in the process of answering the user questions. In the process of playing the voice by the intelligent customer service device, even if the user has a new problem or does not want to hear the currently played content, the intelligent customer service device still can completely play the currently played voice content, and then re-identifies the audio signal of the user, so that the use experience of the user is greatly reduced.
Therefore, the embodiment of the present invention provides a method for processing voice data and an intelligent customer service device, where in the process of voice playing, when a user is detected to speak, the intelligent customer service device reduces the voice playing volume or interrupts the voice playing. The speech content of the user is collected and identified, and subsequent services are provided for the user, so that the use experience of the user is improved.
Referring to fig. 1, a flowchart of a method for processing voice data, which is applicable to an intelligent customer service device and provided by an embodiment of the present invention, is shown, where the method includes the following steps:
step S101: the method comprises the steps that in the process that the intelligent customer service device plays preset voice contents to user equipment, the intelligent customer service device collects audio signals sent by the user equipment in real time.
In the process of implementing step S101 specifically, when the user communicates with the intelligent customer service device through the user equipment, the intelligent customer service device may play preset voice content to the user equipment, for example: for the intelligent customer service device of the bank, when the intelligent customer service device of the bank communicates with a customer, the intelligent customer service device of the bank introduces related products released by the bank through playing voice contents. And when the intelligent customer service device plays the voice content, the intelligent customer service device can collect the audio signal sent by the user equipment in real time.
Step S102: and the intelligent customer service device detects audio information which is used for indicating the behavior type of the user in the audio signal.
It should be noted that, when a user communicates with the intelligent customer service device through user equipment, the user equipment may collect an audio signal and send the audio signal to the intelligent customer service device. And determining a user behavior type according to the audio information in the audio signal, wherein the user behavior type is that the user has a question or the user is speaking.
In the process of implementing step S102, Voice Activity Detection (VAD) and Voice judgment are performed on the audio signal at the same time, and it is determined whether the user is speaking or whether the user has a question. And judging the tone of the audio signal by using a preset tone judgment model.
And if the audio information is determined to be used for indicating that the user has a question, interrupting the playing of the preset voice content. The user is asked questions and audio signals sent by the user device are collected. For example: is the user "casting" determined when the user's audio signal is detected? "," o? When the words of the query are represented, interrupting the currently played voice content, inquiring whether the user needs help or not, and collecting audio signals acquired by the user equipment after inquiring. And performing voice recognition and emotion recognition by using the audio signal, and determining the speaking content of the user and an emotion tag for indicating the speaking emotion of the user. And answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
If the user is determined to be speaking through the VAD, in order to further determine that the user is speaking, the audio signal is used as the input of a preset VAD model to determine whether audio information used for indicating that the user is speaking exists in the audio signal, and if the audio information exists in the audio signal determined through the VAD model, the user is finally determined to be speaking.
It should be noted that, the VAD model is obtained by training the neural network model based on the audio sample data in advance. And training a neural network model based on the tone word sample data in advance to obtain the tone judgment model.
Preferably, the VAD and VAD models are used simultaneously to determine audio information in the audio signal indicating that the user is speaking, and when both the VAD and VAD models determine that the audio information is present in the audio signal, the user is finally determined to be speaking.
Step S103: and if the audio information is determined to be used for indicating that the user has a question, interrupting the playing of the preset voice content.
Step S104: and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within a preset time.
It should be noted that, when the intelligent customer service device plays the voice, in order to ensure that the user clearly listens to the played content, the volume of the played voice is usually large. When a user has a problem and needs to ask, if the intelligent customer service device still plays the voice with a large volume, the user experience is seriously influenced.
In the process of implementing step S104 specifically, when the intelligent customer service device determines that the user is speaking, in order to ensure the user experience, the intelligent customer service device needs to reduce the volume of playing voice within a preset time.
Preferably, after the preset time, if it is detected that the audio signal contains the audio information indicating that the user is speaking, the preset voice content is interrupted to be played.
In a specific implementation, after the preset time, if it is detected that audio information for indicating that the user is speaking exists in the audio signal, that is, after the volume of the voice content is reduced for the preset time, the user is still speaking, the preset voice content is interrupted from being played.
Preferably, when it is determined that the user is speaking, the form of adjusting the voice playing by the intelligent customer service device includes, but is not limited to, the following three cases:
the first condition is as follows: and the intelligent customer service device reduces the volume of playing the preset voice content when the user speaks. Namely, when the user speaks, the intelligent voice reduces the volume of the played voice and keeps the volume of the played voice at a preset value in the whole process.
Case two: and the intelligent customer service device interrupts the preset voice content which is played. Namely, when the user speaks, the preset voice content being played is interrupted.
Case three: and reducing the volume of playing the preset voice content within the preset time, and if the user does not stop speaking after the preset time, interrupting the playing of the preset voice content. For example: and firstly reducing the volume of the played voice within 1 second of the fact that the user is speaking. And if the user still speaks after 1 second, interrupting the playing of the current voice content.
Preferably, after step S104 is executed, voice recognition and emotion recognition are performed by using the audio signal, and the speaking content of the user and the emotion label indicating the emotion of speaking of the user are determined. And answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
In a further implementation, according to the speaking content and the emotion label, when the speaking content and the emotion label conform to a preset pushing rule, interface operation information including an interface operation website is pushed to the user equipment. For example: and when the user is determined to be impatient or complain that the voice operation is slow through the acquired audio signal, determining the operation which the user wants to execute according to the audio information. And if the user equipment is equipment with a built-in operation interface, for example, an app application with an operation interface or a special counter machine, directly pushing the operation interface to the user equipment. And if the user equipment is equipment without a built-in operation interface, pushing an operation interface website to the user equipment, and switching the user equipment to the corresponding operation interface when the user clicks the website.
In the voice recognition and emotion recognition process, the intelligent customer service device uploads the audio signal to a cloud server, and meanwhile, voice recognition and emotion recognition are carried out on the audio signal by using a voice recognition model and an emotion recognition model preset in the cloud server, so that the speaking content of the user and an emotion label used for indicating the speaking emotion of the user are determined. The speech recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
It should be noted that, because the expression modes of different emotions are different for people of different ages, emotion recognition models corresponding to different ages are preset. For example: and (3) respectively setting emotion recognition models corresponding to six categories of male juveniles, female juveniles, male middle-aged persons, female middle-aged persons, male old persons and female old persons. Before emotion recognition is carried out on the audio signal of the user, the age of the user is determined according to the user information of the user, and an emotion recognition model corresponding to the age is selected for emotion recognition.
Further, it should be noted that the above described division of the emotion recognition model includes, but is not limited to, the above six categories.
In a further implementation, the intelligent customer service device performs corresponding operations according to the speaking content and the emotion label, and the operations include but are not limited to: answering the user's question, switching to manual customer service, or ending the call. The specific contents are detailed as follows:
and if the speaking content and/or the emotion label accord with a preset reply rule, inquiring the user about the user question and replying the user question. For example: when detecting that the user says "what you are saying" in a flat mood, the smart customer service device asks the user what questions are not clear, and replies to the questions asked by the user.
And if the speaking content and/or the emotion label accord with a preset switching rule, switching to the manual customer service for the user. For example: and when detecting that the user says 'I need to transfer to the artificial customer service', transferring to the artificial customer service for the user.
And if the speaking content and/or the emotion label accord with a preset hang-up rule, ending the conversation with the user. For example: when it is detected that the user says "i are not interested" in a angry emotion, the intelligent customer service device ends the call with the user.
It should be noted that different types of emotion labels are preset, and when emotion recognition is performed on the audio signal, an emotion label for indicating a speaking emotion of the user is determined according to the audio signal.
It should be further noted that the priority of executing the reply rule, the forwarding rule or the hang-up rule according to the speaking content is higher than that of the emotion label. For example: and if the speaking content conforms to a reply rule and the emotion tag conforms to a switching rule, executing the reply rule, namely inquiring the user about the question and replying the question of the user. Another example is: and if the speaking content accords with a hangup rule and the emotion tag accords with a switching rule, executing the hangup rule, namely ending the conversation with the user.
In the embodiment of the invention, in the process of playing the voice content, when the intelligent customer service device detects that the user speaks, the intelligent customer service device detects the audio information which is used for indicating the behavior type of the user in the audio signal. And interrupting playing the preset voice content according to the behavior type of the user, inquiring the requirement of the user and collecting the audio signal, or reducing the volume of playing the preset voice content within the preset time, and after the preset time, the user still speaks to interrupt playing the preset voice content. According to the collected audio signals of the user, the speaking content and the speaking emotion of the user are identified, corresponding operation is executed according to the speaking content and the speaking emotion, and the use experience of the user is improved.
To better explain the content shown in each step in fig. 1, the processing method of voice data shown in fig. 2 and 3 is illustrated by way of example.
Referring to fig. 2, a schematic flow chart of a processing method of voice data provided by an embodiment of the present invention is shown, including the following steps:
step S201: the intelligent customer service device collects audio signals of the user side.
Step S202: and the intelligent customer service device detects whether the user speaks or not by utilizing a VAD algorithm based on the collected audio signals, if so, step S203 is executed, and if not, the step S201 is executed again.
Step S203: the intelligent customer service device further determines whether the user is speaking by using a neural network VAD model based on the collected audio signals, if so, interrupts voice playing or reduces voice playing volume, and executes step S204, otherwise, returns to execute step S201.
Step S204: and the intelligent customer service device performs voice recognition and emotion recognition on the audio signal, and is switched to manual customer service and answer questions or hang up according to the voice recognition result and the emotion recognition result.
Referring to fig. 3, a flow chart of a processing method of voice data provided by the embodiment of the invention is shown, which includes the following steps:
step S301: the intelligent customer service device collects audio signals of the user side.
Step S302: the intelligent customer service device simultaneously utilizes the VAD algorithm and the neural network VAD model to determine whether the user is speaking, if the VAD algorithm and the neural network VAD model both determine that the user is speaking, the voice playing is interrupted or the voice playing volume is reduced, and step S303 is executed. If the VAD algorithm and/or the neural network VAD model determines that the user does not speak, the step S301 is executed.
Step S303: and the intelligent customer service device performs voice recognition and emotion recognition on the audio signal, and is switched to manual customer service and answer questions or hang up according to the voice recognition result and the emotion recognition result.
It should be noted that, for the execution principle of each step in fig. 2 and fig. 3, reference may be made to the content corresponding to each step in fig. 1 in the embodiment of the present invention, and details are not repeated here.
In the embodiment of the invention, in the process of playing the voice content, when the intelligent customer service device detects that the user speaks, the volume of playing the preset voice content is reduced within the preset time, and after the preset time, the user still speaks to interrupt the playing of the preset voice content. According to the collected audio signals of the user, the speaking content and the speaking emotion of the user are identified, corresponding operation is executed according to the speaking content and the speaking emotion, and the use experience of the user is improved.
Corresponding to the method for processing voice data provided in the foregoing embodiment of the present invention, referring to fig. 4, an embodiment of the present invention further provides a structural block diagram of an intelligent customer service device, where the intelligent customer service device includes: the system comprises a collecting unit 401, a determining unit 402, a first interrupting unit 403 and an adjusting unit 404;
the acquisition unit 401 is configured to collect, in real time, an audio signal sent by the user equipment when the intelligent customer service device plays the preset voice content to the user equipment.
A determining unit 402, configured to detect audio information in the audio signal, where the audio information is used to indicate a user behavior type, where the user behavior type is that a user has a question or that the user is speaking. The process of determining the user behavior type is described in the above embodiment of the present invention, and reference is made to the corresponding contents in step S102 in fig. 1.
A first interrupting unit 403, configured to interrupt playing of the preset voice content if it is determined that the audio information is used to indicate that the user has a question.
An adjusting unit 404, configured to reduce the volume of playing the preset voice content within a preset time.
In the embodiment of the invention, in the process of playing the voice content, when the intelligent customer service device detects that the user speaks, the intelligent customer service device detects the audio information which is used for indicating the behavior type of the user in the audio signal. The preset voice content is interrupted to be played according to the behavior type of the user, or the volume of the preset voice content is reduced within the preset time, the user still speaks after the preset time, the preset voice content is interrupted to be played, and the use experience of the user is improved.
Preferably, referring to fig. 5 in conjunction with fig. 4, a structural block diagram of an intelligent customer service provided in an embodiment of the present invention is shown, where the intelligent customer service device further includes:
a second interruption unit 405, configured to, after the preset time, interrupt playing of the preset voice content if it is detected that audio information indicating that the user is speaking exists in the audio signal.
In a specific implementation, after the adjusting unit 404 reduces the volume of playing the preset voice content within a preset time, the second interrupting unit 405 is executed.
A recognition unit 406, configured to perform speech recognition and emotion recognition by using the audio signal, and determine speaking content of the user and an emotion tag indicating a speaking emotion of the user.
Preferably, in a specific implementation, the identifying unit 406 is further configured to determine an age of the user based on the user information of the user, and select an emotion recognition model corresponding to the age, where emotion recognition models corresponding to different age groups are preset.
And the processing unit 407 is configured to answer the question of the user, transfer manual customer service, or end a call according to the speaking content and the emotion tag.
Correspondingly, the determining unit 402 is specifically configured to: and simultaneously inputting the audio signal into a preset voice recognition model and a preset emotion recognition model for voice recognition and emotion recognition, and determining the speaking content of the user and an emotion label for indicating the speaking emotion of the user, wherein the voice recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
In the embodiment of the invention, the speaking content and speaking emotion of the user are identified according to the acquired audio signal of the user. And corresponding operation is subsequently executed according to the speaking content and the speaking emotion, so that the use experience of the user is improved.
Preferably, referring to fig. 6 in combination with fig. 5, a structural block diagram of an intelligent customer service provided in an embodiment of the present invention is shown, after the first interrupt unit 403 is executed, the intelligent customer service device further includes:
an inquiry unit 408, configured to inquire the user about the question of the user, and collect an audio signal sent by the user equipment. The recognition unit 406 and the processing unit 407 are executed.
Preferably, referring to fig. 7 in conjunction with fig. 5, a structural block diagram of an intelligent customer service provided in an embodiment of the present invention is shown, where the processing unit 407 includes:
a replying module 4071, configured to ask the user a question and reply to the question if the speaking content and/or the emotion tag meet a preset replying rule.
A switching module 4072, configured to switch to an artificial customer service for the user if the speaking content and/or the emotion tag meet a preset switching rule.
A hang-up module 4073, configured to terminate the call with the user if the speech content and/or the emotion tag meet a preset hang-up rule.
And executing a reply rule, a forwarding rule or a hang-up rule according to the speaking content, wherein the priority of the reply rule, the forwarding rule or the hang-up rule is higher than that of the emotion label.
In the embodiment of the invention, the intelligent customer service device collects and identifies the speaking content and speaking emotion of the user. And providing follow-up services such as answering questions, switching manual customer service or ending the conversation for the user according to the speaking content and the speaking mood, and improving the use experience of the user.
In summary, an embodiment of the present invention provides a method for processing voice data and an intelligent customer service device, where the method includes: the method comprises the steps that an intelligent customer service device collects audio signals sent by user equipment in real time in the process of playing preset voice contents to the user equipment; detecting audio information in the audio signal for indicating the user behavior type; if the audio information is determined to be used for indicating that the user has a question, the playing of the preset voice content is interrupted; and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within the preset time. In the scheme, when the intelligent customer service device detects that the user speaks in the process of playing the voice content, the voice playing volume is reduced or the voice playing is interrupted according to the behavior type of the user. The speech content of the user is collected and recognized, follow-up services such as answering questions, switching manual customer service or ending conversation are provided for the user, and the use experience of the user is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for processing voice data is suitable for an intelligent customer service device, and comprises the following steps:
the method comprises the steps that in the process that an intelligent customer service device plays preset voice contents to user equipment, the intelligent customer service device collects audio signals sent by the user equipment in real time;
detecting audio information which is used for indicating a user behavior type in the audio signal, wherein the user behavior type is that a user has a question or the user is speaking;
if the audio information is determined to be used for indicating that the user has a question, the preset voice content is interrupted to be played;
and if the audio information is determined to be used for indicating that the user is speaking, reducing the volume of playing the preset voice content within a preset time.
2. The method according to claim 1, wherein if the audio information indicates that the user has a question, after the interruption of the playing of the preset audio content, further comprising:
inquiring the user about the user's question and collecting an audio signal sent by the user equipment;
performing voice recognition and emotion recognition by using the audio signal, and determining speaking content of the user and an emotion tag for indicating the speaking emotion of the user;
and answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
3. The method of claim 1, wherein after determining that the audio information indicates that the user is speaking, further comprising:
after the preset time, if audio information used for indicating that the user is speaking is detected to exist in the audio signal, the preset voice content is interrupted to be played;
performing voice recognition and emotion recognition by using the audio signal, and determining speaking content of the user and an emotion tag for indicating the speaking emotion of the user;
and answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
4. The method of claim 2 or 3, wherein said answering said user's question, forwarding human customer service, or ending a call based on said speech content and emotion tags comprises:
if the speaking content and/or the emotion label accord with a preset reply rule, inquiring the user about the user question and replying the user question;
if the speaking content and/or the emotion label accord with a preset switching rule, switching to manual customer service for the user;
if the speaking content and/or the emotion label accord with a preset hang-up rule, ending the conversation with the user equipment;
and executing a reply rule, a forwarding rule or a hang-up rule according to the speaking content, wherein the priority of the reply rule, the forwarding rule or the hang-up rule is higher than that of the emotion label.
5. The method of claim 2 or 3, wherein the using the audio signal for speech recognition and emotion recognition to determine the speaking content of the user and an emotion label indicating the emotion of the user speaking comprises:
and simultaneously inputting the audio signal into a preset voice recognition model and a preset emotion recognition model for voice recognition and emotion recognition, and determining the speaking content of the user and an emotion label for indicating the speaking emotion of the user, wherein the voice recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
6. The method of claim 5, wherein the using the audio signal for speech recognition and emotion recognition before determining the speaking content of the user and the emotion label indicating the emotion of the user speaking, further comprises:
and determining the age of the user based on the user information of the user, and selecting an emotion recognition model corresponding to the age, wherein emotion recognition models corresponding to different age groups are preset.
7. An intelligent customer service device, comprising:
the intelligent customer service device is used for collecting audio signals sent by the user equipment in real time in the process of playing preset voice contents to the user equipment;
the determining unit is used for detecting audio information which is used for indicating a user behavior type in the audio signal, wherein the user behavior type is that a user has a question or the user is speaking;
the first interruption unit is used for interrupting the playing of the preset voice content if the audio information is determined to indicate that the user has a question;
and the adjusting unit is used for reducing the volume of the preset voice content in the preset time if the audio information is determined to indicate that the user is speaking.
8. The intelligent customer service device of claim 7 further comprising:
a second interruption unit, configured to, after the preset time, interrupt playing of the preset voice content if it is detected that audio information indicating that a user is speaking exists in the audio signal;
the recognition unit is used for carrying out voice recognition and emotion recognition by utilizing the audio signal, and determining speaking content of the user and an emotion label for indicating the speaking emotion of the user;
and the processing unit is used for answering the questions of the user, switching to manual customer service or ending the call according to the speaking content and the emotion label.
9. The intelligent customer service device according to claim 8, wherein the determining unit is specifically configured to: and simultaneously inputting the audio signal into a preset voice recognition model and a preset emotion recognition model for voice recognition and emotion recognition, and determining the speaking content of the user and an emotion label for indicating the speaking emotion of the user, wherein the voice recognition model and the emotion recognition model are obtained by training a neural network model based on audio sample data.
10. The intelligent customer service device of claim 8 wherein the processing unit comprises:
the reply module is used for inquiring the user about the question and replying the question of the user if the speaking content and/or the emotion label accord with a preset reply rule;
the switching module is used for switching to the manual customer service for the user if the speaking content and/or the emotion label accord with a preset switching rule;
the hang-up module is used for ending the conversation with the user if the speaking content and/or the emotion label accord with a preset hang-up rule;
and executing a reply rule, a forwarding rule or a hang-up rule according to the speaking content, wherein the priority of the reply rule, the forwarding rule or the hang-up rule is higher than that of the emotion label.
CN201910650265.9A 2019-07-18 2019-07-18 Voice data processing method and intelligent customer service device Pending CN112242135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910650265.9A CN112242135A (en) 2019-07-18 2019-07-18 Voice data processing method and intelligent customer service device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910650265.9A CN112242135A (en) 2019-07-18 2019-07-18 Voice data processing method and intelligent customer service device

Publications (1)

Publication Number Publication Date
CN112242135A true CN112242135A (en) 2021-01-19

Family

ID=74168179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910650265.9A Pending CN112242135A (en) 2019-07-18 2019-07-18 Voice data processing method and intelligent customer service device

Country Status (1)

Country Link
CN (1) CN112242135A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096644A (en) * 2021-03-31 2021-07-09 闽江学院 Telephone voice processing system
CN113488024A (en) * 2021-05-31 2021-10-08 杭州摸象大数据科技有限公司 Semantic recognition-based telephone interruption recognition method and system
WO2023065633A1 (en) * 2021-10-22 2023-04-27 平安科技(深圳)有限公司 Abnormal semantic truncation detection method and apparatus, and device and medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100463706B1 (en) * 2004-04-27 2004-12-29 주식회사 엠포컴 A system and a method for analyzing human emotion based on voice recognition through wire or wireless network
US6882973B1 (en) * 1999-11-27 2005-04-19 International Business Machines Corporation Speech recognition system with barge-in capability
CN103269405A (en) * 2013-05-23 2013-08-28 深圳市中兴移动通信有限公司 Method and device for hinting friendlily
CN203912042U (en) * 2014-06-12 2014-10-29 国家电网公司 Automatic tone tuning customer service telephone
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN105100356A (en) * 2015-07-07 2015-11-25 上海斐讯数据通信技术有限公司 Automatic volume adjustment method and system
CN105895101A (en) * 2016-06-08 2016-08-24 国网上海市电力公司 Speech processing equipment and processing method for power intelligent auxiliary service system
CN107580272A (en) * 2017-07-17 2018-01-12 成都华科威电子科技有限公司 A kind of vehicle audio broadcast sound volume Automatic adjustment method
CN107657017A (en) * 2017-09-26 2018-02-02 百度在线网络技术(北京)有限公司 Method and apparatus for providing voice service
CN108900726A (en) * 2018-06-28 2018-11-27 北京首汽智行科技有限公司 Artificial customer service forwarding method based on speech robot people
CN108961887A (en) * 2018-07-24 2018-12-07 广东小天才科技有限公司 Voice search control method and family education equipment
CN109040449A (en) * 2018-08-06 2018-12-18 维沃移动通信有限公司 A kind of volume adjusting method and terminal device
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
CN109767791A (en) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 A kind of voice mood identification and application system conversed for call center
CN110021308A (en) * 2019-05-16 2019-07-16 北京百度网讯科技有限公司 Voice mood recognition methods, device, computer equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882973B1 (en) * 1999-11-27 2005-04-19 International Business Machines Corporation Speech recognition system with barge-in capability
KR100463706B1 (en) * 2004-04-27 2004-12-29 주식회사 엠포컴 A system and a method for analyzing human emotion based on voice recognition through wire or wireless network
CN103269405A (en) * 2013-05-23 2013-08-28 深圳市中兴移动通信有限公司 Method and device for hinting friendlily
CN203912042U (en) * 2014-06-12 2014-10-29 国家电网公司 Automatic tone tuning customer service telephone
CN105100356A (en) * 2015-07-07 2015-11-25 上海斐讯数据通信技术有限公司 Automatic volume adjustment method and system
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN105895101A (en) * 2016-06-08 2016-08-24 国网上海市电力公司 Speech processing equipment and processing method for power intelligent auxiliary service system
CN107580272A (en) * 2017-07-17 2018-01-12 成都华科威电子科技有限公司 A kind of vehicle audio broadcast sound volume Automatic adjustment method
CN107657017A (en) * 2017-09-26 2018-02-02 百度在线网络技术(北京)有限公司 Method and apparatus for providing voice service
CN108900726A (en) * 2018-06-28 2018-11-27 北京首汽智行科技有限公司 Artificial customer service forwarding method based on speech robot people
CN108961887A (en) * 2018-07-24 2018-12-07 广东小天才科技有限公司 Voice search control method and family education equipment
CN109040449A (en) * 2018-08-06 2018-12-18 维沃移动通信有限公司 A kind of volume adjusting method and terminal device
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
CN109767791A (en) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 A kind of voice mood identification and application system conversed for call center
CN110021308A (en) * 2019-05-16 2019-07-16 北京百度网讯科技有限公司 Voice mood recognition methods, device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096644A (en) * 2021-03-31 2021-07-09 闽江学院 Telephone voice processing system
CN113488024A (en) * 2021-05-31 2021-10-08 杭州摸象大数据科技有限公司 Semantic recognition-based telephone interruption recognition method and system
WO2023065633A1 (en) * 2021-10-22 2023-04-27 平安科技(深圳)有限公司 Abnormal semantic truncation detection method and apparatus, and device and medium

Similar Documents

Publication Publication Date Title
CN112365894B (en) AI-based composite voice interaction method and device and computer equipment
CN112242135A (en) Voice data processing method and intelligent customer service device
CN108874895B (en) Interactive information pushing method and device, computer equipment and storage medium
CN109065052B (en) Voice robot
WO2016194740A1 (en) Speech recognition device, speech recognition system, terminal used in said speech recognition system, and method for generating speaker identification model
CN112313930B (en) Method and apparatus for managing maintenance
CN110705309B (en) Service quality evaluation method and system
CN110995943B (en) Multi-user streaming voice recognition method, system, device and medium
CN115083434B (en) Emotion recognition method and device, computer equipment and storage medium
CN108074571A (en) Sound control method, system and the storage medium of augmented reality equipment
CN110335596A (en) Products Show method, apparatus, equipment and storage medium based on speech recognition
CN114297365B (en) Intelligent customer service system and method based on Internet
CN113840040B (en) Man-machine cooperation outbound method, device, equipment and storage medium
CN111768781A (en) Voice interruption processing method and device
CN109271503A (en) Intelligent answer method, apparatus, equipment and storage medium
CN113505272A (en) Behavior habit based control method and device, electronic equipment and storage medium
CN112767916A (en) Voice interaction method, device, equipment, medium and product of intelligent voice equipment
CN117253478A (en) Voice interaction method and related device
CN110489519B (en) Session method based on session prediction model and related products
CN111510563A (en) Intelligent outbound method and device, storage medium and electronic equipment
CN110086941A (en) Speech playing method, device and terminal device
CN112087726B (en) Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
CN110047473B (en) Man-machine cooperative interaction method and system
CN114067842B (en) Customer satisfaction degree identification method and device, storage medium and electronic equipment
CN110765242A (en) Method, device and system for providing customer service information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination