CN112201277A - Voice response method, device and equipment and computer readable storage medium - Google Patents

Voice response method, device and equipment and computer readable storage medium Download PDF

Info

Publication number
CN112201277A
CN112201277A CN202011052933.7A CN202011052933A CN112201277A CN 112201277 A CN112201277 A CN 112201277A CN 202011052933 A CN202011052933 A CN 202011052933A CN 112201277 A CN112201277 A CN 112201277A
Authority
CN
China
Prior art keywords
voice
user
intonation
response
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011052933.7A
Other languages
Chinese (zh)
Other versions
CN112201277B (en
Inventor
申亚坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202011052933.7A priority Critical patent/CN112201277B/en
Publication of CN112201277A publication Critical patent/CN112201277A/en
Application granted granted Critical
Publication of CN112201277B publication Critical patent/CN112201277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a voice response method, a voice response device, voice response equipment and a computer readable storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining user voice, determining a tone type corresponding to the user voice according to voice characteristics and voice content of the user voice, generating response voice corresponding to the user voice based on the tone type corresponding to the user voice and the voice content, and finally broadcasting the response voice. The broadcasted response voice is obtained according to the tone type and the voice content of the voice of the user, so that the broadcasted response voice can be different as long as the tone type of the voice of the user is different, and the personalized response according to the voice of the user is realized, so that the experience of the user can be improved. In addition, the tone type corresponding to the user voice is determined according to the voice characteristics of the user voice and two dimensionalities of the voice content, so that the tone type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.

Description

Voice response method, device and equipment and computer readable storage medium
Technical Field
The present application relates to the field of voice processing, and in particular, to a method and an apparatus for voice response, an electronic device, and a computer-readable storage medium.
Background
In many service scenarios, intelligent voice response devices are provided for voice interaction with users. However, many current intelligent voice response devices have a single response mode, for example, a uniform tone response mode is adopted for response, so that personalized response cannot be performed according to different user voices, and the service experience of users cannot be improved.
Disclosure of Invention
The application provides a voice response method and device, electronic equipment and a computer readable storage medium, and aims to solve the problem of how to perform personalized response according to user voice in application of voice response equipment.
In order to achieve the above object, the present application provides the following technical solutions:
a method of voice response comprising:
acquiring user voice;
determining a tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and broadcasting the response voice.
Optionally, in the foregoing method, the intonation types include at least two specified intonation types, and any one of the intonation types is preset according to the voice feature of the voice of the historical user and the voice content;
the speech features include at least a pitch feature and a pitch feature.
Optionally, the determining, according to the voice feature of the user voice and the voice content, a tone type corresponding to the user voice includes:
inputting the user voice into a pre-trained Bayes classification model, and enabling the Bayes classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; enabling the voice classification model to determine the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voices output by the Bayesian model and the voice classification model;
and if the intonation type output by the Bayes classification model and the intonation type output by the voice classification model are the same intonation type, taking the same intonation type as the intonation type corresponding to the voice of the user.
The above method, optionally, further includes:
and if the intonation types output by the Bayes classification model and the intonation types output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type.
Optionally, in the method, the bayesian classification model is obtained by training according to a voice training sample, where the voice training sample carries the voice feature;
the Bayesian classification model determines the intonation type corresponding to the user voice as follows: and the Bayesian classification model calculates the probability that the user voice belongs to each intonation type respectively according to the voice characteristics of the user voice, and determines the intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
Optionally, in the foregoing method, the speech classification model is a GA-BP neural network model, and the GA-BP neural network model is a model obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial and error method;
the optimization of the initial BP neural network model is as follows: and training and learning the initial weight and the threshold of each layer of the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and threshold of each layer to obtain the optimized BP neural network model.
Optionally, in the method, the generating, based on the intonation type corresponding to the user voice and the voice content, a response voice corresponding to the user voice includes:
determining responsive voice content based on the voice content;
and generating the response voice with the voice content being the response voice content and the tone type being the tone type corresponding to the user voice.
An apparatus for voice response, comprising:
an acquisition unit configured to acquire a user voice;
the determining unit is used for determining the tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
a generating unit, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and the broadcasting unit is used for broadcasting the response voice.
A voice response apparatus comprising: a processor and a memory for storing a program; the processor is used for running the program to realize the voice response method.
A computer-readable storage medium having stored thereon instructions which, when executed on a computer, cause the computer to perform the above-described method of voice response.
The method and the device comprise the following steps: the method comprises the steps of obtaining user voice, determining a tone type corresponding to the user voice according to voice characteristics and voice content of the user voice, generating response voice corresponding to the user voice based on the tone type corresponding to the user voice and the voice content, and finally broadcasting the response voice. The broadcasted response voice is obtained according to the tone type and the voice content of the voice of the user, so that the broadcasted response voice can be different as long as the tone type of the voice of the user is different, and the personalized response according to the voice of the user is realized, so that the experience of the user can be improved.
In addition, the tone type corresponding to the user voice is determined according to the voice characteristics of the user voice and two dimensionalities of the voice content, so that the tone type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for voice response provided by an embodiment of the present application;
FIG. 2 is a flowchart of a method for determining a type of intonation corresponding to a user's voice according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a voice response apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a voice response apparatus according to an embodiment of the present application.
Detailed Description
In many occasions, the intelligent voice broadcasting equipment is adopted to carry out voice interaction on the user, however, at present, a plurality of intelligent voice response equipment only pay attention to the content of the voice of the user and do not pay attention to the tone used for the voice, so that the user generally adopts a uniform tone response mode to respond, the user cannot carry out personalized response according to different user voices, and the service experience of the user cannot be improved.
Therefore, the embodiment of the present application provides a voice response method, which aims to respond to a user by combining a user voice and a voice content of the user voice, so as to implement a personalized response according to different user voices.
In the present application, the speech content of the user speech refers to the speech text content corresponding to the user speech.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The execution main body of the embodiment is an intelligent voice broadcasting device with a voice processing function, such as an intelligent voice robot.
Fig. 1 is a method for responding to a voice according to an embodiment of the present application, and the method may include the following steps:
and S101, acquiring the voice of the user.
The user's pronunciation is user's pronunciation, and intelligent voice broadcast equipment can be in the pronunciation collection scope under the running state, and the collection obtains user's pronunciation.
S102, determining the tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice.
In this embodiment, the voice feature is information that can be used to describe the mood and emotion of the user voice, and the voice feature includes a tone feature, an amplitude feature, a tone color feature, and the like.
The tone type includes at least two designated tone types, the voice feature and the voice content of the voice of the historical user are preset, that is, the designated tone type is set according to the voice information of the mood and the emotion of the voice of the historical user and the voice content of the voice of the historical user, and the designated tone type can be a cheerful and funny interactive tone type, a mild and formal interactive tone type and the like, for example, the cheerful and funny interactive tone type can be a type in which the tone or the tone amplitude of the voice is changed greatly and the correlation between the voice content and the service inquiry problem is weak. The gentle and formal interactive tone type can be a type with less change of tone or amplitude of voice and stronger correlation between voice content and service inquiry problem.
The specific embodiment of this step can refer to the flowchart shown in fig. 2.
S103, generating response voice corresponding to the user voice based on the tone type and the voice content corresponding to the user voice.
The specific implementation mode of the step comprises a step A1 and a step A2:
step a1, based on the speech content of the user's speech, determines the responsive speech content.
The response voice content corresponding to the voice content of the user voice is determined according to the voice content of the user voice, for example, the corresponding response voice content may be determined according to a keyword included in the voice content.
Certainly, in the step, the response voice content may also be determined based on the voice content and the tone type of the user voice, that is, the response voice content of the response voice is not only related to the voice content of the user voice, but also related to the tone type of the user voice, that is, the voice content of the response voice may be different under the condition that the voice content of the user voice is the same and the tone type is different, so that the step has better personalized characteristics.
Step a2, generating the response voice with the voice content being the response voice content and the tone type being the tone type corresponding to the user voice.
The tone type of the response voice is the same as that of the user voice, so that the personalized effect of the response voice can be enhanced.
And S104, broadcasting response voice.
For example, the intelligent voice broadcasting device calls a preset voice broadcaster to broadcast the response voice.
The method provided by the embodiment comprises the following steps: the method comprises the steps of obtaining user voice, determining a tone type corresponding to the user voice according to voice characteristics and voice content of the user voice, generating response voice corresponding to the user voice based on the tone type corresponding to the user voice and the voice content, and finally broadcasting the response voice. The broadcasted response voice is obtained according to the tone type and the voice content of the voice of the user, so that the broadcasted response voice can be different as long as the tone type of the voice of the user is different, and the personalized response according to the voice of the user is realized, so that the experience of the user can be improved.
In addition, the tone type corresponding to the user voice is determined according to the voice characteristics of the user voice and two dimensionalities of the voice content, so that the tone type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
Fig. 2 is a specific implementation manner of determining, by S102 according to the speech feature and the speech content of the user speech, the intonation type corresponding to the user speech, in the above embodiment, which may include the following steps:
s201, inputting the user voice into a pre-trained Bayes classification model, and enabling the Bayes classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice.
In this step, the Bayesian classification model is obtained by training according to the voice training sample. The voice training sample carries a plurality of voice characteristics, wherein the prior art can be referred to in the training method for obtaining the Bayesian classification model by training the training sample.
The pre-trained Bayes classification model can extract tone characteristics of the user voice, and tone types corresponding to the user voice are determined based on the tone characteristics of the user voice.
The method specifically comprises the following steps: and the Bayes classification model calculates the probability that the user voice belongs to each appointed intonation type respectively according to the voice characteristics of the user voice, and determines the appointed intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
For example, X represents the feature set of all the intonation features of the user's voice, and Y1 represents the first intonation type, then the probability that the user's voice belongs to the first intonation type is calculated by substituting all the intonation features of the user's voice into a probability formula.
Wherein, the probability formula is:
Figure BDA0002710099510000071
p (Y1| X) is the probability that the user's speech belongs to the common first intonation type Y1, A, given the user's speech feature set XiRepresenting the ith feature in the feature set X corresponding to the user's voice, n is the number of features in the feature set X, P (Y1) is the probability that any one intonation type belongs to the first intonation type Y1, P (A)i| Y1) is the condition that the intonation type is the first intonation type Y1, the corresponding characteristic is AiIs the probability that user speech occurs in all of the specified intonation types, P (A)i) Having feature A for any one speechiThe probability of (a) of (b) being,
wherein P (Y1) and P (A)i| Y1), and P (A)i) The method is obtained by pre-estimating a plurality of feature sets X with determined tone types. The larger the number of the feature set X is, the more accurate the intonation type corresponding to the feature set X is, and the estimated P (Y1) and P (A) arei| Y1), and P (A)i) The more accurate.
S202, recognizing and obtaining the voice content corresponding to the user voice.
The voice content of the user voice can be obtained by adopting the existing voice recognition method.
S203, inputting the voice content of the user voice into a pre-trained voice classification model, and enabling the voice classification model to determine the tone type corresponding to the user voice according to the voice content of the user voice.
The Bayes classification model determines the tone type corresponding to the user voice based on the voice characteristics used for the voice, and the voice classification model determines the tone type corresponding to the user voice according to the voice content of the user voice.
Optionally, the speech classification model is a GA-BP neural network model. And the GA-BP neural network model is obtained by optimizing the initial BP neural network model. The trained voice classification model can obtain the tone type corresponding to the input voice content.
The number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the tone type, and the number of hidden layer nodes is determined based on a trial and error method. The voice training sample is voice content of the historical user voice carrying tone types.
Optimizing the initial BP neural network model as follows: training and learning the initial weight and the threshold of each of the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, determining the optimal initial weight and threshold of each layer, and obtaining the optimized BP neural network model. For a specific optimization process, reference may be made to the prior art.
And S204, obtaining the intonation types corresponding to the user voice output by the Bayes model and the voice classification model respectively.
S205, judging whether the intonation types corresponding to the user voices output by the Bayesian model and the voice classification model are the same or not. If the two are the same, S206 is executed, and if the two are not the same, S207 is executed.
And S206, taking the same tone type as the tone type corresponding to the voice of the user.
The speech type corresponding to the user speech output by the Bayesian model and the speech classification model is the same, which indicates that the probability that the same speech type is the correct speech type of the user speech is very high.
And S207, determining the tone type corresponding to the voice of the user as a preset default tone type.
For example, a default intonation type may be preset as a flat intonation type, and the intonation type corresponding to the user voice is determined as a flat formal interactive intonation type when the intonation types corresponding to the user voice output by the bayesian model and the voice classification model are different.
In the method provided by this embodiment, the bayesian classification model determines the intonation type corresponding to the user voice based on the voice feature used for the voice, and the voice classification model determines the intonation type corresponding to the user voice according to the voice content of the user voice, which is equivalent to determining the intonation type used for the voice from different dimensions.
Fig. 3 is a schematic structural diagram of a voice response apparatus according to an embodiment of the present application, including: a processor 301 and a memory 302, the memory for storing a program and the processor for executing the program to implement the method of voice response provided herein.
The intelligent voice response equipment can be placed at each service point and used for providing automatic voice response service for users. For example, the intelligent voice response equipment can be used for a service network point for transacting business, and the service experience of the user is improved by providing joyful interaction with the user and business transaction-like interaction.
For example, when the user voice is of a tone type of cheerful tone, the user is likely to want to have informal comma-like interaction with the intelligent voice device, and when the user voice is of a tone type of flat tone, the user is likely to want to have formal business interaction with the intelligent voice device.
Correspondingly, the tone type of the user voice is specified as a cheerful tone type or a flat tone type in advance, the intelligent voice response equipment is configured in advance, when the tone type of the user voice is determined to be the cheerful tone type, the user is responded by adopting the cheerful tone of fun, and when the tone type of the user voice is determined to be the flat tone type, the user is responded by adopting the flat tone. The intelligent voice response equipment can improve the service experience of the user by providing two different interaction modes.
Fig. 4 is a schematic structural diagram of a voice response apparatus according to an embodiment of the present application, including:
an obtaining unit 401, configured to obtain a user voice;
a determining unit 402, configured to determine, according to a voice feature and a voice content of the user voice, a tone type corresponding to the user voice;
a generating unit 403, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and the broadcasting unit 404 is used for broadcasting the response voice.
The tone type comprises at least two appointed tone types, any one tone type is preset according to the voice characteristics and the voice content of the voice of the historical user, and the voice characteristics at least comprise tone characteristics and tone amplitude characteristics.
The specific implementation manner of determining the intonation type corresponding to the user voice by the determining unit 402 according to the voice feature and the voice content of the user voice is as follows:
inputting the user voice into a pre-trained Bayes classification model, and enabling the Bayes classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; enabling the voice classification model to determine the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voices output by the Bayesian model and the voice classification model;
if the intonation type output by the Bayes classification model and the intonation type output by the voice classification model are the same intonation type, taking the same intonation type as the intonation type corresponding to the voice of the user;
and if the intonation types output by the Bayes classification model and the intonation types output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type.
Optionally, the bayesian classification model is obtained by training according to a voice training sample, where the voice training sample carries the voice feature; the Bayesian classification model determines the intonation type corresponding to the user voice as follows: and the Bayesian classification model calculates the probability that the user voice belongs to each intonation type respectively according to the voice characteristics of the user voice, and determines the intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
Optionally, the speech classification model is a GA-BP neural network model, and the GA-BP neural network model is a model obtained by optimizing an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial and error method;
the optimization of the initial BP neural network model is as follows: and training and learning the initial weight and the threshold of each layer of the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and threshold of each layer to obtain the optimized BP neural network model.
Optionally, the specific implementation manner of generating, by the generating unit 403, the response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content is as follows:
determining responsive voice content based on the voice content;
and generating the response voice with the voice content as response voice content and the tone type as the tone type corresponding to the voice of the user.
The device that this application embodiment provided includes: the method comprises the steps of obtaining user voice, determining a tone type corresponding to the user voice according to voice characteristics and voice content of the user voice, generating response voice corresponding to the user voice based on the tone type corresponding to the user voice and the voice content, and finally broadcasting the response voice. The broadcasted response voice is obtained according to the tone type and the voice content of the voice of the user, so that the broadcasted response voice can be different as long as the tone type of the voice of the user is different, and the personalized response according to the voice of the user is realized, so that the experience of the user can be improved.
In addition, the tone type corresponding to the user voice is determined according to the voice characteristics of the user voice and two dimensionalities of the voice content, so that the tone type corresponding to the user voice has higher accuracy, and the accuracy of the broadcasted response voice can be improved.
The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of voice response of the present application, namely to perform the steps of:
acquiring user voice;
determining a tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and broadcasting the response voice.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of voice response, comprising:
acquiring user voice;
determining a tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
generating response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and broadcasting the response voice.
2. The method according to claim 1, wherein the intonation types include at least two specified intonation types, any one of the intonation types being preset according to the voice characteristics of the historical user voice and the voice content;
the speech features include at least a pitch feature and a pitch feature.
3. The method according to claim 2, wherein the determining the type of intonation corresponding to the user speech according to the speech feature of the user speech and the speech content comprises:
inputting the user voice into a pre-trained Bayes classification model, and enabling the Bayes classification model to determine the intonation type corresponding to the user voice according to the voice characteristics of the user voice;
recognizing and obtaining the voice content corresponding to the user voice;
inputting the voice content of the user voice into a pre-trained voice classification model; enabling the voice classification model to determine the intonation type corresponding to the user voice according to the voice content of the user voice;
respectively acquiring the intonation types corresponding to the user voices output by the Bayesian model and the voice classification model;
and if the intonation type output by the Bayes classification model and the intonation type output by the voice classification model are the same intonation type, taking the same intonation type as the intonation type corresponding to the voice of the user.
4. The method of claim 3, further comprising:
and if the intonation types output by the Bayes classification model and the intonation types output by the voice classification model are different intonation types, determining the intonation type corresponding to the user voice as a preset default intonation type.
5. The method according to claim 3, wherein the Bayesian classification model is trained based on speech training samples, the speech training samples carrying the speech features;
the Bayesian classification model determines the intonation type corresponding to the user voice as follows: and the Bayesian classification model calculates the probability that the user voice belongs to each intonation type respectively according to the voice characteristics of the user voice, and determines the intonation type corresponding to the maximum probability value as the intonation type corresponding to the user voice.
6. The method of claim 3, wherein the speech classification model is a GA-BP neural network model, and the GA-BP neural network model is optimized from an initial BP neural network model;
the number of input layer nodes of the initial BP neural network model is determined according to the voice content length of a voice training sample, the number of output layer nodes is determined according to the intonation type, and the number of hidden layer nodes is determined based on a trial and error method;
the optimization of the initial BP neural network model is as follows: and training and learning the initial weight and the threshold of each layer of the input layer, the hidden layer and the output layer of the initial BP neural network model according to preset sample data and a genetic algorithm, and determining the optimal initial weight and threshold of each layer to obtain the optimized BP neural network model.
7. The method according to claim 1, wherein generating a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content comprises:
determining responsive voice content based on the voice content;
and generating the response voice with the voice content being the response voice content and the tone type being the tone type corresponding to the user voice.
8. An apparatus for voice response, comprising:
an acquisition unit configured to acquire a user voice;
the determining unit is used for determining the tone type corresponding to the user voice according to the voice characteristics and the voice content of the user voice;
a generating unit, configured to generate a response voice corresponding to the user voice based on the intonation type corresponding to the user voice and the voice content;
and the broadcasting unit is used for broadcasting the response voice.
9. A voice response apparatus, characterized by comprising: a processor and a memory for storing a program; the processor is configured to execute the program to implement the method of voice response according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of voice answering of any one of claims 1-7.
CN202011052933.7A 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium Active CN112201277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011052933.7A CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011052933.7A CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112201277A true CN112201277A (en) 2021-01-08
CN112201277B CN112201277B (en) 2024-03-22

Family

ID=74008030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011052933.7A Active CN112201277B (en) 2020-09-29 2020-09-29 Voice response method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112201277B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
CN109447354A (en) * 2018-10-31 2019-03-08 中国银行股份有限公司 A kind of intelligent bank note distribution method and device based on GA-BP neural network
KR20190088126A (en) * 2018-01-05 2019-07-26 서울대학교산학협력단 Artificial intelligence speech synthesis method and apparatus in foreign language
CN110110169A (en) * 2018-01-26 2019-08-09 上海智臻智能网络科技股份有限公司 Man-machine interaction method and human-computer interaction device
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190088126A (en) * 2018-01-05 2019-07-26 서울대학교산학협력단 Artificial intelligence speech synthesis method and apparatus in foreign language
CN108334583A (en) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 Affective interaction method and device, computer readable storage medium, computer equipment
CN110110169A (en) * 2018-01-26 2019-08-09 上海智臻智能网络科技股份有限公司 Man-machine interaction method and human-computer interaction device
CN109447354A (en) * 2018-10-31 2019-03-08 中国银行股份有限公司 A kind of intelligent bank note distribution method and device based on GA-BP neural network
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Also Published As

Publication number Publication date
CN112201277B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN108428447B (en) Voice intention recognition method and device
CN111428010B (en) Man-machine intelligent question-answering method and device
CN111931513A (en) Text intention identification method and device
CN110019742B (en) Method and device for processing information
CN111858854B (en) Question-answer matching method and relevant device based on historical dialogue information
CN111191450A (en) Corpus cleaning method, corpus entry device and computer-readable storage medium
CN111161726B (en) Intelligent voice interaction method, device, medium and system
CN112686051B (en) Semantic recognition model training method, recognition method, electronic device and storage medium
CN111583906A (en) Role recognition method, device and terminal for voice conversation
CN112632242A (en) Intelligent conversation method and device and electronic equipment
CN110457454A (en) A kind of dialogue method, server, conversational system and storage medium
CN111639162A (en) Information interaction method and device, electronic equipment and storage medium
CN111625636B (en) Method, device, equipment and medium for rejecting man-machine conversation
CN115640398A (en) Comment generation model training method, comment generation device and storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN110209768A (en) The problem of automatic question answering treating method and apparatus
WO2022141142A1 (en) Method and system for determining target audio and video
CN109271637B (en) Semantic understanding method and device
CN112201277B (en) Voice response method, device, equipment and computer readable storage medium
CN113035179B (en) Voice recognition method, device, equipment and computer readable storage medium
CN111984769B (en) Information processing method and device of response system
CN114049875A (en) TTS (text to speech) broadcasting method, device, equipment and storage medium
CN111723198A (en) Text emotion recognition method and device and storage medium
CN118173093B (en) Speech dialogue method and system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant