CN108184032A - The method of servicing and device of a kind of customer service system - Google Patents
The method of servicing and device of a kind of customer service system Download PDFInfo
- Publication number
- CN108184032A CN108184032A CN201611116110.XA CN201611116110A CN108184032A CN 108184032 A CN108184032 A CN 108184032A CN 201611116110 A CN201611116110 A CN 201611116110A CN 108184032 A CN108184032 A CN 108184032A
- Authority
- CN
- China
- Prior art keywords
- speech
- synthesized
- contact staff
- text
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/527—Centralised call answering arrangements not requiring operator intervention
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Abstract
The invention discloses the method for servicing and device of a kind of customer service system, including:Receive phonetic synthesis instruction;It is instructed according to the phonetic synthesis received, determines speech text to be synthesized;According to the speech text to be synthesized determined and in advance according to the speech parameter model library of contact staff's tone color foundation currently answered, the speech with contact staff's tamber characteristic of speech text to be synthesized is synthesized;The instruction of contact staff is received, and the sentence being made of synthesis voice and/or contact staff's artificial speech is played according to instruction.Due to play to user be have contact staff's tamber characteristic synthesis voice and/or contact staff's artificial speech composition sentence, therefore, considerably reduce language amount of the contact staff during manual service, reduce the tired pressure of contact staff, and, user can be defaulted as contact staff always with its verbal communication, so as to improve the service quality of customer service system, enhance user experience.
Description
Technical field
The present invention relates to the method for servicing and device of communication technique field more particularly to a kind of customer service system.
Background technology
Movement at present, unicom, the customer service system of three big communication company of telecommunications, usually by machine customer service and artificial customer service group
Into.During telephone service, when receiving conversation message from the user, first serviced by machine customer service.Work as user
When thinking that machine customer service can not solve the problems, such as its proposition, then artificial customer service is manually selected, seeked advice to artificial customer service.
In current this customer service system, the speech comparison of machine customer service is dull, sound no natural language that
Sample is vivid, also, machine customer service does not have adaptability to changes when participating in the cintest, can solve the problems, such as it is limited, therefore, in telephone service
Cheng Zhong, artificial customer service occupy consequence.But artificial customer service needs continuous work 6 hours or more in shifts every time, and
The problem of needing not had to according to user during telephone service and situation provide a large amount of answer and explain, are very easy to feel tired
Labor.Situations such as fatigue can cause the cacoepy of artificial customer service or misread so as to reduce the quality of customer service, influences
User experience.
Therefore, the service quality of customer service system how is improved, and then promotes user experience, is that the technology of urgent need to resolve is asked
Topic.
Invention content
The embodiment of the present invention provides a kind of method of servicing and device of customer service system, in the prior art to solve
The problem of how improving the service quality of customer service system, and then promoting user experience.
An embodiment of the present invention provides a kind of method of servicing of customer service system, including:
Receive phonetic synthesis instruction;
It is instructed according to the phonetic synthesis received, determines speech text to be synthesized;
It is established according to the speech text to be synthesized determined and in advance according to the contact staff's tone color currently answered
Speech parameter model library, synthesize the speech with contact staff's tamber characteristic of the speech text to be synthesized;
The instruction of the contact staff is received, and is played according to described instruction and synthesizes voice and/or the customer service by described
The sentence of personnel's artificial speech composition.
In a kind of possible realization method, in above-mentioned method of servicing provided in an embodiment of the present invention, the basis connects
The phonetic synthesis instruction received, determines speech text to be synthesized, specifically includes:
Determine that the phonetic synthesis received instructs whether corresponding speech text to be synthesized is standard words term sentence;
If so, corresponding standard words term sentence is instructed to be determined as the speech text to be synthesized the phonetic synthesis;
If it is not, then the formula words term sentence of filling a vacancy after the text for inserting the phonetic synthesis instruction carrying is waited to close as described in
Into speech text.
In a kind of possible realization method, in above-mentioned method of servicing provided in an embodiment of the present invention, the basis is true
The speech text to be synthesized made and the speech parameter model established in advance according to the contact staff's tone color currently answered
Library synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized, specifically includes:
The speech text to be synthesized determined is segmented using text analyzer, is obtained and the words to be synthesized
The corresponding word mark file of sound text;
The speech parameter mould that file is marked according to the word and is established in advance according to the contact staff's tone color currently answered
Type library determines speech characteristic parameter corresponding with the speech text to be synthesized;
According to the speech characteristic parameter, synthesize the speech text to be synthesized has contact staff's tamber characteristic
Speech.
It is described according to institute in above-mentioned method of servicing provided in an embodiment of the present invention in a kind of possible realization method
Predicate language marks file and the speech parameter model library established in advance according to contact staff's tone color for currently answering, it is determining with it is described
The corresponding speech characteristic parameter of speech text to be synthesized, specifically includes:
In the speech parameter model library established in advance according to the contact staff's tone color currently answered, search and the word
Mark the corresponding speech parameter model of each word in file;
According to the corresponding speech parameter model of each word, determined and the speech text to be synthesized by parameter generation algorithm
The LF0 that corresponding fundamental frequency information conversion log domains obtain, aperiodic ingredient spectrum information average value BAP on different frequency bands and
The 18 dimension line spectrum pairs parameter LSP that sound channel spectrum information extracts in frame.
It is described according to institute in above-mentioned method of servicing provided in an embodiment of the present invention in a kind of possible realization method
Speech characteristic parameter is stated, synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized, it is specific to wrap
It includes:
Mixed excitation source corresponding with the speech text to be synthesized is formed using the LF0 determined and the BAP;
The mixed excitation source input filter that will be determined, and pass through the LSP determined to the wave filter
It is controlled, synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized.
In a kind of possible realization method, in above-mentioned method of servicing provided in an embodiment of the present invention, further include:Pass through
Following manner establishes the speech parameter model library with contact staff's tone color:
The raw tone wave file included in the speech database of contact staff is decomposed, obtains the raw tone waveform
The fundamental frequency information of each syllable, aperiodic ingredient spectrum information harmony road spectrum information in file;
The fundamental frequency information of each syllable is converted to log domains and obtains LF0;
The aperiodic ingredient spectrum information of each syllable is averaged to obtain respectively in preset each frequency band
BAP;
The sound channel spectrum information of each syllable is extracted into 18 dimension line spectrum pairs parameter LSP in frame;
According to the corresponding word mark file of the raw tone wave file, LF0, the BAP determined to each syllable
With LSP speech parameter model is established according to hidden Markov model;
After carrying out Model tying and model training to established each speech parameter model, obtain that there is the contact staff
The speech parameter model library of tone color.
The embodiment of the present invention additionally provides a kind of service unit of customer service system, including:
Receiving unit, for receiving phonetic synthesis instruction;
Determination unit, for according to the phonetic synthesis instruction received, determining speech text to be synthesized;
Synthesis unit, for according to the speech text to be synthesized determined and in advance according to the visitor currently to answer
The speech parameter model library of personnel's tone color foundation is taken, synthesize the speech text to be synthesized has contact staff's tone color spy
The speech of sign;
Broadcast unit for receiving the instruction of the contact staff, and plays according to described instruction and synthesizes voice by described
And/or the sentence of contact staff's artificial speech composition.
In a kind of possible realization method, in above-mentioned service unit provided in an embodiment of the present invention, the determining list
Member instructs whether corresponding speech text to be synthesized is standard words term specifically for the phonetic synthesis for determining to receive
Sentence;If so, corresponding standard words term sentence is instructed to be determined as the speech text to be synthesized the phonetic synthesis;If it is not,
The formula of filling a vacancy after the text for inserting the phonetic synthesis instruction carrying is then talked about into term sentence as the speech text to be synthesized.
In a kind of possible realization method, in above-mentioned service unit provided in an embodiment of the present invention, the synthesis is single
Member, including:
First synthesizing subunit, for being divided using text analyzer the speech text to be synthesized determined
Word obtains word mark file corresponding with the speech text to be synthesized;
Second synthesizing subunit, for marking file and in advance according to the contact staff's sound currently answered according to the word
The speech parameter model library that color is established determines speech characteristic parameter corresponding with the speech text to be synthesized;
Third synthesizing subunit, for according to the speech characteristic parameter, synthesizing having for the speech text to be synthesized
The speech of contact staff's tamber characteristic.
In a kind of possible realization method, in above-mentioned service unit provided in an embodiment of the present invention, described second closes
Into subelement, specifically in the speech parameter model library established in advance according to the contact staff's tone color currently answered, searching
Speech parameter model corresponding with word each in word mark file;According to the corresponding speech parameter model of each word, lead to
It crosses parameter generation algorithm and determines the LF0 that fundamental frequency information conversion log domains corresponding with the speech text to be synthesized obtain, it is aperiodic
The 18 dimension line spectrum pairs parameters that ingredient spectrum information average value BAP on different frequency bands and sound channel spectrum information extract in frame
LSP。
In a kind of possible realization method, in above-mentioned service unit provided in an embodiment of the present invention, the third is closed
It is corresponding with the speech text to be synthesized mixed specifically for being formed using the LF0 and the BAP that determine into subelement
Close driving source;The mixed excitation source input filter that will be determined, and pass through the LSP determined to the wave filter
It is controlled, synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized.
In a kind of possible realization method, in above-mentioned service unit provided in an embodiment of the present invention, further include:Modeling
Unit for decomposing the raw tone wave file included in the speech database of contact staff, obtains the raw tone wave
The fundamental frequency information of each syllable, aperiodic ingredient spectrum information harmony road spectrum information in shape files;By the fundamental frequency of each syllable
Information is converted to log domains and obtains LF0;The aperiodic ingredient spectrum information of each syllable is taken respectively in preset each frequency band
Averagely it is worth to BAP;The sound channel spectrum information of each syllable is extracted into 18 dimension line spectrum pairs parameter LSP in frame;According to described
The corresponding word mark file of raw tone wave file, to LF0, BAP and LSP that each syllable is determined according to hidden Ma Erke
Husband's model foundation speech parameter model;After carrying out Model tying and model training to established each speech parameter model, obtain
Speech parameter model library with contact staff's tone color.
The present invention has the beneficial effect that:
The method of servicing and device of customer service system provided in an embodiment of the present invention, including:Receive phonetic synthesis instruction;According to
The phonetic synthesis instruction received, determines speech text to be synthesized;It presses according to the speech text to be synthesized determined and in advance
According to the speech parameter model library that the contact staff's tone color currently answered is established, synthesize speech text to be synthesized has contact staff
The speech of tamber characteristic;The instruction of contact staff is received, and is played according to instruction by synthesizing voice and/or the artificial language of contact staff
The sentence of sound composition.Due to according to the speech parameter model library established in advance according to the contact staff's tone color currently answered, obtaining
The speech with contact staff's tamber characteristic of speech text to be synthesized, and can will be by synthesizing according to the instruction of contact staff
The sentence of voice and/or contact staff's artificial speech composition plays to user, therefore, it is possible to reduce contact staff is in manual service
Language amount in the process, reduces the tired pressure of contact staff, and then improves the service quality of customer service system, enhances user
Experience.Also, play to user is the speech with contact staff's tamber characteristic, sounds vivid so that user feels
Know less than the participation for having machine more in interactive process, be defaulted as contact staff always with its verbal communication, therefore, further
The service quality of customer service system is improved, enhances user experience.
Description of the drawings
Fig. 1 is the flow chart of the method for servicing of customer service system provided in an embodiment of the present invention;
Fig. 2 is the stream for the speech with contact staff's tamber characteristic that speech text to be synthesized is synthesized in the embodiment of the present invention
Cheng Tu;
Fig. 3 is the flow chart that the speech parameter model library with contact staff's tone color is established in the embodiment of the present invention;
Fig. 4 is the structure diagram of the service unit of customer service system provided in an embodiment of the present invention;
Fig. 5 is the parameterised speech synthesis system frame provided in an embodiment of the present invention based on hidden Markov model;
Fig. 6 is the signal that the service unit provided in an embodiment of the present invention by customer service system assists that contact staff services
Figure.
Specific embodiment
Below in conjunction with the accompanying drawings, the specific embodiment party of the method for servicing to customer service system provided in an embodiment of the present invention and device
Formula is described in detail.
The method of servicing of a kind of customer service system provided in an embodiment of the present invention, as shown in Figure 1, specifically including following steps:
S101, phonetic synthesis instruction is received;
S102, it is instructed according to the phonetic synthesis received, determines speech text to be synthesized;
The speech text to be synthesized and built in advance according to the contact staff's tone color currently answered that S103, basis are determined
Vertical speech parameter model library synthesizes the speech with contact staff's tamber characteristic of speech text to be synthesized;
S104, the instruction for receiving contact staff, and played according to instruction by synthesizing voice and/or contact staff's artificial speech
The sentence of composition.
Specifically, in above-mentioned method of servicing provided in an embodiment of the present invention, due to according in advance according to currently answering
The speech parameter model library that contact staff's tone color is established, obtained speech text to be synthesized has contact staff's tamber characteristic
Speech, and can be played the sentence being made of synthesis voice and/or contact staff's artificial speech according to the instruction of contact staff
To user, therefore, it is possible to reduce language amount of the contact staff during manual service reduces the tired pressure of contact staff,
And then the service quality of customer service system is improved, enhance user experience.Also, play to user is with contact staff's sound
The speech of color characteristic sounds vivid so that user is perceived less than the participation for having machine more in interactive process, is defaulted as
Contact staff always with its verbal communication, therefore, further improve the service quality of customer service system, enhance user's body
It tests.
In the specific implementation, in above-mentioned method of servicing provided in an embodiment of the present invention, step S102 is according to receiving
Phonetic synthesis instructs, and determines speech text to be synthesized, can specifically be accomplished by the following way:
Determine that the phonetic synthesis received instructs whether corresponding speech text to be synthesized is standard words term sentence;
If so, corresponding standard words term sentence is instructed to be determined as speech text to be synthesized phonetic synthesis;
If it is not, then using the formula words term sentence of filling a vacancy after the text for inserting phonetic synthesis instruction carrying as speech to be synthesized text
This.
Specifically, in above-mentioned method of servicing provided in an embodiment of the present invention, in the specific embodiment of step S102
Standard words term sentence is some basic exchange sentences that contact staff uses when being serviced for subscriber phone, such as:" it is very glad
Serviced for you ", " would you please input ID card No. ".Also, during standard words term sentence is played to user, if user
Either party speaks with contact staff personnel, then can stop speech play at any time, to ensure between contact staff and user
It is good interactive, improve user experience.
Specifically, in above-mentioned method of servicing provided in an embodiment of the present invention, in the specific embodiment of step S102
Formula of filling a vacancy talks about term sentence, is the sentence for needing to carry out group sentence according to the real consumption situation or traffic conditions of user.Such as:" you
Current credit balance is XX members ", wherein, XX is the data in charge system, needs to be inserted in fixed clause, then lead to
It crosses personalized speech synthetic technology and carries out online synthesis voice output.Certainly, formula of filling a vacancy talks about term sentence, can also there is other realizations
Mode, such as:Still by taking " your current credit balance is XX members " as an example, only " your current credit balance is member " can be carried out
Phonetic synthesis exports, and credit balance " XX " can be said by contact staff oneself, not limited herein.
In the specific implementation, in above-mentioned method of servicing provided in an embodiment of the present invention, step S103 is according to determining
Speech text to be synthesized and the speech parameter model library established in advance according to the contact staff's tone color currently answered, synthesis are treated
The speech with contact staff's tamber characteristic of speech text is synthesized, as shown in Fig. 2, specifically may comprise steps of:
S201, the speech text to be synthesized determined is segmented using text analyzer, obtained and speech to be synthesized
The corresponding word mark file of text;
S202, the speech parameter mould that file is marked according to word and is established in advance according to the contact staff's tone color currently answered
Type library determines speech characteristic parameter corresponding with speech text to be synthesized;
S203, according to speech characteristic parameter, synthesize the speech with contact staff's tamber characteristic of speech text to be synthesized.
Specifically, in above-mentioned method of servicing provided in an embodiment of the present invention, such as with " be very glad and serviced for you " to treat
For synthesizing speech text, use text analyzer that can obtain " very " "high" " emerging " " for " " you " " clothes " " business " and its respective right
The mark file answered;Then in conjunction with mark file, in the speech parameter established in advance according to the contact staff's tone color currently answered
In model library, it can find and the corresponding speech characteristic parameter of " very " "high" " emerging " " for " " you " " clothes " " business ";Finally,
According to the corresponding speech characteristic parameter found, " being very glad for the speech with contact staff's tamber characteristic can be synthesized
For you service " voice.
In the specific implementation, in above-mentioned method of servicing provided in an embodiment of the present invention, step S202 is marked according to word
File and the speech parameter model library established in advance according to the contact staff's tone color currently answered, determine and speech text to be synthesized
Corresponding speech characteristic parameter can specifically be accomplished by the following way:
In the speech parameter model library established in advance according to the contact staff's tone color currently answered, search and marked with word
The corresponding speech parameter model of each word in file;
According to the corresponding speech parameter model of each word, determined by parameter generation algorithm corresponding with speech text to be synthesized
The obtained LF0 in fundamental frequency information conversion log domains, aperiodic ingredient spectrum information average value BAP on different frequency bands and sound channel
The 18 dimension line spectrum pairs parameter LSP that spectrum information extracts in frame.
Specifically, in above-mentioned method of servicing provided in an embodiment of the present invention, in order to improve the quality of synthesis voice, step
The average value BAP of aperiodic ingredient spectrum information on different frequency bands in the specific implementation of S202, can be with right and wrong periodic component
Spectrum Ap be averaged to obtain BAP according to 5 frequency bands, wherein, 5 frequency bands can be respectively 0~1000Hz, 1000~2000Hz,
2000~4000Hz, 4000~6000HZ, 6000~8000Hz, do not limit herein.
In the specific implementation, in above-mentioned method of servicing provided in an embodiment of the present invention, step S203 is according to phonetic feature
Parameter synthesizes the speech with contact staff's tamber characteristic of speech text to be synthesized, can specifically be accomplished by the following way:
Mixed excitation source corresponding with speech text to be synthesized is formed using the LF0 and BAP determined;
The mixed excitation source input filter that will be determined, and pass through the LSP determined and wave filter is controlled, it synthesizes
The speech with contact staff's tamber characteristic of speech text to be synthesized.
In the specific implementation, in above-mentioned method of servicing provided in an embodiment of the present invention, can also include:Pass through such as lower section
Formula establishes the speech parameter model library with contact staff's tone color, as shown in Figure 3:
S301, the raw tone wave file that includes in the speech database of contact staff is decomposed, obtains raw tone wave
The fundamental frequency information of each syllable, aperiodic ingredient spectrum information harmony road spectrum information in shape files;
S302, it the fundamental frequency information of each syllable is converted to log domains obtains LF0;
S303, the aperiodic ingredient spectrum information of each syllable is averaged to obtain respectively in preset each frequency band
BAP;Wherein, preset each frequency band can be 0~1000Hz, 1000~2000Hz, 2000~4000Hz, 4000~
6000HZ, 6000~8000Hz, do not limit herein;
S304, the sound channel spectrum information of each syllable is extracted to 18 dimension line spectrum pairs parameter LSP in frame;
S305, file, LF0, the BAP determined to each syllable are marked according to the corresponding word of raw tone wave file
With LSP speech parameter model is established according to hidden Markov model;
S306, it after carrying out Model tying and model training to established each speech parameter model, obtains with customer service people
The speech parameter model library of member's tone color.
It should be noted that the sequence of the step S302-S304 in above-mentioned method of servicing provided in an embodiment of the present invention can
To exchange, however it is not limited to the sequencing of foregoing description.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of service unit of customer service system, due to the clothes
The principle that business device solves the problems, such as is similar to above-mentioned method of servicing, and therefore, the implementation of the service unit may refer to above-mentioned clothes
The implementation of business method, overlaps will not be repeated.
The service unit of customer service system provided in an embodiment of the present invention, as shown in figure 4, can include:
Receiving unit 401, for receiving phonetic synthesis instruction;
Determination unit 402, for according to the phonetic synthesis instruction received, determining speech text to be synthesized;
Synthesis unit 403, for according to the speech text to be synthesized determined and in advance according to the customer service currently answered
The speech parameter model library that personnel's tone color is established synthesizes the speech with contact staff's tamber characteristic of speech text to be synthesized;
Broadcast unit 404 for receiving the instruction of contact staff, and is played according to instruction by synthesizing voice and/or customer service
The sentence of personnel's artificial speech composition.
In the specific implementation, in above-mentioned service unit provided in an embodiment of the present invention, determination unit 402 specifically can be with
For determining that the phonetic synthesis received instructs whether corresponding speech text to be synthesized is standard words term sentence;It if so, will
Phonetic synthesis instructs corresponding standard words term sentence to be determined as speech text to be synthesized;If it is not, it will then insert phonetic synthesis instruction
Formula of filling a vacancy after the text of carrying talks about term sentence as speech text to be synthesized.
In the specific implementation, in above-mentioned service unit provided in an embodiment of the present invention, synthesis unit 403 can include:
First synthesizing subunit 4031, for being divided using text analyzer the speech text to be synthesized determined
Word obtains word mark file corresponding with speech text to be synthesized;
Second synthesizing subunit 4032, for marking file and in advance according to the contact staff's sound currently answered according to word
The speech parameter model library that color is established determines speech characteristic parameter corresponding with speech text to be synthesized;
Third synthesizing subunit 4033, for according to speech characteristic parameter, synthesize speech text to be synthesized to have customer service
The speech of personnel's tamber characteristic.
In the specific implementation, in above-mentioned service unit provided in an embodiment of the present invention, the second synthesizing subunit 4032, tool
Body can be used in the speech parameter model library established in advance according to the contact staff's tone color currently answered, and search and word mark
The corresponding speech parameter model of each word in explanatory notes part;According to the corresponding speech parameter model of each word, generated and calculated by parameter
Method determines the LF0 that fundamental frequency information conversion log domains corresponding with speech text to be synthesized obtain, and aperiodic ingredient spectrum information is in difference
The 18 dimension line spectrum pairs parameter LSP that average value BAP and sound channel spectrum information on frequency band are extracted in frame.
In the specific implementation, in above-mentioned service unit provided in an embodiment of the present invention, third synthesizing subunit, 4033 tools
Body can be used for forming mixed excitation source corresponding with speech text to be synthesized using the LF0 and BAP that determine;By what is determined
Mixed excitation source input filter, and pass through the LSP determined and wave filter is controlled, synthesize the tool of speech text to be synthesized
There is the speech of contact staff's tamber characteristic.
In the specific implementation, in above-mentioned service unit provided in an embodiment of the present invention, can also include:Modeling unit
405, for decomposing the raw tone wave file included in the speech database of contact staff, obtain raw tone wave file
In each syllable fundamental frequency information, aperiodic ingredient spectrum information harmony road spectrum information;The fundamental frequency information of each syllable is converted to
Log domains obtain LF0;The aperiodic ingredient spectrum information of each syllable is averaged to obtain BAP respectively in preset each frequency band;
The sound channel spectrum information of each syllable is extracted into 18 dimension line spectrum pairs parameter LSP in frame;It is corresponding according to raw tone wave file
Word marks file, and speech parameter mould is established according to hidden Markov model to LF0, BAP and LSP that each syllable is determined
Type;After carrying out Model tying and model training to established each speech parameter model, the language with contact staff's tone color is obtained
Sound parameter model library.
Technical solution for a better understanding of the present invention, the present invention provides establish to have customer service in above-mentioned method of servicing
The speech with contact staff's tamber characteristic of the speech parameter model library of personnel's tone color and synthesis speech text to be synthesized
Specific embodiment, i.e., the parameterised speech synthesis system frame based on hidden Markov model, as shown in Figure 5:
Part A show the specific embodiment for establishing the speech parameter model library with contact staff's tone color in Fig. 5.Target
The speech database of contact staff includes the raw tone wave file of wav forms and corresponding mark file label.
By raw tone wave file by adaptive weighted general interpositioning, i.e. STRAIGHT analytical technologies, it is effectively decomposed into source letter
Breath and channel information, wherein, source information includes fundamental frequency F0 and aperiodic component spectrum AP, and channel information composes SP for sound channel.Then, into
Fundamental frequency F0 is converted to log domains and obtains LF0 by the processing of one step;Aperiodic component spectrum Ap is averaged to obtain according to 5 frequency bands
BAP, wherein, 5 frequency bands be respectively 0~1000Hz, 1000~2000Hz, 2000~4000Hz, 4000~6000HZ, 6000~
8000Hz;Sound channel spectrum sp is extracted into 18 dimension line spectrum pairs parameter LSP in frame.Finally, with reference to mark file label to LF0, BAP
And the parameter combination of LSP, it carries out hidden Markov model and establishes speech parameter model, then to established each speech parameter mould
Type carries out Model tying and model training, recycles 3 times or so the speech parameter models for obtaining target contact staff.
Part B show the specific of the speech with contact staff's tamber characteristic that synthesizes speech text to be synthesized in Fig. 5
Embodiment.The text of voice to be synthesized obtains the mark file label forms of synthesis needs by text analyzer, then, knot
The speech parameter model library of target contact staff that part A obtains in Fig. 5 is closed, finds voice corresponding with speech text to be synthesized
Characteristic parameter, LF0, BAP and LSP.Finally, mixed excitation corresponding with speech text to be synthesized is formed using LF0 and BAP
Source;The mixed excitation source input filter that will be determined, and pass through the LSP determined and wave filter is controlled, it synthesizes and waits to close
Into the speech with contact staff's tamber characteristic of speech text.
In addition, the tool of voice service is realized by above-mentioned method of servicing and service unit the present invention also provides contact staff
Body embodiment, as shown in Figure 6:
After contact staff's accessing user's phone, standard can be talked about term sentence and formula of filling a vacancy talks about the voices to be synthesized such as term sentence
Text by sound of the above-mentioned service unit synthesis with contact staff's tone color, plays to user.Such as " you are good, is very glad
Serviced for you " this standard words term sentence, by sound of the above-mentioned service unit synthesis with contact staff's tone color, play to
User.For another example, when user needs to handle or change business, it can generate by above-mentioned service unit and " would you please input identity card
The sound with contact staff's tone color of number " this standard words term sentence, plays to user.And to ensure preferable use
Family is experienced, and stops speech play at any time.When user inquires credit balance, need by charge system with currently inquiring
The corresponding credit balance data XX of user is inserted in fixed clause " your current credit balance is member ", then will be inserted telephone expenses
The sentence " your current credit balance is XX members " of remaining sum XX is synthesized by above-mentioned service unit to be exported.As it can be seen that contact staff is only
Communication need to be carried out with user in the case of the answer content that the exchange way according to user need to adjust at any time, such as " good
", the sentence of " situation is such " this kind of basic communication;And in both situations above, can will have oneself tone color
The speech of feature plays to client, and user the perceives or contact staff exchanges with it, and experience effect is preferable.
The method of servicing and device of above-mentioned customer service system provided in an embodiment of the present invention, including:Receive phonetic synthesis instruction;
It is instructed according to the phonetic synthesis received, determines speech text to be synthesized;According to the speech text to be synthesized determined and in advance
The speech parameter model library first established according to the contact staff's tone color currently answered, synthesize speech text to be synthesized has customer service
The speech of personnel's tamber characteristic;The instruction of contact staff is received, and is played according to instruction by synthesizing voice and/or contact staff people
The sentence of work voice composition.Due to according to the speech parameter model library established in advance according to contact staff's tone color for currently answering,
The speech with contact staff's tamber characteristic of speech text to be synthesized is obtained, and can will be by according to the instruction of contact staff
The sentence of synthesis voice and/or contact staff's artificial speech composition plays to user, therefore, it is possible to reduce contact staff is artificial
Language amount in service process, reduces the tired pressure of contact staff, and then improves the service quality of customer service system, enhances
User experience.Also, play to user is the speech with contact staff's tamber characteristic, sounds vivid so that is used
Family is perceived less than the participation for having machine more in interactive process, be defaulted as contact staff always with its verbal communication, therefore, into
One step improves the service quality of customer service system, enhances user experience.
In addition, personalized speech synthetic technology, is that one kind is synthesized by establishing target speaker's phonetic feature model
The technology of target person sound of speaking.The technology collects the recording materials of certain phoneme spreadability first, then extracts speaker
The phonetic feature of feature establishes the characteristic model of target speaker, and then for any one section of statement text, can pass through model
The speech parameter feature of the text is generated, the sound of the text with target speaker's speciality is synthesized finally by vocoder
Sound.Current speech synthesis technique is mainly waveform concatenation speech synthesis technique and parameterised speech synthetic technology.
But speech synthesis technique is only used as voice broadcast in customer service field at present, not extensively customer service field its
He uses in applying.And in the method for servicing and device of customer service system provided in an embodiment of the present invention, start phonetic synthesis
A new application scenarios of the technology in customer service field, by personalized speech synthetic technology in customer service incoming call outbound calling process
Middle use, greatly reduces the workload of contact staff, and then improves the service quality and user experience of customer service system, has
Wide application prospect.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (12)
1. a kind of method of servicing of customer service system, which is characterized in that including:
Receive phonetic synthesis instruction;
It is instructed according to the phonetic synthesis received, determines speech text to be synthesized;
According to the speech text to be synthesized determined and in advance according to the language of contact staff's tone color foundation currently answered
Sound parameter model library synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized;
The instruction of the contact staff is received, and is played according to described instruction and synthesizes voice and/or the contact staff by described
The sentence of artificial speech composition.
2. method of servicing as described in claim 1, which is characterized in that the phonetic synthesis instruction that the basis receives,
It determines speech text to be synthesized, specifically includes:
Determine that the phonetic synthesis received instructs whether corresponding speech text to be synthesized is standard words term sentence;
If so, corresponding standard words term sentence is instructed to be determined as the speech text to be synthesized the phonetic synthesis;
If it is not, the formula of filling a vacancy after the text for inserting the phonetic synthesis instruction carrying is then talked about into term sentence as the words to be synthesized
Sound text.
3. method of servicing as described in claim 1, which is characterized in that the speech text to be synthesized that the basis is determined
And this is in advance according to the speech parameter model library of contact staff's tone color foundation currently answered, and synthesizes the speech to be synthesized
The speech with contact staff's tamber characteristic of text, specifically includes:
The speech text to be synthesized determined is segmented using text analyzer, is obtained and the speech text to be synthesized
This corresponding word mark file;
The speech parameter model library that file is marked according to the word and is established in advance according to the contact staff's tone color currently answered,
Determine speech characteristic parameter corresponding with the speech text to be synthesized;
According to the speech characteristic parameter, synthesize the speech text to be synthesized there is contact staff's tamber characteristic if
Sound.
4. method of servicing as claimed in claim 3, which is characterized in that it is described according to the word mark file and in advance according to
The speech parameter model library that contact staff's tone color for currently answering is established determines voice corresponding with the speech text to be synthesized
Characteristic parameter specifically includes:
In the speech parameter model library established in advance according to the contact staff's tone color currently answered, search and marked with the word
The corresponding speech parameter model of each word in file;
According to the corresponding speech parameter model of each word, determined by parameter generation algorithm corresponding with the speech text to be synthesized
The obtained LF0 in fundamental frequency information conversion log domains, aperiodic ingredient spectrum information average value BAP on different frequency bands and sound channel
The 18 dimension line spectrum pairs parameter LSP that spectrum information extracts in frame.
5. method of servicing as claimed in claim 4, which is characterized in that it is described according to the speech characteristic parameter, described in synthesis
The speech with contact staff's tamber characteristic of speech text to be synthesized, specifically includes:
Mixed excitation source corresponding with the speech text to be synthesized is formed using the LF0 determined and the BAP;
The mixed excitation source input filter that will be determined, and pass through the LSP determined and the wave filter is carried out
Control synthesizes the speech with contact staff's tamber characteristic of the speech text to be synthesized.
6. such as claim 1-5 any one of them method of servicing, which is characterized in that further include:Tool is established in the following way
There is the speech parameter model library of contact staff's tone color:
The raw tone wave file included in the speech database of contact staff is decomposed, obtains the raw tone wave file
In each syllable fundamental frequency information, aperiodic ingredient spectrum information harmony road spectrum information;
The fundamental frequency information of each syllable is converted to log domains and obtains LF0;
The aperiodic ingredient spectrum information of each syllable is averaged to obtain BAP respectively in preset each frequency band;
The sound channel spectrum information of each syllable is extracted into 18 dimension line spectrum pairs parameter LSP in frame;
According to the corresponding word mark file of the raw tone wave file, LF0, BAP and the LSP determined to each syllable
Speech parameter model is established according to hidden Markov model;
After carrying out Model tying and model training to established each speech parameter model, obtain that there is contact staff's tone color
Speech parameter model library.
7. a kind of service unit of customer service system, which is characterized in that including:
Receiving unit, for receiving phonetic synthesis instruction;
Determination unit, for according to the phonetic synthesis instruction received, determining speech text to be synthesized;
Synthesis unit, for according to the speech text to be synthesized determined and in advance according to the customer service people currently to answer
Speech parameter model library that member's tone color is established synthesizes the contact staff's tamber characteristic that has of the speech text to be synthesized
Speech;
Broadcast unit, for receiving the instruction of the contact staff, and according to described instruction play by it is described synthesize voice and/or
The sentence of contact staff's artificial speech composition.
8. service unit as claimed in claim 7, which is characterized in that the determination unit, specifically for determining what is received
The phonetic synthesis instructs whether corresponding speech text to be synthesized is standard words term sentence;If so, by the phonetic synthesis
Corresponding standard words term sentence is instructed to be determined as the speech text to be synthesized;If it is not, it will then insert the phonetic synthesis instruction
Formula of filling a vacancy after the text of carrying talks about term sentence as the speech text to be synthesized.
9. service unit as claimed in claim 7, which is characterized in that the synthesis unit, including:
First synthesizing subunit for being segmented using text analyzer to the speech text to be synthesized determined, is obtained
To word mark file corresponding with the speech text to be synthesized;
Second synthesizing subunit, for marking file according to the word and being built in advance according to the contact staff's tone color currently answered
Vertical speech parameter model library determines speech characteristic parameter corresponding with the speech text to be synthesized;
Third synthesizing subunit, for according to the speech characteristic parameter, it is described to synthesize having for the speech text to be synthesized
The speech of contact staff's tamber characteristic.
10. service unit as claimed in claim 9, which is characterized in that second synthesizing subunit, specifically for advance
In the speech parameter model library established according to the contact staff's tone color currently answered, search and each word in word mark file
The corresponding speech parameter model of language;According to the corresponding speech parameter model of each word, by parameter generation algorithm determine with it is described
LF0 that the corresponding fundamental frequency information conversion log domains of speech text to be synthesized obtain, aperiodic ingredient spectrum information is on different frequency bands
The 18 dimension line spectrum pairs parameter LSP that average value BAP and sound channel spectrum information extract in frame.
11. service unit as claimed in claim 10, which is characterized in that the third synthesizing subunit, specifically for using
The LF0 determined and the BAP form mixed excitation source corresponding with the speech text to be synthesized;The institute that will be determined
Mixed excitation source input filter is stated, and passes through the LSP determined and the wave filter is controlled, waits to close described in synthesis
Into the speech with contact staff's tamber characteristic of speech text.
12. such as claim 7-11 any one of them service units, which is characterized in that further include:Modeling unit, for decomposing
The raw tone wave file included in the speech database of contact staff obtains each sound in the raw tone wave file
The fundamental frequency information of section, aperiodic ingredient spectrum information harmony road spectrum information;The fundamental frequency information of each syllable is converted to log domains
Obtain LF0;The aperiodic ingredient spectrum information of each syllable is averaged to obtain BAP respectively in preset each frequency band;
The sound channel spectrum information of each syllable is extracted into 18 dimension line spectrum pairs parameter LSP in frame;According to raw tone waveform text
The corresponding word mark file of part, LF0, BAP and the LSP determined to each syllable establish voice according to hidden Markov model
Parameter model;After carrying out Model tying and model training to established each speech parameter model, obtain that there is the customer service people
The speech parameter model library of member's tone color.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611116110.XA CN108184032B (en) | 2016-12-07 | 2016-12-07 | Service method and device of customer service system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611116110.XA CN108184032B (en) | 2016-12-07 | 2016-12-07 | Service method and device of customer service system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108184032A true CN108184032A (en) | 2018-06-19 |
CN108184032B CN108184032B (en) | 2020-02-21 |
Family
ID=62544670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611116110.XA Active CN108184032B (en) | 2016-12-07 | 2016-12-07 | Service method and device of customer service system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108184032B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785823A (en) * | 2019-01-22 | 2019-05-21 | 中财颐和科技发展(北京)有限公司 | Phoneme synthesizing method and system |
CN109933658A (en) * | 2019-03-21 | 2019-06-25 | 中国联合网络通信集团有限公司 | Customer service speaking analysis method and device |
CN110085209A (en) * | 2019-04-11 | 2019-08-02 | 广州多益网络股份有限公司 | A kind of tone color screening technique and device |
CN110610720A (en) * | 2019-09-19 | 2019-12-24 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN111883133A (en) * | 2020-07-20 | 2020-11-03 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium |
CN112988998A (en) * | 2021-03-15 | 2021-06-18 | 中国联合网络通信集团有限公司 | Response method and device |
CN113808576A (en) * | 2020-06-16 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Voice conversion method, device and computer system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1336750A (en) * | 2000-07-27 | 2002-02-20 | 霈捷科技股份有限公司 | Multiplex telephone service system and mechanism |
US20100145702A1 (en) * | 2005-09-21 | 2010-06-10 | Amit Karmarkar | Association of context data with a voice-message component |
CN102231275A (en) * | 2011-06-01 | 2011-11-02 | 北京宇音天下科技有限公司 | Embedded speech synthesis method based on weighted mixed excitation |
CN103065619A (en) * | 2012-12-26 | 2013-04-24 | 安徽科大讯飞信息科技股份有限公司 | Speech synthesis method and speech synthesis system |
CN105261355A (en) * | 2015-09-02 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and apparatus |
CN105304080A (en) * | 2015-09-22 | 2016-02-03 | 科大讯飞股份有限公司 | Speech synthesis device and speech synthesis method |
-
2016
- 2016-12-07 CN CN201611116110.XA patent/CN108184032B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1336750A (en) * | 2000-07-27 | 2002-02-20 | 霈捷科技股份有限公司 | Multiplex telephone service system and mechanism |
US20100145702A1 (en) * | 2005-09-21 | 2010-06-10 | Amit Karmarkar | Association of context data with a voice-message component |
CN102231275A (en) * | 2011-06-01 | 2011-11-02 | 北京宇音天下科技有限公司 | Embedded speech synthesis method based on weighted mixed excitation |
CN103065619A (en) * | 2012-12-26 | 2013-04-24 | 安徽科大讯飞信息科技股份有限公司 | Speech synthesis method and speech synthesis system |
CN105261355A (en) * | 2015-09-02 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and apparatus |
CN105304080A (en) * | 2015-09-22 | 2016-02-03 | 科大讯飞股份有限公司 | Speech synthesis device and speech synthesis method |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785823A (en) * | 2019-01-22 | 2019-05-21 | 中财颐和科技发展(北京)有限公司 | Phoneme synthesizing method and system |
CN109933658A (en) * | 2019-03-21 | 2019-06-25 | 中国联合网络通信集团有限公司 | Customer service speaking analysis method and device |
CN109933658B (en) * | 2019-03-21 | 2021-05-11 | 中国联合网络通信集团有限公司 | Customer service call analysis method and device |
CN110085209A (en) * | 2019-04-11 | 2019-08-02 | 广州多益网络股份有限公司 | A kind of tone color screening technique and device |
CN110085209B (en) * | 2019-04-11 | 2021-07-23 | 广州多益网络股份有限公司 | Tone screening method and device |
CN110610720A (en) * | 2019-09-19 | 2019-12-24 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN113808576A (en) * | 2020-06-16 | 2021-12-17 | 阿里巴巴集团控股有限公司 | Voice conversion method, device and computer system |
CN111883133A (en) * | 2020-07-20 | 2020-11-03 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, customer service voice recognition server and storage medium |
CN111883133B (en) * | 2020-07-20 | 2023-08-29 | 深圳乐信软件技术有限公司 | Customer service voice recognition method, customer service voice recognition device, server and storage medium |
CN112988998A (en) * | 2021-03-15 | 2021-06-18 | 中国联合网络通信集团有限公司 | Response method and device |
CN112988998B (en) * | 2021-03-15 | 2023-06-16 | 中国联合网络通信集团有限公司 | Response method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108184032B (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108184032A (en) | The method of servicing and device of a kind of customer service system | |
CN105869626B (en) | A kind of method and terminal of word speed automatic adjustment | |
CN103903627B (en) | The transmission method and device of a kind of voice data | |
CN102254553B (en) | The automatic normalization of spoken syllable duration | |
CN108847249A (en) | Sound converts optimization method and system | |
CN109599092B (en) | Audio synthesis method and device | |
Clark et al. | Evaluating long-form text-to-speech: Comparing the ratings of sentences and paragraphs | |
CN108833722A (en) | Audio recognition method, device, computer equipment and storage medium | |
Tanaka et al. | A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation | |
US20190378532A1 (en) | Method and apparatus for dynamic modifying of the timbre of the voice by frequency shift of the formants of a spectral envelope | |
DE112004000187T5 (en) | Method and apparatus of prosodic simulation synthesis | |
DE102004012208A1 (en) | Individualization of speech output by adapting a synthesis voice to a target voice | |
Hansen et al. | On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks | |
JP2013167900A (en) | System and technique for producing spoken voice prompt | |
EP1280137B1 (en) | Method for speaker identification | |
CN103370743A (en) | Voice quality conversion system, voice quality conversion device, method therefor, vocal tract information generating device, and method therefor | |
Picart et al. | Continuous control of the degree of articulation in HMM-based speech synthesis | |
JP2004226556A (en) | Method and device for diagnosing speaking, speaking learning assist method, sound synthesis method, karaoke practicing assist method, voice training assist method, dictionary, language teaching material, dialect correcting method, and dialect learning method | |
CN107705782A (en) | Method and apparatus for determining phoneme pronunciation duration | |
CN109599094A (en) | The method of sound beauty and emotion modification | |
Doi et al. | Statistical approach to enhancing esophageal speech based on Gaussian mixture models | |
Köster | Multidimensional analysis of conversational telephone speech | |
Siegert et al. | Speech signal compression deteriorates acoustic cues to perceived speaker charisma | |
Zahner et al. | Conversion from facial myoelectric signals to speech: a unit selection approach | |
Murphy et al. | Testing the GlórCáil system in a speaker and affect voice transformation task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100053 53a, xibianmennei street, Xuanwu District, Beijing Patentee after: CHINA MOBILE COMMUNICATION LTD., Research Institute Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd. Address before: 100053 53a, xibianmennei street, Xuanwu District, Beijing Patentee before: CHINA MOBILE COMMUNICATION LTD., Research Institute Patentee before: CHINA MOBILE COMMUNICATIONS Corp. |