CN112185339A

CN112185339A - Voice synthesis processing method and system for power supply intelligent client

Info

Publication number: CN112185339A
Application number: CN202011064061.6A
Authority: CN
Inventors: 王婷婷; 陈琳; 林磊; 罗陆宁; 黄媚; 刘家学; 李艳; 练芯妤; 谢钰莹; 曹美群; 刘安琪; 罗建国; 黎怡均; 罗益会; 黄公跃; 付婷婷; 陈辉; 莫屾; 严玉婷; 林思远
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention provides a voice synthesis processing method for power supply intelligent clients, which is applied to an intelligent client service scene, carries out voice recognition on received voice data of a user by an intelligent voice client service, carries out natural voice processing, and provides corresponding client service by the intelligent voice client service according to the semantics of the user. And matching corresponding customer service voice data from a preset template database according to the voice data sent by the user and responding to the user. The template database is provided with voice templates, the voice templates are matched preferentially for answering, synthesis is carried out when the template matching is unsuccessful, and the template database is updated. The invention also provides a corresponding system. The invention can improve the pertinence of voice synthesis and improve the effect of synthesizing voice.

Description

Voice synthesis processing method and system for power supply intelligent client

Technical Field

The invention relates to the technical field of power supply intelligent clients, in particular to a voice synthesis processing method and system for a power supply intelligent client.

Background

For customer service work, intelligent voice is one of trends of future development, and although many power supply enterprises are actively building intelligent customer service systems, most of the existing voice navigation systems have some disadvantages, mainly embodied in the disadvantages of low intelligent degree, limited voice recognition effect, complex service flow, poor integrity, poor serviceability and the like. Meanwhile, speech synthesis is poor in pertinence and limited in effect.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method and a system for processing speech synthesis for a power-supplying smart client, which can improve the pertinence of speech synthesis and improve the effect of synthesizing speech.

To solve the above technical problem, an aspect of the present invention provides a speech synthesis processing method for a power supply smart client, including the following steps:

step S10, answering the voice signal of the client through the intelligent power supply seat;

step S11, the intelligent voice customer service carries out voice recognition to the received voice data of the customer, carries out natural voice processing, and determines the service type according to the semantics of the customer;

step S12, according to the voice data sent by the client, matching the corresponding customer service voice template from the preset voice template library corresponding to the service type, and adopting the voice in the customer service voice template to respond to the client;

step S13, when the matching is unsuccessful, synthesizing the customer service voice template, and outputting the selected language, tone and background music according to the customer requirements;

and step S14, carrying out statistical recording on the conversation with unmatched templates, and updating the voice template library of the conversation with the service type according to the synthesized customer service voice template.

Preferably, further comprising:

presetting a voice template library corresponding to each class of service type, wherein the voice template library comprises corresponding customer service voice templates;

presetting a voice style corresponding to each customer service voice template, wherein the voice style comprises the following steps: a full and pure man's voice, a soft, sweet and beautiful woman's voice, a standard genuine english woman's voice;

presetting a hotword corresponding to each service type and a weight corresponding to each hotword, wherein the hotwords comprise: name of person, place name, proper noun of service.

Preferably, the step S11 further includes:

and determining the corresponding service type according to the hot words contained in the voice data of the client.

Preferably, the step S13 further includes:

providing a prerecording synthesis template, prerecording voice by using a speaker for the text which accords with the fixed component of the voice template in the synthesized text, and using the synthesized voice for the non-fixed component, wherein the non-fixed component is obtained by recognizing the manually input text.

Preferably, the step S13 further includes:

performing voice recognition on the synthesized text by adopting a high-precision text analysis technology to ensure intelligent analysis and processing of unrecorded words, polyphones, special symbols and prosodic phrases in the text;

and receiving a voice adjustment request, and dynamically adjusting synthesis parameters of volume, speed and pitch.

Correspondingly, the invention also provides a speech synthesis processing system for supplying power to intelligent clients, which comprises:

the voice input unit is used for answering the voice signal of the client through the power supply intelligent seat;

the service type determining unit is used for carrying out voice recognition on the received voice data of the client by the intelligent voice client service, carrying out natural voice processing and determining the service type according to the semantics of the client;

the matching response unit is used for matching a corresponding customer service voice template from a preset voice template library corresponding to the service type according to voice data sent by the customer, and responding to the customer by adopting voice in the customer service voice template;

the voice template synthesis unit is used for synthesizing the customer service voice template when the matching response unit is unsuccessful in matching and outputting the selected language, tone and background music according to the requirements of customers;

and the updating unit is used for counting and recording the sessions with unmatched templates and updating the voice template library of the sessions with the service types according to the synthesized customer service voice template of the voice template synthesis unit.

Preferably, further comprising:

the template library setting unit is used for presetting a voice template library corresponding to each class of service type, and the voice template library comprises corresponding customer service voice templates;

the voice style setting unit is used for presetting a voice style corresponding to each customer service voice template, and the voice style comprises the following steps: a full and pure man's voice, a soft, sweet and beautiful woman's voice, a standard genuine english woman's voice;

a hotword setting unit, configured to preset a hotword corresponding to each service type and a weight corresponding to each hotword, where the hotword includes: name of person, place name, proper noun of service.

Preferably, the service type determining unit is further configured to determine a corresponding service type according to a hotword included in the voice data of the client.

Preferably, the speech template synthesis unit further comprises:

the synthesis processing unit is used for providing a prerecording synthesis template, prerecording voice by using a speaker for the text which accords with the fixed component of the voice template in the synthesized text, using the synthesized voice for the non-fixed component, and identifying the non-fixed component according to the manually input text to obtain the synthesized text;

the high-precision processing unit is used for carrying out voice recognition on the synthesized text by adopting a high-precision text analysis technology so as to ensure intelligent analysis and processing on unrecorded words, polyphones, special symbols and prosodic phrases in the text;

and the dynamic adjusting unit receives the voice adjusting request and dynamically adjusts the synthesis parameters of the volume, the speed and the pitch.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a voice synthesis processing method and system for power supply intelligent clients, which are applied to an intelligent client service scene, wherein the intelligent voice client service performs voice recognition on received voice data of a user, performs natural voice processing, and provides corresponding client service according to user semantics by the intelligent voice client service. And matching corresponding customer service voice data from a preset template database according to the voice data sent by the user and responding to the user. The template database is provided with voice templates which are matched with the voice templates to answer preferentially, the voice templates are synthesized when the template matching is unsuccessful, and different languages, timbres, background music and the like can be output according to the requirements of users. And carrying out statistical recording on the sessions with unmatched templates, and updating the template library according to the sessions. The invention can improve the pertinence of voice synthesis and improve the effect of synthesizing voice.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a speech synthesis processing method for a power-supplying smart client according to the present invention;

FIG. 2 is a schematic diagram of an embodiment of a speech synthesis processing system for a power-supplying smart client according to the present invention;

fig. 3 is a schematic structural diagram of the speech template synthesis unit in fig. 2.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

For those skilled in the art to more clearly understand the objects, technical solutions and advantages of the present invention, the following description will be further provided in conjunction with the accompanying drawings and examples.

Referring to fig. 1, a schematic flow chart of an embodiment of a speech synthesis processing method for a power supply smart client according to the present invention is shown; in this embodiment, the speech synthesis processing method for a power supply smart client includes the following steps:

in a specific example, the step S11 further includes:

determining a corresponding service type according to a hotword contained in voice data of a client, wherein the hotword comprises: name of person, place name, proper noun of service.

in a specific example, the step S13 further includes:

the method is favorable for improving the synthesis effect in the customization field, simplifying the customization process, accelerating the customization speed, simultaneously enabling the prerecording to be more natural and flexible to use, and meeting wider application requirements.

It is understood that in the embodiment of the present invention, the technology of converting the input text into the voice can be used to synthesize the voice to approach the natural effect of the real person by using the most advanced algorithms of text processing, prosody analysis, polyphonic xiazaki, etc. and the synthesis method based on the parameters and the concatenation of the large template database. Meanwhile, a multi-language speech synthesis engine is integrated, and speech synthesis services such as Chinese, Chinese and English mixed reading, pure English and the like can be provided.

In a specific example, the step S13 further includes:

performing voice recognition on the synthesized text by adopting a high-precision text analysis technology to ensure intelligent analysis and processing of unknown words (such as place names), polyphones, special symbols (such as punctuations, numbers and the like) and prosodic phrases in the text;

and receiving a voice adjustment request, and dynamically adjusting synthesis parameters of volume, speed and pitch (fundamental frequency).

It is understood that, in the above method, before step S10, it is further required to:

presetting hot words corresponding to each service type and weights corresponding to the hot words; by setting a different weight for each hotword, the probability of being identified may be increased or decreased.

Meanwhile, in the embodiment of the invention, the distributed ASR service is carried out by adopting a load balancing technology, so that the service capability of the ASR can be linearly and transversely expanded. To ensure availability, please have to ensure that the CPU level of the load is below 60%, and additionally perform N +1 deployment.

In summary, the method provided by the invention can be applied to the scene of intelligent customer service, the intelligent voice customer service performs voice recognition on the received voice data of the user, performs natural voice processing, and provides corresponding customer service by the intelligent voice customer service according to the user semantics. And matching corresponding customer service voice data from a preset template database according to the voice data sent by the user and responding to the user. The template database is provided with voice templates which are matched with the voice templates to answer preferentially, the voice templates are synthesized when the template matching is unsuccessful, and different languages, timbres, background music and the like can be output according to the requirements of users. And carrying out statistical recording on the sessions with unmatched templates, and updating the template library according to the sessions.

As shown in fig. 2, which is a schematic structural diagram illustrating an embodiment of a speech synthesis processing system for a power supply smart client according to the present invention, and is also shown in fig. 3, in this embodiment, the speech synthesis processing system 1 for a power supply smart client includes:

the voice input unit 10 is used for answering the voice signal of the customer through the power supply intelligent seat;

a service type determining unit 11, configured to perform voice recognition on received voice data of a client by an intelligent voice customer service, perform natural voice processing, and determine a service type according to semantics of the client; preferably, the service type determining unit is further configured to determine a corresponding service type according to a hotword included in the voice data of the client.

A matching response unit 12, configured to match a corresponding customer service voice template from a preset voice template library corresponding to the service type according to voice data sent by the customer, and respond to the customer by using a voice in the customer service voice template;

a voice template synthesis unit 13, configured to synthesize a customer service voice template when the matching response unit fails to perform matching, and output a selected language, tone and background music according to a customer requirement;

and the updating unit 14 is used for carrying out statistical recording on the sessions with unmatched templates and updating the voice template library of the sessions with the service types according to the synthesized customer service voice template of the voice template synthesis unit.

In one specific example, the system further comprises:

the template library setting unit 15 is used for presetting a voice template library corresponding to each class of service type, wherein the voice template library comprises corresponding customer service voice templates;

the voice style setting unit 16 is configured to preset a voice style corresponding to each customer service voice template, where the voice style includes: a full and pure man's voice, a soft, sweet and beautiful woman's voice, a standard genuine english woman's voice;

a hotword setting unit 17, configured to preset a hotword corresponding to each service type and a weight corresponding to each hotword, where the hotword includes: name of person, place name, proper noun of service.

In a specific example, the speech template synthesizing unit 13 further includes:

a synthesis processing unit 130, configured to provide a prerecorded synthesis template, prerecorded a voice by a speaker for a text that matches a fixed component of the voice template in the synthesized text, and synthesized a voice for a non-fixed component that is obtained by recognition of a manually input text;

the high-precision processing unit 131 is configured to perform speech recognition on the synthesized text by using a high-precision text analysis technology, so as to ensure intelligent analysis and processing of unrecorded words, polyphones, special symbols, and prosodic phrases in the text;

the dynamic adjustment unit 132 receives the voice adjustment request, and dynamically adjusts the synthesis parameters of the volume, the speech rate, and the pitch.

For more details, reference may be made to the foregoing description of fig. 1, which is not repeated herein.

The embodiment of the invention has the following beneficial effects:

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A speech synthesis processing method for a power supply intelligent client is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the step S11 further comprises:

4. The method of claim 2, wherein the step S13 further comprises:

5. The method of claim 3, wherein the step S13 further comprises:

6. A speech synthesis processing system for powering smart clients, comprising:

7. The system of claim 6, further comprising:

8. The system of claim 7, wherein the service type determining unit is further configured to determine the corresponding service type according to a hotword included in the voice data of the client.

9. The system of claim 8, wherein the speech template synthesis unit further comprises: