CN104464716A

CN104464716A - Voice broadcasting system and method

Info

Publication number: CN104464716A
Application number: CN201410670671.9A
Authority: CN
Inventors: 王程程
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2014-11-20
Filing date: 2014-11-20
Publication date: 2015-03-25
Anticipated expiration: 2034-11-20
Also published as: CN104464716B

Abstract

The invention relates to a voice broadcasting system and method. Sample voice matched with a written message broadcaster role is recorded and sent to a voice storage module through a first network communication module and a second network communication module; stored voice data are acquired, voice feature parameters are extracted from the acquired voice data, model training is conducted on the voice feature parameters, and then a feature voice model is acquired; written messages needing to be broadcast by a user through voice are collected, and the collected written messages are sent to a feature voice synthesis module through the first network communication module and the second network communication module; the feature voice model and the written messages are acquired, feature voice with broadcaster voice features and written message content is synthesized, and feature voice data are stored in the voice storage module; the feature voice is broadcast. By the adoption of the system and method, voice with written message sender voice features can be broadcast, individualization is high, and the voice can be easily accepted by a hearer.

Description

A kind of voice broadcasting system and method

Technical field

The present invention relates to speech synthesis technique field, particularly a kind of voice broadcasting system and method.

Background technology

In daily life; we often can run into the situation cannot reading SMS because being busy with working in hand; such as: drive, beat keyboard; for this situation; can only waiting in hand works stop time to leaf through mobile phone short message; and for very important SMS, opportunity may be missed because not watching response in time, thus bring loss.

Existing by by note word synthetic speech in prior art, thus carry out voice broadcast missed call, massage voice reading unread short messages.Phonetic synthesis, also known as literary periodicals (Text to Speech) technology, produced the technology of artificial voice by the method for machinery, electronics, it computing machine oneself is produced or the Word message of outside input change into the mankind can listen understand, technology that fluent natural language exports.

But, a kind of sound model that the unified employing of speech sound feature that existing voice broadcasting modes is reported is extracted in advance, the speech sound of synthesis is single, and the voice played out can not realize having identical intonation, the rhythm with the sender of text message, cause that the voice reported out are stiff, emotional expression is insufficient, lacking individuality, not easily accept by hearer.Therefore, a kind of language play back system with Word message sender characteristic voice is badly in need of.

Summary of the invention

Technical matters to be solved by this invention provides one to have Word message sender characteristic voice voice broadcasting system and method for the deficiencies in the prior art.

The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of voice broadcasting system, comprises client and server system; Described client comprises characteristic sound and records module, Word message acquisition module, first network communication module and characteristic sound playing module, and described server system comprises voice storage module, characteristic sound training module, second network communication module and characteristic sound synthesis module;

Described characteristic sound records module, and described sample voice for recording the sample voice with Word message report person role match, and is sent to voice storage module through first network communication module and second network communication module by it;

Described first network communication module, it is for receiving and dispatching the transmission data between client and server system;

Described second network communication module, it is for receiving and dispatching the transmission data between client and server system;

Described voice storage module, it records the sample voice data of module acquires and the characteristic speech data of characteristic sound synthesis module synthesis for storing characteristic sound;

Described characteristic sound training module, it extracts sound characteristic parameter in the sample voice that stores from voice storage module, and carries out model training, obtains characteristic speech model, and described characteristic speech model is sent to characteristic voice synthetic module;

Described Word message acquisition module, it needs to carry out with voice the Word message reported for gathering user, and the Word message collected is sent to characteristic sound synthesis module through first network communication module and second network communication module;

Described characteristic sound synthesis module, it, for according to described characteristic speech model and described Word message, synthesizes the characteristic voice with report person's characteristic voice and Word message content, and described characteristic speech data is stored to voice storage module;

Described characteristic sound playing module, it is for playing the characteristic voice of characteristic sound synthesis module synthesis.

The invention has the beneficial effects as follows: can on all kinds of mobile terminal, mobile phone, panel computer such as, realize all kinds of Word message of voice broadcast, Word message comprises: the text message of the instant message software receipt such as newsletter archive information, e-book, SMS and QQ Fetion, micro-letter, footpath between fields, footpath between fields.When user uses reciting news text message of the present invention, e-book, a kind of voice tone color in voice storage module can be selected to play according to oneself hobby; When the information that user uses the present invention to report with other people word communication, the voice that the present invention reports have the voice of Word message sender characteristic voice, comprehensively meet the tone color demand of user to speaker dependent, personalized strong, easily accept by hearer, make the better experience effect of the acquisition of user.

On the basis of technique scheme, the present invention can also do following improvement.

Further, described characteristic sound is recorded module and is comprised voice collecting unit and address list binding unit,

Described voice collecting unit, it is for gathering the raw tone of report person, and the raw tone collected is sent to address list to bind unit;

Described address list binding unit, the sample voice data of having bound for the raw tone of report person and report person's Role Information being bound, and are sent to voice storage module through first network communication module and second network communication module by it.

Further, described first network communication module comprises voice transmitting element, Word message transmitting element and characteristic voice transmitting element; Described second network communication module comprises voice receiving unit, Word message receiving element and characteristic voice receiving unit;

Described voice transmitting element, it records the sample voice data of module output for receiving characteristic sound, and described sample voice data are sent to voice receiving unit;

Described voice receiving unit, its speech data exported for receiving described voice transmitting element, and described speech data is sent to voice storage module;

Described Word message transmitting element, its Word message exported for receiving Word message acquisition module, and described Word message is sent to Word message receiving element;

Described Word message receiving element, its Word message exported for receiving described Word message transmitting element, and described Word message is sent to characteristic sound synthesis module;

Described characteristic voice transmitting element, its characteristic speech data exported for receiving voice storage module, and described characteristic speech data is sent to characteristic voice receiving unit.

Described characteristic voice receiving unit, its characteristic speech data exported for receiving characteristic voice transmitting element, and described characteristic speech data is sent to characteristic sound playing module.

Further, described voice storage module comprises sample voice storage unit and characteristic voice memory unit;

Described sample voice storage unit, it is for receiving and storing the sample voice data that described characteristic sound records module acquires;

Described characteristic voice memory unit, its for receive and store described characteristic sound synthesis module synthesis characteristic speech data.

Further, described characteristic sound training module comprises voice annotation unit, parameter extraction unit, model training unit and model storage unit;

Described voice annotation unit, it for obtaining the sample voice of report person, and carries out voice annotation to it;

Described parameter extraction unit, it, for the sample voice marked, carries out the extraction of acoustical characteristic parameters;

Described model training unit, it obtains the characteristic speech model of report person for carrying out model training to acoustical characteristic parameters, and described characteristic speech model is stored to model storage unit;

Described model storage unit, described model for receiving and storing the characteristic speech model of report person, and is sent to characteristic sound synthesis module by it.

Further, described characteristic sound synthesis module comprises text-processing unit, parameter prediction unit and phonetic synthesis unit;

Described text-processing unit, its Word message exported for being received Word message acquisition module by first network communication module and second network communication module, and the mark described Word message being translated into phonetic synthesis unit can identify;

Described parameter prediction unit, it extracts parameters,acoustic corresponding to current text information for the characteristic speech model exported according to mark and the characteristic sound training module of Word message;

Described phonetic synthesis unit, it, for carrying out phonetic synthesis according to the parameters,acoustic corresponding with text message, exports the characteristic voice with report person pronunciation characteristic consistent with current text information.

In order to solve the technical problem, the present invention also provides a kind of voice broadcast method, comprises the following steps,

S101: record the sample voice with Word message report person role match, and described sample voice is sent to voice storage module through first network communication module and second network communication module;

S102: obtain voice data, extracts sound characteristic parameter, and carries out model training to described sound characteristic parameter, obtain characteristic speech model from the speech data obtained;

S103: gathering user needs to carry out with voice the Word message reported, and the Word message collected is sent to characteristic sound synthesis module through first network communication module and second network communication module;

S104: obtain described characteristic speech model and described Word message, synthesis has the characteristic voice of report person's characteristic voice and Word message content, and described characteristic speech data is stored to voice storage module;

S105: play characteristic voice.

The invention has the beneficial effects as follows: can on all kinds of mobile terminal, mobile phone, panel computer, notebook, vehicle-mounted computer such as, realize all kinds of Word message of voice broadcast, Word message comprises: the text message of the instant message software receipt such as newsletter archive information, e-book, SMS and QQ Fetion, micro-letter, footpath between fields, footpath between fields.When user uses reciting news text message of the present invention, e-book, a kind of voice tone color in voice storage module can be selected to play according to oneself hobby; When the information that user uses the present invention to report with other people word communication, the voice that the present invention reports have the voice of Word message sender characteristic voice, comprehensively meet the tone color demand of user to speaker dependent, personalized strong, easily accept by hearer, make the better experience effect of the acquisition of user.

Further, step S101 is specially:

S101a: the raw tone gathering report person, and the raw tone collected sent to address list to bind unit;

S101b: the raw tone of report person and report person's Role Information are bound, and the speech data bound is sent to voice storage module through first network communication module and second network communication module.

Further, described step S102 is specially,

S102a: the sample voice obtaining report person, and voice annotation is carried out to it;

S102b: obtain the sample voice marked, extraction acoustical characteristic parameters is carried out to it;

S102c: obtain acoustical characteristic parameters, model training is carried out to it, obtains the characteristic speech model of report person;

S102d: obtain and store the characteristic speech model of report person;

Further, described step S104 is specially,

S104a: obtain the Word message gathered, be translated into the mark that phonetic synthesis unit can identify;

S104b: the characteristic speech model of the mark and the output of characteristic sound training module that obtain Word message extracts parameters,acoustic corresponding to current text information;

S104c: obtain the parameters,acoustic corresponding with text message, carry out phonetic synthesis according to described parameters,acoustic, exports the characteristic voice with report person pronunciation characteristic consistent with current text information.

Accompanying drawing explanation

Fig. 1 is voice broadcasting system composition structural drawing;

Fig. 2 is voice broadcasting system inner module composition structural drawing;

Fig. 3 is voice broadcast method process flow diagram.

Embodiment

Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.

Fig. 1 is voice broadcasting system composition structural drawing, and as shown in Figure 1, a kind of voice broadcasting system, comprises client and server system; Client can be installed on all kinds of mobile terminal, and mobile terminal comprises mobile phone, panel computer, notebook, vehicle-mounted computer etc.Client starts when user needs to carry out voice broadcast.

Client comprises characteristic sound and records module, Word message acquisition module, first network communication module and characteristic sound playing module; Server system comprises voice storage module, characteristic sound training module, second network communication module and characteristic sound synthesis module;

Characteristic sound records module, for recording the sample voice with Word message report person role match, and sample voice is sent to voice storage module through first network communication module and second network communication module;

Characteristic sound refers to the sound pronunciation feature with particular person, can the voice of fuzzy diagnosis speaker identity according to this pronunciation characteristic.According to the speech samples recording different people in the present invention, so that synthesize the voice with speaker characteristic voice according to speech samples.In addition, time for particular person recording characteristic sound, the characteristic sound of the identity Role Information of characteristic people and recording can be bound, be characteristic phonetic symbol and note the role of speaker, Role Information can be the name of speaker, this name can be stored in cell phone address book list, also can be the pet name in the buddy list of the Instant Messenger (IM) software such as the QQ pet name, the Fetion pet name, micro-letter pet name, footpath between fields, the footpath between fields pet name.

First network communication module and second network communication module, for receiving and dispatching the transmission data between client and server system; Network in described first network communication module and second network communication module can be wide area network or LAN (Local Area Network).

Voice storage module, records the sample voice data of module acquires and the characteristic speech data of characteristic sound synthesis module synthesis for storing characteristic sound; The voice storage module i.e. database of a storaged voice, stores the sample voice of particular person and the characteristic voice of later stage synthesis in this database.Wherein, sample voice stores with the form of multiple recording short sentence, the voice that user can record frequent contact are as required stored in this database as characteristic speech samples, each contact person's initial speech sample length can be half an hour by one hour, due to synthesis characteristic sound effect along with database expand better, be pursue the sound effect that more emulates in the later stage, by increasing the mode expanding data storehouse of sample voice duration.

Characteristic sound training module, it extracts sound characteristic parameter in the sample voice that stores from voice storage module, and carries out model training, obtains characteristic speech model, and described characteristic speech model is sent to characteristic voice synthetic module;

Word message acquisition module, it needs to carry out with voice the Word message reported for gathering user, and the Word message collected is sent to characteristic sound synthesis module through first network communication module and second network communication module; The Word message gathered, can be newsletter archive information, e-book, also can be the Word message of the instant message software receipt such as SMS and QQ Fetion, micro-letter, footpath between fields, footpath between fields, client be gathered by card format and is collected Word message, as the input of subsequent voice synthesis system.

Characteristic sound synthesis module, it, for according to described characteristic speech model and described Word message, synthesizes the characteristic voice with report person's characteristic voice and Word message content, and described characteristic speech data is stored to voice storage module;

Characteristic sound playing module, it is for playing the characteristic voice of characteristic sound synthesis module synthesis.

The present invention can on all kinds of mobile terminal, mobile phone, panel computer such as, realize all kinds of Word message of voice broadcast, Word message comprises: the text message of the instant message software receipt such as newsletter archive information, e-book, SMS and QQ Fetion, micro-letter, footpath between fields, footpath between fields.When user uses reciting news text message of the present invention, e-book, a kind of voice tone color in voice storage module can be selected to play according to oneself hobby; When the information that user uses the present invention to report with other people word communication, the voice that the present invention reports have the voice of Word message sender characteristic voice, comprehensively meet the tone color demand of user to speaker dependent, personalized strong, easily accept by hearer, make the better experience effect of the acquisition of user.

Fig. 2 is voice broadcasting system inner module composition structural drawing, as shown in Figure 2, characteristic sound recording module comprises voice collecting unit and address list binding unit, voice collecting unit, it is for gathering the raw tone of report person, and the raw tone collected is sent to address list to bind unit; Address list binding unit, the sample voice data of having bound for the raw tone of report person and report person's Role Information being bound, and are sent to voice storage module through first network communication module and second network communication module by it.

First network communication module comprises voice transmitting element, Word message transmitting element and characteristic voice transmitting element; Second network communication module comprises voice receiving unit, Word message receiving element and characteristic voice receiving unit; Voice transmitting element, it records the sample voice data of module output for receiving characteristic sound, and described sample voice data are sent to voice receiving unit; Voice receiving unit, its speech data exported for receiving described voice transmitting element, and described speech data is sent to voice storage module; Word message transmitting element, its Word message exported for receiving Word message acquisition module, and described Word message is sent to Word message receiving element; Word message receiving element, its Word message exported for receiving described Word message transmitting element, and described Word message is sent to characteristic sound synthesis module; Characteristic voice transmitting element, its characteristic speech data exported for receiving voice storage module, and described characteristic speech data is sent to characteristic voice receiving unit.Characteristic voice receiving unit, its characteristic speech data exported for receiving characteristic voice transmitting element, and described characteristic speech data is sent to characteristic sound playing module.

Voice storage module comprises sample voice storage unit and characteristic voice memory unit; Sample voice storage unit, it is for receiving and storing the sample voice data that described characteristic sound records module acquires; Characteristic voice memory unit, its for receive and store described characteristic sound synthesis module synthesis characteristic speech data.

Characteristic sound training module comprises voice annotation unit, parameter extraction unit, model training unit and model storage unit; Voice annotation unit, it for obtaining the sample voice of report person, and carries out voice annotation to it; The content of mark comprises: the syllable phoneme cutting of speech data and mark, stress and prosodic labeling, character/word border and part-of-speech tagging, identifies the background noise mark of voice.Parameter extraction unit, it, for the sample voice marked, carries out the extraction of acoustical characteristic parameters; Acoustical characteristic parameters comprises fundamental frequency and spectrum signature parameter.Model training unit, it obtains the characteristic speech model of report person for carrying out model training to acoustical characteristic parameters, and described characteristic speech model is stored to model storage unit; Model storage unit, described model for receiving and storing the characteristic speech model of report person, and is sent to characteristic sound synthesis module by it.

Characteristic sound synthesis module comprises text-processing unit, parameter prediction unit and phonetic synthesis unit; Text-processing unit, its Word message exported for being received Word message acquisition module by first network communication module and second network communication module, and the mark described Word message being translated into phonetic synthesis unit can identify; Parameter prediction unit, it extracts parameters,acoustic corresponding to current text information for the characteristic speech model exported according to mark and the characteristic sound training module of Word message; When having synthesis demand after gathering Word message, if stored the sound model (gathering the sound of this word information transmitter namely as characteristic speech samples) of Word message sender in model storage unit, then call the characteristic speech model of this sender, as one of the input of parameter prediction unit.If do not store the sound model of Word message sender in model storage unit, so then one of the input as parameter prediction unit of characteristic speech model can be exported by Real-time Collection speech samples.

Phonetic synthesis unit, it, for carrying out phonetic synthesis according to the parameters,acoustic corresponding with text message, exports the characteristic voice with report person pronunciation characteristic consistent with current text information.

Fig. 3 is voice broadcast method process flow diagram, and as shown in Figure 3, voice broadcast method comprises the following steps,

S105: play characteristic voice.

Step S101 is specially:

Step S102 is specially,

S102d: obtain and store the characteristic speech model of report person;

Step S104 is specially,

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a voice broadcasting system, it is characterized in that, comprise client and server system, described client comprises characteristic sound and records module, Word message acquisition module, first network communication module and characteristic sound playing module, and described server system comprises voice storage module, characteristic sound training module, second network communication module and characteristic sound synthesis module;

2. a kind of voice broadcasting system according to claim 1, is characterized in that, described characteristic sound is recorded module and comprised voice collecting unit and address list binding unit,

3. a kind of voice broadcasting system according to claim 1, it is characterized in that, described first network communication module comprises voice transmitting element, Word message transmitting element and characteristic voice transmitting element; Described second network communication module comprises voice receiving unit, Word message receiving element and characteristic voice receiving unit;

4. a kind of voice broadcasting system according to claim 1, it is characterized in that, described voice storage module comprises sample voice storage unit and characteristic voice memory unit,

5. a kind of voice broadcasting system according to claim 1, it is characterized in that, described characteristic sound training module comprises voice annotation unit, parameter extraction unit, model training unit and model storage unit;

6. a kind of voice broadcasting system according to claim 1, it is characterized in that, described characteristic sound synthesis module comprises text-processing unit, parameter prediction unit and phonetic synthesis unit;

7. a voice broadcast method, is characterized in that, described voice broadcast method comprises the following steps,

S105: play characteristic voice.

8. a kind of voice broadcast method according to claim 7, it is characterized in that, step S101 is specially:

9. a kind of voice broadcast method according to claim 7, it is characterized in that, described step S102 is specially,

S102d: obtain and store the characteristic speech model of report person.

10. a kind of voice broadcast method according to claim 7, it is characterized in that, described step S104 is specially,