CN101222542B - Method for implementing Text-To-Speech function - Google Patents

Method for implementing Text-To-Speech function Download PDF

Info

Publication number
CN101222542B
CN101222542B CN2007101530700A CN200710153070A CN101222542B CN 101222542 B CN101222542 B CN 101222542B CN 2007101530700 A CN2007101530700 A CN 2007101530700A CN 200710153070 A CN200710153070 A CN 200710153070A CN 101222542 B CN101222542 B CN 101222542B
Authority
CN
China
Prior art keywords
literary composition
media resource
composition language
text
text string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2007101530700A
Other languages
Chinese (zh)
Other versions
CN101222542A (en
Inventor
陈诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2007101530700A priority Critical patent/CN101222542B/en
Publication of CN101222542A publication Critical patent/CN101222542A/en
Application granted granted Critical
Publication of CN101222542B publication Critical patent/CN101222542B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method for realizing a text-to-speech conversion function, wherein, media resource control equipment controls media resource processing equipment to realize a text-to-speech conversion through an H.248 protocol. The method comprises the following steps: the media resource control equipment carries an extension packet parameter in an H.248 message to instruct the media resource processing equipment to execute a text-to-speech conversion processing corresponding to the parameter; the media resource processing equipment calls a text-to-speech converter to execute the text-to-speech conversion processing according to the parameter in the message and feeds back the text-to-speech conversion result to the media resource control equipment. With the method provided by the invention, a user can be provided with service applications related to the text-to-speech conversion in the media resource application of mobile or fixed networks; at the same time, only the text needs to be modified during the modification without need to rerecord, and prompt tones with much personality can be played according to the requirement of the user.

Description

A kind of method that realizes function of text-to-speech convert
Technical field
The present invention relates to a kind of method that realizes function of text-to-speech convert, particularly a kind of employing H.248 agreement realizes the method for function of text-to-speech convert as control protocol.
Background technology
Literary composition language switch technology is the voice technology of a core.It converts text message to the synthetic voice of machine, that provide convenience, friendly Man Machine Interface.Briefly exactly a text string is converted to voice.As input text " hello ", after the function of text-to-speech convert processing, the voice of output " hello " the words.
In the existing network system, application server has two kinds of methods usually to user's playback the time:
The 1st kind of method is to play-over a recording.As when user of customer call fails, system can be to user prompt " subscriber absent that you call out ", and this section prompt tone is to record in advance and be stored on the server apparatus.Existing perfect method in agreement H.248 is as agreement H.248.9.
The 2nd kind of method is to realize with function of text-to-speech convert.When customer call was failed, system became " subscriber absent that you call out " this text-converted voice to export to the user.
Use the benefit of literary composition language conversion to be:
(1) revises conveniently, when revising, only need revise text, do not need to record again;
(2) can play more personalized prompt tone according to user's request, as using male voice, female voice, neutral sound playing respectively.
H.248, the 2nd kind of above-mentioned method do not defining in the agreement, and the media resource applied environment need use function of text-to-speech convert, and at this point, the present invention proposes a kind of H.248 method of agreement realization function of text-to-speech convert of passing through.
Summary of the invention
The invention provides a kind of media resource controlling device and pass through the H.248 method of agreement indication media resource processing arrangements realization function of text-to-speech convert.
The method of realization function of text-to-speech convert of the present invention may further comprise the steps:
Step 1, the H.248 message of carrying literary composition language conversion indication that media resource processing arrangements receiving media resource control appliance sends, carry in the described H.248 message by the expanding packet parameter of protocol extension package definition H.248, indicate described media resource processing arrangements to carry out and the corresponding literary composition language of this parameter conversion process; And
Step 2, media resource processing arrangements is carried out literary composition language conversion process according to the literary composition language of the parameter call in above-mentioned message transducer, and with the literary composition transformation result feedback media resource controlling device of speaking.
Wherein, carry the relevant information of text string in this expanding packet parameter, media resource processing arrangements calls literary composition language transducer and carries out the conversion of literary composition language according to the relevant information of text string.
The relevant information of above-mentioned text string can be text string itself, it is as being embedded in H.248 in the message by the character string of orthoepy, after media resource processing arrangements receives text string, directly extract text string and call literary composition language transducer and carry out literary composition language and change.
When text string is stored on media resource processing arrangements or the external server in advance, the relevant information of above-mentioned text string can be the sign that comprises text string and the text of stored position information, after media resource processing arrangements receives above-mentioned text, according to stored position information wherein, from this locality or external server read text string and put into buffer memory, and call literary composition language transducer and carry out literary composition language and change.
Wherein, the relevant information of above-mentioned text string can comprise the text of text string and another text string, text file comprises the sign and the stored position information of this another text string, the sign of text file and text string are combined into the continuous text string, and before text sign, increase keyword and identify this and be combined as the text that pronounces, after media resource processing arrangements receives this combination, at first from this locality or external server read text string, put into buffer memory after itself and the pronunciation text string that H.248 carries in the message be connected in series, call literary composition language transducer then and carry out literary composition language and change.
Wherein, the relevant information of above-mentioned text string can comprise the combination of a text string and a recording file, and before text string, increase keyword and identify this and be combined as a voice document, after media resource processing arrangements receives this combination, at first call literary composition language transducer text string is carried out the conversion of literary composition language, voice and the recording file with literary composition language conversion back output makes up a voice snippet then.
Wherein, the relevant information of above-mentioned text string can be for comprising the combination of a text and a recording file, text file comprises the sign and the stored position information of this another text string, and before this sign, increase keyword and identify this and be combined as a voice document, after media resource processing arrangements receives this combination, at first according to stored position information from this locality or external server read text string and put into buffer memory, call literary composition language transducer then the text string that reads is carried out the conversion of literary composition language, and the voice and the recording file of literary composition language conversion back output made up a voice snippet.
In said method, H.248, this further carries the relevant parameter that the voice attributes of output changed in the literary composition language in message, this relevant parameter comprises: pronunciation category of language, pronunciation sex, pronunciation age, pause, can further include at least one in the following parameter: the articulation type of the rate of articulation, pronunciation volume, audio tone, special character, read, whether end the conversion of literary composition language again when the user imports, media resource processing arrangements receives and calls literary composition language transducer behind this relevant parameter and for the voice of output corresponding attribute is set.
Call literary composition language transducer at the step 2 media resource processing arrangements and carry out in the literary composition language transfer process, said method further comprises:
Step 21, media resource controlling device indication media resource processing arrangements detects the anomalous event that takes place in the speech recognition process.
When detecting anomalous event, media resource processing arrangements will represent that anomalous event corresponding error sign indicating number feeds back to media resource controlling device.
Further, media resource processing arrangements calls in the literary composition language transducer execution literary composition language transfer process in step 2, and said method also comprises:
Step 22, media resource controlling device is controlled literary composition language transfer process.
In step 22, media resource controlling device can comprise the control of literary composition language transfer process and temporarily stops the user being play voice after the conversion of literary composition language, and return to broadcast state from above-mentioned halted state.
In step 22, media resource controlling device can comprise making to the control of literary composition language transfer process plays F.F. or rewind down, and this F.F. comprises the some words of F.F., sentence or paragraph, and perhaps F.F. is some seconds, this rewind down comprises the some words of rewind down, sentence or paragraph, and perhaps rewind down is some seconds.
In step 22, media resource controlling device can comprise that to the control of literary composition language transfer process restarting the literary composition language changes.
In step 22, media resource controlling device comprises that to the control of literary composition language transfer process the user ends the conversion of literary composition language.
In step 22, media resource controlling device comprises the current sentence of repeat playing, paragraph or full text to the control of literary composition language transfer process, and the control of literary composition language transfer process is comprised that further cancellation is to current sentence, paragraph or repeat playing in full.
By method provided by the invention, can in using, the media resource of mobile or fixed network provide the literary composition language service application that conversion is correlated with to the user, as being changed into sound, the content on the webpage reads to listen to the user.Simultaneously, when revising, only need revise text, not need to record again, and can play more personalized prompt tone according to user's request.
Description of drawings
Fig. 1 is in the WCDMA IMS network, handles the network architecture of media resource business.
Fig. 2 is in fixing flexible exchanging network, handles the network architecture of media resource business.
Fig. 3 realizes the flow chart of the method for function of text-to-speech convert for the present invention.
Embodiment
Fig. 1 is in the WCDMA IMS network, handles the network architecture of media resource business.Wherein, application server 1 is used to handle miscellaneous service, for example to user's playback, collect the digits, meeting, recording etc.Service call session control equipment 2 is used to handle route, and the message that application server is sent correctly is transmitted to media resource controlling device 3, and perhaps the message that media resource controlling device 3 is sent correctly is routed to application server 1.Media resource controlling device 3 is used to control media resource, and it is according to the requirement of application server 1, selects corresponding media resource processing arrangements 4 and controls the processing of media resource.Media resource processing arrangements 4 is used for the processing of media resource, under the control of media resource controlling device 3, finishes the media resource operational processes that application server 1 issues.
Wherein, the interface that adopts between application server 1, service call session control equipment 2 and the media resource controlling device 3 uses Session Initiation Protocol and XML agreement, or the agreement of Session Initiation Protocol and similar XML (for example VXML).The interface that adopts between media resource controlling device 3 and the media resource processing arrangements 4 is a Mp, uses H.248 agreement.The external interface of media resource processing arrangements 4 is a Mb, generally adopts Real-time Transport Protocol carrying user media stream.
Fig. 2 is in fixing flexible exchanging network, handles the network architecture of media resource business.Wherein, Media Resource Server (Media Resource Server, MRS) be equivalent to the media resource controlling device 3 in the WCDMA IMS network and the function of media resource processing arrangements 4, application server is equivalent to the application server 1 in the WCDMA IMS network and the function of service call session control equipment 2, and Softswitch and application server 1 function are roughly the same.
Provided by the present inventionly realize that by agreement H.248 the media resource that the method for function of text-to-speech convert can be applied in WCDMA IMS network shown in Figure 1 and the fixedly flexible exchanging network shown in Figure 2 handles.Equally also can be applied to other network, as cdma network and fixing IMS network, the framework of its media resource application scenarios and operation flow and above-mentioned WCDMA IMS's is basic identical, and WCDMA, CDMA circuit flexible exchanging network, its media resource application architecture and operation flow and fixedly flexible exchanging network is basic identical.Just, the present invention can be applied to all realize function of text-to-speech convert by the control of agreement H.248 media resource apparatus situation.
To be example below, the H.248 method of agreement realization function of text-to-speech convert of passing through provided by the present invention will be described simultaneously with reference to the accompanying drawings to be applied to WCDMA IMS.
Here, because the present invention only relates to the processing procedure between media resource controlling device 3 shown in Figure 1 and the media resource processing arrangements 4, and other processes are identical with the processing procedure in the existing WCDMA IMS network, therefore, only the processing procedure between media resource controlling device 3 and the media resource processing arrangements 4 is described for simplification.
As shown in Figure 3, the flow chart that carries out the control and the processing of media resource for media resource controlling device 3 and media resource processing arrangements 4.
Step 1, media resource controlling device 3 are sent the indication of carrying out the conversion of literary composition language to media resource processing arrangements 4.
Particularly, H.248 media resource controlling device 3 is carrying the expanding packet parameter in the message by defining H.248 protocol extension bag, thereby indication media resource processing arrangements 4 is carried out the conversion of literary composition language.H.248 protocol package is defined as follows:
Bag title (Package Name): TTS bag (TTS Package)
Bag sign (PackageID): ttsp (0x?)
Illustrate: slightly, referring to the explanation of follow-up scheme
Version (Version): 1
Expansion (Extends): do not have
1. characteristic (Properties):
Do not have
2. incident (Events):
With reference to the definition in follow-up " incident " part.
3, signal (Signals)
With reference to follow-up definition in " signal " part.
4. statistical information (Statistics)
Do not have
5. handle (Procedure)
The corresponding follow-up flow process that will describe.
In step 1, can adopt multiple mode in the parameter of message H.248, to carry the text string relevant information:
(1) in the parameter of message H.248, carry text string:
"
H.248 the identification of the functional entity of the not processed H.248 agreement of the form of text string just be embedded in the message as a string.After media resource processing arrangements 4 receives this parameter, can directly extract text string and give the processing of literary composition language transducer.
(2) H.248 carrying text string file identification and stored position information in the message parameter
Text string can be stored on media resource processing arrangements 4 or the external server in advance, H.248 carries the sign and the stored position information of text string file in the message.
The sign of text string file can be the arbitrary string that meets the file designation standard.
The stored position information of text string file has three kinds of forms:
But the file of this locality direct access I. is as welcome.txt;
II. pass through file: the file of // mode access, as file: //huawei/welcome.txt;
III. pass through http: the file of // mode access, as htttp: //huawei/welcome.txt;
After media resource processing arrangements receives this parameter,, from far-end server or local storage, read text earlier, put into buffer memory, call literary composition language transducer again and handle according to the deposit position of text string file.
(3) H.248 carrying text string and text simultaneously in the message parameter, text string and combination of files are carried out
Text sign and text string are combined into a continuous text string, increase special keyword in text sign front, a pronunciation text is introduced in expression, rather than directly changes this filename, as:
<importtextfile?http://huawei/welcome.txt>
Do?you?want?to?play?a?game?
After media resource processing arrangements 4 receives the combination fill order of pronunciation text string and text string file, carry out preliminary treatment earlier, read the text string file from external server or in this locality, and the pronunciation text string that carries in itself and the message is connected to become a string, put into buffer memory, call literary composition language transducer again and handle.
(4) after indication is done literary composition language conversion process to a text string or text, be combined into another voice segments with the recording fragment again
Increase special keyword in voice document sign front, a voice document is introduced in expression, rather than directly changes this filename, as:
<importaudiofile?http://huawei/welcome.g711>
Do?you?want?to?play?a?game?
Media resource processing arrangements 4 carries out preliminary treatment earlier after receiving the combination fill order of literary composition language converting speech and recording file, reads file from far-end server or this locality, puts into buffer memory; Call literary composition language transducer again and handle text string, and the output voice and the voice document of literary composition language conversion is combined into a sound bite.
In addition, in step 1, further H.248 carrying the voice attributes parameter that the conversion of literary composition language is exported in the message.When the indication media resource processing arrangements was carried out the conversion of literary composition language, the portable parameter relevant with pronunciation had:
(1) pronunciation category of language
Can use different category of language, defer to the definition of RFC3066.
(2) pronunciation sex
Can be male voice, female voice or neutral sound;
(3) the pronunciation age
Can be child's sound, adult's sound or old sound;
(4) rate of articulation
The rate of articulation can be faster or slower than normal word speed, represents with percentage, and-20% expression is than normal speed slow 20%.
(5) pronunciation volume
The pronunciation volume can be higher or lower than normal pitch, represents with percentage, and-20% expression is than normal pitch low 20%.
(6) audio tone
Audio tone can be higher or lower than normal pitch, represents with percentage, and-20% expression is than normal pitch low 20%.
(7) articulation type of special character
To the special word regulation articulation type in the text string.Pronunciation as " 2005/10/01 " is " on October 1st, 2005 ".
(8) whether pause and the duration that pauses, stall position
The purpose of pausing is to pronounce to be accustomed in order to meet, and the pause duration is a time value greater than 0, and stall position can have several values: pause after whenever running through in short, perhaps pause after whenever running through one section word.
(9) whether read again and stressed rank, stressed position
Stressed rank can be high, medium and low three ranks; Can there be several values the position of reading again: only read again when beginning in full, the beginning of every words is all read again, and the beginning of every section words is all read again etc.
(10) whether read text in advance
If file is read in indication in advance, then after receiving order, the remote server of arriving reads file cache in this locality, otherwise by the time reads file during command execution again;
(11) duration of file cache
After file read this locality, buffer memory how long lost efficacy by the back.
(12) whether when the user imports DTMF or voice, end the conversion of literary composition language.
At literary composition language conversion and automatic speech/when DTMF identification is carried out simultaneously, when if the user imports DTMF or voice, the literary composition language is changed and can be ended in the literary composition language transfer process.
Step 2, media resource processing arrangements are confirmed this indication after the indication that receives media resource controlling device, confirmation is fed back media resource controlling device, and call literary composition language transducer and carry out the conversion of literary composition language, the voice after the user plays conversion.
Particularly, H.248 defining in the protocol package:
Signal (Signal) comprising: the signal of TTS file is play in (1) indication; (2) signal of TTS string is play in indication; (3) signal of TTS string, TTS file and voice snippet is play in indication; (4) indication is provided with the signal of stress; (5) indication is provided with the signal of pause; And the signal of the special words of (6) indication, these signals are expressed as follows respectively:
(1) plays TTS file (Play TTS File), be used for indication and carry out function of text-to-speech convert.
Signal name (Signal Name): play TTS file (Play TTS File)
Signal identification (SignalID): ptf (0x?)
(Description) is described: the text string file is carried out the TTS function
Signal type (SignalType): BR
Duration (Duration): unavailable (Not Applicable)
Its additional parameter (Additional Parameter) comprising:
I.
Parameter name (Parameter Name): TTS file
Parameter identification (Parameter ID): tf (0x?)
Illustrate: TTS filename and memory location
Type (Type): character string (String)
Whether optional (Optional): not
Possible value (Possible Value): legal file identification and storage format
Default value (Default): do not have
II.
Parameter name: language form (Language Type)
Parameter identification: lt (0x?)
Illustrate: language form
Type: character string
Whether optional: not
Probable value: defer to the RFC3066 agreement
Default value: do not have
III.
Parameter name: sex (Gender)
Parameter identification: ge (0x?)
Illustrate: the pronunciation sex
Type: character string
Whether optional: not
Probable value: man, woman, neutrality
Default value: do not have
IV.
Parameter name: age (Age)
Parameter identification: ag (0x?)
Illustrate: the pronunciation age
Type: character string
Whether optional: not
Probable value: child, adult, old man
Default value: do not have
V.
Parameter name: speed (Speed)
Parameter identification: sp (0x?)
Illustrate: the rate of articulation
Type: integer
Whether optional: yes
Probable value: the value from-100% to 100%
Default value: do not have
VI.
Parameter name: volume (Volume)
Parameter identification: vo (0x?)
Illustrate: the pronunciation volume
Type: integer
Whether optional: as to be
Probable value: the value from-100% to 100%
Default value: do not have
VII.
Parameter name: tone (Tone)
Parameter identification: to (0x?)
Illustrate: audio tone
Type: integer
Whether optional: as to be
Probable value: the value from-100% to 100%
Default value: do not have
VII.
Parameter name: read file (Prefetch) in advance
Parameter identification: pf (0x?)
Illustrate: read the text string file in advance
Type: enum
Whether optional: as to be
Probable value: be, not
Default value: be
VIII.
Parameter name: cache-time (Cache Time)
Parameter identification: ct (0x?)
Illustrate: the file cache duration
Type: integer
Whether optional: as to be
Probable value: greater than 0 second
Default value: do not have
IX.
Parameter name: DTMF inserts
Parameter identification: dbi (0x?)
Illustrate: when the user imports DTMF, end the conversion of literary composition language
Type: enum
Whether optional: as to be
Probable value: be, not
Default value: do not have
X.
Parameter name: voice barge in
Parameter identification: vbi (0x?)
Illustrate: when user importer voice, end the conversion of literary composition language
Type: integer
Whether optional: as to be
Probable value: greater than 0 second
Default value: do not have
(2) play TTS string (Play TTS String), be used for indication text string is carried out the TTS function.
Signal name: play the TTS string
Signal identification: pts (0x?)
Illustrate: indication is carried out the TTS function to text string
Signal type: BR
Duration: unavailable
Its additional parameter comprises:
I.
Parameter name: TTS goes here and there (TTS String)
Parameter identification: ts (0x?)
Illustrate: the text string that can pronounce
Type: character string
Whether optional: not
Probable value: the text string that can pronounce
Default value: do not have
II. other parameter is identical with II, III, IV, V, VI, IX, the X of " playing the TTS file " signal.
(3) play TTS string, TTS file and voice snippet
Signal name: play combination (Play union)
Signal identification: pu (0x?)
Illustrate: the combination of playing TTS string, TTS file, sound bite file
Signal type: BR
Duration: unavailable
Its additional parameter comprises:
I.
Parameter name: TTS and voice snippet
Parameter identification: ta (0x?)
Illustrate: the combination of playing TTS string, TTS file, sound bite file
Type: character string
Optional No whether
Probable value: the combination of playing TTS string, TTS file, sound bite file
Default value: do not have
II. other parameter is identical with II, III, IV, V, VI, IX, the X of " playing the TTS file " signal.But II, III, IV, V, VI parameter only are applicable to the TTS transfer process.
(4) stress (Set Accentuation), the stressed rank and the position that are used to indicate TTS are set.
Signal name: stressed (Set Accentuation) is set
Signal identification: sa (0x?)
Illustrate: stressed rank and the position of indication TTS
Signal type: BR
Duration is unavailable
Its additional parameter comprises:
I.
Parameter name: read position (Accentuation Position) again
Parameter identification: ap (0x?)
Illustrate: read the position again
Type: character string
Whether optional: as to be
Probable value: starting position, sentence beginning, paragraph beginning
Default value: do not have
II.
Parameter name: read rank (Accentuation Grade) again
Parameter identification: ag (0x?)
Illustrate: read rank again
Type: character string
Whether optional: as to be
Probable value: height, in, low
Default value: do not have
(5) pause (Set Break), the stall position and the duration that are used to indicate TTS are set.
Signal name: pause (Set Break) is set
Signal identification: sb (0x?)
Illustrate: stall position and the duration of indication TTS
Type signal: BR
Duration is unavailable
Its additional parameter comprises:
I.
Parameter name: stall position (Break Position)
Parameter identification: bp (0x?)
Illustrate: stall position
Type: character string
Whether optional: not
Probable value: the ending of sentence, the ending of paragraph
Default value: do not have
II.
Parameter name: pause duration (Break Time)
Parameter identification: bt (0x?)
Illustrate: the pause duration
Type: integer
Whether optional: yes
Probable value: greater than 0 millisecond
Default value: do not have
(6) special words (Special Words) is used to indicate the manner of articulation of TTS to special words.
Signal name (Signal Name): special words
Signal identification (SignalID): sw (0x?)
Illustrate: indication TTS is to the manner of articulation of special words
Type signal: BR
Duration is unavailable
Its additional parameter parameter comprises:
I.
Parameter name: target words (Destination Words)
Parameter identification: dw (0x?)
Illustrate: the original words in the text string
Type: character string
Whether optional: as to be
Probable value: any
Default value: do not have
II.
Parameter name: replace pronunciation (Say As)
Parameter identification: sa (0x?)
Illustrate: the manner of articulation of replacement
Type: character string
Whether optional: as to be
Probable value: any
Default value: do not have
Step 3, media resource controlling device 3 indication media resource processing arrangements detect literary composition language transformation result.
Step 4, media resource processing arrangements 4 are confirmed and are returned confirmation after receiving this indication.
Step 5,3 pairs of literary composition languages of media resource controlling device transfer process is controlled, and this control comprises:
1, suspends: temporarily stop the user being play voice after the conversion;
2, recover: recover above halted state to broadcast state;
3, there is multiple indicating means F.F. and the position that is fast-forward to:
(1) several words of F.F.;
(2) be fast-forward to the beginning of a certain sentence in back;
(3) be fast-forward to a certain section beginning in back;
(4) F.F. is some seconds;
(5) the some phonetic units of F.F. (phonetic unit is self-defined by realizing, as 10s).
4, there is multiple indicating means the position of rewind down and rewind down:
(1) several words of rewind down;
(2) fall back on a certain sentence beginning in front soon;
(3) fall back on a certain section beginning in front soon;
(4) rewind down is some seconds;
(5) the some phonetic units of rewind down (phonetic unit is self-defined by realizing, as 10s).
5, restart the conversion of literary composition language;
6, literary composition language EOC: the user ends
7, the scope that repeats and repeat has multiple indicating means:
(1) repeats current sentence;
(2) repeat present segment;
(3) repeat in full;
8, cancellation repeats: cancel above-mentioned repeat playing;
9, reset literary composition language conversion parameter, comprise above-mentioned tone, volume, velocity of sound, pronunciation sex, pronunciation age, read parameters such as position, stall position and duration again.
Particularly, H.248 be defined as in the protocol package:
Signal: comprise TTS suspend,
(1) TTS suspends (TTS Pause), is used for indication and suspends TTS.
Signal name: TTS suspends (TTS pause)
Signal identification:: tp (0x?)
Illustrate: indication suspends TTS
Type signal: BR
Duration: unavailable
Additional parameter: do not have
(2) TTS recovers (TTS Resume), is used for indication and recovers the TTS time-out.
Signal name: TTS recovers (TTS Resume)
Signal identification: tr (0x?)
Illustrate: indication recovers TTS and suspends
Type signal: BR
Duration is unavailable
Additional parameter: do not have
(3) TTS skips words (TTS Jump Words), is used for proceeding after several words are skipped in indication.
Signal name: TTS skips words
Signal identification: tjw (0x?)
Illustrate: indication is jumped to some positions and is proceeded
Type signal: BR
Duration: unavailable
Additional parameter:
I.
Parameter name: skip what (Jump Size)
Parameter identification: js (0x?)
Illustrate: the word number of skipping, just representing that backward negative indication is forward
Type: integer
Whether optional: not
Probable value: any
Default value: do not have
(4) TTS skips sentence (TTS Jump Sentences), and being used for indication, to skip several sentences follow-up
Continuous carrying out.
Signal name: TTS jump sentences
Signal identification: tjs (0x?)
Illustrate: proceed after several sentences are skipped in indication
Type signal: BR
Duration: unavailable
Additional parameter comprises:
I.
Parameter name: what are skipped
Parameter identification: js (0x?)
Illustrate: the sentence number of redirect, just representing that backward negative indication is forward
Type: integer
Whether optional: not
Probable value: any
Default value: do not have
(5) TTS skips paragraph (TTS Jump Paragraphs), is used for proceeding after several paragraphs are skipped in indication.
Signal name: TTS skips paragraph
Signal identification: tjp (0x?)
Illustrate: proceed after several paragraphs are skipped in indication
Type signal: BR
Duration: unavailable
Additional parameter comprises:
I.
Parameter name: what are skipped
Parameter identification: js (0x?)
Illustrate: the paragraph number of redirect, just representing that backward negative indication is forward
Type: integer
Whether optional: not
Probable value: any
Default value: do not have
(6) TTS skips a second number (TTS Jump Seconds), proceeds after being used to indicate the voice of skipping several seconds.
Signal name: TTS skips a second number
Signal identification: tjs (0x?)
Illustrate: indication is skipped and was proceeded behind the voice in several seconds
Type signal: BR
Duration: unavailable
Additional parameter comprises:
I.
Parameter name: what are skipped
Parameter identification: js (0x?)
Illustrate: the second number of redirect, just representing that backward negative indication is forward
Type: integer
Whether optional: not
Probable value: any
Default value: do not have
(7) TTS skips voice unit (TTS Jump Voice Unit), is used for proceeding after several voice units are skipped in indication.
Signal name: TTS skips voice unit
Signal identification: tjvu (0x?)
Illustrate: proceed after several voice units are skipped in indication, voice unit is big
Little realization is self-defined
Type signal: BR
Duration: unavailable
Additional parameter comprises:
I.
Parameter name: what are skipped
Parameter identification: js (0x?)
Illustrate: the voice unit number of redirect, just representing that backward negative indication is forward
Type: integer
Whether optional: not
Probable value: any
Default value: do not have
(8) TTS restarts (TTS Restart)
Signal name: TTS restarts
Signal identification: tr (0x?)
Illustrate: TTS restarts
Type signal: BR
Duration: unavailable
Additional parameter: do not have
(9) TTS finishes (TTS End)
Signal name: TTS finishes
Signal identification: te (0x?)
Illustrate: TTS finishes
Type signal: BR
Duration: unavailable
Additional parameter: do not have
(10) TTS repeats (TTS Repeat), and indication repeats a certain section literal of TTS.
Signal name: TTS repeats
Signal identification: tre (0x?)
Illustrate: a certain section literal that repeats TTS
Type signal: BR
Duration: unavailable
Additional parameter comprises:
I.
Parameter name: repeatable position
Parameter identification: pos (0x?)
Illustrate: repeatable position
Type: character string
Whether optional: not
Probable value: current sentence, current paragraph, all the elements
Default value: do not have
Whether optional: yes
Probable value: greater than 0 second
Step 6, media resource processing arrangements 4 are confirmed and are returned confirmation after receiving this indication.
Step 7, media resource processing arrangements 4 will detected incident such as normal terminations in transfer process spoken in literary composition, and overtime grade feeds back to media resource controlling device 3.
The detected incident of literary composition language transfer process comprises: the parameter of describing the result when error code under the abnormal conditions and normal conversion finish.
1, the error code of function of text-to-speech convert execution
Media resource processing arrangements if generation is unusual, return concrete error code to media resource controlling device in carrying out literary composition language transfer process.The occurrence of error code is distributed unitedly by normal structure, and content comprises:
(1) word that can not discern or word;
(2) unpronounceable word;
(3) the text string file does not exist;
(4) text string file read error;
(5) parameter is not supported or mistake;
(6) control of literary composition language conversion is not supported or mistake;
(7) media resource processing arrangements hardware error;
(8) media resource processing arrangements software error;
(9) other mistake.
2, the description result's who returns after the literary composition language conversion normal termination parameter
During literary composition language conversion normal termination, can return following information:
(1) literary composition language transfer process normal termination;
(2) user imports and triggers literary composition language conversion termination: the user imports abort key, and the user imports DTMF, user input voice.
(3) statistical information: to the literary composition language converting speech duration of user's broadcast.
Specific as follows:
Incident:
(1) TTS carries out failure (TTS Failure)
Event name (Event Name): TTS carries out failure
Event identifier (EventID): ttsfail (0x?)
Illustrate: failure is carried out in the conversion of literary composition language, returns error code
Event Description parameter (EventDescriptor Parameters) does not have
Detected event argument (ObservedEventDescriptor parameters) comprising:
I.
Parameter name: mistake return code (Error Return Code)
Parameter identification: erc (0x?)
Illustrate: the error code parameter
Parameter type: integer
Whether optional: not
Probable value: the error code of above scheme definition
Default value: do not have
(2) TTS complete (TTS Success)
The incident title: TTS is complete
Event identifier: ttssuss (0x?)
Illustrate: the conversion of literary composition language is complete, return results
Event Description parameter: do not have
Detected event argument (ObservedEventDescriptor parameters) comprising:
I.
Parameter name: finish reason (End Cause)
Parameter identification: ec (0x?)
Illustrate: the reason that triggers literary composition language EOC
Type: integer
Whether optional: as to be
Probable value: convert, the user imports DTMF, user input voice
Default value: do not have
II.
Parameter name: TTS time (TTS Time)
Parameter identification: tt (0x?)
Illustrate: the duration of carrying out the conversion of literary composition language
Type: integer
Whether optional: as to be
Probable value: greater than 0 second
Default value: do not have
Step 8, media resource controlling device 3 feeds back to media resource processing arrangements 4 with acknowledge message, literary composition language EOC.
By method provided by the invention, can in using, the media resource of mobile or fixed network provide the literary composition language service application that conversion is correlated with to the user, as being changed into sound, the content on the webpage reads to listen to the user.Simultaneously, when revising, only need revise text, not need to record again, and can play more personalized prompt tone according to user's request.
Be understandable that the present invention is not limited to the above embodiments, those skilled in the art can change accordingly or modify on understanding basis of the present invention.For example, media resource controlling device 3 can be simultaneously sent indication in above-mentioned steps 1 and the step 3 to media resource processing arrangements 4, and media resource processing arrangements 4 operation in execution in step 2 and the step 4 simultaneously.

Claims (18)

1. a method that realizes function of text-to-speech convert is characterized in that, media resource controlling device is passed through H.248 agreement, and control media resource processing arrangements realization literary composition language is changed, and this method may further comprise the steps:
Step 1, the H.248 message of carrying literary composition language conversion indication that media resource processing arrangements receiving media resource control appliance sends, carry in the described H.248 message by the expanding packet parameter of protocol extension package definition H.248, indicate described media resource processing arrangements to carry out and the corresponding literary composition language of this parameter conversion process; And
Step 2, media resource processing arrangements is carried out literary composition language conversion process according to the literary composition language of the parameter call in above-mentioned message transducer, and with the literary composition transformation result feedback media resource controlling device of speaking;
Carry the relevant information of text string in this expanding packet parameter, media resource processing arrangements calls literary composition language transducer and carries out the conversion of literary composition language according to the relevant information of text string;
The relevant information of above-mentioned text string is a text string itself, and H.248 it as being embedded in the message by the character string of orthoepy, after media resource processing arrangements receives text string, directly extracts text string and calls literary composition language transducer and carry out literary composition language and change.
2. the method for claim 1, it is characterized in that, when text string is stored on media resource processing arrangements or the external server in advance, the relevant information of above-mentioned text string is to comprise the sign of text string and the text of stored position information, after media resource processing arrangements receives above-mentioned text, according to stored position information wherein, from this locality or external server read text string and put into buffer memory, and call literary composition language transducer and carry out literary composition language and change.
3. the method for claim 1, it is characterized in that, the relevant information of above-mentioned text string comprises the text of text string and another text string, text file comprises the sign and the stored position information of this another text string, the sign of text file and text string are combined into the continuous text string, and before text sign, increase keyword and identify this and be combined as the text that pronounces, after media resource processing arrangements receives this combination, at first from this locality or external server read text string, put into buffer memory after itself and the pronunciation text string that H.248 carries in the message be connected in series, call literary composition language transducer then and carry out literary composition language and change.
4. the method for claim 1, it is characterized in that, the relevant information of above-mentioned text string comprises the combination of a text string and a recording file, and before text string, increase keyword and identify this and be combined as a voice document, after media resource processing arrangements receives this combination, at first call literary composition language transducer text string is carried out the conversion of literary composition language, voice and the recording file with literary composition language conversion back output makes up a voice snippet then.
5. the method for claim 1, it is characterized in that, the relevant information of above-mentioned text string comprises the combination of a text and a recording file, text file comprises the sign and the stored position information of this another text string, and before this sign, increase keyword and identify this and be combined as a voice document, after media resource processing arrangements receives this combination, at first according to stored position information from this locality or external server read text string and put into buffer memory, call literary composition language transducer then the text string that reads is carried out the conversion of literary composition language, and the voice and the recording file of literary composition language conversion back output made up a voice snippet.
6. the method for claim 1, it is characterized in that, H.248, this further carries the relevant parameter that the voice attributes of output changed in the literary composition language in message, this relevant parameter comprises: pronunciation category of language, pronunciation sex, pronunciation age, pause, media resource processing arrangements receive calls literary composition language transducer behind this relevant parameter and for the voice of output corresponding attribute is set.
7. method as claimed in claim 6, it is characterized in that, H.248, this further carries the relevant parameter that the voice attributes of output changed in the literary composition language in message, this relevant parameter comprises at least one in the following parameter: the articulation type of the rate of articulation, pronunciation volume, audio tone, special character, read, whether end the conversion of literary composition language again when the user imports, media resource processing arrangements receives and calls literary composition language transducer behind this relevant parameter and for the voice of output corresponding attribute is set.
8. as any one described method of claim 1 to 7, it is characterized in that media resource processing arrangements calls in the literary composition language transducer execution literary composition language transfer process in step 2, further comprises:
Step 21, media resource controlling device indication media resource processing arrangements detects the anomalous event that takes place in the speech recognition process.
9. method as claimed in claim 8 is characterized in that, when detecting anomalous event, media resource processing arrangements will represent that anomalous event corresponding error sign indicating number feeds back to media resource controlling device.
10. method as claimed in claim 8 is characterized in that, media resource processing arrangements calls in the literary composition language transducer execution literary composition language transfer process in step 2, further comprises:
Step 22, media resource controlling device is controlled literary composition language transfer process.
11. method as claimed in claim 10 is characterized in that, media resource controlling device comprises the control of literary composition language transfer process and temporarily stops the user being play voice after the conversion of literary composition language.
12. method as claimed in claim 11 is characterized in that, media resource controlling device further comprises from above-mentioned halted state the control of literary composition language transfer process and returns to broadcast state.
13. method as claimed in claim 10, it is characterized in that, media resource controlling device comprises making to the control of literary composition language transfer process plays F.F. or rewind down, this F.F. comprises the some words of F.F., sentence or paragraph, perhaps F.F. is some seconds, this rewind down comprises the some words of rewind down, sentence or paragraph, and perhaps rewind down is some seconds.
14. method as claimed in claim 10 is characterized in that, media resource controlling device comprises that to the control of literary composition language transfer process restarting the literary composition language changes.
15. method as claimed in claim 10 is characterized in that, media resource controlling device comprises that to the control of literary composition language transfer process the user ends the conversion of literary composition language.
16. method as claimed in claim 10 is characterized in that, media resource controlling device comprises the current sentence of repeat playing, paragraph or full text to the control of literary composition language transfer process.
17. method as claimed in claim 16 is characterized in that, media resource controlling device comprises further that to the control of literary composition language transfer process cancellation is to current sentence, paragraph or repeat playing in full.
18. method as claimed in claim 10 is characterized in that, media resource controlling device comprises the control of literary composition language transfer process and stops the user being play voice after the conversion of literary composition language.
CN2007101530700A 2005-10-21 2005-10-21 Method for implementing Text-To-Speech function Expired - Fee Related CN101222542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007101530700A CN101222542B (en) 2005-10-21 2005-10-21 Method for implementing Text-To-Speech function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007101530700A CN101222542B (en) 2005-10-21 2005-10-21 Method for implementing Text-To-Speech function

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101142778A Division CN100487788C (en) 2005-10-21 2005-10-21 A method to realize the function of text-to-speech convert

Publications (2)

Publication Number Publication Date
CN101222542A CN101222542A (en) 2008-07-16
CN101222542B true CN101222542B (en) 2011-09-14

Family

ID=39632106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007101530700A Expired - Fee Related CN101222542B (en) 2005-10-21 2005-10-21 Method for implementing Text-To-Speech function

Country Status (1)

Country Link
CN (1) CN101222542B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013187610A1 (en) * 2012-06-15 2013-12-19 Samsung Electronics Co., Ltd. Terminal apparatus and control method thereof
US10429823B2 (en) * 2014-12-29 2019-10-01 Abb Schweiz Ag Method for identifying a sequence of events associated with a condition in a process plant
CN105827516B (en) * 2016-05-09 2019-06-21 腾讯科技(深圳)有限公司 Message treatment method and device
CN107770382A (en) * 2017-10-30 2018-03-06 江西博瑞彤芸科技有限公司 The method for playing text information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1516846A (en) * 2002-04-11 2004-07-28 株式会社Ntt都科摩 Service providing system and service providing method
CN1575574A (en) * 2000-12-28 2005-02-02 英特尔公司 Enhanced media gateway control protocol

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1575574A (en) * 2000-12-28 2005-02-02 英特尔公司 Enhanced media gateway control protocol
CN1516846A (en) * 2002-04-11 2004-07-28 株式会社Ntt都科摩 Service providing system and service providing method

Also Published As

Publication number Publication date
CN101222542A (en) 2008-07-16

Similar Documents

Publication Publication Date Title
CN100487788C (en) A method to realize the function of text-to-speech convert
US7194071B2 (en) Enhanced media gateway control protocol
US7657563B2 (en) System, method and storage medium for providing a multimedia contents service based on user&#39;s preferences
JP3936718B2 (en) System and method for accessing Internet content
TWI249729B (en) Voice browser dialog enabler for a communication system
CN1145927C (en) Speech interface for simultaneous use of facility and application
JP2003520983A5 (en)
EP1311102A1 (en) Streaming audio under voice control
CN101322408B (en) Triggerless interactive television
CA2537741A1 (en) Dynamic video generation in interactive voice response systems
EP2273754A2 (en) A conversational portal for providing conversational browsing and multimedia broadcast on demand
US6724864B1 (en) Active prompts
US8005199B2 (en) Intelligent media stream recovery
CN1329739A (en) Voice control of a user interface to service applications
CN101222542B (en) Method for implementing Text-To-Speech function
CN109243450A (en) A kind of audio recognition method and system of interactive mode
CN100426377C (en) A method for speech recognition
CN111629110A (en) Voice interaction method and voice interaction system
CN1953447B (en) A method for processing media resource
US20230130386A1 (en) Audio Assistance During Trick Play Operations
CN113114860B (en) Web-based audio and video response system and use method thereof
KR20090075334A (en) The technology and method for interactive voice and video response platform and services on 3g cdma networks
CN101222541A (en) Method for implementing speech recognition function
JP4082249B2 (en) Content distribution system
AU2003274048A1 (en) Text-to-speech streaming via a network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110914

Termination date: 20121021