Background technology
Media server is used for all media handlings related to audio frequency and video, including video and audio RTP data are flowed to and regarded
The mutual conversion of audio file.Meanwhile, also be responsible for receive user by the DTMF inputs of terminal, the guiding voice of play service,
Show dynamic guide picture.Session Initiation Protocol and MSML/MOML abilities that it has enable it to the control in application server APP
System is lower being interacted with user of completing whole conversation procedure.
Media control unit (MSCU) is a significant element in media server, main to complete to carry out with other entities
There is provided resource management in itself, the function of safeguarding and control other service resources units to complete complicated business for capability negotiation.
Media storage transmission audio unit (MSTU-audio) is the service resources unit in media server, completes magnanimity
Voice data storage, including realize audio frequency document display function.There is external network interface in media storage unit, can directly pass through
External network interface transmitting-receiving on unit.
Media storage transmission video unit (MSTU-video) is the service resources unit in media server, completes magnanimity
Multimedia audio-video data storage, including realize video file playing function.There is external network interface in media storage unit, can be with
Directly received and dispatched by the external network interface on unit.
Now, the use that media server is broadcast is very wide.Audio and video playing can be mainly summarized as, is collected the digits and the work(such as meeting
Energy.
Function from Text To Speech (Text To Speech, referred to as TTS) is to identify the text message of input
Come, be converted into voice messaging, voice medium is sent to user.At present in field of telecommunications, TTS application is substantially configuration one
Special TTS engine, specifies TTS that voice is sent into user terminal to complete a business by signaling.
Fig. 1 is the system structure diagram for realizing TTS audio transcodings according to correlation technique.As shown in figure 1, the system
Workflow comprises the following steps:
Step 101:Terminal initiates call, activates APP business.APP initiates operation flow to media server;
Step 102:APP asks TTS business by SIP signalings to media server;
Step 103:Media server asks TTS resources by SIP signalings to TTS engine, and passes through MRCP agreement controls
TTS engine finishing service function processed;
Step 104:TTS engine sends media to terminal
It is current typical networking and operation flow above.TTS engine makes as the external device of media server
With.APP is simply initiated when requested service to media server, and media server judges type of service, works as type of service
When being applied for TTS, media server initiates to ask again to TTS engine, applies for resource, and controls the behavior of TTS engine,
TTS engine automatic terminal that media are sent to a distant place after signaling is received.
Above flow can complete a basic TTS business.But as the extension of the application of business occurs in that some are asked
Topic.Such as, the audio capability collection of TTS engine causes service fail with the unmatched problem of media server capability set.Because
APP is when same media server agreement SDP, and media server is not aware that whether type of service is TTS, so can be according to
The limit of power of oneself consults audio frequency parameter with terminal.When APP issues INFO instructions to media server, media server is
Can recognize that TTS types of service, now media server by terminal SDP information to TTS engine application resource.If TTS
The audio capability scope of server is unsatisfactory for the result that media server is negotiated with terminal, exactly causes service fail.Such as:
Media server negotiates code/decode type for G726 forms with terminal, but TTS engine only supports G711 audio lattice
Formula..
The business demand of media server can not be met for above-mentioned in the audio capability collection of TTS engine in the prior art
In the case of, the problem of terminal access media service data bag data fails, there is presently no effective solution.
The content of the invention
It is a primary object of the present invention to provide it is a kind of realize the audio code-transferring method from Text To Speech TTS, device and
System, the feelings of the business demand of media server can not be met with solution in the audio capability collection of TTS engine in the prior art
Under condition, the problem of terminal access media service data bag data fails.
To achieve these goals, there is provided a kind of sound realized from Text To Speech TTS according to an aspect of the present invention
Frequency code-transferring method.
Realize that the method for TTS audio transcodings includes according to the present invention:Media server, which is received, comes from application server APP
Access request, and determine media server support code/decode type collection;Media server receives the TTS business of APP applications
Request, and the media service data bag of the type of service is met according to TTS types of service to TTS engine application;Media services
Device is held consultation according to code/decode type collection and TTS engine, to obtain the audio coding decoding type after consulting, and according to audio
Code/decode type will be sent to terminal after media service data bag transcoding.
Further, media server is held consultation according to code/decode type collection and TTS engine, to obtain after negotiation
Audio coding decoding type, and will send to terminal and include after media service data bag transcoding according to audio coding decoding type:Media
Control unit MSCU sends session initiation protocol SIP signalings to TTS engine, is taken with consulting simultaneously designated media server with TTS
The audio coding decoding type of business device matching, type of coding collection includes audio coding decoding type;Speech centre crosspoint MRU is received
The media service data bag that TTS engine is returned, and media service data bag is carried out according to the audio coding decoding type of negotiation
Transcoding, and the media service data bag after transcoding is sent and preserved to media storage transmission audio unit MSTU;MSCU is controlled
MSTU sends the media service data bag after transcoding to terminal.
Further, before the media service data bag that heart crosspoint MRU receptions TTS engine is returned in voice,
Method also includes:MSCU sets up with TTS engine and communicated to connect;TTS engine recognizes text, and converts text to media sector
Business packet.
Further, before the media service data bag that heart crosspoint MRU receptions TTS engine is returned in voice,
Method also includes:MSCU issues transcoding order to MRU;The port type that MRU and TTS engine are connected is appointed as after consulting
Audio coding decoding type.
Further, MSCU, which controls MSTU to send the media service data bag after transcoding to terminal, includes:MSCU to
MSTU issues the order for opening NAT passages;MSTU is sent to terminal after the media service data bag after transcoding is carried out into NAT.
Further, before media server receives the access request from application server APP, method also includes:
Terminal sends multimedia service data bag to APP and asked;APP asks to send to media server according to multimedia service data bag
The signaling of access request, and port address outside MSTU is used as to the address with terminal interaction.
To achieve these goals, realized according to another aspect of the present invention there is provided one kind from Text To Speech TTS
Audio trans-coding system.
Realize that the system of TTS audio transcodings includes according to the present invention:Terminal;TTS engine;Media server, is used for
The access request from application server APP is received, to determine the code/decode type collection of media server support, and APP is received
The TTS service requests of application, to meet the media business number of the type of service to TTS engine application according to TTS types of service
According to bag, then held consultation according to code/decode type collection and TTS engine, to obtain the audio coding decoding type after consulting, and
It will be sent according to audio coding decoding type after media service data bag transcoding to terminal.
Further, media server includes:Media control unit MSCU, for sending session initiation protocol SIP signalings
To TTS engine, with the audio coding decoding type consulted and designated media server is matched with TTS engine, type of coding collection
Including audio coding decoding type;Speech centre crosspoint MRU, the media service data bag for receiving TTS engine return,
And media service data bag is subjected to transcoding according to the audio coding decoding type of negotiation, and by the media service data bag after transcoding
Send and preserve to media storage transmission audio unit MSTU;Wherein, MSCU controls MSTU by the media service data after transcoding
Bag is sent to terminal.
Further, terminal sends multimedia service data bag to APP and asked;APP please according to multimedia service data bag
The signaling that access request is sent to media server is sought, and port address outside MSTU is used as to the address with terminal interaction.
To achieve these goals, realized according to another aspect of the present invention there is provided one kind from Text To Speech TTS
Audio transcoding device.
Realize that the device of TTS audio transcodings includes according to the present invention:First processing module, self-application clothes are carried out for receiving
Business device APP access request, and determine the code/decode type collection that media server is supported;Second processing module, for receiving APP
The TTS service requests of application, and the media business number of the type of service is met according to TTS types of service to TTS engine application
According to bag;3rd processing module, for being held consultation according to code/decode type collection and TTS engine, to obtain the audio after consulting
Code/decode type, and will be sent according to audio coding decoding type after media service data bag transcoding to terminal.
By the present invention, the access request from application server APP is received using media server, and determine that media take
The code/decode type collection that business device is supported;Media server receive APP application TTS service requests, and according to TTS types of service to
TTS engine application meets the media service data bag of the type of service;Media server takes according to code/decode type collection and TTS
Business device is held consultation, to obtain the audio coding decoding type after consulting, and according to audio coding decoding type by media service data
Sent after bag transcoding to terminal, media server can not be met in the audio capability collection of TTS engine in the prior art by solving
Business demand in the case of, the problem of terminal access media service data bag data fails, so reached raising terminal visit
Ask the effect of media service data bag data success rate.
Embodiment
In order that technical problems, technical solutions and advantages to be solved are clearer, clear, tie below
Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only
To explain the present invention, it is not intended to limit the present invention.
The invention provides a kind of system for realizing TTS audio transcodings.Fig. 2 is according to embodiments of the present invention to realize TTS
The system structure diagram of audio transcoding, as shown in Fig. 2 the system includes:Terminal;TTS engine;Media server, is used for
The access request from application server APP is received, to determine the code/decode type collection of media server support, and APP is received
The TTS service requests of application, to meet the media business number of the type of service to TTS engine application according to TTS types of service
According to bag, then held consultation according to code/decode type collection and TTS engine, to obtain the audio coding decoding type after consulting, and
It will be sent according to audio coding decoding type after media service data bag transcoding to terminal.
Above-described embodiment realizes that when to TTS engine application resource Session Description Protocol SDP is not by media server
Using the audio coding decoding of terminal as the capability set of negotiation, and the code/decode format that media server has all been supported is used as ability
Collection, after consulting successfully in the media server, media are sent to inside media server by TTS engine, then media services
After device is by transcoding, the form needed according to terminal is sent, so that the audio capability collection solved in TTS engine can not
In the case of the business demand for meeting media server, the problem of terminal access media service data bag data fails, Jin Erda
The effect for improving terminal access media service data bag data success rate is arrived.
Fig. 3 is the detailed construction schematic diagram according to media server in embodiment illustrated in fig. 2.As shown in figure 3, in the application
Stating the media server in embodiment can include:Media control unit MSCU, for sending session initiation protocol SIP signalings extremely
TTS engine, with the audio coding decoding type consulted and designated media server is matched with TTS engine, type of coding Ji Bao
Include audio coding decoding type;Speech centre crosspoint MRU, the media service data bag for receiving TTS engine return, and
Media service data bag is subjected to transcoding according to the audio coding decoding type of negotiation, and the media service data bag after transcoding is sent out
Send and preserve to media storage transmission audio unit MSTU;Wherein, MSCU controls MSTU by the media service data bag after transcoding
Send to terminal.
Preferably, in above-described embodiment, terminal can send multimedia service data bag to APP and ask;APP is according to many matchmakers
Body business data packet ask to media server send access request signaling, and using port address outside MSTU as with terminal interaction
Address.
Specifically, as shown in Fig. 2 the detailed operation flow of the system comprises the following steps:
Step S10, terminal asks multimedia service data bag to APP, and APP sends INVITE signalings to media server and entered
Row media negotiation, media server by the capability set of itself select code/decode type, and using port address outside MSTU as with end
Hold the address of interaction.
It is application TTS business that step S20, APP server sends the content in INFO requests, INFO to media server,
Meanwhile, transfer after media server identification type of service to TTS engine application media service data bag data;
Step S30, media server is held consultation with TTS engine, and controls TTS engine progress text to be converted to language
Sound.As shown in figure 3, step S30 specifically may include steps of:
Step S301, media control unit MSCU initiate session initiation protocol SIP signalings to TTS engine, consult to compile solution
Code type.What the audio coding decoding capability set now consulted in INVITE signalings was possessed by media server, as MRU branch
All code/decode types held, and require that media bag is sent to media server by TTS engine, and received by MRU.
Step S302, MSCU issue the order for opening NAT passages to MSTU, indicate that the data that will be received from MRU are sent
To terminal (user terminal).
Step S303, MSCU issue transcoding order to MRU.MRU is specified to receive the media bag sended over from TTS, and will
The result that MRU is appointed as consulting to obtain from step S301 with the portable audio code/decode type that TTS is connected, and MRU is exported
Media code/decode type be set to terminal and be actually needed receive code/decode type.
Step S304, MSCU set up TCP/IP links with TTS.And instruction is sent to TTS engine by MRCP agreements, refer to
Show that TTS engine recognizes text, and the media service data bag after conversion is sent to MRU end.
Media service data bag is sent to MRU receiving port by step S305, TTS engine.
Step S306, MRU will terminate the media progress transcoding received from TTS, and the audio frequency media after transcoding is sent to
Media storage transmits audio unit MSTU receiving ports;
Step S307, MSTU are received after MRU audio pack, directly will audio pack carry out NAT after be sent to terminal.
Several steps more than, terminal just can receive the audio stream be converted to by text.
Finally, media server reports INFO implementing results to APP, while APP sends BYE signalings to media server,
Discharge resource.In media server to TTS engine request release resource, the backward APP returning results of success, now call is tied
Beam.
Fig. 4 is the method flow diagram for realizing TTS audio transcodings according to embodiments of the present invention.As shown in figure 4, the realization
The method of TTS audio transcodings comprises the following steps:
Step S41, media server receives the access request from application server APP, and determines media server branch
The code/decode type collection held;
Step S43, media server receives the TTS service requests of APP applications, and according to TTS types of service to TTS service
Device application meets the media service data bag of the type of service;
Step S45, media server is held consultation according to code/decode type collection and TTS engine, to obtain after negotiation
Audio coding decoding type, and will be sent according to audio coding decoding type after media service data bag transcoding to terminal.
In above-described embodiment, media server is by the way that when to TTS engine application resource, the SDP in the embodiment is not
Using the audio coding decoding of terminal as the capability set of negotiation, and the code/decode format that media server has all been supported is used as ability
Collection.After consulting successfully, media are sent to inside media server by TTS engine, after then media server is by transcoding,
The form needed according to terminal is sent, so that the audio capability collection solved in TTS engine can not meet media services
In the case of the business demand of device, the problem of terminal access media service data bag data fails, and then reached raising terminal
Access the effect of media service data bag data success rate.
In above-described embodiment, step S45 media services are held consultation according to code/decode type collection and TTS engine, to obtain
The audio coding decoding type after consulting is taken, and will be sent according to audio coding decoding type after media service data bag transcoding to terminal
The step of can include:Media control unit MSCU sends session initiation protocol SIP signalings to TTS engine, to consult and refer to
Determine the audio coding decoding type that media server is matched with TTS engine, type of coding collection includes audio coding decoding type;Voice
Center crosspoint MRU receives the media service data bag that TTS engine is returned, and by media service data bag according to negotiation
Audio coding decoding type carries out transcoding, and the media service data bag after transcoding is sent and preserved to media storage transmission audio
Unit MSTU;MSCU controls MSTU sends the media service data bag after transcoding to terminal.
Preferably, in above-described embodiment, heart crosspoint MRU receives the media business that TTS engine is returned in voice
Before packet, method also includes:MSCU sets up with TTS engine and communicated to connect;TTS engine recognizes text, and by text
Be converted to media service data bag.
Preferably, in above-described embodiment, heart crosspoint MRU receives the media business that TTS engine is returned in voice
Before packet, method also includes:MSCU issues transcoding order to MRU;The MRU port types connected with TTS engine are referred to
It is set to the audio coding decoding type after consulting.
In each above-mentioned embodiment of the present invention, MSCU controls MSTU sends the media service data bag after transcoding to terminal
The step of can include:MSCU issues the order for opening NAT passages to MSTU;MSTU enters the media service data bag after transcoding
Sent after row NAT to terminal.
Preferably, before media server receives the access request from application server APP, method also includes:Eventually
Hold to APP and send the request of multimedia service data bag;APP is asked to send to media server and visited according to multimedia service data bag
The signaling of request is asked, and port address outside MSTU is used as to the address with terminal interaction.
Present invention also offers a kind of device for realizing TTS audio transcodings.Fig. 5 is realization according to embodiments of the present invention
The apparatus structure schematic diagram of TTS audio transcodings, as shown in figure 5, this realizes that the device of TTS audio transcodings includes:First processing mould
Block 101, the processing module 105 of Second processing module 103 and the 3rd.
Wherein, first processing module 101, for receiving the access request from application server APP, and determine that media take
The code/decode type collection that business device is supported;Second processing module 103, the TTS service requests for receiving APP applications, and according to TTS
Type of service meets the media service data bag of the type of service to TTS engine application;3rd processing module 105, for root
Hold consultation, solved with obtaining the audio coding decoding type after consulting, and being compiled according to audio according to code/decode type collection and TTS engine
Code type will be sent to terminal after media service data bag transcoding.
The device can be a kind of media server, and the media server is when to TTS engine application resource, and SDP is not
Using the audio coding decoding of terminal as the capability set of negotiation, and the code/decode format that media server has all been supported is used as ability
Collection.After consulting successfully, media are sent to inside media server by TTS engine, after then media server is by transcoding,
The form needed according to terminal is sent.
It should be noted that the embodiment of the present invention can be in such as one group computer the step of the flow of accompanying drawing is illustrated
Performed in the computer system of executable instruction, and, although logical order is shown in flow charts, but in some situations
Under, can be with the step shown or described by being performed different from order herein.
In embodiment description more than, it can be seen that the present invention realizes following technique effect:Improve terminal access matchmaker
The effect of body business datum bag data success rate.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and constituted
Network on, alternatively, the program code that they can be can perform with computing device be realized, it is thus possible to they are stored
Performed in the storage device by computing device, either they are fabricated to respectively multiple integrated circuit modules or by they
In multiple modules or step single integrated circuit module is fabricated to realize.So, the present invention is not restricted to any specific
Hardware and software is combined.
A preferred embodiment of the present invention has shown and described in described above, but as previously described, it should be understood that the present invention
Be not limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and available for various other combinations,
Modification and environment, and above-mentioned teaching or the technology or knowledge of association area can be passed through in invention contemplated scope described herein
It is modified., then all should be in this hair and the change and change that those skilled in the art are carried out do not depart from the spirit and scope of the present invention
In the protection domain of bright appended claims.