WO2015196823A1 - Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur - Google Patents
Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur Download PDFInfo
- Publication number
- WO2015196823A1 WO2015196823A1 PCT/CN2015/073051 CN2015073051W WO2015196823A1 WO 2015196823 A1 WO2015196823 A1 WO 2015196823A1 CN 2015073051 W CN2015073051 W CN 2015073051W WO 2015196823 A1 WO2015196823 A1 WO 2015196823A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tts
- server
- text information
- media
- service
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
Definitions
- the present invention relates to the field of communications, and in particular, to a method, apparatus, and server for implementing cyclic playback from text to voice services.
- MS Media Server
- SIP Session Initiation Protocol
- MSML Media Sessions Markup Language
- MOML Media Objects Markup Language
- the Media Control Unit is an important unit in the media server. It mainly performs capability negotiation with other entities to provide management, maintenance, and control of other service resource units to complete complex services.
- the Media Storage Transmit Unit-audio is a service resource unit in the media server, which performs massive audio data storage, including audio file playback.
- the external storage port on the media storage unit can be directly sent and received through the external network port on the unit.
- the Media Processing Unit mainly performs media codec conversion, number collection, and conference mixing functions.
- TTS Text To SPEECH
- the application of TTS is basically to configure a special TTS server to specify the TTS to send audio to the client to complete a service.
- FIG. 1 is a schematic structural diagram of a system for implementing a TTS cyclic play service according to the related art. As shown in Figure 1, the workflow of the system includes the following steps:
- Step S101 The terminal initiates a call to activate the service of the APP server.
- the APP server initiates a business process to the media server;
- Step S102 The APP server requests the media server to complete N times of TTS service by sending N times of SIP signaling.
- Step S103 The media server requests TTS resources from the TTS server through SIP signaling, and controls the TTS server to complete the service function by using the MRCP protocol;
- Step S104 The TTS server sends the media package to the terminal through the media server, and the TTS server reports the information such as the playback duration to the media server.
- the TTS server is used as a peripheral device of the media server.
- the APP server requests the service, it only initiates to the media server.
- the media server determines the service type.
- the service type is TTS application
- the media server initiates a request to the TTS server, requests resources, and controls the behavior of the TTS server, and automatically adopts the MRCP protocol.
- the text is recognized as audio sent to the media server.
- TTS INFO
- the embodiment of the present invention provides a method, an apparatus, and a server for implementing a loop-to-play of text-to-speech services, so as to reduce the complexity of processing the internal media resources by the media server to support the TTS service loop playback.
- An embodiment of the present invention provides a method for implementing a cyclic playback of a text-to-speech TTS service for a media server, including:
- the media channel is maintained and interacts with the TTS server, so that the TTS server can complete another TTS service for the text information by using the media channel.
- the method further includes:
- TTS service request message sent by the application server for the text information, where the TTS service request message carries the number of times of the loop play NUM;
- the number of loop plays NUM is parsed from the TTS service request message.
- the method further includes:
- the media channel is opened upon receiving a TTS service request message for the text information sent by the application server.
- the method further includes:
- the media channel is closed, and the application server is notified that the NUM loop playback for the text information is completed.
- the codec type corresponding to the media channel is determined by the media server according to a set of codec types supported by the media server, and negotiated with the TTS server.
- a device for implementing a cyclic playback of a text-to-speech TTS service for a media server comprising:
- the judging module is configured to: when the TTS service of the text information is completed by using the media channel of the media server from the text-to-speech TTS server, determine whether the TTS server completes the TTS service for the text information The number of times of the text information is NUM, and the judgment result is obtained;
- An interaction module configured to: when the determination result is no, maintain the media channel, and The TTS server interacts such that the TTS server can utilize the media channel to complete another TTS service for the text message.
- the device further includes:
- the receiving module is configured to: after the determining module determines whether the number of times the TTS service of the text information is completed by the TTS server reaches the number 5% of the loop information of the text information, and receives the text information sent by the application server a TTS service request message, where the TTS service request message carries the number of times of the loop play NUM;
- the parsing module is configured to: after the judging module determines whether the number of times the TTS server completes the TTS service for the text information reaches the number 5% of the loop information of the text information, parses the TTS service request message from the TTS service request message The number of loops played is NUM.
- the device further includes:
- the module is opened, and is configured to: when receiving the TTS service request message sent by the application server for the text information, open the media channel.
- the device further includes:
- the closing and notification module is configured to: when the determination result is yes, close the media channel, and notify the application server that the NUM loop playback for the text information is completed.
- the interaction module is further configured to: determine, according to the codec type set supported by the media server, the codec type corresponding to the media channel by using the TTS server.
- An embodiment of the present invention further provides a server, including the apparatus for implementing cyclic playback from text to voice TTS services as described above.
- the embodiment of the invention further provides a computer readable storage medium storing program instructions, which can be implemented when the program instructions are executed.
- the TTS service is completed by utilizing the media channel used by the TTS service to avoid the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and the corresponding signaling interaction. Therefore, the pressure on the media server to process resources and signaling is reduced, and the performance of the media server in performing the TTS service is improved.
- FIG. 1 is a schematic structural diagram of a general process for implementing a TTS cyclic play service according to the related art
- FIG. 2 is a flow chart showing the steps of a method for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention
- FIG. 3 is a schematic diagram showing the interaction structure between a media server and each module according to a preferred embodiment of the present invention
- FIG. 4 is a schematic diagram of timings of exchanging signaling between a media server and each module according to a preferred embodiment of the present invention
- FIG. 5 is a structural block diagram of an apparatus for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention.
- FIG. 2 is a flow chart showing the steps of a method for implementing a loop-to-speech playback of a text-to-speech TTS service according to an embodiment of the present invention.
- an embodiment of the present invention provides a method for implementing a cyclic playback of a text-to-speech TTS service. Including the following steps:
- Step 201 When the TTS server uses the media channel of the media server to complete a TTS service for the text information, determine whether the number of times the TTS server completes the TTS service of the text information reaches the looping of the text information. The number of times NUM, the judgment result is obtained;
- Step 202 When the determination result is no, the media channel is maintained and interacts with the TTS server, so that the TTS server can complete another TTS service for the text information by using the media channel.
- the method is for a media server.
- the media server when the TTS server completes the TTS service by using the media channel, if the number of times of the TTS service does not reach the number of times of the loop playback, the media server does not close the media channel, but uses the media channel to complete another TTS service. Thereby, the closing and reopening of the media channel is avoided, thereby reducing the number of establishment and release of internal resources of the media server and the corresponding signaling interaction, thereby reducing the pressure on the media server to process resources and signaling, and improving the media server. Performance when performing TTS services.
- the method may further include:
- TTS service request message sent by the application server for the text information, where the TTS service request message carries the number of times of the loop play NUM;
- the number of loop plays NUM is parsed from the TTS service request message.
- the media channel is opened upon receiving a TTS service request message for the text information sent by the application server.
- the media channel is closed, and the APP server is notified that the NUM loop playback for the text information is completed.
- the codec type corresponding to the media channel may be determined by the media server according to the codec type set supported by the media server, and negotiated with the TTS server.
- the present invention provides a method, a device, and a system for implementing a cyclic playback of a text-to-speech TTS service, in order to solve the disadvantages of the TTS service complex process for identifying text multiple times in the related art.
- the media server handles the problem that the TTS looping service has a high failure rate and low performance.
- a method for implementing a TTS loop play service including:
- the media server receives an access request from the application APP server and determines a set of codec types supported by the media server;
- the media server receives the TTS service request applied by the APP server, and applies for the service resource to the TTS server according to the TTS service type;
- the media server parses the INFO (TTS) field to obtain the number of loops N.
- TTS INFO
- the media server does not release the local resource, maintains the media link with the TTS server, and performs the next MRCP negotiation to identify the text information. And determining the number of round-robin requests according to N, and finally the TTS server sends the N-time audio playback converted into text recognition to the terminal through the media service (only one application is required).
- a system for implementing a TTS loop playback service from text to voice includes:
- the first processing module is configured to: receive an access request from the APP server, and determine a set of codec types supported by the media server;
- the second processing module is configured to: receive the TTS service request of the APP application, and apply for the TTS service resource according to the TTS service type, and determine the number of times of the loop play;
- the third processing module is configured to: negotiate with the TTS server according to the codec type set to obtain the negotiated audio codec type, and send the media service data packet to the terminal server through the media server according to the audio codec type.
- FIG. 3 is a schematic diagram of a switch structure between a module inside the media server and an APP server, a TTS server, and a terminal server according to the embodiment.
- the media control unit MSCU is configured to send a session initial protocol SIP, MRCP signaling to TTS. Server, SIP negotiation to negotiate and specify the audio codec type that the media server matches the TTS server, the MRCP signaling interaction controls the TTS server to recognize the text, and the content is played; the voice center switching unit MRU is set to receive the TTS server data packet, and The media service data packet is sent to the media storage transport audio unit MSTU; wherein the MSCU controls the MSTU to send the media service data packet to the terminal.
- SIP session initial protocol
- MRCP signaling to negotiate and specify the audio codec type that the media server matches the TTS server
- the MRCP signaling interaction controls the TTS server to recognize the text, and the content is played
- the voice center switching unit MRU is set to receive the TTS server data packet
- FIG. 4 is a schematic diagram of the timing of the exchange of the media server and the module.
- the detailed workflow is as follows:
- Step S410 The S420APP server sends an INVITE signaling to the media server for media negotiation, and the media server selects a codec type by using its own capability set, and uses an MSTU external port address as an address to interact with the terminal; the APP server sends the media server to the media server.
- Sending an INFO request the content in the INFO is the application TTS service, and the media server parses the field played in the INFO to be N, and saves all the information;
- Step S430 the media server negotiates with the TTS server, and controls the TTS server to convert the text into voice.
- the step S430 may include the following steps:
- Step S4301 The media control unit MSCU initiates a session initial protocol SIP signaling to the TTS server to negotiate a codec type.
- the audio codec capability set negotiated in the INVITE signaling is owned by the media server, that is, all codec types supported by the MRU;
- Step S4302 the TTS server returns an INVITE message 200OK, and notifies the media server of the negotiated audio codec type;
- Step S4303 The MSCU applies for the media server side MSTU external port resource, the MRU1 transcoding resource, and the MRU2 transcoding resource required for the TTS service; the MSCU sends a NAT channel command to the MSTU, and sends an open transcoding command to the MRU, indicating that Receiving data from the MRU internal port to the identified audio package sent by the TTS server, and the media server side media channel is opened;
- Step S4304 the MSCU sends a TCP/IP link request to the TTS server.
- Step S4305 the MSCU accepts the TTS server to send a TCP/IP link request reply message
- Step S4306 the MSCU sends an MRCP request message to the TTS server, indicating that the TTS server needs to recognize the text information;
- Step S4307 the TTS server replies to the MRCP request message, notifies the media server that the text recognition is being performed, and sends an audio packet to the external port of the MSTU, and the MRU sends the data packet forwarded by the NAT from the external port of the MSTU to the terminal;
- Step S4308 the TTS server notifies the MSCU that the text recognition is completed, the MSCU notifies the TTS server to close the current TCP link, and the MSCU module parses the saved INFO (TTS) loop playback number N, and determines whether it is necessary to initiate the MRCP request to the TTS server again. If it is necessary to continue to identify the play, repeat steps S4304 to S4308;
- Step S4309 the N times TTS service is completed, and the MSCU sends a bye request message to the TTS server to notify the TTS server to release the SIP data area corresponding to the TTS service.
- step S4310 the media server receives the TMS server bye reply message, releases the SIP data area of the media server side, and the service is completed.
- Step S440 the media server sends an info message to the APP, and reports information such as the playing duration;
- Step S450 the APP server sends the BYE signaling to the media server to release the resource.
- the media server completes the MRCP request N times and the text processing by using the TTS server to negotiate the result, which significantly reduces the pressure on the media server to process resources and signaling, and greatly improves the performance of the media server in performing the TTS service. .
- FIG. 5 is a structural block diagram of an apparatus for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention.
- an embodiment of the present invention provides a device for implementing a cyclic playback of a text-to-speech TTS service, including :
- the determining module 501 is configured to: when the TTS server uses the media channel of the media server to complete a TTS service for the text information, determine that the TTS server completes the text message Whether the number of times of the TTS service reaches the number of loops NUM of the text information, and obtains the judgment result;
- the interaction module 502 is configured to: when the determination result is no, maintain the media channel, and interact with the TTS server, so that the TTS server can use the media channel to complete another time for the text information. TTS service.
- the device is for a media server.
- TTS service is completed by utilizing a media channel used for completing the TTS service, thereby avoiding the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and the corresponding letter.
- the interaction thus reducing the pressure on the media server to process resources and signaling, improves the performance of the media server when performing TTS services.
- the method may further include:
- a receiving module configured to receive a TTS service request for the text information sent by the application server before determining whether the number of times the TTS server completes the TTS service for the text information reaches the number 5% of the loop information of the text information a message, the TTS service request message carries the number of times of the loop play NUM;
- a parsing module configured to parse the loop playback from the TTS service request message before determining whether the number of times the TTS server completes the TTS service for the text information reaches the number of cyclic play NUM of the text information NUM times.
- the method may further include:
- the module is opened, and is configured to open the media channel when receiving a TTS service request message for the text information sent by the application server.
- the method may further include:
- the close and notification module is configured to close the media channel when the determination result is yes, and notify the APP server that the NUM loop playback for the text information is completed.
- the codec type corresponding to the media channel may be determined by the interaction module according to the codec type set supported by the media server, and negotiated with the TTS server.
- An embodiment of the present invention further provides a server, where the server includes the foregoing apparatus for implementing cyclic playback from a text-to-speech TTS service.
- the server is, for example, a media server.
- all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. Thus, the invention is not limited to any specific combination of hardware and software.
- the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
- each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
- the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
- the embodiment of the present invention completes another TTS service by utilizing a media channel used for completing the TTS service, thereby avoiding the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and corresponding
- the signaling interaction reduces the pressure on the media server to process resources and signaling, and improves the performance of the media server when performing TTS services.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
La présente invention concerne un procédé et un dispositif pour réaliser la lecture cyclique d'un service de conversion de textes en paroles, et un serveur. Le procédé consiste à : lorsqu'un serveur TTS accomplit un service TTS sur des informations textuelles à l'aide d'un canal multimédia d'un serveur multimédia, juger si le nombre de fois où le serveur TTS accomplit le service TTS sur les informations textuelles correspond au nombre (NUM) de lectures cycliques des informations textuelles, et acquérir un résultat du jugement ; et lorsque le résultat du jugement est négatif, interagir avec le serveur TTS de sorte que le serveur TTS puisse accomplir un autre service TTS sur les informations textuelles à l'aide du canal multimédia.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410305490.6 | 2014-06-27 | ||
CN201410305490.6A CN105306420B (zh) | 2014-06-27 | 2014-06-27 | 实现从文本到语音业务循环播放的方法、装置及服务器 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015196823A1 true WO2015196823A1 (fr) | 2015-12-30 |
Family
ID=54936711
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/073051 WO2015196823A1 (fr) | 2014-06-27 | 2015-02-13 | Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105306420B (fr) |
WO (1) | WO2015196823A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357727A (zh) * | 2017-07-04 | 2017-11-17 | 广州君海网络科技有限公司 | App运行测试方法、装置、可读存储介质和计算机设备 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111369970A (zh) * | 2020-06-01 | 2020-07-03 | 浙江百应科技有限公司 | 一种高可用的tts通道智能路由的方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040054536A1 (en) * | 2002-09-13 | 2004-03-18 | Chih-Chung Kuo | Method for generating text script of high efficiency |
CN101088117A (zh) * | 2004-12-22 | 2007-12-12 | 摩托罗拉公司 | 改善文本到语音性能的方法和装置 |
CN102314874A (zh) * | 2010-06-29 | 2012-01-11 | 鸿富锦精密工业(深圳)有限公司 | 文本到语音转换系统与方法 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100486282C (zh) * | 2006-03-27 | 2009-05-06 | 华为技术有限公司 | 一种实现语音交互的方法 |
CN101378391B (zh) * | 2007-08-31 | 2011-12-21 | 华为技术有限公司 | 媒体业务实现方法及通讯系统以及相关设备 |
CN201199724Y (zh) * | 2008-02-05 | 2009-02-25 | 珠海市太川电子企业有限公司 | 一种可视对讲系统室内机 |
JP2009294640A (ja) * | 2008-05-07 | 2009-12-17 | Seiko Epson Corp | 音声データ作成システム、プログラム、半導体集積回路装置及び半導体集積回路装置の製造方法 |
CN102231734B (zh) * | 2011-06-22 | 2017-10-03 | 南京中兴新软件有限责任公司 | 实现从文本到语音tts的音频转码方法、装置及系统 |
CN102394991B (zh) * | 2011-09-28 | 2017-04-19 | 中兴通讯股份有限公司 | 一种多媒体会议业务中实现会场放音的方法和系统 |
-
2014
- 2014-06-27 CN CN201410305490.6A patent/CN105306420B/zh active Active
-
2015
- 2015-02-13 WO PCT/CN2015/073051 patent/WO2015196823A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040054536A1 (en) * | 2002-09-13 | 2004-03-18 | Chih-Chung Kuo | Method for generating text script of high efficiency |
CN101088117A (zh) * | 2004-12-22 | 2007-12-12 | 摩托罗拉公司 | 改善文本到语音性能的方法和装置 |
CN102314874A (zh) * | 2010-06-29 | 2012-01-11 | 鸿富锦精密工业(深圳)有限公司 | 文本到语音转换系统与方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357727A (zh) * | 2017-07-04 | 2017-11-17 | 广州君海网络科技有限公司 | App运行测试方法、装置、可读存储介质和计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
CN105306420B (zh) | 2019-08-30 |
CN105306420A (zh) | 2016-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11637876B2 (en) | System and method for integrating session initiation protocol communication in a telecommunications platform | |
US8880581B2 (en) | System and method for classification of media in VoIP sessions with RTP source profiling/tagging | |
US8139741B1 (en) | Call control presence | |
US8199886B2 (en) | Call control recording | |
US7688954B2 (en) | System and method for identifying caller | |
EP2067348B1 (fr) | Procédé d'enregistrement de conversation variable | |
US8837697B2 (en) | Call control presence and recording | |
US9838564B2 (en) | System and method for distributed processing in an internet protocol network | |
EP1883198B1 (fr) | Procédé et système d'interaction avec des serveurs multimédia sur la base du protocole sip | |
WO2015196823A1 (fr) | Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur | |
WO2013189430A2 (fr) | Procédé, système et serveur multimédia pour mettre en oeuvre un service de reconnaissance automatique de la parole | |
US9148306B2 (en) | System and method for classification of media in VoIP sessions with RTP source profiling/tagging | |
WO2012174908A1 (fr) | Procédé, dispositif et système pour réaliser un transcodage audio de textes en paroles | |
WO2010130193A1 (fr) | Dispositif, procédé servant à commander l'envoi de paquets multimédias audio et serveur multimédia audio | |
CN101453446B (zh) | 一种建立mrcp控制与承载通道的方法、装置与系统 | |
Cisco | Interactive Voice Response Version 2.0 on VoIP | |
US8625577B1 (en) | Method and apparatus for providing audio recording | |
KR20120058764A (ko) | VoIP의 통화 품질 제공 방법 및 그 장치 | |
KR20090066062A (ko) | Sip기반 인터넷 전화 서비스 시스템 및 방법 | |
US8737575B1 (en) | Method and apparatus for transparently recording media communications between endpoint devices | |
WO2016169319A1 (fr) | Procédé, dispositif et système de déclenchement de service, et serveur multimédia | |
US20240107083A1 (en) | Methods and systems for efficient streaming of audio from contact center cloud platform to third-party servers | |
EP4037349B1 (fr) | Procédé permettant de fournir une fonctionnalité d'aide vocale à l'utilisateur final au moyen d'une connexion vocale établie sur un système de télécommunications basé sur ip | |
EP2391084B1 (fr) | Système et procédé pour améliorer la latence dans un réseau IP | |
US9258420B2 (en) | Software-based operator switchboard |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15812632 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15812632 Country of ref document: EP Kind code of ref document: A1 |