WO2015196823A1 - Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur - Google Patents

Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur Download PDF

Info

Publication number
WO2015196823A1
WO2015196823A1 PCT/CN2015/073051 CN2015073051W WO2015196823A1 WO 2015196823 A1 WO2015196823 A1 WO 2015196823A1 CN 2015073051 W CN2015073051 W CN 2015073051W WO 2015196823 A1 WO2015196823 A1 WO 2015196823A1
Authority
WO
WIPO (PCT)
Prior art keywords
tts
server
text information
media
service
Prior art date
Application number
PCT/CN2015/073051
Other languages
English (en)
Chinese (zh)
Inventor
张伟
丁向军
张武雄
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015196823A1 publication Critical patent/WO2015196823A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present invention relates to the field of communications, and in particular, to a method, apparatus, and server for implementing cyclic playback from text to voice services.
  • MS Media Server
  • SIP Session Initiation Protocol
  • MSML Media Sessions Markup Language
  • MOML Media Objects Markup Language
  • the Media Control Unit is an important unit in the media server. It mainly performs capability negotiation with other entities to provide management, maintenance, and control of other service resource units to complete complex services.
  • the Media Storage Transmit Unit-audio is a service resource unit in the media server, which performs massive audio data storage, including audio file playback.
  • the external storage port on the media storage unit can be directly sent and received through the external network port on the unit.
  • the Media Processing Unit mainly performs media codec conversion, number collection, and conference mixing functions.
  • TTS Text To SPEECH
  • the application of TTS is basically to configure a special TTS server to specify the TTS to send audio to the client to complete a service.
  • FIG. 1 is a schematic structural diagram of a system for implementing a TTS cyclic play service according to the related art. As shown in Figure 1, the workflow of the system includes the following steps:
  • Step S101 The terminal initiates a call to activate the service of the APP server.
  • the APP server initiates a business process to the media server;
  • Step S102 The APP server requests the media server to complete N times of TTS service by sending N times of SIP signaling.
  • Step S103 The media server requests TTS resources from the TTS server through SIP signaling, and controls the TTS server to complete the service function by using the MRCP protocol;
  • Step S104 The TTS server sends the media package to the terminal through the media server, and the TTS server reports the information such as the playback duration to the media server.
  • the TTS server is used as a peripheral device of the media server.
  • the APP server requests the service, it only initiates to the media server.
  • the media server determines the service type.
  • the service type is TTS application
  • the media server initiates a request to the TTS server, requests resources, and controls the behavior of the TTS server, and automatically adopts the MRCP protocol.
  • the text is recognized as audio sent to the media server.
  • TTS INFO
  • the embodiment of the present invention provides a method, an apparatus, and a server for implementing a loop-to-play of text-to-speech services, so as to reduce the complexity of processing the internal media resources by the media server to support the TTS service loop playback.
  • An embodiment of the present invention provides a method for implementing a cyclic playback of a text-to-speech TTS service for a media server, including:
  • the media channel is maintained and interacts with the TTS server, so that the TTS server can complete another TTS service for the text information by using the media channel.
  • the method further includes:
  • TTS service request message sent by the application server for the text information, where the TTS service request message carries the number of times of the loop play NUM;
  • the number of loop plays NUM is parsed from the TTS service request message.
  • the method further includes:
  • the media channel is opened upon receiving a TTS service request message for the text information sent by the application server.
  • the method further includes:
  • the media channel is closed, and the application server is notified that the NUM loop playback for the text information is completed.
  • the codec type corresponding to the media channel is determined by the media server according to a set of codec types supported by the media server, and negotiated with the TTS server.
  • a device for implementing a cyclic playback of a text-to-speech TTS service for a media server comprising:
  • the judging module is configured to: when the TTS service of the text information is completed by using the media channel of the media server from the text-to-speech TTS server, determine whether the TTS server completes the TTS service for the text information The number of times of the text information is NUM, and the judgment result is obtained;
  • An interaction module configured to: when the determination result is no, maintain the media channel, and The TTS server interacts such that the TTS server can utilize the media channel to complete another TTS service for the text message.
  • the device further includes:
  • the receiving module is configured to: after the determining module determines whether the number of times the TTS service of the text information is completed by the TTS server reaches the number 5% of the loop information of the text information, and receives the text information sent by the application server a TTS service request message, where the TTS service request message carries the number of times of the loop play NUM;
  • the parsing module is configured to: after the judging module determines whether the number of times the TTS server completes the TTS service for the text information reaches the number 5% of the loop information of the text information, parses the TTS service request message from the TTS service request message The number of loops played is NUM.
  • the device further includes:
  • the module is opened, and is configured to: when receiving the TTS service request message sent by the application server for the text information, open the media channel.
  • the device further includes:
  • the closing and notification module is configured to: when the determination result is yes, close the media channel, and notify the application server that the NUM loop playback for the text information is completed.
  • the interaction module is further configured to: determine, according to the codec type set supported by the media server, the codec type corresponding to the media channel by using the TTS server.
  • An embodiment of the present invention further provides a server, including the apparatus for implementing cyclic playback from text to voice TTS services as described above.
  • the embodiment of the invention further provides a computer readable storage medium storing program instructions, which can be implemented when the program instructions are executed.
  • the TTS service is completed by utilizing the media channel used by the TTS service to avoid the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and the corresponding signaling interaction. Therefore, the pressure on the media server to process resources and signaling is reduced, and the performance of the media server in performing the TTS service is improved.
  • FIG. 1 is a schematic structural diagram of a general process for implementing a TTS cyclic play service according to the related art
  • FIG. 2 is a flow chart showing the steps of a method for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram showing the interaction structure between a media server and each module according to a preferred embodiment of the present invention
  • FIG. 4 is a schematic diagram of timings of exchanging signaling between a media server and each module according to a preferred embodiment of the present invention
  • FIG. 5 is a structural block diagram of an apparatus for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention.
  • FIG. 2 is a flow chart showing the steps of a method for implementing a loop-to-speech playback of a text-to-speech TTS service according to an embodiment of the present invention.
  • an embodiment of the present invention provides a method for implementing a cyclic playback of a text-to-speech TTS service. Including the following steps:
  • Step 201 When the TTS server uses the media channel of the media server to complete a TTS service for the text information, determine whether the number of times the TTS server completes the TTS service of the text information reaches the looping of the text information. The number of times NUM, the judgment result is obtained;
  • Step 202 When the determination result is no, the media channel is maintained and interacts with the TTS server, so that the TTS server can complete another TTS service for the text information by using the media channel.
  • the method is for a media server.
  • the media server when the TTS server completes the TTS service by using the media channel, if the number of times of the TTS service does not reach the number of times of the loop playback, the media server does not close the media channel, but uses the media channel to complete another TTS service. Thereby, the closing and reopening of the media channel is avoided, thereby reducing the number of establishment and release of internal resources of the media server and the corresponding signaling interaction, thereby reducing the pressure on the media server to process resources and signaling, and improving the media server. Performance when performing TTS services.
  • the method may further include:
  • TTS service request message sent by the application server for the text information, where the TTS service request message carries the number of times of the loop play NUM;
  • the number of loop plays NUM is parsed from the TTS service request message.
  • the media channel is opened upon receiving a TTS service request message for the text information sent by the application server.
  • the media channel is closed, and the APP server is notified that the NUM loop playback for the text information is completed.
  • the codec type corresponding to the media channel may be determined by the media server according to the codec type set supported by the media server, and negotiated with the TTS server.
  • the present invention provides a method, a device, and a system for implementing a cyclic playback of a text-to-speech TTS service, in order to solve the disadvantages of the TTS service complex process for identifying text multiple times in the related art.
  • the media server handles the problem that the TTS looping service has a high failure rate and low performance.
  • a method for implementing a TTS loop play service including:
  • the media server receives an access request from the application APP server and determines a set of codec types supported by the media server;
  • the media server receives the TTS service request applied by the APP server, and applies for the service resource to the TTS server according to the TTS service type;
  • the media server parses the INFO (TTS) field to obtain the number of loops N.
  • TTS INFO
  • the media server does not release the local resource, maintains the media link with the TTS server, and performs the next MRCP negotiation to identify the text information. And determining the number of round-robin requests according to N, and finally the TTS server sends the N-time audio playback converted into text recognition to the terminal through the media service (only one application is required).
  • a system for implementing a TTS loop playback service from text to voice includes:
  • the first processing module is configured to: receive an access request from the APP server, and determine a set of codec types supported by the media server;
  • the second processing module is configured to: receive the TTS service request of the APP application, and apply for the TTS service resource according to the TTS service type, and determine the number of times of the loop play;
  • the third processing module is configured to: negotiate with the TTS server according to the codec type set to obtain the negotiated audio codec type, and send the media service data packet to the terminal server through the media server according to the audio codec type.
  • FIG. 3 is a schematic diagram of a switch structure between a module inside the media server and an APP server, a TTS server, and a terminal server according to the embodiment.
  • the media control unit MSCU is configured to send a session initial protocol SIP, MRCP signaling to TTS. Server, SIP negotiation to negotiate and specify the audio codec type that the media server matches the TTS server, the MRCP signaling interaction controls the TTS server to recognize the text, and the content is played; the voice center switching unit MRU is set to receive the TTS server data packet, and The media service data packet is sent to the media storage transport audio unit MSTU; wherein the MSCU controls the MSTU to send the media service data packet to the terminal.
  • SIP session initial protocol
  • MRCP signaling to negotiate and specify the audio codec type that the media server matches the TTS server
  • the MRCP signaling interaction controls the TTS server to recognize the text, and the content is played
  • the voice center switching unit MRU is set to receive the TTS server data packet
  • FIG. 4 is a schematic diagram of the timing of the exchange of the media server and the module.
  • the detailed workflow is as follows:
  • Step S410 The S420APP server sends an INVITE signaling to the media server for media negotiation, and the media server selects a codec type by using its own capability set, and uses an MSTU external port address as an address to interact with the terminal; the APP server sends the media server to the media server.
  • Sending an INFO request the content in the INFO is the application TTS service, and the media server parses the field played in the INFO to be N, and saves all the information;
  • Step S430 the media server negotiates with the TTS server, and controls the TTS server to convert the text into voice.
  • the step S430 may include the following steps:
  • Step S4301 The media control unit MSCU initiates a session initial protocol SIP signaling to the TTS server to negotiate a codec type.
  • the audio codec capability set negotiated in the INVITE signaling is owned by the media server, that is, all codec types supported by the MRU;
  • Step S4302 the TTS server returns an INVITE message 200OK, and notifies the media server of the negotiated audio codec type;
  • Step S4303 The MSCU applies for the media server side MSTU external port resource, the MRU1 transcoding resource, and the MRU2 transcoding resource required for the TTS service; the MSCU sends a NAT channel command to the MSTU, and sends an open transcoding command to the MRU, indicating that Receiving data from the MRU internal port to the identified audio package sent by the TTS server, and the media server side media channel is opened;
  • Step S4304 the MSCU sends a TCP/IP link request to the TTS server.
  • Step S4305 the MSCU accepts the TTS server to send a TCP/IP link request reply message
  • Step S4306 the MSCU sends an MRCP request message to the TTS server, indicating that the TTS server needs to recognize the text information;
  • Step S4307 the TTS server replies to the MRCP request message, notifies the media server that the text recognition is being performed, and sends an audio packet to the external port of the MSTU, and the MRU sends the data packet forwarded by the NAT from the external port of the MSTU to the terminal;
  • Step S4308 the TTS server notifies the MSCU that the text recognition is completed, the MSCU notifies the TTS server to close the current TCP link, and the MSCU module parses the saved INFO (TTS) loop playback number N, and determines whether it is necessary to initiate the MRCP request to the TTS server again. If it is necessary to continue to identify the play, repeat steps S4304 to S4308;
  • Step S4309 the N times TTS service is completed, and the MSCU sends a bye request message to the TTS server to notify the TTS server to release the SIP data area corresponding to the TTS service.
  • step S4310 the media server receives the TMS server bye reply message, releases the SIP data area of the media server side, and the service is completed.
  • Step S440 the media server sends an info message to the APP, and reports information such as the playing duration;
  • Step S450 the APP server sends the BYE signaling to the media server to release the resource.
  • the media server completes the MRCP request N times and the text processing by using the TTS server to negotiate the result, which significantly reduces the pressure on the media server to process resources and signaling, and greatly improves the performance of the media server in performing the TTS service. .
  • FIG. 5 is a structural block diagram of an apparatus for implementing a cyclic playback of a text-to-speech TTS service according to an embodiment of the present invention.
  • an embodiment of the present invention provides a device for implementing a cyclic playback of a text-to-speech TTS service, including :
  • the determining module 501 is configured to: when the TTS server uses the media channel of the media server to complete a TTS service for the text information, determine that the TTS server completes the text message Whether the number of times of the TTS service reaches the number of loops NUM of the text information, and obtains the judgment result;
  • the interaction module 502 is configured to: when the determination result is no, maintain the media channel, and interact with the TTS server, so that the TTS server can use the media channel to complete another time for the text information. TTS service.
  • the device is for a media server.
  • TTS service is completed by utilizing a media channel used for completing the TTS service, thereby avoiding the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and the corresponding letter.
  • the interaction thus reducing the pressure on the media server to process resources and signaling, improves the performance of the media server when performing TTS services.
  • the method may further include:
  • a receiving module configured to receive a TTS service request for the text information sent by the application server before determining whether the number of times the TTS server completes the TTS service for the text information reaches the number 5% of the loop information of the text information a message, the TTS service request message carries the number of times of the loop play NUM;
  • a parsing module configured to parse the loop playback from the TTS service request message before determining whether the number of times the TTS server completes the TTS service for the text information reaches the number of cyclic play NUM of the text information NUM times.
  • the method may further include:
  • the module is opened, and is configured to open the media channel when receiving a TTS service request message for the text information sent by the application server.
  • the method may further include:
  • the close and notification module is configured to close the media channel when the determination result is yes, and notify the APP server that the NUM loop playback for the text information is completed.
  • the codec type corresponding to the media channel may be determined by the interaction module according to the codec type set supported by the media server, and negotiated with the TTS server.
  • An embodiment of the present invention further provides a server, where the server includes the foregoing apparatus for implementing cyclic playback from a text-to-speech TTS service.
  • the server is, for example, a media server.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. Thus, the invention is not limited to any specific combination of hardware and software.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the present invention completes another TTS service by utilizing a media channel used for completing the TTS service, thereby avoiding the closing and reopening of the media channel, thereby reducing the number of establishment and release of the internal resources of the media server and corresponding
  • the signaling interaction reduces the pressure on the media server to process resources and signaling, and improves the performance of the media server when performing TTS services.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un procédé et un dispositif pour réaliser la lecture cyclique d'un service de conversion de textes en paroles, et un serveur. Le procédé consiste à : lorsqu'un serveur TTS accomplit un service TTS sur des informations textuelles à l'aide d'un canal multimédia d'un serveur multimédia, juger si le nombre de fois où le serveur TTS accomplit le service TTS sur les informations textuelles correspond au nombre (NUM) de lectures cycliques des informations textuelles, et acquérir un résultat du jugement ; et lorsque le résultat du jugement est négatif, interagir avec le serveur TTS de sorte que le serveur TTS puisse accomplir un autre service TTS sur les informations textuelles à l'aide du canal multimédia.
PCT/CN2015/073051 2014-06-27 2015-02-13 Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur WO2015196823A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410305490.6 2014-06-27
CN201410305490.6A CN105306420B (zh) 2014-06-27 2014-06-27 实现从文本到语音业务循环播放的方法、装置及服务器

Publications (1)

Publication Number Publication Date
WO2015196823A1 true WO2015196823A1 (fr) 2015-12-30

Family

ID=54936711

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/073051 WO2015196823A1 (fr) 2014-06-27 2015-02-13 Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur

Country Status (2)

Country Link
CN (1) CN105306420B (fr)
WO (1) WO2015196823A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357727A (zh) * 2017-07-04 2017-11-17 广州君海网络科技有限公司 App运行测试方法、装置、可读存储介质和计算机设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369970A (zh) * 2020-06-01 2020-07-03 浙江百应科技有限公司 一种高可用的tts通道智能路由的方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054536A1 (en) * 2002-09-13 2004-03-18 Chih-Chung Kuo Method for generating text script of high efficiency
CN101088117A (zh) * 2004-12-22 2007-12-12 摩托罗拉公司 改善文本到语音性能的方法和装置
CN102314874A (zh) * 2010-06-29 2012-01-11 鸿富锦精密工业(深圳)有限公司 文本到语音转换系统与方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100486282C (zh) * 2006-03-27 2009-05-06 华为技术有限公司 一种实现语音交互的方法
CN101378391B (zh) * 2007-08-31 2011-12-21 华为技术有限公司 媒体业务实现方法及通讯系统以及相关设备
CN201199724Y (zh) * 2008-02-05 2009-02-25 珠海市太川电子企业有限公司 一种可视对讲系统室内机
JP2009294640A (ja) * 2008-05-07 2009-12-17 Seiko Epson Corp 音声データ作成システム、プログラム、半導体集積回路装置及び半導体集積回路装置の製造方法
CN102231734B (zh) * 2011-06-22 2017-10-03 南京中兴新软件有限责任公司 实现从文本到语音tts的音频转码方法、装置及系统
CN102394991B (zh) * 2011-09-28 2017-04-19 中兴通讯股份有限公司 一种多媒体会议业务中实现会场放音的方法和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054536A1 (en) * 2002-09-13 2004-03-18 Chih-Chung Kuo Method for generating text script of high efficiency
CN101088117A (zh) * 2004-12-22 2007-12-12 摩托罗拉公司 改善文本到语音性能的方法和装置
CN102314874A (zh) * 2010-06-29 2012-01-11 鸿富锦精密工业(深圳)有限公司 文本到语音转换系统与方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357727A (zh) * 2017-07-04 2017-11-17 广州君海网络科技有限公司 App运行测试方法、装置、可读存储介质和计算机设备

Also Published As

Publication number Publication date
CN105306420B (zh) 2019-08-30
CN105306420A (zh) 2016-02-03

Similar Documents

Publication Publication Date Title
US11637876B2 (en) System and method for integrating session initiation protocol communication in a telecommunications platform
US8880581B2 (en) System and method for classification of media in VoIP sessions with RTP source profiling/tagging
US8139741B1 (en) Call control presence
US8199886B2 (en) Call control recording
US7688954B2 (en) System and method for identifying caller
EP2067348B1 (fr) Procédé d'enregistrement de conversation variable
US8837697B2 (en) Call control presence and recording
US9838564B2 (en) System and method for distributed processing in an internet protocol network
EP1883198B1 (fr) Procédé et système d'interaction avec des serveurs multimédia sur la base du protocole sip
WO2015196823A1 (fr) Procédé et dispositif pour réaliser une lecture cyclique d'un service de conversion de textes en paroles, et serveur
WO2013189430A2 (fr) Procédé, système et serveur multimédia pour mettre en oeuvre un service de reconnaissance automatique de la parole
US9148306B2 (en) System and method for classification of media in VoIP sessions with RTP source profiling/tagging
WO2012174908A1 (fr) Procédé, dispositif et système pour réaliser un transcodage audio de textes en paroles
WO2010130193A1 (fr) Dispositif, procédé servant à commander l'envoi de paquets multimédias audio et serveur multimédia audio
CN101453446B (zh) 一种建立mrcp控制与承载通道的方法、装置与系统
Cisco Interactive Voice Response Version 2.0 on VoIP
US8625577B1 (en) Method and apparatus for providing audio recording
KR20120058764A (ko) VoIP의 통화 품질 제공 방법 및 그 장치
KR20090066062A (ko) Sip기반 인터넷 전화 서비스 시스템 및 방법
US8737575B1 (en) Method and apparatus for transparently recording media communications between endpoint devices
WO2016169319A1 (fr) Procédé, dispositif et système de déclenchement de service, et serveur multimédia
US20240107083A1 (en) Methods and systems for efficient streaming of audio from contact center cloud platform to third-party servers
EP4037349B1 (fr) Procédé permettant de fournir une fonctionnalité d'aide vocale à l'utilisateur final au moyen d'une connexion vocale établie sur un système de télécommunications basé sur ip
EP2391084B1 (fr) Système et procédé pour améliorer la latence dans un réseau IP
US9258420B2 (en) Software-based operator switchboard

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15812632

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15812632

Country of ref document: EP

Kind code of ref document: A1