WO2016119226A1 - 一种多方通话中语音转文本的方法及装置 - Google Patents
一种多方通话中语音转文本的方法及装置 Download PDFInfo
- Publication number
- WO2016119226A1 WO2016119226A1 PCT/CN2015/071966 CN2015071966W WO2016119226A1 WO 2016119226 A1 WO2016119226 A1 WO 2016119226A1 CN 2015071966 W CN2015071966 W CN 2015071966W WO 2016119226 A1 WO2016119226 A1 WO 2016119226A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- terminal
- text
- identifier
- session
- database
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000005070 sampling Methods 0.000 claims abstract description 43
- 238000012546 transfer Methods 0.000 claims description 53
- 238000004891 communication Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 description 26
- 238000006243 chemical reaction Methods 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000011664 signaling Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/146—Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/64—Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/60—Medium conversion
Definitions
- an allocating unit configured to allocate a session to the at least two terminals, so that the sessions of the terminal with the same identifier or the same identifier are the same in the forward text request sent by the at least two terminals;
- a second receiving unit configured to receive, by using a packet domain, a voice stream that is sent by at least one terminal in a multi-party call and whose sampling rate is greater than 8 kHz; wherein the multi-party call corresponds to one session;
- a sending unit configured to send the text to the terminal in the multiparty call.
- the text includes an identifier of the at least one terminal; or the text includes The identifier and user name of at least one terminal.
- the processor is further configured to: if the allocated session is a session in the database, add an identifier of a terminal that is not included in the database of the identifiers of the at least two terminals to the database The assigned session corresponds.
- FIG. 7 is a schematic structural diagram of another apparatus for voice-to-text in a multi-party call according to an embodiment of the present invention.
- a first embodiment of the present invention provides a method for voice-to-text in a multi-party call. As shown in FIG. 2, the method may include:
- the terminal 1 with the telephone number 123 and the terminal with the telephone number 456 are in the process of the call (assuming that the terminal 1 is the calling party and the terminal 2 is the called party), when the terminal 1 enables the text-transfer function, The server receives the text-transfer request of the terminal 1 as ⁇ 123, 456>; when the terminal 2 enables the text-transfer function, the server receives the text-transfer request of the terminal 2 as ⁇ 123, 456>.
- the server receives the transfer text request 1 ⁇ 123,456> of the terminal A, and receives the transfer text request 2 ⁇ 123,456> of the terminal B, and the server is the first identifier 123 included in the transfer text request of the terminal A. Included with the transfer request of terminal B The first identifier 123 is the same, and the second identifier 456 included in the forward text request of the terminal A is the same as the second identifier 456 included in the forward text request of the terminal B, and the terminal A and the terminal B are assigned the same session, so that The sessions in which the first identifiers of the first identifiers and the terminals with the same identifiers are the same in the transfer text requests sent by the at least two terminals are the same.
- terminal A sends a voice stream with a sampling rate greater than 8 KHz to the server through the packet domain: "Where is our meeting held?"
- the second terminal and the third terminal are allocated the same new session.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
会话 | 终端的标识 |
会话1 | 终端1、终端2 |
会话2 | 终端3、终端4、终端5 |
会话3 | 终端8、终端9、终端10、终端11 |
…… | …… |
会话 | 终端的标识 |
会话1 | 终端1、终端2 |
会话2 | 终端3、终端4、终端5 |
会话3 | 终端8、终端9、终端10、终端11 |
会话4 | 终端12、终端13 |
…… | …… |
会话 | 终端的标识 |
会话1 | 终端1、终端2 |
会话2 | 终端3、终端4、终端5 |
会话3 | 终端8、终端9、终端10、终端11 |
会话4 | 终端12、终端13、终端14 |
…… | …… |
Claims (10)
- 一种多方通话中语音转文本的方法,应用于服务器,其特征在于,所述方法包括:接收至少两个终端发送的转文本请求;其中,所述转文本请求包括第一标识、第二标识;向所述至少两个终端分配会话,使得所述至少两个终端中发送的转文本请求中所述第一标识相同或所述第二标识相同的终端的会话相同;通过分组域接收一个多方通话中至少一个终端发送的采样率大于8KHz的语音流;其中,所述多方通话对应一个会话;将所述语音流转换为文本;向所述多方通话中的终端发送所述文本。
- 根据权利要求1所述的方法,其特征在于,所述向所述至少两个终端分配会话,包括:若数据库包括第一终端发送的转文本请求中的第一标识或第二标识,向所述第一终端分配所述数据库中,与所述第一终端发送的转文本请求中的第一标识或第二标识对应的会话;其中,所述第一终端为所述至少两个终端中的任一个终端;所述数据库包括至少一个会话及与所述至少一个会话对应的终端标识;若所述数据库中不包括第二终端发送的转文本请求中的第一标识和第二标识,且所述第二终端发送的转文本请求中的第一标识与第三终端发送的转文本请求中的第一标识相同,且所述第二终端发送的转文本请求中的第二标识与所述第三终端发送的转文本请求中的第二标识相同,则向所述第二终端和所述第三终端分配同一个新的会话。
- 根据权利要求1或2所述的方法,其特征在于,所述文本包括所述至少一个终端的标识;或者,所述文本包括所述至少一个终端的标识及用户名。
- 根据权利要求1-3任一项所述的方法,其特征在于,在所述 向所述至少两个终端分配会话之后,所述方法还包括:若所述分配的会话为新的会话,则将所述分配的会话与所述至少两个终端的标识建立对应关系,并将所述对应关系添加至所述数据库;若所述分配的会话为所述数据库中的会话,则将所述至少两个终端的标识中未包括于所述数据库的终端的标识添加至所述数据库中与所述分配的会话对应。
- 根据权利要求1-4任一项所述的方法,其特征在于,在所述通过分组域接收一个多方通话中至少一个终端发送的采样率大于8KHz的语音流之后,所述方法还包括:接收第五终端发送的结束消息;将所述数据库中,所述第五终端的标识移除;若在所述数据库中,会话对应的终端的标识为空,则将所述会话从所述数据库中移除。
- 一种多方通话中语音转文本的装置,其特征在于,所述装置包括:第一接收单元,用于接收至少两个终端发送的转文本请求;其中,所述转文本请求包括第一标识、第二标识;分配单元,用于向所述至少两个终端分配会话,使得所述至少两个终端中发送的转文本请求中所述第一标识相同或所述第二标识相同的终端的会话相同;第二接收单元,用于通过分组域接收一个多方通话中至少一个终端发送的采样率大于8KHz的语音流;其中,所述多方通话对应一个会话;转换单元,用于将所述语音流转换为文本;发送单元,用于向所述多方通话中的终端发送所述文本。
- 根据权利要求6所述的装置,其特征在于,所述分配单元用于:若数据库包括第一终端发送的转文本请求中的第一标识或第二标识,向所述第一终端分配所述数据库中,与所述第一终端发送的转文本请求中的第一标识或第二标识对应的会话;其中,所述第一终端为所述至少两个终端中的任一个终端;所述数据库包括至少一个会话及与所述至少一个会话对应的终端标识;若所述数据库中不包括第二终端发送的转文本请求中的第一标识和第二标识,且所述第二终端发送的转文本请求中的第一标识与第三终端发送的转文本请求中的第一标识相同,且所述第二终端发送的转文本请求中的第二标识与所述第三终端发送的转文本请求中的第二标识相同,则向所述第二终端和所述第三终端分配同一个新的会话。
- 根据权利要求6或7所述的装置,其特征在于,所述文本包括所述至少一个终端的标识;或者,所述文本包括所述至少一个终端的标识及用户名。
- 根据权利要求6-8任一项所述的装置,其特征在于,所述装置还包括:添加单元,用于若所述分配的会话为新的会话,则将所述分配的会话与所述至少两个终端的标识建立对应关系,并将所述对应关系添加至所述数据库;所述添加单元还用于,若所述分配的会话为所述数据库中的会话,则将所述至少两个终端的标识中未包括于所述数据库的终端的标识添加至所述数据库中与所述分配的会话对应。
- 根据权利要求6-9任一项所述的装置,其特征在于,所述装置还包括:第三接收单元,用于接收第五终端发送的结束消息;移除单元,用于将所述数据库中,所述第五终端的标识移除;所述移除单元还用于,若在所述数据库中,会话对应的终端的标识为空,则将所述会话从所述数据库中移除。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2017129888A RU2677878C1 (ru) | 2015-01-30 | 2015-01-30 | Способ и устройство для преобразования голоса в текст в конференц-вызове |
KR1020177024009A KR101987123B1 (ko) | 2015-01-30 | 2015-01-30 | 다자간 통화에서 음성을 텍스트로 변환하는 방법 및 장치 |
US15/547,465 US10825459B2 (en) | 2015-01-30 | 2015-01-30 | Method and apparatus for converting voice into text in multiparty call |
EP15879434.7A EP3244600B1 (en) | 2015-01-30 | 2015-01-30 | Method and apparatus for converting voice into text in multi-party call |
PCT/CN2015/071966 WO2016119226A1 (zh) | 2015-01-30 | 2015-01-30 | 一种多方通话中语音转文本的方法及装置 |
JP2017540583A JP6573676B2 (ja) | 2015-01-30 | 2015-01-30 | 多者通話において音声をテキストに変換するための方法および装置 |
CN201580003322.4A CN106105175B (zh) | 2015-01-30 | 2015-01-30 | 一种多方通话中语音转文本的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/071966 WO2016119226A1 (zh) | 2015-01-30 | 2015-01-30 | 一种多方通话中语音转文本的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016119226A1 true WO2016119226A1 (zh) | 2016-08-04 |
Family
ID=56542220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/071966 WO2016119226A1 (zh) | 2015-01-30 | 2015-01-30 | 一种多方通话中语音转文本的方法及装置 |
Country Status (7)
Country | Link |
---|---|
US (1) | US10825459B2 (zh) |
EP (1) | EP3244600B1 (zh) |
JP (1) | JP6573676B2 (zh) |
KR (1) | KR101987123B1 (zh) |
CN (1) | CN106105175B (zh) |
RU (1) | RU2677878C1 (zh) |
WO (1) | WO2016119226A1 (zh) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133342A1 (en) * | 2001-03-16 | 2002-09-19 | Mckenna Jennifer | Speech to text method and system |
CN1636384A (zh) * | 2002-02-20 | 2005-07-06 | 思科技术公司 | 进行带可选语音到文本转换的电话会议的方法和系统 |
US20050201540A1 (en) * | 2004-03-09 | 2005-09-15 | Rampey Fred D. | Speech to text conversion system |
CN1859331A (zh) * | 2006-03-16 | 2006-11-08 | 华为技术有限公司 | 一种多方通信的实现方法及系统 |
CN101068271A (zh) * | 2007-06-26 | 2007-11-07 | 华为技术有限公司 | 电话纪要生成系统、通信终端、媒体服务器及方法 |
WO2008066836A1 (en) * | 2006-11-28 | 2008-06-05 | Treyex Llc | Method and apparatus for translating speech during a call |
EP2627063A1 (en) * | 2012-02-13 | 2013-08-14 | Alcatel Lucent | A telephony system with a background recapitulation feature |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7158764B2 (en) | 2001-12-13 | 2007-01-02 | Electronic Data Systems Corporation | System and method for sending high fidelity sound between wireless units |
JP2005012484A (ja) * | 2003-06-19 | 2005-01-13 | Nec Engineering Ltd | 音声会議システム |
US8027276B2 (en) * | 2004-04-14 | 2011-09-27 | Siemens Enterprise Communications, Inc. | Mixed mode conferencing |
US20060282265A1 (en) * | 2005-06-10 | 2006-12-14 | Steve Grobman | Methods and apparatus to perform enhanced speech to text processing |
US8817668B2 (en) | 2006-09-15 | 2014-08-26 | Microsoft Corporation | Distributable, scalable, pluggable conferencing architecture |
US9025751B2 (en) * | 2008-10-01 | 2015-05-05 | Avaya Inc. | System and method of managing conference calls through the use of filtered lists of participants |
US8542807B2 (en) * | 2009-02-09 | 2013-09-24 | Applied Minds, Llc | Method and apparatus for establishing a data link based on a pots connection |
US20110195739A1 (en) | 2010-02-10 | 2011-08-11 | Harris Corporation | Communication device with a speech-to-text conversion function |
US8559606B2 (en) * | 2010-12-07 | 2013-10-15 | Microsoft Corporation | Multimodal telephone calls |
US8510398B2 (en) * | 2010-12-10 | 2013-08-13 | At&T Intellectual Property I, Lp | Apparatus and method for managing message communication |
US9420431B2 (en) * | 2011-03-08 | 2016-08-16 | General Motors Llc | Vehicle telematics communication for providing hands-free wireless communication |
US8918197B2 (en) * | 2012-06-13 | 2014-12-23 | Avraham Suhami | Audio communication networks |
US9110891B2 (en) * | 2011-12-12 | 2015-08-18 | Google Inc. | Auto-translation for multi user audio and video |
JP6201279B2 (ja) * | 2012-03-22 | 2017-09-27 | 日本電気株式会社 | サーバ、サーバの制御方法および制御プログラム、情報処理システム、情報処理方法、携帯端末、携帯端末の制御方法および制御プログラム |
US8675854B2 (en) * | 2012-05-01 | 2014-03-18 | Mitel Networks Corporation | Multi-modal communications with conferencing and clients |
JP6303324B2 (ja) * | 2013-08-09 | 2018-04-04 | 株式会社リコー | 通信システム、管理装置、通信方法およびプログラム |
CN104700836B (zh) * | 2013-12-10 | 2019-01-29 | 阿里巴巴集团控股有限公司 | 一种语音识别方法和系统 |
-
2015
- 2015-01-30 EP EP15879434.7A patent/EP3244600B1/en active Active
- 2015-01-30 KR KR1020177024009A patent/KR101987123B1/ko active IP Right Grant
- 2015-01-30 US US15/547,465 patent/US10825459B2/en active Active
- 2015-01-30 WO PCT/CN2015/071966 patent/WO2016119226A1/zh active Application Filing
- 2015-01-30 RU RU2017129888A patent/RU2677878C1/ru active
- 2015-01-30 CN CN201580003322.4A patent/CN106105175B/zh active Active
- 2015-01-30 JP JP2017540583A patent/JP6573676B2/ja active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133342A1 (en) * | 2001-03-16 | 2002-09-19 | Mckenna Jennifer | Speech to text method and system |
CN1636384A (zh) * | 2002-02-20 | 2005-07-06 | 思科技术公司 | 进行带可选语音到文本转换的电话会议的方法和系统 |
US20050201540A1 (en) * | 2004-03-09 | 2005-09-15 | Rampey Fred D. | Speech to text conversion system |
CN1859331A (zh) * | 2006-03-16 | 2006-11-08 | 华为技术有限公司 | 一种多方通信的实现方法及系统 |
WO2008066836A1 (en) * | 2006-11-28 | 2008-06-05 | Treyex Llc | Method and apparatus for translating speech during a call |
CN101068271A (zh) * | 2007-06-26 | 2007-11-07 | 华为技术有限公司 | 电话纪要生成系统、通信终端、媒体服务器及方法 |
EP2627063A1 (en) * | 2012-02-13 | 2013-08-14 | Alcatel Lucent | A telephony system with a background recapitulation feature |
Non-Patent Citations (1)
Title |
---|
See also references of EP3244600A4 * |
Also Published As
Publication number | Publication date |
---|---|
JP6573676B2 (ja) | 2019-09-11 |
EP3244600B1 (en) | 2022-06-22 |
KR101987123B1 (ko) | 2019-06-10 |
US10825459B2 (en) | 2020-11-03 |
CN106105175A (zh) | 2016-11-09 |
RU2677878C1 (ru) | 2019-01-22 |
US20170372701A1 (en) | 2017-12-28 |
EP3244600A1 (en) | 2017-11-15 |
JP2018509056A (ja) | 2018-03-29 |
EP3244600A4 (en) | 2018-01-17 |
KR20170108121A (ko) | 2017-09-26 |
CN106105175B (zh) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2007123609A (ru) | Телекоммуникационная система, управляемая пользователем | |
US10320975B2 (en) | Communication systems, communication devices, and related methods for routing calls between communication devices having users with different abilities | |
US20080253548A1 (en) | Systems and Methods for Facilitating Teleconferencing without Pre-Reservation of Conference Resources | |
AU2013353694B2 (en) | Call termination on OTT network | |
CN106128468B (zh) | 语音通话方法及装置 | |
CN111866207B (zh) | 一种音视频会议系统号码分配方法及系统 | |
US9967813B1 (en) | Managing communication sessions with respect to multiple transport media | |
CN102916939A (zh) | 一种基于voip技术的ptt通话实现方法及实现系统 | |
JP2005057781A (ja) | 改良型グループ通信システム | |
WO2020066107A1 (ja) | 中継装置および音声通信の録音方法 | |
US20100261494A1 (en) | Latency improvement methods in native ptt gateway for a group call with dispatch console clients | |
US10917443B2 (en) | Telephone communication system and method for dynamic assignment of IP-PBXs | |
CN106686542B (zh) | 一种呼叫处理的方法和装置 | |
WO2016119226A1 (zh) | 一种多方通话中语音转文本的方法及装置 | |
WO2020066105A1 (ja) | 中継装置および音声通信のモニタ方法 | |
US12113932B2 (en) | Transcription communication | |
CN101582950B (zh) | Ip多媒体子系统中振铃态转接方法及系统 | |
KR101104704B1 (ko) | Ptt 서비스에서 멀티미디어를 이용한 발언자 표시 방법 | |
KR100723673B1 (ko) | Ptt 서비스에서 멀티미디어를 이용한 발언자 표시 방법및 시스템 | |
CN103024685A (zh) | 一种多频道ptt系统及实现方法 | |
US20130216034A1 (en) | Multicall Telephone System | |
CN108370401B (zh) | 用于使用ip语音通信来快速连接用户的方法和装置 | |
CN110113371B (zh) | 一种会话管理系统及会话管理服务器 | |
US10237402B1 (en) | Management of communications between devices | |
US20140270120A1 (en) | Preconfigured Sender Communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15879434 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15547465 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2017540583 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2015879434 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20177024009 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2017129888 Country of ref document: RU Kind code of ref document: A |