CN101079836A - An instant communication method and system based on asymmetric media - Google Patents

An instant communication method and system based on asymmetric media Download PDF

Info

Publication number
CN101079836A
CN101079836A CN 200610157827 CN200610157827A CN101079836A CN 101079836 A CN101079836 A CN 101079836A CN 200610157827 CN200610157827 CN 200610157827 CN 200610157827 A CN200610157827 A CN 200610157827A CN 101079836 A CN101079836 A CN 101079836A
Authority
CN
China
Prior art keywords
information
media
speech
client
text
Prior art date
Application number
CN 200610157827
Other languages
Chinese (zh)
Inventor
王新亮
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to CN 200610157827 priority Critical patent/CN101079836A/en
Publication of CN101079836A publication Critical patent/CN101079836A/en

Links

Abstract

The invention discloses a timely communication method and system based on asymmetrical media in the timely communication domain, which comprises the following steps: A. at least two customer ends select their own timely communication media patterns after building communication connection to input media information of the selected media as the same pattern as the first customer end through first customer end, which transmits the media information to at least second customer end; B. the second customer end receives the media information to judge whether to converse the character voice of media information, which executes the judging result; C. the media information is output to play or display in the second customer end; the media pattern contains character pattern and voice pattern; the media information consists of character information and voice information. The invention reinforces the flexibility of communication after the interactive media information among asymmetrical media user is automatically transferred and output, which makes the communication more hommized.

Description

一种基于非对称媒体的即时通信方法及系统 Instant messaging method and system based on asymmetric medium

技术领域 FIELD

本发明涉及即时通信领域,更具体地说,涉及一种基于非对称媒体的即时通信方法及系统。 The present invention relates to the field of instant messaging, and more particularly, to a method and system for real-time communication based on asymmetric medium.

背景技术 Background technique

现阶段的即时通信方式,最典型的应用是在个人计算机(PersonalComputer,PC)上运行各种即时通信软件,通过互联网进行文字、语音、视频等多媒体通信。 Instant communication at this stage, the most typical application is in the personal computer (PersonalComputer, PC) to run on a variety of instant messaging software, multimedia communication text, voice and video over the Internet. 除此之外,这种即时通信方式也能应用于移动电话终端和固定电话终端,具体实现与PC类似。 In addition, this real-time communication can also be applied to a mobile phone terminal and a fixed telephone terminal, embodied similar to a PC.

但是上述的通信方式均基于对称媒体,也即,参与通信的各用户所在的客户端需采用相同的媒体形式(如均为文字或者均为语音),而不支持对文字和语音的转换。 However, the above-described communication media are based on symmetry, i.e., each user participating in the communication where the client needs to use the same type of media (e.g., both are voice or text), does not support the conversion of text and voice. 若不同的用户选取不同的媒体形式,则在实际的通信过程中,各用户客户端均涉及到了文字和语音这两种媒体形式,且每一者的输出媒体形式须与对方的输入媒体形式保持一致。 If different users select a different media form, in the actual communication, each user client to involve both text and voice media forms, and each form of media output to be held with the other forms of media input consistent. 例如,用户A选取文字,用户B选取语音,若两者之间要互通信息,则不得不采取用户A听音、打字,用户B看字、发音的方式,则实质上,用户A所在客户端的输入媒体形式是文字、输出媒体形式是语音,用户B所在客户端的输入媒体形式是语音、输出媒体形式是文字。 For example, users select the text A, B select user voice, if you want to exchange information between the two, then had to take A user listening, typing, user B to see the word, pronounced way, the essence, where the user A client the input media in the form of text, speech output medium form, where the client B user input is a voice form of media, the media in the form of text output.

由上可知,在现有技术中若要实现非对称媒体间的通信,客户端不能完全独立的选择一种媒体形式,且输出媒体形式受到对方所选择的媒体形式的限制,因此通信的灵活性较低;与之对应的,要求硬件设备支持对方采用的输入媒体形式的本地输出,因此各硬件设备(即文字和语音的输入输出设备,包括键盘、鼠标、麦克风、扬声器等)均要正常运行,若一方设备出现故障则无法实现非对称媒体间的通信;另外,现有技术的通信方式也不具备充分的人性化特点,特殊人群之间的沟通存在障碍,例如在一位失聪用户与一位失明用户之间,则无法采用现有技术进行即时通信。 From the above, to implement communication between the client media can not be completely asymmetrical select a media independent form in the prior art, and subjected to other forms of output media of the selected media in the form of limits, so the flexibility of communication low; the corresponding hardware required to support the input devices of the local output of the other forms of media used, thus hardware devices (i.e., text and voice input and output devices, including a keyboard, a mouse, a microphone, a speaker, etc.) are to function properly , if one device fails the communication can not be achieved between the asymmetric medium; Further, prior art communication system do not have sufficient characteristics of human nature, communication between the barriers special populations, for example, a user with a deaf between bit blind users, the prior art can not be employed for real-time communication.

综上所述,采用现有技术进行非对称媒体间的即时通信时存在诸多限制因素,导致通信的灵活性较低,因此需要一种能够灵活应用于非对称媒体的即时通信方法。 In summary, many constraints exist when using the prior art for real-time communication between the asymmetric medium, resulting in a lower flexibility of communication, and therefore a need for a flexible communication method in an instant asymmetrical medium.

发明内容 SUMMARY

本发明的目的在于提供一种基于非对称媒体的即时通信方法,旨在解决现有技术应用于非对称媒体间通信时灵活性低的问题。 Object of the present invention is to provide a method based on instant messaging asymmetric medium, to solve the prior art when applied to a communication media between the asymmetric problem of the low flexibility.

本发明的目的还在于提供一种基于非对称媒体的即时通信系统,以更好地解决现有技术中存在的上述问题。 Object of the present invention is to provide an instant communication system based on asymmetric media in order to better solve the aforementioned problems present in the prior art.

为了实现发明目的,所述基于非对称媒体的即时通信方法包括以下步骤:A.至少两个客户端在建立通信连接后选择各自的即时通信媒体形式,由第一客户端输入与其所选媒体形式一致的媒体信息,并将所述媒体信息发送给至少第二客户端;B.所述第二客户端接收到所述媒体信息后,判断是否需对所述媒体信息进行文字语音转换,并执行判断结果;C.将所述媒体信息输出,并在所述第二客户端中播放或者显示;所述媒体形式包括文字形式和语音形式,所述媒体信息包括文字信息和语音信息。 To achieve the object of the invention, the instant messaging method based on asymmetric medium comprising the steps of: A at least two clients after establishing a communication connection selection medium in the form of a respective instant messaging, a first input thereto the selected media client form. the same media information, and sends the media information to at least a second client;. B the second client after receiving the media information, determines whether to perform the media information text to speech conversion, and performs determination result;. C to output the media information, and played or displayed on the second client; said media forms including text form and in the form of speech, the media information including text information and voice information.

所述步骤A进一步包括:A1.所述第一客户端输入媒体信息后,对所述媒体信息进行编码和信息处理,并封装成数据包发送给至少第二客户端。 Said step A further comprises:. A1 of the first client after input media information, the media information may be encoded and information processing, and encapsulated into packets transmitted to at least a second client.

所述步骤A1中的信息处理包括对语音进行回声抵消、噪声抑制、增益控制。 The information processing includes the step A1 of the speech echo cancellation, noise reduction, gain control.

所述步骤B进一步包括以下步骤:B1.所述第二客户端接收到所述数据包后,根据网络协议解析出其中的媒体信息,并对所述媒体信息进行解码和信息处理;B2.根据所述第二客户端所选择的媒体形式,判断是否需对所述媒体信息进行文字语音转换,若需要则执行步骤B3,若不需要则转所述步骤C;B3.根据语音识别技术或者语音合成技术,对所述媒体信息进行文字语音转换。 Said step B further comprises the step of: B1 after receiving the data packet of the second client, wherein the network protocol parsing the media information and the media information is decoded and information processing; The B2. the second client in the form of the selected media, the media is determined whether it is necessary to convert text to speech information, if necessary the step B3, if the need to turn the step C;. B3 speech or voice recognition technology synthesis technology, the media information for text to speech conversion.

所述步骤B1中的信息处理包括语音的后处理,所述语音的后处理是指语音增强和去噪声处理。 The information processing includes the step B1 speech post processing, and post-processing the speech and the speech enhancement refers to the denoising process.

所述步骤C中将所述媒体信息输出的步骤进一步包括:C1.对所述步骤B3转换所得的媒体信息进行尺寸调整,然后输出;所述尺寸调整包括:将不足一帧长度的语音信息与其后的语音信息拼接为整帧长度;将由语音信息转换而来的文字信息分成固定大小的数据包。 Said step C the media information in the output step further comprises: a C1 B3 the step of converting the resulting resized media information, and then outputs; the size adjustment comprising: a frame length will be less than its speech information after the voice information for the entire frame length splice; by the speech information into character information converted from the fixed packet size.

为了更好地实现发明目的,所述基于非对称媒体的即时通信系统,包括网络服务器,以及与所述网络服务器相连的至少两个客户端,所述客户端包括输入输出模块、信息处理模块和收发模块,所述客户端进一步包括文字语音转换模块;所述文字语音转换模块与所述输入输出模块及信息处理模块相连,用于根据各客户端所选择的即时通信媒体形式,判断所述客户端之间交互并由所述信息处理模块发送来的媒体信息是否需要进行文字语音转换,若需要则执行所述文字语音转换,并将所述媒体信息转发至所述输入输出模块。 In order to achieve the object of the invention, the instant messaging system based on an asymmetric media, includes a web server, and the at least two clients connected to the network server, the client comprises input-output module, and the information processing module transceiver module, the client module further comprises a text to speech conversion; converting the text to speech module and the input-output module and an information processing module, according to various forms of media IM client selected by determining whether the client whether by the interaction between processing module sends information to the media information needs to convert text to speech, if desired converting the text to speech is performed, and forwards the media information to the input-output module.

所述输入输出模块与所述客户端的文字语音转换模块及信息处理模块相连,同时与输入设备和输出设备相连,用于实现媒体信息的输入和输出。 The input-output module and the client module converts text to speech and information processing module, and connected to an input device and an output device for input and output media information.

所述输入输出模块进一步用于在对所述文字语音转换模块转换得到的媒体信息输出前进行尺寸调整,包括:将不足一帧长度的语音信息与其后的语音信息拼接为整帧长度;将由语音信息转换而来的文字信息分成固定大小的数据包。 The input module is further configured to output the media information output speech conversion module converting the obtained character before resizing, comprising: a frame length will be less than the voice information of the voice information is spliced ​​thereto entire frame length; by voice information converted from the character information into a fixed size data packets.

所述信息处理模块用于对所述媒体信息进行编码、解码及信息处理,所述信息处理包括:对语音进行回声抵消、噪声抑制、增益控制、语音增强和去噪声处理。 Means for processing said information to the media information encoding, decoding and processing the information, the information processing comprising: speech echo cancellation, noise reduction, gain control, voice enhancement and denoising process.

本发明通过对非对称媒体用户之间交互的媒体信息在接收端进行自动转换后输出,增强了通信的灵活性,同时也使得通信更加人性化。 By the present invention, the receiving end converts the output of the automatic information exchange media between the asymmetric media users, enhances the flexibility of communication, but also makes communication more humane.

附图说明 BRIEF DESCRIPTION

图1是本发明中基于非对称媒体的即时通信系统结构图;图2是本发明中基于非对称媒体的即时通信方法流程图。 FIG 1 of the present invention is based on the instant messaging system configuration diagram of an asymmetric medium; FIG 2 is a flowchart illustrating the present invention, a communication method of the instant asymmetrical medium.

具体实施方式 Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。 To make the objectives, technical solutions and advantages of the present invention will become more apparent hereinafter in conjunction with the accompanying drawings and embodiments of the present invention will be further described in detail. 应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。 It should be understood that the specific embodiments described herein are only intended to illustrate the present invention and are not intended to limit the present invention.

本发明中,若进行即时通信的各用户客户端选择了不同的媒体形式(语音形式或文字形式),则当其中一个客户端接收到另一客户端发送的媒体信息(语音信息或文字信息)后,首先利用语音识别技术或语音合成技术将该媒体信息进行文字语音转换,再在该客户端中显示或播放。 In the present invention, for each user if the instant messaging client chose different forms of media (voice or text form), wherein when the client receives a media information (voice information or text information) sent by another client after the first use of speech recognition or speech synthesis technology for the media information text to speech conversion, and then displayed or played at the client. 这种即时通信方式便捷地实现了非对称媒体间的互通,用户可任意选择语音或文字中的一种作为通信的媒体形式,而不受对方选择的媒体形式的限制,从而使得用户之间的即时通信方式更加灵活。 This instant communication convenient way to achieve interoperability between the asymmetric medium, the user can arbitrarily select one kind of voice or text as a form of communication medium, without the other form of restriction of the selected media, so that the user between instant communication more flexible.

图1示出了本发明中基于非对称媒体的即时通信系统的结构,该系统采用客户端/服务器(Client/Server,CS)模式,包括网络服务器300,以及多个与之相连的客户端,如图1所示的第一客户端100以及第二客户端200。 FIG 1 illustrates the present invention in the instant messaging system based on an asymmetric structure of the media, the system uses a client / server (Client / Server, CS) mode, includes a web server 300, and a plurality of clients connected thereto, the first client 100 shown in Figure 1 and a second client terminal 200. 应当说明的是,本发明的系统中客户端的数量并不限于以上两者,图1给出的系统结构只是一个最简示例,本发明的系统可在此基础上进行扩展。 It should be noted that the number of the system of the present invention is not limited to both the client above, Figure 1 shows a system configuration is only a simplest example, the system can be extended according to the present invention on this basis.

网络服务器300用于提供客户端注册,维护客户端信息,并管理所有客户端(包括第一客户端100和第二客户端200)。 The web server 300 for providing a client registration, client information maintenance, and management of all clients (client 100 comprises a first and a second client 200). 本发明涉及的网络典型的为因特网,除此之外还包括移动电话网和固定电话网。 The present invention relates to a typical network is the Internet, in addition to including a mobile telephone network and fixed telephone network.

第一客户端100包括输入输出模块101、文字语音转换模块102、信息处理模块103和收发模块104,其中:输入输出模块101与文字语音转换模块102、信息处理模块103相连,同时与输入设备及输出设备相连,其功能包括:(1)当第一客户端100作为发送客户端时,输入输出模块101接收本端输入设备输入的媒体信息(语音或者文字),并将该媒体信息送入信息处理模块103中;(2)当第一客户端100作为接收客户端时,输入输出模块101将第二客户端200发送并经由文字语音转换模块102而来的媒体信息(语音或者文字)送到本端输出设备播放或者显示;(3)当第一客户端100作为接收客户端时,输入输出模块101还用于在对文字语音转换模块102转换得到的媒体信息输出前进行尺寸调整,包括:对不足一帧长度的语音信息进行拼接处理,也即截取其后的语音信息从而拼接为整帧长度;将由语 The client 100 includes a first input-output module 101, text to speech conversion module 102, the information processing module 103 and transceiver module 104, wherein: the input-output module 101 and text to speech conversion module 102, the information processing module 103, while the input device and connected to the output device, its features include: (1) when the first client 100 transmits a client input-output module 101 receives the local information media input device input (voice or text), and information into the media information the processing module 103; (2) when the first client 100 as a client receiving the input-output module 101 and the second client 200 transmits the text to speech conversion module 102 from media information (voice or text) is fed via this end play or display output device; (3) when the first client 100 as a client receiving the input output module 101 is further configured to adjust the size of the media information output to the text-speech conversion module 102 obtained in the above, comprising: of less than a frame length of splicing processing voice information, i.e. speech information thus taken subsequent splicing of the whole frame length; language by 音信息转换而来的文字信息分成固定大小的数据包。 Sound information converted from the character information into packets of fixed size. 输入输出模块101与多种输入设备和输出设备相连,本发明涉及到的典型的输入设备包括键盘、鼠标、麦克风等,输出设备包括显示器、扬声器等。 Input-output module 101 is connected with various input and output devices, typical of the present invention relates to an input device like a keyboard, a mouse, a microphone, an output device including a display, a speaker, and the like. 应当说明的是,该输入输出模块101与这些输入设备、输出设备之间并无明确界限,可包括后两者作为一个集成的功能模块,也可独立存在。 It should be noted that no clear boundaries between the input and output module 101 these input devices, output devices, may include both the function as an integrated module, may exist independently.

文字语音转换模块102与输入输出模块101、信息处理模块103相连,其主要在媒体信息的接收端中起作用。 Text to speech conversion module 102 is connected to the input-output module 101, the information processing module 103, which is the main function in the receiving end of the media information. 当信息处理模块103对所接收到的媒体信息进行解码和信息处理后,则转发至文字语音转换模块102中,文字语音转换模块102首先根据第一客户端100中所选择的媒体形式判断是否需要对该媒体信息进行文字语音转换,如需要则执行文字语音转换,若不需要则直接转发至输入输出模块101。 When the information processing module 103 of the received media information is decoded and information processing is forwarded to the text to speech conversion module 102, text to speech conversion module 102 first end 100 in the form of the selected media is determined according to whether the first client for text to speech conversion on the media information, such as the need to perform the text to speech conversion, if needed it is forwarded directly to the input-output module 101. 该文字语音转换模块102对媒体信息进行文字语音转换,所采取的技术是语音识别和语音合成。 The text to speech conversion module 102, the media information text to speech conversion technology is adopted speech recognition and speech synthesis. 语音识别技术就是对输入的语音数字信号进行分析识别得到相应的文字信息的过程,实现输入语音输出文字。 Speech recognition technology is a digital voice signal input is analyzed to identify the process of obtaining the corresponding character information, the incoming command voice output character. 语音识别是语音信号处理领域的前沿技术之一,涉及语音信号分析和处理、智能算法、模式识别等方面。 Speech recognition is one of the cutting-edge technology in the field of voice signal processing, involving aspects of the speech signal analysis and processing, intelligent algorithms and pattern recognition. 语音合成技术,也即文语转换(Text-To-Speech,TTS),通过语音合成技术可以把输入的文字转化为语音信号输出,语音合成主要是对输入的文字进行词法语法句法分析,分析完成后结合语音库,生成需要的语音信号。 Speech synthesis technique, i.e., TTS (Text-To-Speech, TTS), can input character conversion by speech synthesis speech signal output, speech synthesis is mainly to input text lexical syntax parsing, analysis is complete after binding speech database, generating a speech signal is required. 关于语音识别和语音合成的详细内容,可参考由蔡莲红、黄德智、蔡锐所著,并由清华大学出版社于2003年11月1日出版的《现代语音技术基础与应用》。 For details of the speech recognition and speech synthesis, reference may be made Lianhong Cai, Huang Dezhi, Cai Rui written by Tsinghua University "Modern Speech Technology Fundamentals and Applications" Press in 2003 published November 1.

信息处理模块103与输入输出模块101、文字语音转换模块102以及收发模块104相连,其功能包括:(1)若第一客户端100为发送客户端,则该信息处理模块103对输入输出模块101所输入的媒体信息进行编码和信息处理,该信息处理包括对语音进行回声抵消、噪声抑制、增益控制等;(2)若第一客户端100为接收客户端,则该信息处理模块103对收发模块104接收到的数据包,首先根据网络协议解析其所包含的媒体信息,并进行解码和信息处理,该信息处理包括执行语音的后处理,例如语音增强和去噪声处理等。 The information processing module 103 and input-output module 101, text to speech conversion module 102 and transceiver module 104 is connected, the function comprising: (1) when the first client to send a client 100, the information input and output processing module 103 module 101 the input media information is encoded and information processing, the information processing speech comprising echo cancellation, noise reduction, gain control and the like; (2) when the first client to the receiving client 100, the information processing module 103 of the transceiver module 104 receives the data packet, parses first network protocol information contained medium, and decoding and processing the information, the information processing comprises performing post-processing of speech, such as voice enhancement and denoising process and the like.

收发模块104与信息处理模块103相连,用于将信息处理模块103处理后的媒体信息发送至第二客户端200,以及接收第二客户端200发送的媒体信息,并转发至信息处理模块103。 Transceiver module 104 and information processing module 103, the media for transmitting the information to the information processing module 103 processes the second client 200, and receives the media information of the second client terminal 200 transmits and forwards the information to the processing module 103.

第二客户端200则包括输入输出模块201、文字语音转换模块202、信息处理模块203和收发模块204,分别与第一客户端100中的输入输出模块101、文字语音转换模块102、信息处理模块103、收发模块104具有相同的结构和功能,此处不再赘述。 The client 200 includes a second input-output module 201, text to speech conversion module 202, the information processing module 203 and transceiver module 204, respectively, with the first client module 101 of the input-output terminal 100, text to speech conversion module 102, the information processing module 103, transceiver module 104 has the same structure and function will not be repeated here.

图2示出了本发明中基于非对称媒体的即时通信方法流程,包括以下步骤:在步骤S201中,各客户端(即至少第一客户端100和第二客户端200)建立通信连接,并选择各自的媒体形式,本发明中提及的媒体形式主要包括文字形式和语音形式。 Figure 2 shows the present invention is a method based on instant messaging asymmetric flow media, comprising the following steps: In step S201, each of the client (i.e., client 100 at least a first and a second client 200) establish a communication connection, and select the respective forms of media, the media of the present invention mentioned forms including the form of voice and text form.

在步骤S202中,第一客户端100通过其输入输出模块101,输入与其所选媒体形式一致的媒体信息,本发明所称的媒体信息包括文字信息和语音信息。 In step S202, the first client 100 via the input and output module 101, consistent with its input in the form of media information of the selected media, the media of the present invention is referred to information including text information and voice information.

在步骤S203中,第一客户端100输入媒体信息后,利用信息处理模块103对该媒体信息进行编码和信息处理。 In step S203, the first client 100 after the input media information, the information processing module 103 for encoding the media information and information processing. 本发明中此处所称的信息处理包括对语音进行回声抵消、噪声抑制、增益控制等。 Reference herein to the present invention includes an information processing speech echo cancellation, noise reduction, gain control.

在步骤S204中,第一客户端100将该媒体信息封装成数据包,并利用收发模块104将该数据包发送给第二客户端200。 In step S204, the first client 100 the media information into the packet, and using the transceiver module 104 transmits the data packet to the second client 200.

在步骤S205中,第二客户端200利用其收发模块204接收该媒体信息的数据包,并根据网络协议解析出其所包含的媒体信息。 In step S205, the second client 200 using its transceiver module 204 receives a packet of the media information, and parse the information contained in the media according to the network protocol.

在步骤S206中,第二客户端200利用其信息处理模块203对该媒体信息进行解码和信息处理,并将处理后的媒体信息发送至文字语音转换模块202。 In step S206, the client 200 has a second information processing module 203 of the media information is decoded and information processing, and sends media information to the text-speech conversion module 202 after the treatment. 此步骤中的信息处理包括执行语音的后处理,例如语音增强和去噪声处理等。 This information processing step comprises performing post-processing of the speech, such as voice enhancement and denoising process and the like.

在步骤S207中,第二客户端200中的文字语音转换模块202收到该媒体信息后,首先判断是否需要对其进行文字语音转换,若需要则执行步骤S208,若不需要则转步骤S209。 After step S207, the second client 200 text to speech conversion module 202 receives the media information, first determines whether it needs to be spoken text, if required step S208 is executed, if the need to go to Step S209. 此步骤与前述步骤S201对应,若两客户端最初所选择的媒体形式不一致,此处则需要进行文字语音转换,若媒体形式一致则不需要。 This step corresponds to the aforementioned step S201, if the two are inconsistent client initially selected media forms, the need for text to speech conversion here, the form is not required if the same media.

在步骤S208中,文字语音转换模块202利用语音识别技术和语音合成技术,完成对该媒体信息的文字语音转换:(1)语音识别技术就是对输入的语音数字信号进行分析识别得到相应的文字信息的过程,实现输入语音输出文字,其为语音信号处理领域的前沿技术,涉及语音信号分析和处理、智能算法、模式识别等方面;(2)语音合成技术也即文语转换(Text-To-Speech,TTS),其主要是对输入的文字进行词法语法句法分析,分析完成后结合语音库,生成需要的语音信号。 In step S208, text to speech conversion module 202 using voice recognition and speech synthesis technology, the complete text to speech conversion media information: (1) speech recognition technology is a digital voice signal input is analyzed to identify the corresponding character information process the incoming command voice output character as a leading edge of the speech signal processing techniques, it involves analyzing and processing aspects of the speech signal, intelligent algorithms, pattern recognition and the like; (2) speech synthesis technology i.e., TTS (text-To- speech, TTS), which is mainly for text input syntax analysis of lexical grammar, the analysis is done in conjunction with speech database to generate a voice signal required. 关于语音识别和语音合成的详细内容,可参考由蔡莲红、黄德智、蔡锐所著,并由清华大学出版社于2003年11月1日出版的《现代语音技术基础与应用》。 For details of the speech recognition and speech synthesis, reference may be made Lianhong Cai, Huang Dezhi, Cai Rui written by Tsinghua University "Modern Speech Technology Fundamentals and Applications" Press in 2003 published November 1.

在步骤S209中,通过输入输出设备201将媒体信息输出,并在第二客户端200中播放或者显示。 In step S209, the playback client 200 via the second input-output device 201 outputs the media information, and or displayed. 其中,将媒体信息输出的步骤进一步包括对媒体信息进行尺寸调整,然后输出。 Wherein the step of outputting the media information to the media information further comprising resizing, and then output. 由于文字和语音信息编码后是按照数据包发送和接收的,一包文字(或语音)信息的长度是固定的。 Since the text and voice messages are transmitted and received coded according to a data packet, a packet of text (or voice) information is fixed length. 一包文字信息是网络传输和显示的单位;而对于语音信息,其网络传输单位是包,播放和采集单位却是帧,一包语音数据包含整数帧。 Text message is a packet transmission network and a display unit; for voice information, which is a packet network transmission unit, and displays the frame acquisition unit is a packet voice frame data contains an integer. (1)若两客户端的媒体形式选择一致:如均为语音,则每次取解码后的一个语音帧数据送到播放缓冲区由输出设备播放;如均为文字,每次取一包解码后的文字信息,送给输出设备显示。 Such as are text, each taking a decoded packet; are as voice, a voice frame data each time the playback buffer to fetch the decoded playback by the output device: (1) If the two forms of media clients consistent selection text messages, sent to the output device display. (2)若两客户端的媒体形式选择不一致:对由文字合成的语音,需要对语音作拼接处理,因为一包文字信息合成的语音长度可能不是整数帧,因此会出现不足一帧长度的情况,对不足一帧长度的数据,先在其后补充静音数据,当下一次的语音数据到来时则截取相应长度的语音数据以替代静音部分进行拼接,从而保证语音的连续性,每次取合成语音数据播放时,先判断播放缓冲区中是否有补充了静音数据而且还未播放的语音帧,如果有则取相同长度数据替换静音部分,如果没有就取完整的一帧数据送到播放缓冲区;对由语音转换得到的文字,由于一包语音转换得到的文字可能不是整数包,因此要进行分包处理,将文字信息分成固定大小的数据包,一次取一包显示,不足一包的则将其后作为空信息处理,只显示有效的文字信息。 (2) If two client media in the form of selection is inconsistent: for a text synthesized speech, the need for speech for splicing, since a packet character information synthesizing speech length may not integer number of frames, and therefore the case is less than a frame length will occur, less than a frame length data, the first supplemental thereafter mute data, voice data corresponding to the length taken next time incoming data in place of voice silent portion is spliced ​​to ensure speech continuity, each taking synthesized speech data during playback, first determine whether there is a speech frame and has not been supplemented with mute data broadcast playback buffer, if the data replacement take the same length of the silent section, if not take the complete data to a playback buffer; for obtained by a text to speech conversion, since a text converted packet voice packets may not be an integer, thus to be sub-processing text information into a fixed size data packet, a time to take the package display, then it is less than a packet after processing the information as a blank, only the valid character information. 至此,则完成了一次即时通信的完整流程。 Thus, once the process is completed a complete instant messaging.

应当说明的是,本发明典型的应用但并不限定于因特网,还可应用于固定电话网和移动电话网,因此客户端可为PC终端、固定电话终端以及移动电话终端等。 It should be noted that a typical application of the present invention is not limited to the Internet, is also applicable to fixed telephone network and mobile telephone network, the client terminal may be a PC, a mobile telephone and a fixed telephone terminal and other terminals.

另外,本发明解决了非对称媒体的即时通信问题,但同时仍适用于对称媒体间的通信,用户可灵活选择自身客户端的媒体形式。 Further, the present invention solves the problem of asymmetric real-time communication media, but still applied to a communication medium between the symmetry, the user can flexibly select media in the form of their clients.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。 The above are only preferred embodiments of the present invention but are not intended to limit the present invention, any modifications within the spirit and principle of the present invention, equivalent replacements and improvements should fall in the protection of the present invention within range.

Claims (10)

1.一种基于非对称媒体的即时通信方法,其特征在于,所述方法包括以下步骤:A.至少两个客户端在建立通信连接后选择各自的即时通信媒体形式,由第一客户端输入与其所选媒体形式一致的媒体信息,并将所述媒体信息发送给至少第二客户端;B.所述第二客户端接收到所述媒体信息后,判断是否需对所述媒体信息进行文字语音转换,并执行判断结果;C.将所述媒体信息输出,并在所述第二客户端中播放或者显示;所述媒体形式包括文字形式和语音形式,所述媒体信息包括文字信息和语音信息。 CLAIMS 1. A method of instant messaging based on asymmetric medium, characterized in that the method comprises the following steps:. A least two clients after establishing a communication connection selection medium in the form of a respective instant messaging, input by the first client selected media form consistent with its media information, and sends the media information to at least a second client;. B the second client after receiving the media information, determines whether or not the media information required for text voice conversion, and performs the judgment result; C to output the media information, and played or displayed on the second client; the forms of media including text form and in the form of speech, the media information includes text messages and voice information.
2.根据权利要求1所述的基于非对称媒体的即时通信方法,其特征在于,所述步骤A进一步包括:A1.所述第一客户端输入媒体信息后,对所述媒体信息进行编码和信息处理,并封装成数据包发送给至少第二客户端。 The instant messaging method based on asymmetric medium according to claim 1, wherein said step A further comprises:. A1 after the first input media client information, the media information may be encoded and information processing, and encapsulated into packets transmitted to at least a second client.
3.根据权利要求2所述的基于非对称媒体的即时通信方法,其特征在于,所述步骤A1中的信息处理包括对语音进行回声抵消、噪声抑制、增益控制。 3. The instant messaging method based on asymmetric medium, wherein according to claim 2, said information processing comprising the step A1 speech echo cancellation, noise reduction, gain control.
4.根据权利要求2所述的基于非对称媒体的即时通信方法,其特征在于,所述步骤B进一步包括以下步骤:B1.所述第二客户端接收到所述数据包后,根据网络协议解析出其中的媒体信息,并对所述媒体信息进行解码和信息处理;B2.根据所述第二客户端所选择的媒体形式,判断是否需对所述媒体信息进行文字语音转换,若需要则执行步骤B3,若不需要则转所述步骤C;B3.根据语音识别技术或者语音合成技术,对所述媒体信息进行文字语音转换。 The instant messaging method based on asymmetric medium according to claim 2, wherein said step B further comprises the following steps:. B1 the second client after receiving the data packet, according to the network protocol wherein parsing the media information and the media information is decoded and information processing;. B2 according to the second medium in the form selected by the client determines whether the required media information text to speech conversion, if desired the step B3, if the need to turn the step C;. B3 voice recognition or speech synthesis technology, the media information for text to speech conversion.
5.根据权利要求4所述的基于非对称媒体的即时通信方法,其特征在于,所述步骤B1中的信息处理包括语音的后处理,所述语音的后处理是指语音增强和去噪声处理。 The instant messaging method based on asymmetric medium according to claim 4, wherein said information processing step B1 comprises post-processing the speech, the speech post-processing means and a speech enhancement denoising process .
6.根据权利要求4所述的基于非对称媒体的即时通信方法,其特征在于,所述步骤C中将所述媒体信息输出的步骤进一步包括:C1.对所述步骤B3转换所得的媒体信息进行尺寸调整,然后输出;所述尺寸调整包括:将不足一帧长度的语音信息与其后的语音信息拼接为整帧长度;将由语音信息转换而来的文字信息分成固定大小的数据包。 The instant messaging method based on asymmetric medium according to claim 4, wherein, in the step C the media information output step further comprises:. C1 B3 converting the information media obtained in step resized, and then outputs; the size adjustment comprising: a voice information less than a frame length splice voice information and the subsequent frame length is an integer; divided into fixed-size data packets converted from voice information by the character information.
7.一种基于非对称媒体的即时通信系统,包括网络服务器,以及与所述网络服务器相连的至少两个客户端,所述客户端包括输入输出模块、信息处理模块和收发模块,其特征在于,所述客户端进一步包括文字语音转换模块;所述文字语音转换模块与所述输入输出模块及信息处理模块相连,用于根据各客户端所选择的即时通信媒体形式,判断所述客户端之间交互并由所述信息处理模块发送来的媒体信息是否需要进行文字语音转换,若需要则执行所述文字语音转换,并将所述媒体信息转发至所述输入输出模块。 An instant messaging system based on an asymmetric media, includes a web server, and the at least two clients connected to the network server, the input-output module comprises a client, the information processing module and a transceiver module, wherein the client module further comprises a text to speech conversion; converting the text to speech module and the input-output module and an information processing module, according to various forms of media IM client selected by determining whether the client's whether by the interaction between the information processing module transmits the information to the media require text to speech conversion, if desired, converting the text to speech is performed, and forwards the media information to the input-output module.
8.根据权利要求7所述的基于非对称媒体的即时通信系统,其特征在于,所述输入输出模块与所述客户端的文字语音转换模块及信息处理模块相连,同时与输入设备和输出设备相连,用于实现媒体信息的输入和输出。 The real-time media based on asymmetric communication system according to claim 7, characterized in that the input connected to the output module and the client module converts text to speech and information processing module, and connected to an input device and an output device for input and output media information.
9.根据权利要求7所述的基于非对称媒体的即时通信系统,其特征在于,所述输入输出模块进一步用于在对所述文字语音转换模块转换得到的媒体信息输出前进行尺寸调整,包括:将不足一帧长度的语音信息与其后的语音信息拼接为整帧长度;将由语音信息转换而来的文字信息分成固定大小的数据包。 9. The system according to instant messaging based on asymmetric medium in claim 7, wherein said input output module is further for resizing the media information output speech conversion module converting the text obtained in the above, comprising : the voice information less than a frame length of voice information and the subsequent splicing frame length is an integer; by the speech information into character information converted from the fixed packet size.
10.根据权利要求7所述的基于非对称媒体的即时通信系统,其特征在于,所述信息处理模块用于对所述媒体信息进行编码、解码及信息处理,所述信息处理包括:对语音进行回声抵消、噪声抑制、增益控制、语音增强和去噪声处理。 10. The instant messaging system as claimed in claim asymmetric based medium, characterized in that said 7, the information processing module for the media information encoding, decoding and processing the information, the information processing comprising: voice carry out echo cancellation, noise suppression, gain control, speech enhancement and noise to deal with.
CN 200610157827 2006-12-21 2006-12-21 An instant communication method and system based on asymmetric media CN101079836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610157827 CN101079836A (en) 2006-12-21 2006-12-21 An instant communication method and system based on asymmetric media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610157827 CN101079836A (en) 2006-12-21 2006-12-21 An instant communication method and system based on asymmetric media

Publications (1)

Publication Number Publication Date
CN101079836A true CN101079836A (en) 2007-11-28

Family

ID=38907072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610157827 CN101079836A (en) 2006-12-21 2006-12-21 An instant communication method and system based on asymmetric media

Country Status (1)

Country Link
CN (1) CN101079836A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102347913A (en) * 2011-07-08 2012-02-08 个信互动(北京)网络科技有限公司 Method for realizing voice and text content mixed message
CN102710539A (en) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 Method and device for transferring voice messages
WO2013127367A1 (en) * 2012-03-02 2013-09-06 腾讯科技(深圳)有限公司 Instant communication voice recognition method and terminal
CN103327181A (en) * 2013-06-08 2013-09-25 广东欧珀移动通信有限公司 Voice chatting method capable of improving efficiency of voice information learning for users
CN103632670A (en) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 Voice and text message automatic conversion system and method
CN103873687A (en) * 2014-03-10 2014-06-18 联想(北京)有限公司 Information processing method and electronic equipment
CN103973544A (en) * 2014-04-02 2014-08-06 小米科技有限责任公司 Voice communication method, voice playing method and devices
CN104700836A (en) * 2013-12-10 2015-06-10 阿里巴巴集团控股有限公司 Voice recognition method and voice recognition system
CN104780091A (en) * 2014-01-13 2015-07-15 北京发现角科技有限公司 Instant messaging method and instant messaging system with speech and audio processing function
CN105099865A (en) * 2014-05-23 2015-11-25 北京奇虎科技有限公司 Method and apparatus for handling session messages
CN105376134A (en) * 2014-08-26 2016-03-02 腾讯科技(北京)有限公司 Method and device for displaying communication message
CN106209878A (en) * 2016-07-20 2016-12-07 北京邮电大学 WebRTC-based multimedia data transmission method and device
CN106412032A (en) * 2016-09-14 2017-02-15 安徽声讯信息技术有限公司 Remote audio character transmission method and system
CN106789602A (en) * 2017-03-15 2017-05-31 广东欧珀移动通信有限公司 Voice broadcasting control method, terminal and mobile terminal
CN106850559A (en) * 2016-12-26 2017-06-13 中国科学院计算技术研究所 Extensible network protocol analysis system and method

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102347913B (en) * 2011-07-08 2015-04-08 个信互动(北京)网络科技有限公司 Method for realizing voice and text content mixed message
CN102347913A (en) * 2011-07-08 2012-02-08 个信互动(北京)网络科技有限公司 Method for realizing voice and text content mixed message
WO2013127367A1 (en) * 2012-03-02 2013-09-06 腾讯科技(深圳)有限公司 Instant communication voice recognition method and terminal
CN103295576A (en) * 2012-03-02 2013-09-11 腾讯科技(深圳)有限公司 Voice identification method and terminal of instant communication
US9263029B2 (en) 2012-03-02 2016-02-16 Tencent Technology (Shenzhen) Company Limited Instant communication voice recognition method and terminal
CN102710539A (en) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 Method and device for transferring voice messages
CN103327181B (en) * 2013-06-08 2014-12-10 广东欧珀移动通信有限公司 Voice chatting method capable of improving efficiency of voice information learning for users
CN103327181A (en) * 2013-06-08 2013-09-25 广东欧珀移动通信有限公司 Voice chatting method capable of improving efficiency of voice information learning for users
CN103632670A (en) * 2013-11-30 2014-03-12 青岛英特沃克网络科技有限公司 Voice and text message automatic conversion system and method
CN104700836B (en) * 2013-12-10 2019-01-29 阿里巴巴集团控股有限公司 A kind of audio recognition method and system
US10249301B2 (en) 2013-12-10 2019-04-02 Alibaba Group Holding Limited Method and system for speech recognition processing
CN104700836A (en) * 2013-12-10 2015-06-10 阿里巴巴集团控股有限公司 Voice recognition method and voice recognition system
US10140989B2 (en) 2013-12-10 2018-11-27 Alibaba Group Holding Limited Method and system for speech recognition processing
CN104780091A (en) * 2014-01-13 2015-07-15 北京发现角科技有限公司 Instant messaging method and instant messaging system with speech and audio processing function
CN104780091B (en) * 2014-01-13 2019-06-25 北京发现角科技有限公司 A kind of instant communicating method and system with speech audio processing function
CN103873687A (en) * 2014-03-10 2014-06-18 联想(北京)有限公司 Information processing method and electronic equipment
CN103973544B (en) * 2014-04-02 2017-10-24 小米科技有限责任公司 Audio communication method, speech playing method and device
US10057424B2 (en) 2014-04-02 2018-08-21 Xiaomi Inc. Method for voice calling, method for voice playing and devices thereof
CN103973544A (en) * 2014-04-02 2014-08-06 小米科技有限责任公司 Voice communication method, voice playing method and devices
CN105099865A (en) * 2014-05-23 2015-11-25 北京奇虎科技有限公司 Method and apparatus for handling session messages
CN105376134A (en) * 2014-08-26 2016-03-02 腾讯科技(北京)有限公司 Method and device for displaying communication message
CN106209878A (en) * 2016-07-20 2016-12-07 北京邮电大学 WebRTC-based multimedia data transmission method and device
CN106412032A (en) * 2016-09-14 2017-02-15 安徽声讯信息技术有限公司 Remote audio character transmission method and system
CN106850559A (en) * 2016-12-26 2017-06-13 中国科学院计算技术研究所 Extensible network protocol analysis system and method
CN106789602A (en) * 2017-03-15 2017-05-31 广东欧珀移动通信有限公司 Voice broadcasting control method, terminal and mobile terminal
CN106789602B (en) * 2017-03-15 2019-09-06 Oppo广东移动通信有限公司 Voice control method for playing back, terminal and mobile terminal

Similar Documents

Publication Publication Date Title
US7277855B1 (en) Personalized text-to-speech services
CN1158645C (en) Voice control of user interface to service application program
US7974392B2 (en) System and method for personalized text-to-voice synthesis
KR100561228B1 (en) Method for VoiceXML to XHTML+Voice Conversion and Multimodal Service System using the same
US8027276B2 (en) Mixed mode conferencing
US6507817B1 (en) Voice IP approval system using voice-enabled web based application server
US8174559B2 (en) Videoconferencing systems with recognition ability
US8326624B2 (en) Detecting and communicating biometrics of recorded voice during transcription process
US6975988B1 (en) Electronic mail method and system using associated audio and visual techniques
EP1143679B1 (en) A conversational portal for providing conversational browsing and multimedia broadcast on demand
US7003464B2 (en) Dialog recognition and control in a voice browser
JP4171585B2 (en) System and method for providing network coordinated conversational services
EP0671721A2 (en) Communication system
US8654940B2 (en) Dialect translator for a speech application environment extended for interactive text exchanges
EP2274870B1 (en) Open architecture based domain dependent real time multi-lingual communication service
US6618704B2 (en) System and method of teleconferencing with the deaf or hearing-impaired
US8670987B2 (en) Automatic speech recognition with dynamic grammar rules
TWI516080B (en) Using n-channel selector voip instant communication method and system of language processing
US20080295040A1 (en) Closed captions for real time communication
JP2009540384A (en) Method and system for a sign language graphical interpreter
US8150698B2 (en) Invoking tapered prompts in a multimodal application
EP1331797A1 (en) Communication system for hearing-impaired persons comprising speech to text conversion terminal
JP4271224B2 (en) Speech translation apparatus, speech translation method, speech translation program and system
CN102117614B (en) Personalized text-to-speech synthesis and personalized speech feature extraction
US20040064322A1 (en) Automatic consolidation of voice enabled multi-user meeting minutes

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C12 Rejection of an application for a patent