TW201733376A

TW201733376A - Instant communication method and instant communication system based on voice recognition

Info

Publication number: TW201733376A
Application number: TW106102454A
Authority: TW
Inventors: zhi-jie Yan
Original assignee: Alibaba Group Services Ltd
Priority date: 2016-01-26
Filing date: 2017-01-23
Publication date: 2017-09-16
Also published as: CN106997764A; WO2017128991A1; CN106997764B; TWI774654B

Abstract

An instant communication method and an instant communication system based on voice recognition. The instant communication method comprises: receiving voice information sent by a sending terminal; performing voice recognition on the voice information to generate text information; sending the voice information to a receiving terminal; and sending the text information to the receiving terminal. The instant communication method and system overcome the obstacle that a receiving terminal cannot listen to voice information after receiving same on certain occasions, thereby avoiding the problem of privacy leaks of a user.

Description

Instant messaging method based on speech recognition and instant communication system

本發明關於即時通信技術領域，尤其關於一種基於語音識別的即時通信方法和即時通信系統。 The present invention relates to the field of instant messaging technologies, and more particularly to a voice communication based instant messaging method and an instant messaging system.

透過手機或平板電腦的社交app對講聊天是很多軟體常用的便利功能，例如騰訊的微信、阿裡的釘釘、支付寶、淘寶等都具備這樣的功能。目前這類功能主要的實現方式是發送終端透過語音方式錄製自己的留言，接受方點按收到的資訊，透過聽筒或外放收聽。 The social app intercom chat through mobile phone or tablet is a convenient function commonly used by many softwares, such as Tencent's WeChat, Ali's nail, Alipay, Taobao, etc. all have such a function. At present, the main implementation of such functions is that the transmitting terminal records its own message through voice, and the receiving party clicks on the received information and listens through the handset or externally.

這類功能在方便發送終端的同時，對接收終端實際帶來了一定障礙。主要缺點在於：接收終端無法像文字資訊一樣一目了然的看到資訊內容，需要點按再將手機或平板拿到耳邊用聽筒收聽，或是用手機或平板的揚聲器外放，這在很多場合(例如會議中、或旁邊有其他人)，這是非常不便的，也可能存在隱私洩露的問題。 Such a function brings a certain obstacle to the receiving terminal while facilitating the transmitting terminal. The main disadvantage is that the receiving terminal can't see the information content at a glance like the text information. You need to tap and hold the phone or tablet to the ear to listen to it, or use the speaker of the mobile phone or tablet to put it on, in many occasions ( For example, there are other people in the meeting or next to it. This is very inconvenient, and there may be problems with privacy leaks.

鑒於上述問題，提出了本發明實施例以便提供一種克服上述問題或者至少部分地解決上述問題的基於語音識別的即時通信方法和即時通信系統。 In view of the above problems, embodiments of the present invention have been proposed in order to provide a gram. A speech recognition based instant messaging method and an instant messaging system that address the above problems or at least partially solve the above problems.

為解決上述問題，本發明揭示一種基於語音識別的即時通信方法，包括：接收發送終端發送的語音資訊；將該語音資訊進行語音識別，產生文字資訊；將該語音資訊發送至接收終端；以及將該文字資訊發送至接收終端。 To solve the above problem, the present invention discloses a voice communication-based instant messaging method, including: receiving voice information sent by a transmitting terminal; performing voice recognition on the voice information to generate text information; transmitting the voice information to a receiving terminal; The text message is sent to the receiving terminal.

本發明另一實施例提出一種基於語音識別的即時通信方法，包括：錄製語音資訊並發送至伺服器；接收經過識別該語音資訊產生的文字資訊，並顯示該文字資訊；在接收到糾正操作指令後，進入編輯文字資訊的介面；顯示編輯後文字資訊，並將編輯後文字資訊發送至伺服器。 Another embodiment of the present invention provides a voice communication-based instant messaging method, including: recording voice information and transmitting it to a server; receiving text information generated by identifying the voice information, and displaying the text information; receiving a corrective operation instruction After that, enter the interface for editing the text information; display the edited text information, and send the edited text information to the server.

本發明再一實施例提出一種基於語音識別的即時通信方法，包括：接收伺服器發送的語音資訊；接收伺服器發送的識別該語音資訊後產生的文字資訊；顯示並標記該文字資訊。 A further embodiment of the present invention provides a method for instant messaging based on voice recognition, comprising: receiving voice information sent by a server; receiving text information generated by the server after identifying the voice information; and displaying and marking the text information.

本發明一實施例提出一種基於語音識別的即時通信系統，其特徵在於，包括：語音資訊接收模組，用於接收發送終端發送的語音資訊；文字資訊產生模組，用於將該語音資訊進行語音識別，產生文字資訊；第一發送模組，用於將該語音資訊發送至接收終端；以及第二發送模組，用於將該文字資訊發送至接收終端。 An embodiment of the present invention provides an instant messaging system based on speech recognition The system includes: a voice information receiving module, configured to receive voice information sent by the sending terminal; and a text information generating module, configured to perform voice recognition on the voice information to generate text information; the first sending module, And the second sending module is configured to send the text information to the receiving terminal.

本發明另一實施例提出一種基於語音識別的即時通信系統，包括：語音資訊錄製發送模組，用於錄製語音資訊並發送至伺服器；文字資訊接收顯示模組，用於接收經過識別該語音資訊產生的文字資訊，並顯示該文字資訊；編輯模組，用於在接收到糾正操作指令後，進入編輯文字資訊的介面；顯示發送模組，用於顯示編輯後文字資訊，並將編輯後文字資訊發送至伺服器。 Another embodiment of the present invention provides an instant messaging system based on voice recognition, comprising: a voice information recording and transmitting module for recording voice information and transmitting to a server; and a text information receiving and displaying module for receiving the voice recognition The text information generated by the information, and the text information is displayed; the editing module is used to enter the interface for editing the text information after receiving the correct operation instruction; the display sending module is used to display the edited text information, and after editing Text messages are sent to the server.

本發明再一實施例提出一種基於語音識別的即時通信系統，包括：語音資訊獲取模組，用於接收伺服器發送的語音資訊；文字資訊獲取模組，用於接收伺服器發送的識別該語音資訊後產生的文字資訊；文字資訊顯示標記模組，用於顯示並標記該文字資訊。 Another embodiment of the present invention provides an instant messaging system based on voice recognition, comprising: a voice information acquiring module, configured to receive voice information sent by a server; and a text information acquiring module, configured to receive the voice sent by the server to identify the voice Text information generated after the information; The text information display tag module is used to display and mark the text information.

本發明實施例至少具有以下優點：本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，透過語音識別功能，將語音資訊和文字資訊均發送至接收終端，克服了接收終端獲得資訊的障礙，方便了使用者的使用，避免了隱私洩露的問題。 The embodiment of the present invention has at least the following advantages: the voice recognition-based instant messaging method and the instant messaging system provided by the embodiments of the present invention transmit voice information and text information to the receiving terminal through the voice recognition function, thereby overcoming the receiving terminal. The barriers to information facilitate the use of users and avoid the problem of privacy leaks.

S101、S102、S103、S104‧‧‧方法步驟 S101, S102, S103, S104‧‧‧ method steps

S201、S202、S203、S204、S205、S206、S207、S208、S209‧‧‧方法步驟 S201, S202, S203, S204, S205, S206, S207, S208, S209‧‧‧ method steps

S301、S302、S302a、S302b、S303、S304‧‧‧方法步驟 S301, S302, S302a, S302b, S303, S304‧‧‧ method steps

S401、S402、S402a、S403、S404、S405‧‧‧方法步驟 S401, S402, S402a, S403, S404, S405‧‧‧ method steps

500‧‧‧即時通信系統 500‧‧‧ instant messaging system

501‧‧‧語音資訊接收模組 501‧‧‧Voice Information Receiver Module

502‧‧‧文字資訊產生模組 502‧‧‧Text Information Generation Module

503‧‧‧第一發送模組 503‧‧‧First Sending Module

504‧‧‧第二發送模組 504‧‧‧second transmission module

600‧‧‧即時通信系統 600‧‧‧ instant messaging system

601‧‧‧語音資訊接收模組 601‧‧‧Voice information receiving module

602‧‧‧文字資訊產生模組 602‧‧‧Text Information Generation Module

603‧‧‧第一發送模組 603‧‧‧First Sending Module

604‧‧‧第二發送模組 604‧‧‧second transmission module

605‧‧‧第三發送模組 605‧‧‧The third sending module

606‧‧‧資訊收發模組 606‧‧‧Information Transceiver Module

607‧‧‧第一儲存模組 607‧‧‧First storage module

608‧‧‧第四發送模組 608‧‧‧fourth transmission module

609‧‧‧資訊收發模組 609‧‧‧Information Transceiver Module

610‧‧‧文字資訊關聯模組 610‧‧‧Text Information Association Module

700‧‧‧即時通信系統 700‧‧‧ instant messaging system

701‧‧‧語音資訊錄製發送模組 701‧‧‧Voice Information Recording and Sending Module

702‧‧‧文字資訊接收顯示模組 702‧‧‧Text information receiving display module

703‧‧‧編輯模組 703‧‧‧editing module

704‧‧‧顯示發送模組 704‧‧‧Display Sending Module

705‧‧‧輔助修改資訊接收模組 705‧‧‧Auxiliary modification information receiving module

706‧‧‧語音資訊播放模組 706‧‧‧Voice Information Player Module

800‧‧‧即時通信系統 800‧‧‧ instant messaging system

801‧‧‧語音資訊獲取模組 801‧‧‧Voice Information Acquisition Module

802‧‧‧文字資訊獲取模組 802‧‧‧Text Information Acquisition Module

803‧‧‧文字資訊顯示標記模組 803‧‧‧Text information display tag module

804‧‧‧標記資訊獲取模組 804‧‧‧Marking Information Acquisition Module

805‧‧‧語音資訊播放模組 805‧‧‧Voice Information Playback Module

806‧‧‧接收顯示模組 806‧‧‧ Receiving display module

圖1是本發明第一實施例的基於語音識別的即時通信方法的流程圖。 1 is a flow chart of a voice recognition based instant messaging method according to a first embodiment of the present invention.

圖2是本發明第二實施例的基於語音識別的即時通信方法的流程圖。 2 is a flow chart of a voice recognition based instant messaging method in accordance with a second embodiment of the present invention.

圖3是本發明第三實施例的基於語音識別的即時通信方法的流程圖。 3 is a flow chart of a voice recognition based instant messaging method in accordance with a third embodiment of the present invention.

圖4是本發明第四實施例的基於語音識別的即時通信方法的流程圖。 4 is a flow chart of a voice recognition based instant messaging method according to a fourth embodiment of the present invention.

圖5是對應於本發明第一實施例的基於語音識別的即時通信方法的即時通信系統的方塊圖。 Figure 5 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the first embodiment of the present invention.

圖6是對應於本發明第二實施例的基於語音識別的即時通信方法的即時通信系統的方塊圖。 Figure 6 is a block diagram of an instant messaging system corresponding to a voice recognition based instant messaging method in accordance with a second embodiment of the present invention.

圖7是對應於本發明第三實施例的基於語音識別的即時通信方法的即時通信系統的方塊圖。 Figure 7 is a block diagram of an instant messaging system corresponding to a voice recognition based instant messaging method in accordance with a third embodiment of the present invention.

圖8是對應於本發明第四實施例的基於語音識別的即時通信方法的即時通信系統的方塊圖。 Figure 8 is a speech recognition based on a fourth embodiment of the present invention A block diagram of an instant messaging system for time communication methods.

下面將結合本發明實施例中的附圖，對本發明實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本發明一部分實施例，而不是全部的實施例。基於本發明中的實施例，本領域普通技術人員所獲得的所有其他實施例，都屬於本發明保護的範圍。 The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments of the present invention are within the scope of the present invention.

本發明的核心思想之一在於，提出一種即時通信方法和即時通信系統，使用語音識別將語音資訊進行識別，並透過伺服器將文字資訊直接顯示在發送終端和接收終端的螢幕上，方便了接收終端接收資訊，克服了某些場合下接收終端收到語音資訊後無法收聽的障礙，避免了使用者隱私洩露的問題。 One of the core ideas of the present invention is to provide an instant communication method and an instant communication system, which use voice recognition to identify voice information, and display text information directly on the screen of the transmitting terminal and the receiving terminal through the server, thereby facilitating reception. The terminal receives the information and overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, thereby avoiding the problem of user privacy leakage.

First embodiment

本發明第一實施例提出一種基於語音識別的即時通信方法，如圖1所示為本發明第一實施例的基於語音識別的即時通信方法的流程圖。本發明第一實施例中的即時通信方法應用於伺服器，包括如下步驟： The first embodiment of the present invention provides a voice communication based instant messaging method. FIG. 1 is a flowchart of a voice recognition based instant messaging method according to a first embodiment of the present invention. The instant messaging method in the first embodiment of the present invention is applied to a server, and includes the following steps:

S101，接收發送終端發送的語音資訊；在這一步驟中，發送終端可以在即時通信介面(例如聊天介面)錄製語音資訊，錄音完成之後鬆開該標記或按鈕，則錄製完成。之後，發送終端將語音資訊透過網路發送至伺服器。 S101: Receive voice information sent by the sending terminal. In this step, the sending terminal may record voice information in an instant communication interface (such as a chat interface), and after the recording is completed, the mark or button is released, and the recording is completed. After that, the sending terminal sends the voice information through the network. Send to the server.

S102，將該語音資訊識別為文字資訊；在這一步驟中，伺服器接受到該方發送的語音資訊之後，透過語音識別技術，將該語音資訊識別為文字資訊。語音識別技術是本領域常用的技術，在此並不贅述。 S102: Identify the voice information as text information. In this step, after receiving the voice information sent by the party, the server recognizes the voice information as text information through voice recognition technology. Speech recognition technology is a commonly used technology in the art and will not be described here.

S103，將該語音資訊發送至接收終端；在這一步驟中，伺服器將步驟S101中接收到的語音資訊發送至接收終端。 S103. Send the voice information to the receiving terminal. In this step, the server sends the voice information received in step S101 to the receiving terminal.

值得注意的是，步驟S103可以與步驟S102同時執行或先後執行，當先後執行時，步驟S102和步驟S103的步驟順序並不特別限定。 It should be noted that step S103 may be performed simultaneously with step S102 or sequentially, and when executed sequentially, the sequence of steps of step S102 and step S103 is not particularly limited.

S104，將識別後產生的該文字資訊發送至接收終端；在這一步驟中，伺服器將經過語音識別處理後產生的文字資訊發送給接收終端。較佳地，在這一步驟中，伺服器在發送文字資訊的同時發送指定標記，用於區別由語音資訊轉成的文字資訊和發送方直接以文字方式輸入的文字資訊。 S104: Send the text information generated after the identification to the receiving terminal; in this step, the server sends the text information generated after the voice recognition processing to the receiving terminal. Preferably, in this step, the server sends a specified mark while transmitting the text information, for distinguishing the text information converted from the voice information and the text information directly input by the sender in a text manner.

值得注意的是，當步驟S103在步驟S102之後執行時，步驟S104可以與步驟S103同時執行，或者步驟S104可以先於或後於步驟S103執行，本發明並不特別限定。 It should be noted that, when step S103 is performed after step S102, step S104 may be performed simultaneously with step S103, or step S104 may be performed before or after step S103, and the present invention is not particularly limited.

在一實施例中，可以先執行步驟S103，將步驟S101中收到的語音資訊發送至接收終端，再執行步驟S102，將語音資訊經過語音識別產生文字資訊，之後執行步驟 S104，將識別後產生的文字資訊發送至接收終端；在另一實施例中，可以先執行步驟S102，將步驟S101中收到的語音資訊進行語音識別產生文字資訊，再同時或先後執行步驟S103和步驟S104，將語音資訊和識別後產生的文字資訊發送至接收終端。 In an embodiment, step S103 may be performed first, and the voice information received in step S101 is sent to the receiving terminal, and then step S102 is performed, and the voice information is subjected to voice recognition to generate text information, and then the steps are performed. S104: Send the text information generated after the identification to the receiving terminal. In another embodiment, step S102 may be performed first, and the voice information received in step S101 is voice-recoordinated to generate text information, and then step S103 is performed simultaneously or sequentially. And in step S104, the voice information and the text information generated after the identification are sent to the receiving terminal.

綜上所述，本發明第一實施例提出一種基於語音識別的即時通信方法，將語音資訊透過識別產生文字資訊，透過伺服器將語音資訊和文字資訊均發送至接收終端。該實施例提供的即時通信方法方便了接收終端接收資訊，克服了某些場合下接收終端收到語音資訊後無法收聽的障礙，避免了使用者隱私洩露的問題。 In summary, the first embodiment of the present invention provides a voice communication-based instant messaging method, which generates voice information through recognition, and transmits voice information and text information to a receiving terminal through a server. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.

Second embodiment

本發明第二實施例提出一種基於語音識別的即時通信方法，如圖2所示為本發明第二實施例的基於語音識別的即時通信方法的流程圖。本發明第一實施例中的即時通信方法應用於伺服器，包括如下步驟： A second embodiment of the present invention provides a voice communication based instant messaging method. FIG. 2 is a flowchart of a voice recognition based instant messaging method according to a second embodiment of the present invention. The instant messaging method in the first embodiment of the present invention is applied to a server, and includes the following steps:

S201，接收發送終端發送的語音資訊； S201. Receive voice information sent by the sending terminal.

S202，將該語音資訊識別為文字資訊； S202, identifying the voice information as text information;

S203，將該語音資訊發送至接收終端； S203. Send the voice information to the receiving terminal.

S204，將識別後產生的該文字資訊發送至接收終端； S204. Send the text information generated after the identification to the receiving terminal.

上述步驟S201至S204與第一實施例中的步驟S101至步驟S104相同或相似，在此並不贅述。 The above steps S201 to S204 are the same as or similar to the steps S101 to S104 in the first embodiment, and are not described herein.

在一較佳實施例中，在步驟S202之後，該方法還可以包括 In a preferred embodiment, after step S202, the method may also To include

S205，將識別後產生的該文字資訊發送至發送終端；在這一步驟中，伺服器將在步驟S202中產生的文字資訊發送至發送終端。 S205. Send the text information generated after the identification to the sending terminal. In this step, the server sends the text information generated in step S202 to the sending terminal.

其中，步驟S205、步驟S204和步驟S203的執行順序並不限制，三者可以同時執行，或者以任意順序先後執行，本發明並不特別限制。 The execution order of the step S205, the step S204, and the step S203 is not limited, and the three may be executed at the same time or sequentially in any order, and the present invention is not particularly limited.

另外，在步驟S202之後，所述方法還可以包括： In addition, after the step S202, the method may further include:

S206，將識別後產生的該文字資訊儲存於資料庫；在這一步驟中，伺服器將識別後產生的文字資訊發送至與伺服器連接的資料庫中備用。這一步驟S206可以與步驟S203至S205中的任一者同時或以任意順序先後執行，本發明並不特別限制。 S206, storing the text information generated after the identification in the database; in this step, the server sends the text information generated after the identification to the database connected to the server for use. This step S206 can be performed simultaneously with any of steps S203 to S205 or sequentially in any order, and the present invention is not particularly limited.

在步驟S202之後，所述方法還可以包括： After the step S202, the method may further include:

S207，將輔助錯誤糾正資訊發送至發送終端；這一步驟可以與步驟S203至S205中的任一者同時或以任意順序先後執行，本發明並不特別限制。較佳地，步驟S207可以與步驟S205同時執行，即在將識別後產生的文字資訊發送至發送終端的同時，將錯誤輔助糾正資訊同時發送至發送終端，供發送終端修改識別後的文字資訊。 S207, the auxiliary error correction information is sent to the sending terminal; this step may be performed simultaneously with any one of steps S203 to S205 or in any order, and the present invention is not particularly limited. Preferably, step S207 can be performed simultaneously with step S205, that is, while transmitting the text information generated after the identification to the transmitting terminal, the error assisting correction information is simultaneously sent to the transmitting terminal, and the transmitting terminal modifies the recognized text information.

在語音識別過程中，將會產生詞圖(word graph)及識別詞多候選資訊，在步驟S207中，可以根據詞圖裡的資訊，使用演算法，推薦備選的糾錯詞給使用者點選。這些資訊透過回傳發送終端，可以輔助更高效的對識別文字進行錯誤糾正。例如，當發送終端的使用者選擇錯誤糾正、並點擊識別錯誤的某字詞時，可透過輔助糾正資訊得到該字詞的其他候選字詞，並顯示在虛擬鍵盤上，使用者可透過點擊正確候選高效的進行錯誤糾正。具體地，舉例來說，使用者說：“我要買黃色的”，語音識別錯誤識別成“我要買紅色的”，當使用者點擊“紅色”這個詞時，演算法可根據詞圖資訊，提示出“黃色”這個第二候選供使用者點選。用戶點擊“黃色”，即完成了替換糾錯的操作，非常簡單快捷。 In the speech recognition process, a word graph and a multi-candidate candidate information will be generated. In step S207, an algorithm may be used according to the information in the word map, and an alternative error correction word may be recommended to the user. selected. This information can be used to assist in more efficient recognition of text through the return transmission terminal. Make error corrections. For example, when the user of the sending terminal selects an error correction and clicks on a word identifying the error, other candidate words of the word can be obtained through the auxiliary correction information, and displayed on the virtual keyboard, the user can click through the correct word. Candidates efficiently perform error correction. Specifically, for example, the user says: "I want to buy a yellow one", the speech recognition error is recognized as "I want to buy red", when the user clicks on the word "red", the algorithm can prompt according to the word map information. The second candidate for "yellow" is for the user to click. When the user clicks "yellow", the replacement error correction operation is completed, which is very simple and fast.

之後，所述方法還可以包括： Thereafter, the method may further include:

步驟S208，接收發送終端發出的編輯後文字資訊，並發送至接收終端；在這一步驟中，當發送終端的使用者完成糾正後，發送終端將編輯後文字資訊發送至伺服器，伺服器接收該編輯後文字資訊，並發送至接收終端。 Step S208: Receive the edited text information sent by the sending terminal, and send the information to the receiving terminal. In this step, after the user of the sending terminal completes the correction, the sending terminal sends the edited text information to the server, and the server receives the error. The edited text message is sent to the receiving terminal.

較佳地，在步驟S208之後，本發明還可以包括： Preferably, after step S208, the present invention may further include:

步驟S209，將編輯後文字資訊發送至資料庫。 In step S209, the edited text information is sent to the database.

在這一步驟中，被糾正過的自動語音識別結果價值很高、尤為重要，它提示了：1)伺服器未能完全正確地識別該語音資訊；2)該語音資訊的正確文字資訊已由使用者透過糾正給出。對這類編輯後文字資訊，可以利用語音識別系統的訓練演算法，記錄識別錯誤的文字內容、所對應的語音內容和正確的語音內容，避免此後再犯類似錯誤。這類錯誤糾正資料對語音識別系統自我進化的功能是其他資料所不可比擬的。 In this step, the corrected automatic speech recognition result is of high value and is particularly important. It prompts: 1) the server fails to correctly recognize the speech information; 2) the correct text information of the speech information has been The user gives it by correcting it. For such post-editing text information, the speech recognition system's training algorithm can be used to record the erroneous text content, the corresponding speech content and the correct speech content, so as to avoid making similar mistakes thereafter. The function of this type of error correction data for the self-evolution of the speech recognition system is Other information is unmatched.

綜上所述，本發明第二實施例提出一種基於語音識別的即時通信方法，將語音資訊透過識別產生文字資訊，透過伺服器將語音資訊和文字資訊均發送至接收終端，並將文字資訊發送至發送終端，在發送給發送終端之後提供輔助修改資訊，利用該資訊可以讓發送終端的使用者能夠高效地修改。該實施例提供的即時通信方法方便了接收終端接收資訊，克服了某些場合下接收終端收到語音資訊後無法收聽的障礙，避免了使用者隱私洩露的問題，同時進一步保證了接收終端接收到資訊的準確性。 In summary, the second embodiment of the present invention provides a voice communication-based instant messaging method, which generates voice information through recognition, transmits voice information and text information to a receiving terminal through a server, and sends text information. To the transmitting terminal, after sending to the transmitting terminal, the auxiliary modification information is provided, and the information can be used to enable the user of the transmitting terminal to modify it efficiently. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and further ensures that the receiving terminal receives the information. The accuracy of the information.

Third embodiment

本發明第三實施例提出一種基於語音識別的即時通信方法，如圖3所示為本發明第三實施例的基於語音識別的即時通信方法的流程圖。本發明第三實施例中的即時通信方法應用於資訊的發送終端，包括如下步驟： A third embodiment of the present invention provides a voice communication based instant messaging method. FIG. 3 is a flowchart of a voice recognition based instant messaging method according to a third embodiment of the present invention. The instant messaging method in the third embodiment of the present invention is applied to the information sending terminal, and includes the following steps:

S301，錄製語音資訊並發送至伺服器；在這一步驟中，發送終端可以在即時通信介面(例如聊天介面)錄製語音資訊，例如按住輸入塊的指定標記或按鈕不放，則開始錄音，錄音完成之後鬆開該標記或按鈕，則錄製完成。在錄製完成之後，該即時通信介面可以預設為直接發送，或者發送終端點擊另一標記或按鈕，將資訊透過網路發送至伺服器。 S301, recording voice information and sending to the server; in this step, the sending terminal can record voice information in an instant communication interface (such as a chat interface), for example, pressing and holding a specified mark or button of the input block to start recording, When the mark or button is released after the recording is completed, the recording is completed. After the recording is completed, the instant messaging interface can be preset to be sent directly, or the transmitting terminal can click another tag or button to send the information to the server via the network.

S302，接收經過伺服器識別該語音資訊後的產生文字資訊，並顯示該文字資訊；在這一步驟中，伺服器將發送終端發送的語音資訊進行語音識別產生文字資訊並回傳給發送終端，發送終端接收識別後的文字資訊，並進行顯示。例如在聊天介面，發送終端在步驟S301中將錄製好的語音資訊發送給伺服器，在此步驟S302中，發送終端可在同一聊天介面中接收伺服器回傳的識別該語音資訊後產生的文字資訊，並顯示於該聊天介面。 S302. Receive text generated after the server recognizes the voice information. Information, and display the text information; in this step, the server sends the voice information sent by the terminal for voice recognition to generate text information and returns it to the transmitting terminal, and the transmitting terminal receives the recognized text information and displays it. For example, in the chat interface, the sending terminal sends the recorded voice information to the server in step S301. In this step S302, the sending terminal can receive the text generated by the server after the identification of the voice information in the same chat interface. Information and displayed in the chat interface.

S303，在接收到糾正操作指令後，開啟錯誤糾正介面，進入編輯文字資訊的介面；在這一步驟中，當發送終端的使用者認為語音識別後產生的文字資訊的內容與語音資訊不一致，則可以透過發出糾正操作指令開啟錯誤糾正介面。例如，糾正操作指令可以為使用者長按該文字資訊，發送終端即接收該指令並開啟錯誤糾正介面，進入編輯文字狀態，同時該糾正介面可以顯示虛擬鍵盤或者手寫鍵盤等輸入介面，供使用者糾正錯誤。使用者可以透過虛擬鍵盤等對文字資訊進行增、刪等操作。 S303, after receiving the correct operation instruction, open the error correction interface, and enter an interface for editing the text information; in this step, when the user of the sending terminal considers that the content of the text information generated after the voice recognition is inconsistent with the voice information, The error correction interface can be opened by issuing a corrective action command. For example, the correct operation instruction may press the text information for the user, and the sending terminal receives the instruction and opens the error correction interface to enter the edit text state, and the correction interface may display an input interface such as a virtual keyboard or a handwriting keyboard for the user. Correct the error. The user can add or delete text information through a virtual keyboard or the like.

之後，本方法還可以包括： Thereafter, the method can further include:

S304，顯示編輯後文字資訊，並將編輯後文字資訊發送至伺服器。 S304: Display the edited text information, and send the edited text information to the server.

在這一步驟中，發送終端的使用者編輯之後的編輯後文字資訊已顯示在發送端，該文字資訊同時由發送終端上傳至伺服器中，由該伺服器發送至接收方並進行同步顯示，本發明不再贅述。 In this step, the edited text information after the user of the sending terminal is edited is displayed on the sending end, and the text information is simultaneously uploaded by the sending terminal to the server, and sent by the server to the receiving party and synchronized. The present invention will not be described again.

在一較佳實施例中，步驟S302之後還可以包括： In a preferred embodiment, after step S302, the method may further include:

步驟S302a，接收伺服器發送的輔助修改資訊；在這一步驟中，將在語音識別過程中產生的詞圖(word graph)及識別詞多候選資訊發送至發送終端，可以輔助發送終端使用者更高效的對識別文字進行錯誤糾正。 Step S302a, receiving the auxiliary modification information sent by the server; in this step, sending the word graph and the recognition word multi-candidate information generated in the speech recognition process to the transmitting terminal, which can assist the transmitting terminal user to further Efficiently correct errors in recognized text.

在步驟S303中，該錯誤糾正介面不僅可以顯示文字資訊進入編輯狀態、虛擬鍵盤或者手寫鍵盤等輸入介面，同時可以顯示步驟S302a中伺服器發送的輔助修改資訊，例如，當伺服器認為語音識別之後產生的文字資訊中某一句話或某一個詞不符合語法構成，則可以在該句或該詞的下方加上虛線底線，同時在發送終端顯示介面的其他位置(例如輸入介面)顯示伺服器發送來的輔助修改資訊中包含的多個候選詞，供使用者點選正確的候選詞。或者，當發送方選擇錯誤糾正、並點擊識別錯誤的某字詞時，可透過輔助糾正資訊得到該字詞的其他候選字詞，並顯示在虛擬鍵盤上，使用者可透過點擊正確候選高效的進行錯誤糾正。 In step S303, the error correction interface can display not only the text information into the editing interface, the virtual keyboard or the handwriting keyboard, but also the auxiliary modification information sent by the server in step S302a, for example, after the server considers the voice recognition. If a certain sentence or a certain word in the generated text information does not conform to the grammatical composition, the dotted bottom line may be added to the sentence or the word, and the server may send the other position (such as the input interface) of the sending terminal display interface. The auxiliary modification information included in the auxiliary modification information is for the user to select the correct candidate word. Or, when the sender selects an error correction and clicks on a word identifying the error, other candidate words of the word can be obtained through the auxiliary correction information, and displayed on the virtual keyboard, the user can click the correct candidate to efficiently Make error corrections.

在一較佳實施例中，步驟S302之後還包括： In a preferred embodiment, after step S302, the method further includes:

S302b，在接收到播放語音資訊指令後，播放語音資訊；在該步驟中，若發送終端的使用者透過點擊所顯示的文字資訊等方式發出播放語音資訊指令，則發送終端可以透過聽筒或揚聲器播放在步驟3101中錄製的語音資訊。 S302b, after receiving the command to play the voice information, playing the voice information; in this step, if the user of the sending terminal issues a voice message command by clicking the displayed text information, the sending terminal may The voice information recorded in step 3101 is played through the earpiece or speaker.

綜上所述，本發明第三實施例提出一種基於語音識別的即時通信方法，將語音資訊透過識別產生文字資訊，並提供錯誤糾正功能，可以讓發送終端的使用者能夠修改識別後的文字資訊。該實施例提供的即時通信方法方便了接收終端接收資訊，克服了某些場合下接收終端收到語音資訊後無法收聽的障礙，避免了使用者隱私洩露的問題，同時保證了接收終端接收到資訊的準確性。 In summary, the third embodiment of the present invention provides an instant messaging method based on voice recognition, which generates voice information through recognition and provides error correction function, so that the user of the transmitting terminal can modify the recognized text information. . The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and ensures that the receiving terminal receives the information. The accuracy.

較佳地，本發明第三實施例還可以接收伺服器發出的輔助修改資訊，可以讓使用者高效地修改文字資訊，進一步提高了資訊的準確性和及時性。 Preferably, the third embodiment of the present invention can also receive the auxiliary modification information sent by the server, so that the user can modify the text information efficiently, thereby further improving the accuracy and timeliness of the information.

Fourth embodiment

本發明第四實施例提出一種基於語音識別的即時通信方法，如圖4所示為本發明第四實施例的基於語音識別的即時通信方法的流程圖。本發明第四實施例中的即時通信方法應用於資訊的接收終端，包括如下步驟： A fourth embodiment of the present invention provides a voice communication based instant messaging method. FIG. 4 is a flowchart of a voice recognition based instant messaging method according to a fourth embodiment of the present invention. The instant messaging method in the fourth embodiment of the present invention is applied to a receiving terminal of information, and includes the following steps:

S401，接收伺服器發送的語音資訊；在這一步驟中，發送終端錄製語音資訊並發送至伺服器，在由伺服器將該語音資訊發送至接收終端； S401: Receive voice information sent by the server; in this step, the sending terminal records the voice information and sends the voice information to the server, where the voice information is sent by the server to the receiving terminal;

S402，接收伺服器發送的識別該語音資訊後產生的文字資訊；在這一步驟中，伺服器將該語音資訊經過語音識別產生文字資訊之後，發送至接收終端，接收終端接收經過識別產生的這一文字資訊。 S402, receiving text information generated by the server after identifying the voice information; in this step, the server sends the voice information to the text message after being voice-recognized, and then sent to the receiving terminal, and the receiving terminal receives the knowledge. Don't generate this textual information.

值得注意的是，步驟S401和步驟S402可以同時或先後執行，即接收終端可以同時或先後接收語音資訊和產生的文字資訊，本發明並不特別限制。較佳地，伺服器將語音資訊轉成文字資訊之後，再將語音資訊和文字資訊同時發送給接收終端，接收終端同時接收該語音資訊和該文字資訊。 It should be noted that the step S401 and the step S402 can be performed simultaneously or sequentially, that is, the receiving terminal can receive the voice information and the generated text information simultaneously or sequentially, and the present invention is not particularly limited. Preferably, after the server converts the voice information into text information, the voice information and the text information are simultaneously sent to the receiving terminal, and the receiving terminal simultaneously receives the voice information and the text information.

S403，顯示並標記該文字資訊；在這一步驟中，接收終端可以將該文字資訊顯示於即時通信的介面上。由於該文字資訊是由語音資訊經過識別後產生，為了將其區別於發送方直接以文字輸入的文字資訊，可以對該文字資訊進行標記，例如透過設置特別的底色、字體、標記特別的字元(例如“語音識別”或“ASR”)來區分普通文字資訊和語音識別的文字資訊。 S403, displaying and marking the text information; in this step, the receiving terminal can display the text information on the interface of the instant messaging. Since the text information is generated after the voice information is identified, in order to distinguish it from the text information directly input by the sender in the text, the text information can be marked, for example, by setting a special background color, font, and marking a special word. Meta (such as "voice recognition" or "ASR") to distinguish between plain text information and speech recognition text information.

在標記該文字資訊中，一種可能的方式是，當接收終端接收到語音資訊和對應於該語音資訊的文字資訊，則接收終端將該文字資訊進行標記，使之區別於伺服器發來的由發送終端直接以文字形式輸入的文字資訊；另一種可能的方式是，伺服器在發送該文字資訊時同時發送標記，該標記與該文字資訊同時顯示於接收終端的顯示介面上。在這一種情況下，步驟S402之後還包括： In the marking of the text information, a possible way is that when the receiving terminal receives the voice information and the text information corresponding to the voice information, the receiving terminal marks the text information to distinguish it from the server. The text information input by the sending terminal directly in the form of text; another possible way is that the server simultaneously sends a mark when transmitting the text information, and the mark and the text information are simultaneously displayed on the display interface of the receiving terminal. In this case, after step S402, the method further includes:

S402a，接收伺服器發送的標記資訊。 S402a, receiving the tag information sent by the server.

在這一步驟中，這一標記資訊例如可以為設置特別的底色、字體、標記特別的字元(例如“語音識別”或 “ASR”)等。 In this step, the mark information can be, for example, a special background color, a font, a special character (such as "speech recognition" or "ASR") and so on.

較佳地，在步驟S403之後，該方法還可以包括： Preferably, after step S403, the method may further include:

S404，當接收到使用者的播放該語音資訊的指令，播放該語音資訊；在這一實施例中，播放語音資訊的指令可以為使用者點擊該文字資訊，當使用者點擊所顯示的文字資訊，該接收終端怎透過聽筒或揚聲器播放步驟S401中接收到的語音資訊；較佳地，在步驟S403之後，該方法還可以包括： S404, when receiving the instruction of the user to play the voice information, playing the voice information; in this embodiment, the instruction for playing the voice information may be the user clicking the text information, when the user clicks on the displayed text information. How the receiving terminal plays the voice information received in step S401 through the earpiece or the speaker; Preferably, after step S403, the method may further include:

S405，接收伺服器發送的編輯後文字資訊，並顯示編輯後文字資訊；在該步驟中，當發送終端對文字資訊進行錯誤糾正後，發送終端將糾正後文字資訊發送至伺服器，由伺服器發送至接收終端，接收終端接收該編輯後文字資訊，並進行顯示。較佳地，接收終端可以用編輯後文字資訊覆蓋修改之前的文字資訊。 S405, receiving the edited text information sent by the server, and displaying the edited text information; in this step, after the sending terminal performs error correction on the text information, the sending terminal sends the corrected text information to the server, and the server Sended to the receiving terminal, the receiving terminal receives the edited text information and displays it. Preferably, the receiving terminal can overwrite the text information before modification with the edited text information.

綜上所述，本發明第四實施例提出一種基於語音識別的即時通信方法，將語音資訊透過識別產生文字資訊，並提供錯誤糾正功能，可以讓接收終端的使用者直接接收經過語音識別的文字資訊，並能夠明確該文字資訊是由發送終端直接以文字形式發出還是經過語音識別後產生的文字資訊。該實施例提供的即時通信方法方便了接收終端接收資訊，克服了某些場合下接收終端收到語音資訊後無法收聽的障礙，避免了使用者隱私洩露的問題。 In summary, the fourth embodiment of the present invention provides a voice communication-based instant messaging method, which generates voice information through recognition and provides an error correction function, so that the user of the receiving terminal can directly receive the voice-recognized text. Information, and can be clear whether the text information is sent by the sending terminal directly in text form or after speech recognition. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.

圖5所示為對應於本發明第一實施例的基於語音識別的即時通信方法的即時通信系統，如圖5所示，該實施例中的即時通信系統500包括如下模組：語音資訊接收模組501，用於接收發送終端發送的語音資訊；文字資訊產生模組502，用於將該語音資訊進行語音識別，產生文字資訊；第一發送模組503，用於將該語音資訊發送至接收終端；以及第二發送模組504，用於將該文字資訊發送至接收終端。 FIG. 5 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the first embodiment of the present invention. As shown in FIG. 5, the instant messaging system 500 in this embodiment includes the following module: a voice information receiving module. The group 501 is configured to receive voice information sent by the sending terminal, and the text information generating module 502 is configured to perform voice recognition on the voice information to generate text information. The first sending module 503 is configured to send the voice information to the receiving. The terminal and the second sending module 504 are configured to send the text information to the receiving terminal.

圖6所示為對應於本發明第二實施例的基於語音識別的即時通信方法的即時通信系統600，如圖6所示，在一較佳實施例中，除了上述語音資訊接收模組601、文字資訊產生模組602、第一發送模組603、第二發送模組604之外，所述系統600還包括：第三發送模組605，用於將該文字資訊發送至發送終端。 FIG. 6 shows an instant messaging system 600 corresponding to the voice recognition based instant messaging method according to the second embodiment of the present invention. As shown in FIG. 6, in a preferred embodiment, in addition to the voice information receiving module 601, In addition to the text information generating module 602, the first sending module 603, and the second sending module 604, the system 600 further includes: a third sending module 605, configured to send the text information to the sending terminal.

此外，所述系統600還包括：資訊收發模組606，用於接收所述發送終端發出的編輯後文字資訊，並發送至接收終端。 In addition, the system 600 further includes: an information transceiver module 606, configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.

在一較佳實施例中，所述系統還包括：第一儲存模組607，將該文字資訊儲存於資料庫。 In a preferred embodiment, the system further includes: a first storage module 607, configured to store the text information in a database.

在一較佳實施例中，所述系統還包括：第四發送模組608，用於將輔助錯誤糾正資訊發送至發送終端；以及資訊收發模組609，用於接收所述發送終端發出的編輯後文字資訊，並發送至接收終端。 In a preferred embodiment, the system further includes: The fourth sending module 608 is configured to send the auxiliary error correction information to the sending terminal, and the information sending and receiving module 609 is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.

在一較佳實施例中，所述系統還包括：文字資訊關聯模組610，用於將編輯後文字資訊發送至資料庫，並與糾正前的所述文字資訊關聯。 In a preferred embodiment, the system further includes: a text information association module 610, configured to send the edited text information to the database, and associate with the text information before correction.

在一較佳實施例中，所述輔助錯誤糾正資訊包括針對所述文字資訊的指定字、詞或句的詞圖和候選字詞。 In a preferred embodiment, the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information.

在一較佳實施例中，所述指定字、詞或句的詞圖和候選字詞從所述資料庫中獲得。 In a preferred embodiment, the word map and candidate words of the specified word, word or sentence are obtained from the database.

在一較佳實施例中，所述第一發送模組和所述第二發送模組同時執行，將所述將該語音資訊和所述文字資訊同時發送至接收終端。 In a preferred embodiment, the first sending module and the second sending module are simultaneously executed, and the voice information and the text information are simultaneously sent to the receiving terminal.

圖7所示為對應於本發明第三實施例的基於語音識別的即時通信方法的即時通信系統，如圖7所示，該實施例中的即時通信系統700包括如下模組：語音資訊錄製發送模組701，用於錄製語音資訊並發送至伺服器；文字資訊接收顯示模組702，用於接收經過識別該語音資訊產生的文字資訊，並顯示該文字資訊；編輯模組703，用於在接收到糾正操作指令後，進入編輯文字資訊的介面；顯示發送模組704，用於顯示編輯後文字資訊，並將編輯後文字資訊發送至伺服器。 FIG. 7 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the third embodiment of the present invention. As shown in FIG. 7, the instant messaging system 700 in this embodiment includes the following module: voice information recording and sending. The module 701 is configured to record voice information and send it to the server; the text information receiving display module 702 is configured to receive text information generated by identifying the voice information, and display the text information; and the editing module 703 is configured to After receiving the correct operation instruction, enter the interface for editing the text information; the display sending module 704 is configured to display the edited text information, and The edited text message is sent to the server.

在一較佳實施例中，所述系統還包括：輔助修改資訊接收模組705，用於接收伺服器發送的輔助修改資訊。 In a preferred embodiment, the system further includes: an auxiliary modification information receiving module 705, configured to receive auxiliary modification information sent by the server.

在一較佳實施例中，所述輔助錯誤糾正資訊包括針對所述文字資訊的指定字、詞或句的詞圖和候選字詞，所述候選字詞顯示在所述編輯文字資訊的介面中。 In a preferred embodiment, the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information, and the candidate word is displayed in the interface of the edited text information. .

在一較佳實施例中，所述編輯文字資訊的介面包括輸入介面。 In a preferred embodiment, the interface for editing text information includes an input interface.

在一較佳實施例中，所述系統還包括：語音資訊播放模組706，用於在接收到播放語音資訊指令後，播放語音資訊。 In a preferred embodiment, the system further includes: a voice information playing module 706, configured to play the voice information after receiving the command to play the voice information.

在一較佳實施例中，所述播放語音資訊指令透過使用者點擊該文字資訊產生。 In a preferred embodiment, the playing voice information command is generated by the user clicking the text message.

圖8所示為對應於本發明第四實施例的基於語音識別的即時通信方法的即時通信系統，如圖8所示，該實施例中的即時通信系統800包括如下模組：語音資訊獲取模組801，用於接收伺服器發送的語音資訊；文字資訊獲取模組802，用於接收伺服器發送的識別該語音資訊後產生的文字資訊；文字資訊顯示標記模組803，用於顯示並標記該文字資訊。 8 is an instant messaging system corresponding to the voice recognition based instant messaging method according to the fourth embodiment of the present invention. As shown in FIG. 8, the instant messaging system 800 in this embodiment includes the following module: a voice information acquisition module. The group 801 is configured to receive voice information sent by the server, and the text information obtaining module 802 is configured to receive text information generated by the server after the voice information is recognized, and the text information display mark module 803 is configured to display and mark The text information.

在一較佳實施例中，所述系統還包括：標記資訊獲取模組804，用於接收伺服器發送的標記資訊。 In a preferred embodiment, the system further includes: The tag information obtaining module 804 is configured to receive tag information sent by the server.

在一較佳實施例中，所述文字資訊獲取模組和所述標記資訊獲取模組同時執行，將所述文字資訊和所述標記資訊同時獲取。 In a preferred embodiment, the text information acquisition module and the mark information acquisition module are simultaneously executed, and the text information and the mark information are simultaneously acquired.

在一較佳實施例中，文字資訊顯示標記模組用於顯示所述文字資訊，利用所述標記資訊對所述文字資訊進行標記。 In a preferred embodiment, the text information display tag module is configured to display the text information, and mark the text information by using the tag information.

在一較佳實施例中，所述系統還包括：語音資訊播放模組805，用於當接收到使用者的播放該語音資訊的指令，播放該語音資訊。 In a preferred embodiment, the system further includes: a voice information playing module 805, configured to play the voice information when receiving an instruction of the user to play the voice information.

在一較佳實施例中，所述播放該語音資訊的指令透過使用者點擊該文字資訊產生。 In a preferred embodiment, the instruction to play the voice message is generated by the user clicking the text message.

在一較佳實施例中，所述系統還包括：接收顯示模組806，用於接收伺服器發送的編輯後文字資訊，並顯示該編輯後文字資訊。 In a preferred embodiment, the system further includes: a receiving display module 806, configured to receive the edited text information sent by the server, and display the edited text information.

在一較佳實施例中，所述編輯後文字資訊以覆蓋編輯前文字資訊的方式顯示。 In a preferred embodiment, the edited text information is displayed in a manner that covers the pre-edit text information.

對於裝置實施例而言，由於其與方法實施例基本相似，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。 For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

綜上所述，本發明實施例提出的基於語音識別的即時通信方法和即時通信系統，至少具有以下優點： In summary, the voice recognition based instant messaging method and the instant messaging system proposed by the embodiments of the present invention have at least the following advantages:

(1)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，透過語音識別功能，克服了接收終端獲得資訊的障礙，方便了使用者的使用，避免了隱私洩露的問題。 (1) Voice recognition based instant messaging proposed by the embodiment of the present invention In the method and the instant communication system, the voice recognition function overcomes the obstacle of obtaining information by the receiving terminal, which is convenient for the user to use and avoids the problem of privacy leakage.

(2)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，透過錯誤修改功能，使得發送終端有機會糾正語音識別系統的錯誤； (2) The voice recognition-based instant messaging method and the instant messaging system according to the embodiment of the present invention enable the transmitting terminal to correct the error of the voice recognition system through the error modification function;

(3)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，透過資料收集功能，獲得真實識別錯誤資料以改進語音識別系統的性能。 (3) The voice recognition-based instant messaging method and the instant messaging system according to the embodiment of the present invention obtain real identification error data through the data collection function to improve the performance of the voice recognition system.

(4)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，錯誤糾正的步驟方便發送終端進行錯誤糾正； (4) In the voice communication-based instant messaging method and the instant messaging system according to the embodiment of the present invention, the error correction step facilitates the transmission terminal to perform error correction;

(5)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，資訊標記的步驟方便接收終端辨識收到的資訊是虛擬鍵盤輸入還是語音資訊； (5) In the voice communication-based instant messaging method and the instant messaging system according to the embodiment of the present invention, the step of information marking is convenient for the receiving terminal to recognize whether the received information is virtual keyboard input or voice information;

(6)本發明實施例提出的基於語音識別的即時通信方法和即時通信系統中，如果是語音資訊，接收終端可以點選識別語音資訊後產生的文字資訊，對原始的語音資訊進行重播。 (6) In the voice communication-based instant communication method and the instant communication system according to the embodiment of the present invention, if it is voice information, the receiving terminal can select the text information generated after the voice information is recognized, and replay the original voice information.

本說明書中的各個實施例均採用遞進的方式描述，每個實施例重點說明的都是與其他實施例的不同之處，各個實施例之間相同相似的部分互相參見即可。 The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

本領域內的技術人員應明白，本發明實施例的實施例可提供為方法、裝置、或電腦程式產品。因此，本發明實施例可採用完全硬體實施例、完全軟體實施例、或結合軟體和硬體方面的實施例的形式。而且，本發明實施例可採用在一個或多個其中包含有電腦可用程式碼的電腦可用儲存媒體(包括但不限於磁碟記憶體、CD-ROM、光學記憶體等)上實施的電腦程式產品的形式。 Those skilled in the art will appreciate that embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product. Therefore, the present invention The embodiment may take the form of a completely hard embodiment, a fully software embodiment, or an embodiment combining soft and hardware aspects. Moreover, embodiments of the present invention may employ computer program products implemented on one or more computer usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) including computer usable code. form.

在一個典型的配置中，所述電腦設備包括一個或多個處理器(CPU)、輸入/輸出介面、網路介面和記憶體。記憶體可能包括電腦可讀媒體中的非永久性記憶體，隨機存取記憶體(RAM)和/或非易失性記憶體等形式，如唯讀記憶體(ROM)或快閃記憶體(flash RAM)。記憶體是電腦可讀媒體的示例。電腦可讀媒體包括永久性和非永久性、可移動和非可移動媒體可以由任何方法或技術來實現信號儲存。信號可以是電腦可讀指令、資料結構、程式的模組或其他資料。電腦的儲存媒體的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可擦除可程式設計唯讀記憶體(EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟唯讀記憶體(CD-ROM)、數位多功能光碟(DVD)或其他光學儲存、磁盒式磁帶，磁帶磁磁片儲存或其他磁性存放裝置或任何其他非傳輸媒體，可用於儲存可以被計算設備訪問的信號。按照本文中的界定，電腦可讀媒體不包括暫態性的電腦可讀媒體(transitory media)，如調變的資料信號和載波。 In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory. The memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory ( Flash RAM). Memory is an example of a computer readable medium. Computer readable media including both permanent and non-permanent, removable and non-removable media can be stored by any method or technology. The signals can be computer readable instructions, data structures, modules of programs, or other materials. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory (RAM). Read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM only, digitally versatile A compact disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic storage device or any other non-transportable medium can be used to store signals that can be accessed by the computing device. As defined herein, computer readable media does not include transitory computer readable media, such as modulated data signals and carrier waves.

本發明實施例是參照根據本發明實施例的方法、終端設備(系統)、和電腦程式產品的流程圖和/或方塊圖來描述的。應理解可由電腦程式指令實現流程圖和/或方框圖中的每一流程和/或方塊、以及流程圖和/或方塊圖中的流程和/或方塊的結合。可提供這些電腦程式指令到通用電腦、專用電腦、嵌入式處理機或其他可程式設計資料處理終端設備的處理器以產生一個機器，使得透過電腦或其他可程式設計資料處理終端設備的處理器執行的指令產生用於實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能的裝置。 Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of a method, a terminal device (system), and a computer program product according to an embodiment of the invention. It will be understood that each flow and/or block of flowcharts and/or block diagrams, and combinations of flow and/or blocks in the flowcharts. These computer program instructions can be provided to a processor of a general purpose computer, a special purpose computer, an embedded processor or other programmable data processing terminal device to generate a machine for execution by a processor of a computer or other programmable data processing terminal device The instructions generate means for implementing the functions specified in one or more flows of the flowchart or in a block or blocks of the block diagram.

這些電腦程式指令也可儲存在能引導電腦或其他可程式設計資料處理終端設備以特定方式工作的電腦可讀記憶體中，使得儲存在該電腦可讀記憶體中的指令產生包括指令裝置的製造品，該指令裝置實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能。 The computer program instructions can also be stored in a computer readable memory that can boot a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory include the manufacture of the instruction device. The instruction means implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

這些電腦程式指令也可裝載到電腦或其他可程式設計資料處理終端設備上，使得在電腦或其他可程式設計終端設備上執行一系列操作步驟以產生電腦實現的處理，從而在電腦或其他可程式設計終端設備上執行的指令提供用於實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能的步驟。 These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device to perform a series of operational steps on a computer or other programmable terminal device to produce computer-implemented processing for use on a computer or other programmable computer. The instructions executed on the design terminal device provide steps for implementing the functions specified in one or more flows of the flowchart or in a block or blocks of the flowchart.

儘管已描述了本發明實施例的較佳實施例，但本領域內的技術人員一旦得知了基本創造性概念，則可對這些實施例做出另外的變更和修改。所以，所附申請專利範圍意欲解釋為包括較佳實施例以及落入本發明實施例範圍的所有變更和修改。 While a preferred embodiment of the present invention has been described, it will be apparent that those skilled in the art can make various changes and modifications to the embodiments. Therefore, the scope of the attached patent application is intended All changes and modifications that come within the scope of the embodiments of the invention are intended to be included.

最後，還需要說明的是，在本文中，諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來，而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且，術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、物品或者終端設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、物品或者終端設備所固有的要素。在沒有更多限制的情況下，由語句“包括一個......”限定的要素，並不排除在包括所述要素的過程、方法、物品或者終端設備中還存在另外的相同要素。 Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

以上對本發明所提供的一種基於語音識別的即時通信方法和即時通信系統，進行了詳細介紹，本文中應用了具體個例對本發明的原理及實施方式進行了闡述，以上實施例的說明只是用於幫助理解本發明的方法及其核心思想；同時，對於本領域的一般技術人員，依據本發明的思想，在具體實施方式及應用範圍上均會有改變之處，綜上所述，本說明書內容不應理解為對本發明的限制。 The above is a detailed description of a voice recognition-based instant messaging method and an instant messaging system provided by the present invention. The principles and implementations of the present invention are described in detail herein. The description of the above embodiments is only used for To help understand the method of the present invention and its core idea; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes in specific embodiments and application scopes. It should not be construed as limiting the invention.

Claims

An instant messaging method based on voice recognition, comprising: receiving voice information sent by a sending terminal; performing voice recognition on the voice information to generate text information; transmitting the voice information to a receiving terminal; and transmitting the text information To the receiving terminal.

The instant messaging method of claim 1, wherein after the voice information is voice-recognized to generate text information, the method further comprises: transmitting the text information to the transmitting terminal.

The instant messaging method of claim 2, wherein after the text information is sent to the transmitting terminal, the method further comprises: receiving the edited text information sent by the transmitting terminal, and transmitting the information to the receiving terminal.

The instant messaging method according to claim 3, wherein after the voice information is voice-recognized to generate text information, and after receiving the edited text information sent by the transmitting terminal, and transmitting to the receiving terminal, The method further includes transmitting the auxiliary error correction information to the transmitting terminal, the auxiliary error correction information including a word map and a candidate word for the specified word, word or sentence of the text information.

The instant messaging method as described in claim 2, After the voice information is voice-recognized and the text information is generated, the method further includes: storing the text information in the database; after performing the voice recognition on the voice information to generate the text information, the method further includes: The auxiliary error correction information is sent to the sending terminal; the edited text information sent by the sending terminal is received and sent to the receiving terminal; and after the edited text information sent by the sending terminal is sent to the receiving terminal, the method further includes: The edited text message is sent to the database and associated with the text message before correction.

The instant messaging method of claim 5, wherein the auxiliary error correction information comprises a word map and a candidate word for a specified word, word or sentence for the text information, the word of the specified word, word or sentence. Figures and candidate words are obtained from this database.

An instant communication method based on voice recognition, comprising: recording voice information and transmitting to a server; receiving text information generated by identifying the voice information, and displaying the text information; after receiving the correct operation instruction, entering Edit the interface of the text message; display the edited text message and send the edited text message to the server Server.

The instant messaging method of claim 7, wherein after receiving the text information generated by identifying the voice information and displaying the text information, the method further comprises: receiving the auxiliary modification information sent by the server, where The auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information, and the candidate word is displayed in the interface of the edited text information.

The instant messaging method of claim 7, wherein after receiving the text information generated by the recognized voice information and displaying the text information, the method further comprises: after receiving the instruction to play the voice information, playing Voice information.

The instant messaging method of claim 9, wherein the playing voice information command is generated by the user clicking the text message.

An instant messaging method based on voice recognition, comprising: receiving voice information sent by a server; receiving text information generated by the server after identifying the voice information; and displaying and marking the text information.

The instant messaging method of claim 11, wherein the method further comprises: receiving the tag information sent by the server.

An instant messaging method as described in claim 12, The step of displaying and marking the text information includes: displaying the text information, and marking the text information by using the mark information.

The instant messaging method of claim 11, wherein after the step of displaying and marking the text information, the method further comprises: when receiving the user's instruction to play the voice information, playing the voice information, The instruction to play the voice message is generated by the user clicking the text message.

The instant messaging method of claim 11, wherein after the step of displaying and marking the text information, the method further comprises: receiving the edited text information sent by the server, and displaying the edited text information.

The instant messaging method of claim 15, wherein the edited text information is displayed in a manner covering the pre-edit text information.

An instant messaging system based on voice recognition, comprising: a voice information receiving module, configured to receive voice information sent by a sending terminal; and a text information generating module, configured to perform voice recognition on the voice information to generate text information a first sending module, configured to send the voice information to the receiving terminal; And a second sending module, configured to send the text information to the receiving terminal.

The instant messaging system of claim 17, wherein the system further comprises: a third sending module, configured to send the text information to the transmitting terminal.

The instant messaging system of claim 18, wherein the system further comprises: an information transceiving module, configured to receive the edited text information sent by the transmitting terminal, and send the information to the receiving terminal.

The instant messaging system of claim 19, wherein the system further comprises: a fourth sending module, configured to send the auxiliary error correction information to the sending terminal, the auxiliary error correcting information including the text information Specify word and candidate words for words, words, or sentences.

The instant messaging system of claim 18, wherein the system further comprises: a first storage module, wherein the text information is stored in a fourth sending module of the database, for sending auxiliary error correction information to a sending terminal; the information receiving and receiving module is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal; and the text information associating module, configured to send the edited text information to the database, and before the correction The text information is associated.

The instant messaging system of claim 21, wherein the auxiliary error correction information includes a specified word for the text information, A word map and a candidate word for a word or sentence from which the word map and candidate words of the specified word, sentence or sentence are obtained.

An instant messaging system based on voice recognition, comprising: a voice information recording and transmitting module, configured to record voice information and sent to a server; and a text information receiving display module, configured to receive the voice information generated by the voice information Text information, and display the text information; the editing module is used to enter the interface for editing the text information after receiving the correct operation instruction; and the display sending module is used to display the edited text information, and the edited text information is Send to the server.

The instant messaging system of claim 23, wherein the system further comprises: an auxiliary modification information receiving module, configured to receive auxiliary modification information sent by the server, the auxiliary error correction information including the text information A word map and a candidate word of a word, a word, or a sentence are displayed, and the candidate word is displayed in the interface of the edited text information.

The instant messaging system of claim 23, wherein the system further comprises: a voice information playing module, configured to play the voice information after receiving the instruction to play the voice information.

The instant messaging system of claim 25, wherein the playing voice information command is clicked by the user to click the text information product Health.

An instant messaging system based on voice recognition, comprising: a voice information acquiring module, configured to receive voice information sent by a server; and a text information acquiring module, configured to receive a voice information sent by a server to generate the voice information Text information; and a text message display tag module for displaying and tagging the text message.

The instant messaging system of claim 27, wherein the system further comprises: a tag information acquisition module, configured to receive tag information sent by the server.

The instant messaging system of claim 28, wherein the text information display tag module is configured to display the text information, and use the tag information to mark the text information.

The instant messaging system of claim 27, wherein the system further comprises: a voice information playing module, configured to: when receiving an instruction of the user to play the voice information, play the voice information, and play the The voice message command is generated by the user clicking the text message.

The instant messaging system of claim 27, wherein the system further comprises: a receiving display module, configured to receive the edited text sent by the server Message, and display the edited text information.

The instant messaging system of claim 31, wherein the edited text information is displayed in a manner that covers pre-edit text information.