WO2017128991A1

WO2017128991A1 - Instant communication method and instant communication system based on voice recognition

Info

Publication number: WO2017128991A1
Application number: PCT/CN2017/071382
Authority: WO
Inventors: 鄢志杰
Original assignee: 阿里巴巴集团控股有限公司; 鄢志杰
Priority date: 2016-01-26
Filing date: 2017-01-17
Publication date: 2017-08-03
Also published as: TW201733376A; TWI774654B; CN106997764A; CN106997764B

Abstract

An instant communication method and an instant communication system based on voice recognition. The instant communication method comprises: receiving voice information sent by a sending terminal (S101); performing voice recognition on the voice information to generate text information (S102); sending the voice information to a receiving terminal (S103); and sending the text information to the receiving terminal (S104). The instant communication method and system overcome the obstacle that a receiving terminal cannot listen to voice information after receiving same on certain occasions, thereby avoiding the problem of privacy leaks of a user.

Description

Instant messaging method based on speech recognition and instant communication system

Technical field

The present application relates to the field of instant messaging technologies, and in particular, to a voice communication based instant messaging method and an instant messaging system.

Background technique

The social app intercom chat through mobile phone or tablet is a convenient function commonly used by many softwares, such as Tencent's WeChat, Ali's nail, Alipay, Taobao, etc. all have such a function. At present, the main implementation of such functions is that the transmitting terminal records its own message by voice, and the receiving party clicks on the received information and listens through the handset or externally.

Such a function brings a certain obstacle to the receiving terminal while facilitating the transmitting terminal. The main disadvantage is that the receiving terminal can't see the information content at the same time as the text information. You need to tap and hold the phone or tablet to the ear to listen to it, or use the speaker of the mobile phone or tablet to put it on, in many occasions ( For example, there are other people in the meeting or next to it. This is very inconvenient, and there may be problems with privacy leaks.

Summary of the invention

In view of the above problems, embodiments of the present application have been made in order to provide a voice recognition based instant messaging method and instant messaging system that overcomes the above problems or at least partially solves the above problems.

To solve the above problem, the present application discloses a voice recognition based instant messaging method, including:

Receiving voice information sent by the sending terminal;

Performing voice recognition on the voice information to generate text information;

Transmitting the voice information to the receiving terminal;

The text message is sent to the receiving terminal.

Another embodiment of the present application provides a voice communication based instant messaging method, including:

Record voice messages and send them to the server;

Receiving text information generated by recognizing the voice information, and displaying the text information;

After receiving the corrective operation instruction, enter an interface for editing the text information;

The edited text message is displayed and the edited text message is sent to the server.

Receiving voice information sent by the server;

Receiving text information generated by the server after identifying the voice information;

Display and mark this text message.

An embodiment of the present application provides an instant messaging system based on voice recognition, which includes:

a voice information receiving module, configured to receive voice information sent by the sending terminal;

a text information generating module, configured to perform voice recognition on the voice information to generate text information;

a first sending module, configured to send the voice information to the receiving terminal;

The second sending module is configured to send the text information to the receiving terminal.

Another embodiment of the present application provides a voice recognition based instant messaging system, including:

a voice information recording and sending module, configured to record voice information and send it to a server;

a text information receiving display module, configured to receive text information generated by identifying the voice information, and display the text information;

The editing module is configured to enter an interface for editing the text information after receiving the correct operation instruction;

The display sending module is configured to display the edited text information and send the edited text information to the server.

Another embodiment of the present application provides an instant messaging system based on voice recognition, including:

a voice information acquiring module, configured to receive voice information sent by the server;

a text information obtaining module, configured to receive text information generated by the server after identifying the voice information;

A text message display tag module for displaying and marking the text message.

The embodiments of the present application have at least the following advantages:

In the voice communication-based instant communication method and the instant communication system proposed by the embodiment of the present application, the voice information and the text information are all sent to the receiving terminal through the voice recognition function, thereby overcoming the obstacle of obtaining information by the receiving terminal, and facilitating the user. Use, to avoid the problem of privacy leaks.

DRAWINGS

1 is a flow chart of a voice recognition based instant messaging method according to a first embodiment of the present application.

2 is a flow chart of a voice recognition based instant messaging method according to a second embodiment of the present application.

3 is a flow chart of a voice recognition based instant messaging method according to a third embodiment of the present application.

4 is a flow chart of a voice recognition based instant messaging method according to a fourth embodiment of the present application.

Figure 5 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the first embodiment of the present application.

6 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the second embodiment of the present application.

Figure 7 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the third embodiment of the present application.

8 is a block diagram of an instant messaging system corresponding to the voice recognition based instant messaging method of the fourth embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments of the present application are within the scope of the present disclosure.

One of the core ideas of the present application is to propose an instant communication method and an instant communication system, which uses voice recognition to identify voice information, and displays the text information directly on the screen of the transmitting terminal and the receiving terminal through the server, thereby facilitating reception. The terminal receives the information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.

First embodiment

The first embodiment of the present application provides a voice communication-based instant messaging method. FIG. 1 is a flowchart of a voice recognition-based instant messaging method according to a first embodiment of the present application. The instant messaging method in the first embodiment of the present application is applied to a server, and includes the following steps:

S101. Receive voice information sent by the sending terminal.

In this step, the transmitting terminal can record the voice information in an instant communication interface (for example, a chat interface), and after the recording is completed, the mark or button is released, and the recording is completed. After that, the transmitting terminal sends the voice information to the server through the network.

S102. Identify the voice information as text information.

In this step, after receiving the voice information sent by the party, the server recognizes the voice information as text information through voice recognition technology. Speech recognition technology is commonly used in the field Technology is not described here.

S103. Send the voice information to the receiving terminal.

In this step, the server transmits the voice information received in step S101 to the receiving terminal.

It should be noted that step S103 may be performed simultaneously with step S102 or sequentially, and when executed sequentially, the sequence of steps of step S102 and step S103 is not particularly limited.

S104. Send the text information generated after the identification to the receiving terminal.

In this step, the server transmits the text information generated after the speech recognition processing to the receiving terminal. Preferably, in this step, the server transmits a specified mark while transmitting the text information for distinguishing the text information converted from the voice information and the text information directly input by the sender in a text manner.

It should be noted that, when step S103 is performed after step S102, step S104 may be performed simultaneously with step S103, or step S104 may be performed before or after step S103, which is not particularly limited.

In an embodiment, step S103 may be performed first, and the voice information received in step S101 is sent to the receiving terminal, and then step S102 is performed to generate voice information by voice recognition, and then step S104 is performed to generate the voice information. The text information is sent to the receiving terminal. In another embodiment, step S102 is performed first, and the voice information received in step S101 is voice-recoordinated to generate text information, and then steps S103 and S104 are performed simultaneously or sequentially to perform voice. The information and the text information generated after the identification are transmitted to the receiving terminal.

In summary, the first embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying the voice information, and sends the voice information and the text information to the receiving terminal through the server. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.

Second embodiment

The second embodiment of the present application provides a voice communication based instant messaging method. FIG. 2 is a flowchart of a voice recognition based instant messaging method according to a second embodiment of the present application. The instant messaging method in the first embodiment of the present application is applied to a server, and includes the following steps:

S201. Receive voice information sent by the sending terminal.

S202. Identify the voice information as text information.

S203. Send the voice information to the receiving terminal.

S204. Send the text information generated after the identification to the receiving terminal.

The above steps S201 to S204 are the same as or similar to the steps S101 to S104 in the first embodiment, and are not described herein.

In a preferred embodiment, after step S202, the method may further include

S205. Send the text information generated after the identification to the sending terminal.

In this step, the server transmits the text information generated in step S202 to the transmitting terminal.

The execution order of the step S205, the step S204, and the step S203 is not limited, and the three may be executed at the same time or sequentially in any order, and the present application is not particularly limited.

In addition, after the step S202, the method may further include:

S206. Store the text information generated after the identification in a database.

In this step, the server sends the recognized text information to the database connected to the server for use. This step S206 can be performed simultaneously with any of the steps S203 to S205 or in any order, and the present application is not particularly limited.

After the step S202, the method may further include:

S207. Send auxiliary error correction information to the sending terminal.

This step may be performed simultaneously with any of steps S203 to S205 or in any order. Execution, this application is not particularly limited. Preferably, step S207 can be performed simultaneously with step S205, that is, while transmitting the text information generated after the identification to the transmitting terminal, the error assisting correction information is simultaneously transmitted to the transmitting terminal, and the transmitting terminal modifies the recognized text information.

In the speech recognition process, a word graph and a plurality of candidate word information will be generated. In step S207, an algorithm may be used according to the information in the word map to recommend an alternative error correction word to the user. . This information can be used to assist in more accurate error correction of the recognized text by returning the transmitting terminal. For example, when the user of the sending terminal selects an error correction and clicks on a word identifying the error, other candidate words of the word can be obtained by assisting the correction information, and displayed on the virtual keyboard, the user can click through the correct word. Candidates efficiently perform error correction. Specifically, for example, the user says, “I want to buy a yellow one”, and the speech recognition error is recognized as “I want to buy red”. When the user clicks the word “red”, the algorithm can prompt according to the word map information. The second candidate for "yellow" is for the user to click. When the user clicks "yellow", the replacement error correction operation is completed, which is very simple and fast.

Thereafter, the method may further include:

Step S208, receiving the edited text information sent by the sending terminal, and transmitting the text information to the receiving terminal;

In this step, after the user of the transmitting terminal completes the correction, the transmitting terminal sends the edited text information to the server, and the server receives the edited text information and sends it to the receiving terminal.

Preferably, after step S208, the application may further include:

In step S209, the edited text information is sent to the database.

In this step, the corrected automatic speech recognition result is of high value and is particularly important. It prompts: 1) the server fails to correctly recognize the voice information; 2) the correct text information of the voice information has been used by the user Given by correction. For such post-editing text information, the speech recognition system's training algorithm can be used to record the erroneous text content, the corresponding speech content and the correct speech content, so as to avoid making similar mistakes thereafter. The ability of such error correction data to self-evolve the speech recognition system is unmatched by other data.

In summary, the second embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying voice information, and sends voice information and text information to the receiving terminal through the server, and sends the text information. To the transmitting terminal, after transmitting to the transmitting terminal, auxiliary modification information is provided, by which the user of the transmitting terminal can be efficiently modified. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and further ensures that the receiving terminal receives the information. The accuracy of the information.

Third embodiment

A third embodiment of the present application provides a voice communication-based instant messaging method. FIG. 3 is a flowchart of a voice recognition-based instant messaging method according to a third embodiment of the present application. The instant messaging method in the third embodiment of the present application is applied to a transmitting terminal of information, and includes the following steps:

S301, recording voice information and sending to the server;

In this step, the transmitting terminal can record voice information in an instant communication interface (such as a chat interface), for example, pressing and holding a specified mark or button of the input box to start recording, and releasing the mark or button after the recording is completed, Recording is complete. After the recording is completed, the instant messaging interface may default to sending directly, or the sending terminal may click another tag or button to send the information to the server over the network.

S302. Receive generated text information after the voice information is recognized by the server, and display the text information.

In this step, the server transmits the voice information sent by the terminal to perform voice recognition to generate text information and transmits the text information to the transmitting terminal, and the transmitting terminal receives the recognized text information and displays it. For example, in the chat interface, the sending terminal sends the recorded voice information to the server in step S301. In this step S302, the sending terminal can receive the text information generated by the server after the identification of the voice information in the same chat interface. And displayed in the chat interface.

S303, after receiving the correct operation instruction, open an error correction interface, and enter an interface for editing the text information;

In this step, when the user of the transmitting terminal considers that the content of the text information generated after the voice recognition is inconsistent with the voice information, the error correction interface can be opened by issuing a correcting operation instruction. For example, the correct operation instruction may press the text information for the user, and the sending terminal receives the instruction and opens the error correction interface to enter the edit text state, and the correction interface may display an input interface such as a virtual keyboard or a handwriting keyboard for the user. Correct the error. The user can add or delete text information through a virtual keyboard or the like.

Thereafter, the method can further include:

S304: Display the edited text information, and send the edited text information to the server.

In this step, the edited text information after editing by the user of the sending terminal is displayed on the sending end, and the text information is simultaneously uploaded by the sending terminal to the server, and sent by the server to the receiving party and displayed synchronously. No longer.

In a preferred embodiment, after step S302, the method may further include:

Step S302a, receiving auxiliary modification information sent by the server;

In this step, the word graph and the multi-candidate candidate information generated in the speech recognition process are transmitted to the transmitting terminal, which can assist the transmitting terminal user to perform error correction on the recognized text more efficiently.

In step S303, the error correction interface can display not only the text information into the editing state, the virtual keyboard or the handwriting keyboard, but also the auxiliary modification information sent by the server in step S302a, for example, generated after the server considers the voice recognition. If a sentence or a certain word in the text message does not conform to the grammatical composition, you can add a dotted underline to the sentence or the word, and display the auxiliary sent by the server at other positions of the sending terminal display interface (such as the input interface). Modify multiple candidate words included in the message for the user to select the correct candidate. Alternatively, when the sender selects an error correction and clicks on a word identifying the error, other candidate words of the word can be obtained by the auxiliary correction information, and displayed on the virtual keyboard, and the user can perform the efficient operation by clicking the correct candidate. Error correction.

In a preferred embodiment, after step S302, the method further includes:

S302b, after receiving the instruction to play the voice information, playing the voice information;

In this step, if the user of the transmitting terminal issues a command to play a voice message by clicking the displayed text information or the like, the transmitting terminal can play the voice information recorded in step 3101 through the earpiece or the speaker.

In summary, the third embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by identifying voice information and provides an error correction function, so that the user of the transmitting terminal can modify the recognized text. information. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, avoids the problem of user privacy leakage, and ensures that the receiving terminal receives the information. The accuracy.

Preferably, the third embodiment of the present application can also receive the auxiliary modification information sent by the server, so that the user can modify the text information efficiently, thereby further improving the accuracy and timeliness of the information.

Fourth embodiment

A fourth embodiment of the present application provides a voice communication based instant messaging method. FIG. 4 is a flowchart of a voice recognition based instant messaging method according to a fourth embodiment of the present application. The instant messaging method in the fourth embodiment of the present application is applied to a receiving terminal of information, and includes the following steps:

S401. Receive voice information sent by a server.

In this step, the transmitting terminal records the voice information and sends it to the server, where the voice information is sent by the server to the receiving terminal;

S402. Receive text information generated by the server and identify the voice information.

In this step, the server generates the text information through voice recognition, and then sends the text information to the receiving terminal, and the receiving terminal receives the text information generated by the recognition.

It should be noted that the step S401 and the step S402 can be performed simultaneously or sequentially, that is, the receiving terminal can receive the voice information and the generated text information simultaneously or sequentially, and the application is not particularly limited. Preferably, after the server converts the voice information into the text information, the voice information and the text information are simultaneously sent to the receiving terminal, and the receiving terminal simultaneously receives the voice information and the text information.

S403, displaying and marking the text information;

In this step, the receiving terminal can display the text information on the interface of the instant messaging. Since the text information is generated by recognizing the voice information, in order to distinguish it from the text information directly input by the sender in the text, the text information can be marked, for example, by setting a special background color, a font, and a special character to be marked. (for example, "voice recognition" or "ASR") to distinguish between plain text information and text information for speech recognition.

In the marking of the text information, a possible way is that when the receiving terminal receives the voice information and the text information corresponding to the voice information, the receiving terminal marks the text information to distinguish it from the server. The text information input by the sending terminal directly in the form of text; another possible way is that the server simultaneously sends a mark when transmitting the text information, and the mark is displayed on the display interface of the receiving terminal simultaneously with the text information. In this case, after step S402, the method further includes:

S402a: Receive tag information sent by the server.

In this step, the flag information may be, for example, a special background color, a font, a special character (for example, "speech recognition" or "ASR"), or the like.

Preferably, after step S403, the method may further include:

S404: When receiving an instruction of the user to play the voice information, playing the voice information;

In this embodiment, the instruction to play the voice information may be that the user clicks on the text information, and when the user clicks on the displayed text information, the receiving terminal plays the voice information received in step S401 through the earpiece or the speaker;

Preferably, after step S403, the method may further include:

S405. Receive the edited text information sent by the server, and display the edited text information.

In this step, after the transmitting terminal performs error correction on the text information, the transmitting terminal sends the corrected text information to the server, and the server sends the corrected text information to the receiving terminal, and the receiving terminal receives the edited text information and displays it. Preferably, the receiving terminal may overwrite the text information before the modification with the edited text information.

In summary, the fourth embodiment of the present application provides a voice communication-based instant messaging method, which generates voice information by recognizing voice information and provides an error correction function, so that the user of the receiving terminal can directly receive the voice recognition. Text information, and can clarify whether the text information is text information generated by the sending terminal directly in text form or after speech recognition. The instant communication method provided by the embodiment facilitates the receiving terminal to receive information, overcomes the obstacle that the receiving terminal cannot listen after receiving the voice information in some occasions, and avoids the problem of user privacy leakage.

FIG. 5 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the first embodiment of the present invention. As shown in FIG. 5, the instant messaging system 500 in this embodiment includes the following modules:

The voice information receiving module 501 is configured to receive voice information sent by the sending terminal.

The text information generating module 502 is configured to perform voice recognition on the voice information to generate text information.

a first sending module 503, configured to send the voice information to the receiving terminal;

The second sending module 504 is configured to send the text information to the receiving terminal.

6 is a view showing an instant messaging system corresponding to the voice recognition based instant messaging method according to the second embodiment of the present invention. As shown in FIG. 6, in a preferred embodiment, in addition to the voice information connection described above. In addition to the receiving module 601, the text information generating module 602, the first sending module 603, and the second sending module 604, the system 600 further includes:

The third sending module 605 is configured to send the text information to the sending terminal.

In addition, the system 600 further includes:

The information transceiver module 606 is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.

In a preferred embodiment, the system further includes:

The first storage module 607 stores the text information in a database.

In a preferred embodiment, the system further includes:

a fourth sending module 608, configured to send the auxiliary error correction information to the sending terminal;

The information transceiver module 609 is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.

In a preferred embodiment, the system further includes:

The text information association module 610 is configured to send the edited text information to the database and associate with the text information before the correction.

In a preferred embodiment, the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information.

In a preferred embodiment, the word map and candidate words of the specified word, word or sentence are obtained from the database.

In a preferred embodiment, the first sending module and the second sending module are simultaneously executed, and the voice information and the text information are simultaneously sent to the receiving terminal.

FIG. 7 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the third embodiment of the present invention. As shown in FIG. 7, the instant messaging system 700 in this embodiment includes the following Module:

The voice information recording and sending module 701 is configured to record voice information and send it to the server;

The text information receiving and displaying module 702 is configured to receive the text information generated by identifying the voice information, and display the text information;

The editing module 703 is configured to enter an interface for editing the text information after receiving the correct operation instruction;

The display sending module 704 is configured to display the edited text information, and send the edited text information to the server.

In a preferred embodiment, the system further includes:

The auxiliary modification information receiving module 705 is configured to receive auxiliary modification information sent by the server.

In a preferred embodiment, the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information, the candidate word being displayed in an interface of the edit text information.

In a preferred embodiment, the interface for editing text information includes an input interface.

In a preferred embodiment, the system further includes:

The voice information playing module 706 is configured to play the voice information after receiving the instruction to play the voice information.

In a preferred embodiment, the playing voice information command is generated by the user clicking the text message.

FIG. 8 shows an instant messaging system corresponding to the voice recognition based instant messaging method according to the fourth embodiment of the present invention. As shown in FIG. 8, the instant messaging system 800 in this embodiment includes the following modules:

The voice information obtaining module 801 is configured to receive voice information sent by the server.

The text information obtaining module 802 is configured to receive text information generated by the server and identify the voice information;

The text information display marking module 803 is configured to display and mark the text information.

In a preferred embodiment, the system further includes:

The tag information obtaining module 804 is configured to receive tag information sent by the server.

In a preferred embodiment, the text information acquisition module and the mark information acquisition module are simultaneously executed, and the text information and the mark information are simultaneously acquired.

In a preferred embodiment, the text information display tagging module is configured to display the text information, and mark the text information by using the tag information.

In a preferred embodiment, the system further includes:

The voice information playing module 805 is configured to: when receiving an instruction of the user to play the voice information, play the voice information.

In a preferred embodiment, the instruction to play the voice message is generated by the user clicking the text message.

In a preferred embodiment, the system further includes:

The receiving display module 806 is configured to receive the edited text information sent by the server, and display the edited text information.

In a preferred embodiment, the edited text information is displayed in a manner that covers pre-edit text information.

For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

In summary, the voice recognition based instant messaging method and the instant messaging system proposed by the embodiments of the present application have at least the following advantages:

(1) Voice recognition based instant messaging method and instant communication proposed in the embodiment of the present application In the system, the voice recognition function overcomes the obstacle of obtaining information by the receiving terminal, which is convenient for the user to use and avoids the problem of privacy leakage.

(2) In the voice recognition-based instant messaging method and the instant messaging system proposed by the embodiment of the present application, the error modifying function enables the transmitting terminal to have an opportunity to correct the error of the voice recognition system;

(3) In the voice recognition-based instant messaging method and the instant messaging system proposed in the embodiments of the present application, the real identification error data is obtained through the data collection function to improve the performance of the voice recognition system.

(4) In the voice communication-based instant messaging method and the instant messaging system proposed by the embodiment of the present application, the error correction step facilitates the sending terminal to perform error correction;

(5) In the voice communication-based instant messaging method and the instant messaging system proposed by the embodiment of the present application, the step of information marking is convenient for the receiving terminal to recognize whether the received information is virtual keyboard input or voice information;

(6) In the voice communication-based instant messaging method and the instant messaging system proposed by the embodiment of the present application, if it is voice information, the receiving terminal can select the text information generated after the voice information is recognized, and play back the original voice information.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include computer readable media Non-permanent memory, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media including both permanent and non-persistent, removable and non-removable media may be implemented by any method or technology for signal storage. The signals can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transporting medium can be used to store signals that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed on the instructions are provided for implementing one or more blocks in a flow or a flow and/or block diagram of the flowchart The steps of the function specified in the box.

While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

The above is a detailed description of a voice recognition-based instant messaging method and an instant messaging system provided by the present application. The principles and implementation manners of the present application are described in the specific examples. The description of the above embodiments is only used. To help understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, in accordance with the idea of the present application, there will be changes in the specific embodiments and application scope. The content should not be construed as limiting the application.

Claims

An instant messaging method based on speech recognition, characterized in that it comprises:

Receiving voice information sent by the sending terminal;

Performing voice recognition on the voice information to generate text information;

Transmitting the voice information to the receiving terminal;

The text message is sent to the receiving terminal.
The instant messaging method according to claim 1, wherein after the voice information is voice-recognized to generate text information, the method further comprises:

Send the text message to the sending terminal.
The instant messaging method according to claim 2, wherein after the text information is sent to the transmitting terminal, the method further comprises:

Receiving the edited text information sent by the transmitting terminal and transmitting the text information to the receiving terminal.
The instant messaging method according to claim 3, wherein after the voice information is voice-recognized to generate text information, and after receiving the edited text information sent by the transmitting terminal, and transmitting to the receiving terminal, The method further includes:

The auxiliary error correction information is transmitted to the transmitting terminal, the auxiliary error correction information including a word map and a candidate word for a specified word, word or sentence of the text information.
The instant messaging method according to claim 2, wherein after the voice information is voice-recognized to generate text information, the method further includes:

Store the text information in a database;

After performing the voice recognition on the voice information to generate the text information, the method further includes:

Sending auxiliary error correction information to the transmitting terminal;

Receiving edited text information sent by the sending terminal, and transmitting the text information to the receiving terminal;

After receiving the edited text information sent by the sending terminal and sending the text message to the receiving terminal, the method further includes:

The edited text information is sent to the database and associated with the textual information prior to correction.
The instant messaging method according to claim 5, wherein said auxiliary error correction information comprises a word map and a candidate word for a specified word, word or sentence for said text information, said specified word, word or sentence The word map and candidate words are obtained from the database.
An instant messaging method based on speech recognition, characterized in that it comprises:

Record voice messages and send them to the server;

Receiving text information generated by recognizing the voice information, and displaying the text information;

After receiving the corrective operation instruction, enter an interface for editing the text information;

The edited text message is displayed and the edited text message is sent to the server.
The instant messaging method according to claim 7, wherein after receiving the text information generated by the recognition of the voice information and displaying the text information, the method further comprises:

Receiving auxiliary modification information sent by the server, the auxiliary error correction information including a word map and a candidate word for a specified word, word or sentence of the text information, the candidate word being displayed in an interface of the edit text information .
The instant messaging method according to claim 7, wherein after receiving the text information generated by the recognition of the voice information and displaying the text information, the method further comprises:

After receiving the instruction to play the voice message, the voice message is played.
The instant messaging method according to claim 9, wherein the playing voice information command is generated by a user clicking the text information.
An instant messaging method based on speech recognition, characterized in that it comprises:

Receiving voice information sent by the server;

Receiving text information generated by the server after identifying the voice information;

Display and mark this text message.
The instant messaging method according to claim 11, wherein the method further comprises:

Receive tag information sent by the server.
The instant messaging method according to claim 12, wherein the step of displaying and marking the text information comprises:

The text information is displayed, and the text information is marked by the mark information.
The instant messaging method according to claim 11, wherein after the step of displaying and marking the text information, the method further comprises:

When the user's instruction to play the voice information is received, the voice information is played, and the instruction to play the voice information is generated by the user clicking the text information.
The instant messaging method according to claim 11, wherein after the step of displaying and marking the text information, the method further comprises:

The edited text message sent by the server is received, and the edited text message is displayed.
The instant messaging method according to claim 15, wherein said edited text information is displayed in such a manner as to cover pre-edit text information.
An instant messaging system based on voice recognition, characterized in that it comprises:

a voice information receiving module, configured to receive voice information sent by the sending terminal;

a text information generating module, configured to perform voice recognition on the voice information to generate text information;

a first sending module, configured to send the voice information to the receiving terminal;

The second sending module is configured to send the text information to the receiving terminal.
The instant messaging system of claim 17 wherein said system further comprises:

The third sending module is configured to send the text information to the sending terminal.
The instant messaging system of claim 18, wherein the system further comprises:

The information transceiver module is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal.
The instant messaging system of claim 19, wherein the system further comprises:

And a fourth sending module, configured to send the auxiliary error correction information to the sending terminal, where the auxiliary error correction information includes a word map and a candidate word for the specified word, word or sentence of the text information.
The instant messaging system of claim 18, wherein the system further comprises:

The first storage module stores the text information in a fourth sending module of the database, and sends the auxiliary error correction information to the sending terminal;

The information transceiver module is configured to receive the edited text information sent by the sending terminal, and send the information to the receiving terminal;

The text information association module is configured to send the edited text information to the database and associate with the text information before the correction.
The instant messaging system according to claim 21, wherein said auxiliary error correction information comprises a word map and a candidate word for a specified word, word or sentence for said text information, said specified word, word or sentence The word map and candidate words are obtained from the database.
An instant messaging system based on voice recognition, characterized in that it comprises:

a voice information recording and sending module, configured to record voice information and send it to a server;

a text information receiving display module, configured to receive text information generated by identifying the voice information, and display the text information;

The editing module is configured to enter an interface for editing the text information after receiving the correct operation instruction;

The display sending module is configured to display the edited text information and send the edited text information to the server.
The instant messaging system of claim 23, wherein the system further comprises:

The auxiliary modification information receiving module is configured to receive auxiliary modification information sent by the server, where the auxiliary error correction information includes a word map and a candidate word for a specified word, word or sentence of the text information, where the candidate word is displayed in The interface for editing text information.
The instant messaging system of claim 23, wherein the system further comprises:

The voice information playing module is configured to play the voice information after receiving the instruction to play the voice information.
The instant messaging system according to claim 25, wherein said playing voice information command is generated by a user clicking on said text information.
An instant messaging system based on voice recognition, characterized in that it comprises:

a voice information acquiring module, configured to receive voice information sent by the server;

a text information obtaining module, configured to receive text information generated by the server after identifying the voice information;

A text message display tag module for displaying and marking the text message.
The instant messaging system of claim 27, wherein the system further comprises:

The tag information obtaining module is configured to receive tag information sent by the server.
The instant messaging system according to claim 28, wherein the text information display tagging module is configured to display the text information, and mark the text information by using the tag information.
The instant messaging system of claim 27, wherein the system further comprises:

The voice information playing module is configured to: when receiving an instruction of the user to play the voice information, play the voice information, where the instruction for playing the voice information is generated by the user clicking the text information.
The instant messaging system of claim 27, wherein the system further comprises:

The receiving display module is configured to receive the edited text information sent by the server, and display the edited text information.
The instant messaging system according to claim 31, wherein said edited text information is displayed in a manner covering the pre-edit text information.