WO2020067597A1

WO2020067597A1 - Device, method and computer-readable recording medium for providing asynchronous instant messaging service

Info

Publication number: WO2020067597A1
Application number: PCT/KR2018/011769
Authority: WO
Inventors: 장준수; 윤용기; 장재웅; 김세미; 신희욱; 김영상; 임중신; 정정화
Original assignee: 주식회사 닫닫닫
Priority date: 2018-09-28
Filing date: 2018-10-05
Publication date: 2020-04-02
Also published as: KR20200036414A

Abstract

Disclosed is a transmitting terminal for executing an instant message service application and generating and sending an instant message. The transmitting terminal comprises a voice input module, a voice data generation module, a text data generation module, a data packet generation module, and a communication module. The voice input module may receive a voice message from a user. The voice data generation module may generate voice data from the voice message. The text data generation module may generate text data corresponding to the voice message. The data packet generation module may combine the voice data and the text data to generate a data packet. The communication module may transmit the generated data packet to a server or at least one receiving terminal.

Description

Apparatus, method and computer readable storage medium for providing asynchronous instant message service

This disclosure relates to apparatus, methods, and computer readable storage media for providing asynchronous instant message services.

Unless otherwise stated herein, the content described in this section is not prior art to the claims in this application and should not be construed as prior art for the reasons set forth in this section.

Users using instant messaging services can deliver messages relatively quickly and easily between two or more users. In recent years, as mobile devices such as smartphones are widely used, the use of instant message services has exploded. Recently, in the instant message service, it is possible to transmit a relatively short voice message in addition to the conventional text message. The voice message is simpler to input than the text message, and can deliver various features that the user who input the voice message wants to deliver. However, the data size of the voice message is generally larger than that of the text message, and the user needs to perform an operation (eg, click, touch, etc.) of each voice message and listen to the voice message being played. In view of the above, the voice message may have a temporal constraint or a spatial constraint such as a memory space or a physical space, compared to a text message that can be quickly identified with an eye.

Republic of Korea Registered Patent Publication No. 10-1863776 (hereinafter, prior art document 1), when a user inputs a voice message, performs voice recognition from the voice message to generate a text message, extracts the user's emotion from the voice message Disclosed is a text expression method for changing and outputting a font of a text message generated from a voice message.

As described above, prior art document 1 extracts information on various emotions from a voice message and generates a text message using the information, but only a part of information that a user intends to transmit through a voice message can be obtained. The rest of the information can be lost.

The present disclosure is intended to solve the above problems, and provides an apparatus, method, and computer readable storage medium that are convenient for reproducing a voice message in an instant message service and are efficient in data management. In addition, the present disclosure proposes an apparatus, method, and computer-readable storage medium capable of providing an improved instant message service using a character in an instant message service.

In some embodiments of the present disclosure, a method performed on a transmitting terminal that executes an instant message service application to generate and transmit a message is described. An exemplary method may include receiving a voice message from a user of a transmitting terminal, generating text data, combining voice data and text data to generate a data packet, and transmitting the generated data packet. have. In some examples, generating the text data may include generating text data corresponding to the voice message based on the voice data. In some other examples, generating the text data may include receiving text corresponding to a voice message from a user and generating text data. In some examples, transmitting the data packet may include transmitting the data packet to an integrated server. In some other examples, the method may further include generating a notification message regarding the data packet. In this example, the method may include transmitting a notification message to a relay server and transmitting a data packet to at least one receiving terminal.

In some embodiments, a transmitting terminal for running an instant message service application and generating and transmitting an instant message is described. An exemplary transmission terminal may include a voice input module, a voice data generation module, a text data generation module, a data packet generation module, and a communication module. The voice input module may be configured to receive a voice message from a user of the transmitting terminal. The voice data generation module may be configured to generate corresponding voice data from voice messages received by the voice input module. The text data generation module may be configured to generate text data corresponding to a voice message. The data packet generation module may be configured to combine voice data and text data to generate a data packet. The text data generation module may include a voice recognition module configured to perform voice recognition on voice data from the voice data generation module and a text input module configured to receive text corresponding to a voice message from a user. The text data generation module may be configured to generate text data using at least one of a speech recognition module and a text input module.

In some examples, the transmitting terminal may further include a character module. The character module may be configured to acquire information about a character displayable on a transmitting terminal and at least one receiving terminal and generate character data from the information on the character.

In some embodiments, a computer readable storage medium in which computer programs for executing an instant message service application to generate and send messages are stored. One exemplary computer-readable storage medium includes a computer program, when a computer program is executed, receiving a voice message from a user of the computing device, generating corresponding voice data from the input voice message, and voice data. On the basis, the operation of generating text data corresponding to the voice message, combining the voice data with the text data to generate a data packet, and transmitting the data packet to an integrated server providing an instant message service. It may include one or more computer-executable instructions to make operations executable.

In some embodiments, a method performed on a receiving terminal that executes an instant message service application to receive a message is described. In one exemplary method, the receiving terminal may be capable of data communication through an network with an integrated server providing an instant message service, the method comprising: receiving a data packet transmitted by the transmitting terminal from the integrated server; Obtaining voice data from the data packet; Generating text data corresponding to a voice message based on the voice data; Reproducing a voice message corresponding to the voice data based on the voice data; And based on the text data, displaying a text message corresponding to the text data in response to reproduction of the voice message. In another example, a method includes receiving a data packet from an integrated server; Obtaining text data from the data packet; Determining that speech data corresponding to the text data cannot be obtained; Generating voice data corresponding to the text data based on the text data; The method may include reproducing a voice message corresponding to the voice data based on the voice data, and displaying a text message corresponding to the text data corresponding to the reproduction of the voice message based on the text data. In another exemplary method, the receiving terminal may be connected to a data communication via a network and a relay service providing an instant message service, and may also be connected to enable direct communication with a transmitting terminal executing an instant message service application. In this example, the method may include receiving a notification message for the data packet transmitted by the transmitting terminal from the relay server and receiving the data packet from the transmitting terminal in response to the notification message.

In some embodiments, a receiving terminal for executing an instant message service application and receiving an instant message is described. The receiving terminal may include a communication module, a data acquisition module, a data supplementation module, and an output module. The communication module may be configured to receive a data packet transmitted by the transmitting terminal from the server or the transmitting terminal. The data acquisition module may be configured to acquire at least one of voice data or text data from a data packet. The data supplement module may include a speech recognition module and a speech generation module. If the data acquisition module is unable to acquire voice data from the data packet and acquire text data corresponding to the voice data, the data supplement module may cause the voice recognition module to generate text data corresponding to the voice data. If the data acquisition module is unable to acquire text data from the data packet and acquire speech data corresponding to the text data, the data supplement module may cause the speech generation module to generate speech data corresponding to the text data. The output module is configured to display a text message corresponding to the text data in response to the reproduction of the voice message, based on the text data, and a playback module configured to play the voice message corresponding to the voice data, It can include modules.

In some embodiments, a computer readable storage medium in which a computer program for executing an instant message service application to receive a message is stored is described. One exemplary computer-readable storage medium includes, when a computer program is executed, causing a computing device to receive a data packet transmitted by a transmitting terminal from a server or a transmitting terminal; Obtaining text data from the data packet; Determining that speech data corresponding to text data cannot be obtained from the data packet; Generating voice data corresponding to the text data based on the text data; Reproducing a voice message corresponding to the voice data based on the voice data; And based on the text data, one or more computer-executable instructions to make the actions executable, including displaying a text message corresponding to the text data in response to the reproduction of the voice message.

The above brief summary and description of effects are merely illustrative and are not intended to limit the technical details intended in the present disclosure. By referring to the following detailed description and accompanying drawings, in addition to the above-described exemplary embodiments and technical features, additional embodiments and technical features may be understood.

The features of the present disclosure described above and other additional features will be described in detail below with reference to the accompanying drawings. These drawings show only a few embodiments according to the present disclosure and should not be considered as limiting the scope of the technical spirit of the present disclosure. The technical spirit of the present disclosure will be described in more detail and in detail using the accompanying drawings.

1 is an exemplary environment diagram illustrating an environment in which an instant message service is provided according to at least some embodiments of the present disclosure;

2 is a block diagram schematically illustrating a transmitting terminal according to at least some embodiments of the present disclosure;

3 is a flow diagram illustrating an exemplary process performed at a transmitting terminal, according to at least some embodiments of the present disclosure;

4 is a block diagram schematically illustrating a receiving terminal according to at least some embodiments of the present disclosure;

5-7 are flow diagrams illustrating exemplary processes performed at a receiving terminal, according to at least some embodiments of the present disclosure;

8 shows an example of using an instant message service according to the present disclosure;

9 shows an example in which a message is displayed and played on a user's terminal when using the instant message service according to FIG. 8;

10 illustrates an exemplary computer program product that can be used to provide instant message service in accordance with at least some embodiments of the present disclosure,

11 is a block diagram schematically illustrating an instant message service providing server according to at least some embodiments of the present disclosure.

Hereinafter, embodiments and examples of the present application will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains can easily implement them. However, the present application can be implemented in many different forms and is not limited to the embodiments and examples described herein.

This disclosure relates generally to apparatus, methods and computer readable storage media for providing instant messaging services.

Hereinafter, “instant message service” may refer to a service in which a message received by a recipient is displayed and / or played when the sender sends a message such as a text message, a voice message, an image, or the like to one or more recipients. The term "character" refers to an object represented by a computer graphic and having a face, and may be expressed in various forms, such as a person, an animal, a virtual animal, a robot, etc., and according to the present disclosure, a character is instant It is an object displayed on the message service, and is operated by control of a user device, such as a transmitting terminal or a receiving terminal, as described below, so that the animation of the character can be displayed on the user device.

Hereinafter, the term "module" may refer to a device, a server, a program unit, or a suitable combination thereof. For example, the term "character module" as will be described below, as well as hardware such as a camera for obtaining information about a character to be displayed on a user's device, as well as devices, servers, programs for processing data acquired by such a camera. Units or any suitable combination thereof.

1 is an exemplary environment diagram illustrating an environment 100 in which an instant message service is provided according to at least some embodiments of the present disclosure. The exemplary environment 100 is a network environment 110, one or more transmitting terminals (120-1, 120-2, 120-3, 120-4, ...; hereinafter referred to as 120) and one or more receiving terminals (130-1, 130-2, 130-3, ...; hereinafter collectively referred to as 130). For convenience of description, although 120 is referred to as a transmitting terminal and 130 is called a receiving terminal, the transmitting terminal and the receiving terminal may also perform reception and transmission, respectively. The network environment 110 represents various environments for connecting the transmitting terminal 120 and the receiving terminal 130 by wired or wireless communication. The network environment 110 may include a server 115 for providing instant message service.

In some embodiments, the transmitting terminal 120 may transmit an instant message to the user device 130 through the server 115 providing an instant message service or receive an instant message from the user device 130. In this embodiment, the server 115 may be an integrated server that provides an instant message service, receives the instant message from the transmitting terminal 120, stores it, and transmits it to the receiving terminal 130. In some other examples, the transmitting terminal 120 may transmit a notification message for an instant message to the receiving terminal 130 through the server 115, and the receiving terminal 130 receives the notification message from the server 115 and , In response to the received notification message, an instant message may be directly received from the transmitting terminal 120. In this embodiment, the server 115 provides an instant message service and can act as a relay server, such as a relay server. After the receiving terminal 130 receives the notification message from the server 115, the receiving terminal 130 connects to the transmitting terminal 120 directly, for example, using a peer-to-peer technique. You can receive the message.

In various embodiments, the network environment 110 may further include a communication environment such as a wired environment, a wireless environment, a base station, and the like between the transmitting terminal 120 and the receiving terminal 130. In some examples, the server 115 stores the instant message transmitted by the transmitting terminal 120, and then, when the receiving terminal 130 is connected to the server 115, receives the instant message received from the transmitting terminal 120. Can be configured to transmit. In the example of sending an instant message directly in a peer-to-peer technique, the server 115 may support establishing a peer-to-peer connection between

user devices

120 and 130 in the network environment 110.

In FIG. 1, the transmitting terminal 120 and the receiving terminal 130 are devices that can communicate with each other, such as a smart phone, a tablet computer, a desktop computer, a laptop computer, a mobile phone, a personal digital assistant (PDA), a specific purpose device, or one of the above functions And a small form factor portable (mobile) electronic device such as a fusion device including any. As shown in FIG. 1, the transmitting terminal 120 and the receiving terminal 1300 may perform one-to-one or many-to-many instant message communication as well as one-to-one instant message communication, and the server 115 may perform such instant message. Can provide services.

In some examples, a user (first user) of the transmitting terminal 120 may use the transmitting terminal 120 to input an instant message to be transmitted to a user (the second user) of the receiving terminal 130. The instant message can be a voice message or a text message. When the instant message is a text message, as well known in the art, the transmitting terminal 120 may receive a text message from a first user, generate a data packet, and transmit it to the receiving terminal 130. Hereinafter, the case where the instant message is a voice message will be described.

In some embodiments, the transmitting terminal 120 may receive a voice message from a first user. In some examples, the transmitting terminal 120 may receive a voice message from a first user for a predetermined time. The transmitting terminal 120 may generate corresponding voice data from the input voice message.

Thereafter, the transmitting terminal 120 may acquire text data corresponding to the voice message. In some embodiments, the transmitting terminal 120 may generate text data corresponding to a voice message based on the voice data. A speech recognition technique well known in the art according to the present disclosure may be used to generate the text data. In some other embodiments, the transmitting terminal 120 may receive text corresponding to a voice message from the first user, and generate text data based on the input text. In some examples, at the request of the first user, the transmitting terminal 120 may receive a plurality of voice messages and generate a plurality of voice data from the plurality of voice messages. The transmitting terminal 120 may generate text data corresponding to a plurality of voice messages based on the plurality of voice data or receive input from a user.

In some embodiments, the transmitting terminal 120 may generate a data packet by combining voice data and text data corresponding to the voice message. In some examples, the transmitting terminal 120 may generate integrated message data by matching voice data with corresponding text data, and may generate a data packet by encoding the integrated message data. In some examples of receiving a plurality of voice messages at the request of the first user, the transmitting terminal 120 associates each voice data with text data, and combines the pair of the voice data and text data to be combined, thereby providing integrated message data. Can generate

In some embodiments, the server 115 may be an integrated server, and the transmitting terminal 120 may transmit the generated data packet to the server 115. In some other embodiments, the server 115 may be a relay server, the transmitting terminal 120 may generate a notification message regarding the generated data packet, and transmit the generated notification message to the relay server 115 have. In this embodiment, the transmitting terminal 120 may then directly transmit data packets by connecting to at least one receiving terminal.

In some additional embodiments, while the instant message service is being provided, the first character for the first user and the second character for the second user can be displayed on the sender device 120 and / or the recipient device 13. . While transmitting the instant message to the recipient device 130, the sender device 120 may also transmit character data for the first character. The character data may include information on at least one of the type of the first character, the expression of the first character, or the operation of the first character. Also, the character data may include connection information with a voice message or text message. In this embodiment, the sender device 120 may further combine character data in addition to voice data and text data in generating a data packet.

In some examples in which a data packet is transmitted from the transmitting terminal 120 to the receiving terminal 130 through the server 115 such as an integrated server, the server 115 may include data from the data packet received from the transmitting terminal 120 ( For example, voice data, text data, character data, etc.) may be stored. The server 115 may store the voice message in correspondence with the voice-recognized text message.

In some examples in which data packets are transmitted directly from the transmitting terminal 120 to the receiving terminal 130, the server 115 receives the notification message from the transmitting terminal 120 and transmits the notification message to the receiving terminal 130. The receiving terminal 130 may receive a notification message for the instant message received from the transmitting terminal 120 from the server 115.

The receiving terminal 130 can access the server 115. In some embodiments, the receiving terminal 130 may receive a data packet transmitted by the transmitting terminal 120 from the server 115. In some other embodiments, the receiving terminal 130 may receive a notification message for the data packet transmitted by the transmitting terminal 120 from the server 115. The receiving terminal 130 may connect to the transmitting terminal 120 in response to the received notification message, and receive a data packet from the transmitting terminal 120. In this embodiment, the notification message received from the server 115 may include an indication of one or more voice messages to the transmitting terminal 130 including the voice message received from the transmitting terminal 120.

The receiving terminal 130 may acquire at least one of voice data or text data from the received data packet. In some examples, the transmitting terminal 120 may receive and receive a voice message, as described above, and transmit a data packet including voice data from the voice message and text data corresponding to the voice message to the receiving terminal 130 Can transmit. The receiving terminal 130 may receive a data packet and obtain voice data and text data corresponding to a voice message from the received data packet.

In some other embodiments, the receiving terminal 130 may acquire voice data from the received data packet. In some examples, the transmitting terminal 120 may receive a voice message, and may transmit a data packet including only the voice message toward the receiving terminal 130. The receiving terminal 130 may receive such a data packet, acquire voice data from the data packet, and may not acquire text data. The receiving terminal 130 may generate text data corresponding to a voice message based on the obtained voice data. As described above with respect to the transmitting terminal 120, a well-known speech recognition technique can be used to generate text data. In one example, the receiving terminal 130 may determine that it is not possible to obtain text data corresponding to a voice message from the received data packet, and in response to this determination, text data corresponding to the voice message based on the voice data Can generate

In some other embodiments, the receiving terminal 130 may acquire text data from the received data packet. In some examples, the transmitting terminal 120 may receive text from the first user, and may transmit a data packet including only the text toward the receiving terminal 130. In some other examples, the transmitting terminal 120 may receive a voice message from a first user, generate voice data from the voice message, and use the voice recognition technique or text corresponding to the voice message generated by the user input A data packet combining data with voice data may be transmitted toward the receiving terminal 130, and voice data may be lost for various reasons, such as a transmission environment, an abnormal operation of a transmitting / receiving terminal, and the like. Thereafter, the receiving terminal 130 may acquire text data from the received data packet, and generate voice data corresponding to the text data based on the obtained text data. In one example, the receiving terminal 130 may determine that the voice data corresponding to the text data cannot be obtained from the data packet, and in response to the determination, the voice data based on the text data may be generated.

In some embodiments, the receiving terminal 130 may store the generated or acquired voice data and text data inside the receiving terminal 130. In some examples, the receiving terminal 130 may store the received voice data and text data in association.

In some embodiments, the receiving terminal 130 may play a voice message corresponding to the voice data based on the voice data. Also, the reception terminal 130 may display a text message corresponding to the text data, along with the reproduction of the voice message, based on the text data. In some examples, the receiving terminal 130 may receive a voice message play request for playing one or more voice messages from a user (second user). The receiving terminal 130 may sequentially play one or more voice messages based on one or more voice data in response to the voice message reproduction request. As such, the voice message can be reproduced asynchronously with the reception of the voice message. Additionally, the playback of the voice message may be paused, stopped, or controlled to play the previous voice message or the next voice message at the user's request. The receiving terminal 130 may display a corresponding text message in response to the reproduction of the voice message.

In some additional embodiments, before the receiving terminal 130 plays one or more voice messages, the receiving terminal 130 may filter the text message corresponding to the text data based on a predetermined censorship condition. For example, the predetermined censorship conditions may include profanity, profanity, and the like, but are not limited thereto. The receiving terminal 130 may play a corresponding unfiltered voice message based on the filtering result. For example, when a slang is included in the voice-recognized text message, the receiving terminal 130 may mute at least a part of the voice message corresponding to the text message. The filtering process is not limited to what the receiving terminal 130 performs, and in some embodiments, the filtering process may be performed before the server 115 or the transmitting terminal 120 transmits a data packet.

In some embodiments, the receiving terminal 130 may delete one or more voice data based on a predetermined condition, while leaving the corresponding one or more text data. In some examples, the receiving terminal 130 may When playback of the message ends, the corresponding voice data can be deleted. In some other examples, the receiving terminal 130 may delete old voice data based on a predetermined storage capacity condition.

In some additional embodiments, the receiving terminal 130 may obtain character data from the data packet. In some examples, the receiving terminal 130 may obtain character data for the first character for the first user. The reception terminal 130 may display the character together with the display of the text message and the reproduction of the voice message, based on the character data.

2 is a block diagram schematically illustrating a transmitting terminal 200 according to at least some embodiments of the present disclosure. As shown in FIG. 2, the transmitting terminal 200 includes a voice input module 210, a voice data generation module 220, a text data generation module 230, a data packet generation module 250 and a communication module 260. It may include. Additionally, the transmitting terminal 200 may further include a character module 240 and a notification generating module 270. The voice input module 210, the text input module 232, the camera 242, and the motion input module 244 are examples of a user interface (UI) for receiving input from a user. The components included in the transmitting terminal 200 may be individually implemented, or two or more of the components may be combined to form a single component. The connections between the components in FIG. 2 (eg, the connection between the voice data module 220 and the voice input module 210, the voice recognition module 234 or the data packet generation module, etc.) are for convenience of explanation only. However, the connection of each component is not limited to this connection. Hereinafter, as described in more detail, the transmitting terminal 200 is configured to execute an instant message service application, and can generate and transmit an instant message. The transmitting terminal 200 may be various computing devices, for example, a smart phone, a tablet computer, a desktop computer, a laptop computer, a mobile phone, a personal digital assistant (PDA), a specific purpose device, or a fusion including any of the above functions Small form factor portable (mobile) electronic devices such as devices.

The voice input module 210 may be configured to receive a voice message from a user of the transmitting terminal 200. The voice input module 210 may include an element capable of receiving a user's voice message, for example, a microphone. In some examples, the voice input module 210 may receive a voice message from a user for a predetermined time. In some examples, the voice input module 210 may sequentially receive one or more voice messages according to a user's request. The voice data generation module 220 may be configured to generate corresponding voice data from voice messages input by a user. Hereinafter, a description of well-known components in the technical field of the present disclosure will be omitted, and the configuration according to the present disclosure will be described in detail.

The text data generation module 230 may be configured to generate text data. The text data generation module 230 may include a text input module 232 and a speech recognition module 234. In some examples, the speech recognition module 234 may receive speech data from the speech data generation module 220, and may perform speech recognition using the speech data, and the text data generation module 230 may recognize speech Based on the results of the, it is possible to generate text data corresponding to the voice message. In some other examples, the transmitting terminal 200 may request a user to input text corresponding to a voice message, and the text input module 232 receives text corresponding to a voice message from the user, and the text data generation module ( 230) may generate text data from the input text.

The data packet generation module 250 may be configured to combine the voice data generated by the voice data generation module 220 and the text data generated by the text data generation module 230 to generate a data packet. In some examples, the data packet generation module 250 may generate voice message data and corresponding text data to generate integrated message data, and encode the integrated message data to generate a data packet for transmission. When the voice input module 210 receives a plurality of voice messages sequentially in response to a user's request, the text data generation module 230 may generate text data corresponding to each of the plurality of voice messages, The data packet generation module 250 may correspond to each voice data and text data, and combine the paired voice data and text data to generate integrated message data.

In some embodiments, the communication module 260 may be capable of data communication via an integrated server providing an instant message service through a network. In this embodiment, the communication module 260 may transmit the data packet generated by the data packet generation module 250 to the integrated server.

In some other embodiments, the communication module 260 may be communicatively connected to a relay server providing an instant message service through a network. In this embodiment, the notification generation module 270 may be configured to generate a notification message regarding the data packet generated by the data packet generation module 250. The notification message can include an indication of the voice message and / or text message. The communication module 260 may be configured to transmit the notification message generated by the notification generation module 270 to the relay server. Thereafter, the communication module 260 may be connected to enable direct communication with at least one receiving terminal executing the instant message service application, and may directly transmit data packets.

In some additional embodiments, the sender or recipient's character may be displayed on the transmitting terminal 200 while the instant message service is provided on the transmitting terminal 200, and the transmitting terminal 200 further includes a character module 240 can do. The character module 240 may be configured to acquire information about a character and generate character data from the information about the acquired character. In this embodiment, the data packet generation module combines voice data from the voice data generation module 220, text data from the text data generation module 230, and character data from the character module 240 to generate a data packet. It can be configured to generate. The character data may include information on at least one of the type of the user's character, the expression of the character, or the behavior of the character.

The character module 240 may include a camera 242, a motion input module 244, and a text recognition module 246. The camera 242 may acquire face information of the user, and the character module 240 may determine the facial expression of the character based on the face information obtained by the camera 242. In one example, while the voice input module 210 receives a voice message from the user, the camera 242 acquires the user's face information and the character module 240 can determine the facial expression of the character based on the face information . The motion input module 244 may receive input of a character's motion from a user. For example, the user may select at least one action from a list of actions of the character presented to the user. The text recognition module 246 may recognize a character associated with an action of the character, based on the text data generated by the text data generation module 230. In some examples, the character module 240 may determine the behavior of the character based on the recognized character. In some other examples, the text recognition module 246 may associate the recognized character with the motion of the character, and the motion input module 244 may allow the user to input a selection for the recognized character, from the user, the character module ( 240) can determine the behavior of the character.

Additionally or alternatively, the user may determine the transmission of the instant message after confirming the text message, the facial expression, the action, and the like displayed on the transmitting terminal 200.

3 is a flow diagram illustrating an exemplary process 300 performed in a transmitting terminal, in accordance with at least some embodiments of the present disclosure. For example, the process 300 may be performed under the control of a computing device, such as the transmitting terminal 120 of FIG. 1 and the transmitting terminal 200 of FIG. 2. The process 300 shown in FIG. 2 can include one or more actions, functions, or actions as illustrated by

blocks

310, 320, 330, 340, 350, 360 and / or 370. The various blocks are not intended to be limited to the described embodiment. For example, those skilled in the art will appreciate that for the processes disclosed herein, the functions performed in the processes and methods may be implemented in different orders. In addition, the schematic operations illustrated in FIG. 3 are provided by way of example only, and some of the operations may be optional, combined with fewer operations, or extended to additional operations without departing from the essence of the disclosed embodiment. You can. The process 300 can begin at block 310 receiving a voice message.

In block 310, the computing device may receive a voice message from a user. In some examples, the computing device may receive a voice message from a user for a predetermined time. In some examples, the computing device may sequentially receive one or more voice messages at the user's request. Process 300 may continue to block 320 for generating voice data at block 310, and the computing device may be configured to generate the corresponding voice data from the voice message input by the user. The process 300 can continue at block 320 to block 330 to determine if there is text input corresponding to the voice message.

In block 330, the computing device may determine whether there is text input corresponding to the voice message. For example, the user may enter a voice message, and may enter a request to enter text corresponding to the voice message, and if there is no such request, the computing device determines that the text corresponding to the voice message is not input by the user. Can decide. If the text corresponding to the voice message is not input by the user, the process 300 may continue to block 340 for generating text data based on the voice data. In block 340, the computing device may perform speech recognition using speech data. The computing device may generate text data corresponding to the voice message based on the result of the voice recognition. Process 300 can continue from block 340 to block 360 that generates a data packet.

In block 330, when the text corresponding to the voice message is input by the user, the process 300 may continue to block 350 for receiving text from block 330 and generating text data. In block 350, the computing device may receive text corresponding to a voice message from the user. The user can input text using various input devices such as a touch pad, a keyboard, a mouse, and the like. The computing device may generate text data from text input by the user. Process 300 can continue from block 350 to block 360 that generates a data packet.

In block 360, the computing device may generate a data packet by combining the voice data generated in block 320 and the text data generated in

block

340 or 350. In some examples, the computing device may generate voice message data and corresponding text data to generate integrated message data, and encode the integrated message data to generate a data packet for transmission. In block 310, in response to a user's request, if a plurality of voice messages are sequentially input, in block 360, the computing device may generate text data corresponding to each of the plurality of voice messages. The voice data and the text data can be matched, and a pair of the corresponding voice data and text data can be combined to generate integrated message data.

In some additional embodiments, the computing device may obtain information about the user's character before performing block 360 and generate character data from the information about the obtained character, where the character is displayable on the computing device . In some examples, the computing device may receive information about a character, such as a character's facial expression, motion, or the like. In some other examples, the computing device may obtain information about the character using a device, such as a camera. In this embodiment with respect to, the computing device may, at block 360, further combine character data, in addition to voice data and text data, to generate a data packet.

Process 300 may continue at block 360 to block 370 that transmits the data packet. In some embodiments, in some embodiments, the computing device may receive a user's confirmation of text message, character expression, action, etc., to be transmitted before performing block 370.

In block 370, the computing device may transmit the generated data packet to at least one receiving terminal. In some embodiments, the computing device may be capable of data communication via a network with an integrated server that provides instant messaging services. The integrated server may receive, store, and at least transmit voice data and text data from the computing device to the recipient terminal, as described in more detail with respect to FIG. 11. In some other embodiments, the computing device may be communicatively connected to a relay server providing an instant message service through a network, and the computing device may be communicatively connected to at least one receiving terminal. In this embodiment, the computing device can generate a notification message about the data packet. The computing device can send the notification message to the relay server. Thereafter, when at least one receiving terminal is connected to the computing device, the computing device may directly transmit a data packet to the connected at least one receiving terminal.

4 is a block diagram schematically illustrating a receiving terminal 400 according to at least some embodiments of the present disclosure. As shown in FIG. 4, the receiving terminal 400 may include a communication module 410, a data acquisition module 420, a data complementing module 430, an output module 440 and a memory 450. The components included in the receiving terminal 400 may be individually implemented, or may be implemented in a manner that two or more of the components are combined to form one component. The connections between the components shown in FIG. 4 (eg, connections between the data acquisition module 420 and the data supplementation module 430 and the output module 440, etc.) are for convenience of description only, and The connection is not limited to this connection. For example, although the memory is shown as having no connection in FIG. 4, depending on the implementation, it is possible to operate with at least one of the communication module 410, the data acquisition module 420, the data supplementation module 430, and the output module 440. Can be connected. The receiving terminal 400 can be a variety of computing devices, for example, a smartphone, a tablet computer, a desktop computer, a laptop computer, a mobile phone, a personal digital assistant (PDA), a specific purpose device, or a fusion comprising any of the above functions Small form factor portable (mobile) electronic devices such as devices. In addition, the receiving terminal 400 may be implemented integrally with the transmitting terminal 200 described in FIG. 2, and some components may be implemented in one entity. For example, depending on the implementation, the communication module 260 illustrated in FIG. 2 and the communication module 410 illustrated in FIG. 4 may be integrally implemented. Hereinafter, as described in more detail, the receiving terminal 400 is configured to execute the instant message service application, and can receive and output the instant message.

The communication module 410 may be configured to receive data packets transmitted by the transmitting terminal from the server or the transmitting terminal. In some embodiments, the communication module 410 may connect to the integrated server and receive data packets transmitted by the transmitting terminal from the integrated server. In some other embodiments, the communication module 410 may connect to the relay server and receive a notification message for the data packet transmitted by the transmitting terminal from the relay server. In this embodiment, the communication module 410 may directly connect to the transmitting terminal in response to the received notification message, for example, using a peer-to-peer connection technique, and receive from the connected transmitting terminal.

The data acquisition module 420 may be configured to acquire at least one of voice data or text data from a data packet received by the communication module 410. In some examples, the communication module 410 may receive data packets transmitted by the transmitting terminal, and the data packets may include voice data for voice messages and text data corresponding to voice messages. The data acquisition module 420 may acquire both voice data and text data corresponding to the voice message from the data packet. In this example, the data acquisition module 420 may transmit the acquired voice data and text data to the output module 440.

In some other examples, the communication module 410 may receive data packets transmitted by the transmitting terminal, and these data packets may include only voice data, or may include voice data and corrupted text data corresponding to voice messages. have. In this example, the data acquisition module 420 may acquire voice data from the data packet, and transmit the voice data to the data supplement module 430. In one example, the data acquisition module 420 may determine that the text data corresponding to the voice message cannot be obtained, and in response to the determination, the voice data may be transmitted to the data supplementation module 430. The data supplement module 430 may include a speech recognition module 432 and a speech generation module 434. When the data supplement module 430 receives the voice data from the data acquisition module 420, the voice recognition module 432 generates text data corresponding to the voice message from the voice data using a well-known voice recognition technique. can do. In this example, the output module 440 may receive voice data from the data acquisition module 420 and text data from the data supplement module 432.

In some other examples, the communication module 410 may receive a data packet transmitted by the transmitting terminal, and the data packet may include only text data, or text data and corrupted voice data corresponding to the text data. You can. In this example, the data acquisition module 420 may acquire text data from the data packet, and transmit the text data to the data supplement module 430. In one example, the data acquisition module 420 may determine that speech data corresponding to the text data cannot be acquired, and in response to the determination, the text data may be transmitted to the data supplementation module 430. When the data supplement module 430 receives text data from the data acquisition module 420, the speech generation module 434 may generate speech data from the text data using a well-known speech synthesis technique.

The output module 440 may include a playback module 442 and a display module 444. The reproduction module 442 may be configured to reproduce a voice message corresponding to the voice data, based on the voice data. The receiving terminal 400 may receive a voice message reproduction request to reproduce a voice message from the user through an appropriate user interface (not shown). In some embodiments, when there is more than one voice data acquired by the data acquisition module 420, the playback module 442 sequentially responds to the voice message playback request, sequentially sequentially one or more voice messages corresponding to the one or more voice data. Can be configured to play. The sequential playback may be performed according to the user's control, such as playback, pause, pause, playback of the previous voice message, or playback of the next voice message. The display module 444 may be configured to display a text message corresponding to text data in response to the reproduction of the voice message by the playback module 442.

The memory 450 may store voice data and text data from the data acquisition module 420 and / or the data supplement module 430. In some examples, voice data and text data corresponding to the voice message may be stored in association. In some embodiments, the memory 450 may delete one or more voice data based on predetermined conditions. In some examples, the memory 450 may delete stored voice data when playback of the voice message ends. In some other examples, the memory 450 may delete old voice data based on predetermined storage capacity conditions. For example, if the total capacity of voice data stored in the memory 450 exceeds a predetermined value, the oldest voice message can be deleted.

In some additional embodiments, the output module 440 may filter text messages from text data based on predetermined censorship conditions. The filtering of the output module 440 can use a well-known method. For example, predetermined filtered characters may include profanity, profanity, and the like, but are not limited thereto. Based on the result of filtering by the output module 440, the playback module 442 can play back a voice message, and the display module 444 can display a text message. For example, when a slang is included in a text message, the output module 440 may change at least a part of the text message to a predetermined character and perform mute processing on at least a part of the voice message.

In some additional embodiments, the communication module 410 may receive a data packet transmitted by the transmitting terminal, the data packet including character data including information about a character displayable by the display module 444 can do. The data acquisition module 420 may further acquire character data in addition to voice data and / or text data from the received data packet. The display module 444 may be configured to display the character based on the character data, with the reproduction of the voice message by the reproduction module 442 and the display of the text message by the display module 444.

5-7 are flow diagrams illustrating

exemplary processes

500, 600 and 700 performed at a receiving terminal, according to at least some embodiments of the present disclosure. For example, the

processes

500, 600, and 700 may be performed under the control of a computing device, such as the receiving terminal 130 of FIG. 1 and the receiving terminal 400 of FIG. The process 500 of FIG. 5 can include one or more actions, functions, or actions as illustrated by

blocks

510, 520, 530 and / or 540. The process 600 of FIG. 6 can include one or more actions, functions, or actions as illustrated by

blocks

610, 620, 630, 640, 650 and / or 660. In addition, the process 700 shown in FIG. 7 can include one or more actions, functions, or actions as illustrated by

blocks

710, 720, 730, 740, 750, and / or 760. The various blocks are not intended to be limited to the described embodiment. For example, those skilled in the art will appreciate that for the present process disclosed herein, the functions performed in the processes and methods may be implemented in different orders. For example, the block 530 for reproducing a voice message and the block 540 for displaying a text message may be performed sequentially or simultaneously, depending on the implementation. In addition, the schematic operations illustrated in FIGS. 5-7 are provided as examples only, and some of the operations may be optional, combined with fewer operations, or additional operations without departing from the essence of the disclosed embodiment. Can be extended to

The process 500 shown in FIG. 5 begins at block 510 receiving a data packet. In block 510, the computing device may be configured to receive data packets transmitted by the transmitting terminal from the server or the transmitting terminal. In some embodiments, the computing device can connect to the integrated server and receive data packets sent by the transmitting terminal from the integrated server. In some other embodiments, the computing device may connect to the relay server, receive a notification message for the data packet transmitted by the transmitting terminal from the relay server, and then connect directly to the transmitting terminal and connect to the transmitting terminal Can receive data packets from. Process 500 may continue at block 510 to block 520 to acquire voice data and text data.

In block 520, the computing device may obtain voice data and text data from the data packet received in block 510. In some examples, the received data packet may include voice data for a voice message and text data corresponding to the voice message. The computing device may acquire both voice data and text data from the received data packet. The process 500 may continue at block 520 to block 530 to reproduce the voice message and block 540 to display the text message. In some additional examples, the received data packet can include character data that includes information about a character displayable on the computing device. In this example, the computing device may obtain character data in addition to voice data and text data at block 520.

In block 530, the computing device may play a voice message corresponding to the voice data based on the obtained voice data. In some examples, the computing device may receive a voice message play request to play a voice message from the user. In some examples, the computing device may sequentially play one or more voice messages if there is more than one voice message to play. Additionally, the playback of the voice message may be paused, stopped, or controlled to play the previous voice message or the next voice message at the user's request.

In block 540, the computing device may display a text message corresponding to text data in response to the reproduction of the voice message according to block 530. In some examples in which the computing device further obtains character data from a data packet, the computing device may display the character along with the reproduction of the voice message and the display of a text message based on the acquired character data.

Additionally, the computing device may filter the text message from the text data based on a pre-determined censorship condition, prior to performing block 530 and block 540, and based on the results of the filtering, block 530 And block 540. In one example, characters filtered according to the censorship conditions may include abusive language, profanity, and the like. When the text message includes a profanity, the computing device may change at least a part of the text to a predetermined character, for example, an asterisk (*), and mute the at least part of the voice message. .

The process 600 shown in FIG. 6 begins at block 610 receiving a data packet. The description of the block 610 will be omitted because it overlaps with the description of the block 510 of FIG. 5. Process 600 can continue at block 510 to block 620 for acquiring voice data. In block 620, the computing device may obtain voice data corresponding to the voice message from the data packet received in block 610. In some examples, the received data packet may include only voice data, or may include voice data and corrupted text data corresponding to the voice message. In this example, the computing device can obtain voice data from the data packet. Process 600 may continue at block 620 to block 630 where it is determined that text data cannot be obtained.

In block 630, the computing device may determine that it is possible to obtain speech data from the data packet, but not text data corresponding to the speech message. Process 600 may continue at block 630 to block 640 for generating text data and block 660 to reproduce the voice message.

In block 640, the computing device may generate text data based on the acquired voice data. The computing device may generate text data corresponding to a voice message from voice data using a well-known voice recognition technique. Process 600 may continue to block 650 that displays a text message after performing block 640. Descriptions of the block 660 for reproducing the voice message and the block 650 for displaying the text message are duplicated with the

blocks

530 and 540 described in FIG. 5, respectively, and thus will be omitted.

The process 700 shown in FIG. 7 begins at block 710 receiving a data packet. The description of the block 710 is omitted because it overlaps with the description of the block 510 of FIG. 5. Process 700 may continue at block 710 to block 720 for obtaining text data. In block 720, the computing device may obtain text data from the data packet received in block 710. In some examples, the received data packet may contain only text data, or may include text data and corrupted voice data corresponding to the text data. In this example, the computing device can obtain text data from the data packet. Process 700 may continue at block 720 to block 730 where it is determined that speech data cannot be obtained.

At block 730, the computing device may determine that it is able to obtain text data from the data packet, but cannot acquire speech data corresponding to the text data. Process 700 may continue at block 730 to block 740 for generating voice data and block 760 for displaying text messages. In block 740, the computing device may generate speech data based on the acquired text data. The computing device may generate speech data from text data using a well-known speech synthesis technique. After performing block 740, process 700 may continue to block 750 that plays a voice message. Descriptions of the block 750 for reproducing the voice message and the block 760 for displaying the text message are duplicated with the

blocks

530 and 540 described in FIG. 5, respectively, and thus will be omitted.

In this way, in providing an instant message service, a voice message or a text message can be complemented by transmitting, receiving, and acquiring a voice message together with a text message, sequentially playing the voice message, and displaying the text message correspondingly. It is easier to understand the content of the voice message. In addition, by storing text data corresponding to a voice message, even if the voice data is erased due to a problem of capacity, it becomes easy to quickly understand a conversation made during the provision of an instant message service and to search and review the contents. On the other hand, it is possible to provide a new type of instant message service by displaying a character together with voice reproduction and text display.

8 shows an example of using an instant message service according to the present disclosure, and FIG. 9 shows an example of displaying and playing a message on a user's computing device when using the instant message service according to the foregoing. As illustrated in FIG. 8, the first user 810, the second user 820, and the third user 830 are respectively through the user device 812, the user device 822, and the user device 832, I am using an instant message service. In the example of FIG. 8, the

users

810, 820, and 830 may share an instant message that is transmitted and received by at least one of the

users

810, 820, and 830. For example, when the first user 810 transmits an instant message, the second user 820 and the third user 830 may receive the corresponding instant message. As shown in FIG. 8, the first user 810 and the second user 820 may transmit a voice message.

In some examples, the first user 810 can select the character 816 and the second user 820 can select the character 826. The first user 810 may input a voice message 814 with the phrase "Aren't it cold?" When the first user 810 inputs the voice message 814, the user device 812 may detect the facial expression of the first user 810 and determine the facial expression of the character 816. The user device 812 may obtain a text message 814-2 corresponding to the voice message 814 by performing voice recognition on the voice message 814. In addition, the first user 810 selects one of a list of predetermined actions, or provides a text (eg, “cold” is recognized) recognized by the user device 812 from the voice-recognized text message 814-2. 1 The user 810 may determine the operation of the character 816 by selecting it. Thereafter, the voice message 814 may be transmitted to the

user devices

822 and 832 of the second and third users 820 and 830.

After the voice message 814 is transmitted, the second user 820 may input the voice message 824 with the phrase "I am hot!". When the second user 820 inputs the voice message 824, the user device 822 may detect the facial expression of the second user 820 and determine the facial expression of the character 826. The user device 822 can obtain the text message 824-2 from the voice message 824. In addition, the second user 820 selects one of a list of predetermined actions, or the second user recognizes the text recognized by the user device 822 from the text message 824-2 (eg, “hot” is recognized). The action of the character 826 may be determined by the selection by the 820. Thereafter, the voice message 824 may be transmitted to the

user devices

812 and 232 of the first and

third users

810 and 830.

Thereafter, the third user 830 may access a server (not shown) that provides an instant message service using the user device 832, and the user device 832 may be configured to provide voice data corresponding to the voice message 814. And a data packet containing text data corresponding to the text message 814-2 and a data packet including voice data corresponding to the voice message 824 and text data corresponding to the text message 824-2. You can. 9 (a) and 9 (b), text messages 814-2 and 824-2 may be displayed in response to reproduction of

voice messages

814 and 824. The third user 830 may input a voice message playback request using the user interface 840 displayed on the user device 832. When a request to play a voice message is inputted, as shown in FIG. 8 (a), the voice message 814 is played together with the display of the character 816. Also, while the voice message 814 is being reproduced, the voice-recognized text message 814-2 may be displayed in response to the reproduction of the voice message 814. The character 816 may show the facial expression and motion determined by the user device 812 of the first user 810. Then, as shown in Fig. 9 (b), the voice message 824 is played with the display of the character 826. Further, while the voice message 824 is being played, the voice-recognized text message 824-2 may be displayed in response to the reproduction of the voice message 824. The character 826 may show facial expressions and actions determined by the user device 822 of the second user 820.

9 (c) shows an example of a log of instant messages. The user device 832 of the third user 830 may sequentially display text messages 814-2 and 824-2 received from the first user 810 and the second user 820.

10 shows an example computer program product 1000 that can be used to perform defect inspection in accordance with at least some embodiments of the present disclosure. An exemplary embodiment of an exemplary computer program product is provided using a signal containing medium 1002. In some embodiments, the signal-bearing medium 1002 of one or more computer program products 1000 may include a computer-readable medium 1006, a recordable medium 1008 and / or a communication medium 1010. The command 1004 included in the signal-containing medium 1002 includes the transmitting terminal 120 and the receiving terminal 130 illustrated in FIG. 1, the transmitting terminal 200 illustrated in FIG. 2, and / or the receiving terminal illustrated in FIG. 4 ( 400). In some embodiments, the instructions 1004, when executed, may include at least one instruction for causing the computing device to perform at least one of the processes of FIGS. 3, 5, 6, and 7.

11 is a block diagram schematically illustrating an instant message service providing server 1100 according to at least some embodiments of the present disclosure. The instant message service providing server 1100 may be implemented to have functions of an integrated server and / or a relay server. As illustrated in FIG. 11, the instant message service providing server 1100 may include a communication module 1110, a character module 1120, a voice memory 1130 and a text memory 1140. The communication module 1110 may receive a data packet including voice data, text data, and / or character data from the transmitting terminal from the transmitting terminal. In addition, the communication module 1110 may transmit a notification and / or data packet for a data packet to a receiving terminal. The character module 1120 may store character data received from the transmitting terminal, for example, information on the type of the sender's character, facial expression, action, and the like. The voice memory 1130 may store voice data received from a transmitting terminal. In some examples, voice data stored in the voice memory 1130 may be deleted according to predetermined conditions. The text memory 1140 may store text data corresponding to a voice message, for example, a voice recognized text message and a typed text message. In some examples, text memory 1140 may store text data corresponding to voice data stored in voice memory 1130, and character module 1120 may store character data in voice memory 1130 and / or The text data stored in the text memory 1140 may be stored.

The claimed subject matter is not to be limited in scope by the specific embodiments described herein. For example, some implementations can be in hardware, such as can be used to operate on a device or combination of devices, while other implementations can be in software and / or firmware, for example. Likewise, the claimed subject matter is not limited in scope in this respect, but some implementations may include one or more articles such as signal bearing media, storage media. Such storage media, such as CD-ROMs, computer disks, flash memories, and the like, are executed by a computing device, such as a computing system, computing platform, or other system, to a claimed subject, such as one of the embodiments described above. Thus, it can store instructions that can cause the processor to run. As one possibility, the computing device may include one or more processing units or one or more input / output devices, such as a processor, display, keyboard and / or mouse, and static random access memory, dynamic random access memory, flash memory and / or hard drive. It may contain more than one memory.

There is little distinction between hardware and software implementation of aspects of the system; The use of hardware or software is generally a design choice that represents a cost-effective tradeoff (but not always in the sense that the choice between hardware and software in some contexts can be important). . There are various vehicles (e.g., hardware, software and / or firmware) in which the processes and / or systems and / or other techniques described in this disclosure can be affected, preferred means being processes and / or systems and And / or will change depending on the context in which other technologies are used. For example, if the implementer determines that speed and accuracy are the most important, the implementer can primarily choose the hardware and / or firmware means, and if flexibility is the most important, the implementer can mainly choose the software implementation; Or, as another alternative, implementers may choose any combination of hardware, software and / or firmware.

The foregoing detailed description has described various embodiments of apparatus and / or processes through block diagrams, flow diagrams, and / or examples. As long as such block diagrams, flow diagrams, and / or examples include one or more functions and / or operations, those skilled in the art can provide each function and / or operation in such block diagrams, flow diagrams, or examples in hardware, software, firmware, or their It will be understood that it can be implemented individually and / or collectively by a wide range of virtually any combination. In one embodiment, some portions of the subject matter described in this disclosure may be implemented through an application specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), or other form of integration. However, for those skilled in the art, some aspects of the embodiments of the present disclosure may include one or more computer programs running on one or more computers (eg, one or more programs running on one or more computer systems), one running on one or more processors. The above program (e.g., one or more programs running on one or more microprocessors), firmware, or substantially any combination thereof, that may be implemented in an integrated circuit, in whole or in part, equally, software and / or firmware It will be appreciated that writing code for and / or designing circuitry is within the skill of the artisan in light of the present disclosure. Further, those skilled in the art will understand that the mechanisms of the subject matter of the present disclosure can be distributed into various types of program products, and examples of the subject matter of the present disclosure are specific types of signal bearing media used to actually perform the distribution. You will understand that it applies regardless of

While certain exemplary techniques have been described and illustrated herein using various methods and systems, it should be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from the claimed subject matter. Additionally, many modifications can be made to adapt a particular situation to the teaching of the claimed subject without departing from the central concept described herein. Accordingly, it is intended that the claimed subject matter is not limited to the particular examples disclosed, but such claimed subject matter may also include all embodiments falling within the scope of the appended claims and their equivalents.

Claims

As a method performed on a transmitting terminal that generates and transmits a message by executing an instant message service application, the transmitting terminal is connected to an integrated server that provides an instant message service and enables data communication through a network.

The above method,

Receiving a voice message from a user of the transmitting terminal;

Generating corresponding voice data from the input voice message;

Generating text data corresponding to the voice message based on the voice data;

Generating a data packet by combining the voice data and the text data, and

Transmitting the data packet to the integrated server

Method comprising, on the transmitting terminal.
As a method performed on a transmitting terminal that generates and transmits a message by executing an instant message service application, the transmitting terminal is connected to an integrated server that provides an instant message service and enables data communication through a network.

The above method,

Receiving a voice message from a user of the transmitting terminal;

Generating corresponding voice data from the input voice message;

Generating text data by receiving text corresponding to the voice message from the user;

Generating a data packet by combining the voice data and the text data, and

Transmitting the data packet to the integrated server

Method comprising, on the transmitting terminal.
A method performed on a transmitting terminal that generates and transmits a message by executing an instant message service application, wherein the transmitting terminal is connected to a relay server that provides an instant message service to enable data communication, and the transmitting terminal is It is connected to enable direct communication with at least one receiving terminal executing an instant message service application,

The above method,

Receiving a voice message from a user of the transmitting terminal;

Generating corresponding voice data from the input voice message;

Generating text data corresponding to the voice message based on the voice data;

Generating a data packet by combining the voice data and the text data;

Generating a notification message regarding the data packet,

Transmitting the generated notification message to the relay server, and

Transmitting the data packet to the at least one receiving terminal

Method comprising, on the transmitting terminal.
A method performed on a transmitting terminal that generates and transmits a message by executing an instant message service application, wherein the transmitting terminal is connected to a relay server that provides an instant message service to enable data communication, and the transmitting terminal is It is connected to enable direct communication with at least one receiving terminal executing an instant message service application,

The above method,

Receiving a voice message from a user of the transmitting terminal;

Generating corresponding voice data from the input voice message;

Generating text data by receiving text corresponding to the voice message from the user;

Generating a data packet by combining the voice data and the text data;

Generating a notification message regarding the data packet,

Transmitting the generated notification message to the relay server, and

Transmitting the data packet to the at least one receiving terminal

Method comprising, on the transmitting terminal.
As a transmitting terminal for executing an instant message service application, and generating and transmitting an instant message,

A voice input module configured to receive a voice message from a user of the transmitting terminal;

A voice data generation module configured to generate corresponding voice data from the voice message received by the voice input module;

A text data generation module configured to generate text data corresponding to the voice message;

A data packet generation module configured to combine the voice data and the text data to generate a data packet; And

A communication module configured to transmit the data packet to a server or at least one receiving terminal

Including,

The text data generation module includes a speech recognition module configured to perform speech recognition on the speech data from the speech data generation module and a text input module configured to receive text corresponding to the speech message from the user. ,

The text data generating module is configured to generate the text data using at least one of the speech recognition module or the text input module, the transmitting terminal.
The method of claim 5,

Notification generating module configured to generate a notification message for the data packet

The transmitting terminal further comprising.
The method of claim 5,

Character module configured to obtain information about a character that can be displayed on the transmitting terminal and the at least one receiving terminal, and generate character data from the information on the character

Further comprising,

The data packet generation module is configured to combine the voice data, the text data, and the character data to generate the data packet.
A computer-readable storage medium storing a computer program for executing an instant message service application to generate and transmit a message, which, when executed, causes the computing device to:

Receiving a voice message from a user of the computing device,

Generating corresponding voice data from the input voice message,

Generating text data corresponding to the voice message based on the voice data;

Generating a data packet by combining the voice data and the text data, and

And one or more computer-executable instructions for making the actions executable, including sending the data packet to an integrated server providing an instant message service.
As a method performed on a receiving terminal that executes an instant message service application to receive a message, the receiving terminal is connected to an integrated server providing an instant message service and data communication is possible through a network,

Receiving a data packet transmitted by the transmitting terminal from the integrated server;

Obtaining voice data from the data packet;

Generating text data corresponding to the voice message based on the voice data;

Reproducing a voice message corresponding to the voice data based on the voice data; And

Based on the text data, displaying a text message corresponding to the text data in response to reproduction of the voice message

Method performed on the receiving terminal comprising a.
As a method performed on a receiving terminal that executes an instant message service application to receive a message, the receiving terminal is connected to an integrated server providing an instant message service and data communication is possible through a network,

Receiving a data packet transmitted by the transmitting terminal from the integrated server;

Obtaining text data from the data packet;

Determining that speech data corresponding to the text data cannot be obtained from the data packet;

Generating the voice data corresponding to the text data based on the text data;

Reproducing a voice message corresponding to the voice data based on the voice data; And

Based on the text data, displaying a text message corresponding to the text data in response to reproduction of the voice message

Method performed on the receiving terminal comprising a.
As a method performed on a receiving terminal that executes an instant message service application to receive a message, the receiving terminal is capable of data communication through a network with a relay server providing an instant message service, and the receiving terminal is also capable of receiving the instant message. It is connected to be able to communicate directly with the transmitting terminal running the service application,

Receiving a notification message for a data packet transmitted by the transmitting terminal from the relay server;

Receiving the data packet from the transmitting terminal in response to the notification message;

Obtaining voice data from the data packet;

Generating text data corresponding to the voice message based on the voice data;

Reproducing a voice message corresponding to the voice data based on the voice data; And

Based on the text data, displaying a text message corresponding to the text data in response to reproduction of the voice message

Method performed on the receiving terminal comprising a.
As a method performed on a receiving terminal that executes an instant message service application to receive a message, the receiving terminal is capable of data communication through a network with a relay server providing an instant message service, and the receiving terminal is also capable of receiving the instant message. It is connected to be able to communicate directly with the transmitting terminal running the service application,

Receiving a notification message for a data packet transmitted by the transmitting terminal from the relay server;

Receiving the data packet from the transmitting terminal in response to the notification message;

Obtaining text data from the data packet;

Determining that speech data corresponding to the text data cannot be obtained from the data packet;

Generating the voice data corresponding to the text data based on the text data;

Reproducing a voice message corresponding to the voice data based on the voice data; And

Based on the text data, displaying a text message corresponding to the text data in response to reproduction of the voice message

Method performed on the receiving terminal comprising a.
As a receiving terminal for executing an instant message service application and receiving an instant message,

A communication module configured to receive a data packet transmitted by a transmitting terminal from a server or the transmitting terminal;

A data acquisition module, configured to acquire at least one of voice data or text data from the data packet;

A data supplement module including a speech recognition module and a speech generation module; And

Based on the voice data, a playback module configured to play back a voice message corresponding to the voice data, and based on the text data, configured to display a text message corresponding to the text data in response to the reproduction of the voice message Output module including a display module

Including,

If the data acquisition module is unable to acquire the voice data from the data packet and acquire the text data corresponding to the voice data, the data supplement module causes the voice recognition module to cause the text corresponding to the voice data. To generate data,

When the data acquisition module is unable to acquire the text data from the data packet and acquire speech data corresponding to the text data, the data supplement module causes the speech generation module to generate the speech data corresponding to the text data. It is to be generated, the receiving terminal.
The method of claim 13,

The communication module,

Receiving a notification message for the data packet from the server, and

In response to the notification message, it is configured to receive the data packet from the transmitting terminal, the receiving terminal.
The method of claim 13,

The data acquisition module is configured to acquire character data including information about a character displayable by the display module from the data packet,

The output module is configured to cause the display module to display the character together with the display of the text message and reproduction of the voice message based on the character data.
A computer-readable storage medium storing a computer program for executing an instant message service application and receiving a message, the computer program, when executed, causes the computing device to:

Receiving a data packet transmitted by the transmitting terminal from a server or transmitting terminal;

Obtaining text data from the data packet;

Determining that speech data corresponding to the text data cannot be obtained from the data packet;

Generating the voice data corresponding to the text data based on the text data;

Reproducing a voice message corresponding to the voice data based on the voice data; And

And based on the text data, comprising one or more computer-executable instructions to make operations executable, including displaying a text message corresponding to the text data in response to reproduction of the voice message. Possible storage media.