WO2020145428A1

WO2020145428A1 - Terminal

Info

Publication number: WO2020145428A1
Application number: PCT/KR2019/000304
Authority: WO
Inventors: 양시영; 박용철; 장주영; 채종훈; 한성민
Original assignee: 엘지전자 주식회사
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2020-07-16

Abstract

A terminal is disclosed. A terminal according to an embodiment of the present invention comprises: a communication unit for communicating with an external device; a sound output unit for outputting a voice; and a processor for receiving a message from a transmitting-side device, receiving a synthetic voice to which voice characteristics of a user of the transmitting-side device are applied, and outputting the synthetic voice.

Description

terminal

The present invention relates to a terminal for outputting a synthesized voice to which a voice characteristic of a user who transmitted a message is applied.

Artificial intelligence is a field of computer science and information technology that studies how computers can do thinking, learning, and self-development that human intelligence can do. It means to be able to imitate.

In addition, artificial intelligence does not exist by itself, but is directly or indirectly associated with other fields of computer science. In particular, in recent years, attempts have been made to actively utilize artificial intelligence elements in various fields of information technology to solve problems in those fields.

On the other hand, recently, a technology for reading text into a specific human voice by extracting the characteristics of the voice of a specific person and applying the extracted properties to convert the text into voice has emerged.

However, these techniques are limited to reading texts in the voice of one's own voice or some celebrities or voice actors.

The present invention is to solve the above-described problems, and an object of the present invention is to provide a terminal that outputs a synthesized voice to which a voice characteristic of a user who has transmitted a message is applied.

The terminal according to an embodiment of the present invention, a communication unit for communicating with an external device, a sound output unit for outputting a voice, and receiving a message from a transmitting device, the voice characteristics of the user of the transmitting device is applied to the message And a processor that receives the synthesized voice and outputs the synthesized voice.

According to the prior art, the terminal reads the message in the voice of a voice actor or an entertainer. However, according to the present invention, since the terminal reads the message with the voice of the person who sent the message, the user can generate an effect such as directly hearing the voice of the sender of the message.

In addition, according to the present invention, the user has the advantage of being able to distinguish who the originator of the message is only with the synthetic voice.

In the present invention, since the processor can output a synthesized voice without outputting a guide as to who the message was received from, the user can generate an effect such as a user directly hearing the voice of the sender of the message.

According to the present invention, since the terminal receives the synthesized voice in advance and stores it in the memory and outputs the stored synthesized voice when a user input is received, there is an advantage in that the synthesized voice can be output without delay.

According to the present invention, even when the message includes symbols and emoticons as well as letters and numbers, there is an advantage that only text that can be converted into speech can be extracted and output in the voice of the user of the transmitting device.

According to the present invention, even when the power of the transmitting side device is off or the communication with the transmitting side device is poor, there is an advantage of being able to output the synthesized voice to which the voice characteristics of the user of the transmitting side device are applied.

Synthetic speech should not be given to anyone, because it concerns personal privacy. Therefore, according to the present invention, the transmission-side device can determine whether the user who received the message requests the synthesized voice using the key corresponding to the transmission-side device 1100, thereby countering the synthetic voice hacking of the third party. can do.

In addition, according to the present invention, by using a key corresponding to the terminal (R_key), there is an advantage that can be provided only to other people who are authorized to receive the synthesized voice.

In addition, according to the present invention, by storing a key (R_key) corresponding to the terminal, there is an advantage that can provide a list of others who received the synthesized voice to the user later.

When a plurality of users participate in a conversation, it is not possible to know who sent the message without looking at the chat room, and when the voice prompts to guide who sent the message are outputted, it is not possible to respond to messages input at a high speed. However, according to the present invention, even when a message is received from a plurality of senders, the user has an advantage of quickly grasping who the sender of the message is only by voice output.

In addition, according to the present invention, after the output of a specific synthesized voice is completed, by outputting another synthesized voice, there is an advantage that the content of the message can be clearly transmitted to the user.

Further, according to the present invention, when the second message is output while the synthesized voice is being output, the terminal outputs the synthesized voice and the second synthesized voice together. Accordingly, according to the present invention, since it is possible to respond to messages input at a high speed, and synthesized voice is output according to the speed at which the actual message is received, it is possible to give the user the effect of participating in a real conversation.

According to the present invention, the transmission side device may set the access authority for the synthesized voice, and the terminal for which the access authority is not set cannot receive the synthesized voice. Accordingly, according to the present invention, the user of the transmission-side device has an advantage of selecting a user to provide characteristics of his/her voice.

1 is a block diagram illustrating a terminal related to the present invention.

2 is a flowchart illustrating a method of operating a terminal according to an embodiment of the present invention.

3 to 4 are diagrams for explaining a method of outputting a synthesized voice in which a user's voice characteristics are applied to a message according to an embodiment of the present invention.

5 is a view for explaining a method of receiving a synthesized voice together with a message from a transmitting device.

6 to 7 are diagrams for explaining a method for a terminal to receive a synthesized voice after extracting and transmitting text from a message.

8 to 9 are diagrams for describing an operation method when a message is received from a plurality of transmission-side devices.

10 is a diagram for explaining a method of delivering different messages to a plurality of different users from a transmission-side device perspective according to an embodiment of the present invention.

11 is a diagram for describing an operation in which a synthesized voice is not provided to a terminal according to an embodiment of the present invention.

Hereinafter, exemplary embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar elements are assigned the same reference numbers regardless of the reference numerals, and overlapping descriptions thereof will be omitted. The suffixes "modules" and "parts" for components used in the following description are given or mixed only considering the ease of writing the specification, and do not have meanings or roles distinguished from each other in themselves. In addition, in the description of the embodiments disclosed herein, when it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed herein, detailed descriptions thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in the present specification, and the technical spirit disclosed in the specification is not limited by the accompanying drawings, and all modifications included in the spirit and technical scope of the present invention , It should be understood to include equivalents or substitutes.

Terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are used only for the purpose of distinguishing one component from other components.

When an element is said to be "connected" or "connected" to another component, it is understood that other components may be directly connected to or connected to the other component, but there may be other components in between. It should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle.

The terminals described herein include mobile phones, smart phones, laptop computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation, and slate PCs. It may include a tablet PC (tablet PC), ultrabook (ultrabook), wearable device (wearable device, for example, a watch-type terminal (smartwatch), glass-type terminal (smart glass), HMD (head mounted display), etc. .

1 is a block diagram illustrating a terminal related to the present invention.

The terminal 100 according to the embodiment described in this specification may also be applied to a fixed terminal such as a smart TV, a desktop computer, and a digital signage.

In addition, the terminal 100 according to an embodiment of the present invention may be applied to a fixed or movable robot.

In addition, the terminal 100 according to an embodiment of the present invention may perform the function of a voice agent. The voice agent may be a program that recognizes a user's voice and outputs a response suitable for the recognized user's voice as a voice.

The terminal 100 includes a wireless communication unit 110, an input unit 120, a running processor 130, a sensing unit 140, an output unit 150, an interface unit 160, a memory 170, a processor 180, and It may include a power supply 190.

The wireless communication unit 110 may include at least one of a broadcast reception module 111, a mobile communication module 112, a wireless Internet module 113, a short-range communication module 114, and a location information module 115.

The broadcast receiving module 111 receives a broadcast signal and/or broadcast related information from an external broadcast management server through a broadcast channel.

The mobile communication module 112 includes technical standards or communication methods for mobile communication (eg, Global System for Mobile Communication (GSM), Code Division Multi Access (CDMA), Code Division Multi Access 2000 (CDMA2000), EV -Enhanced Voice-Data Optimized or Enhanced Voice-Data Only (DO), Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), LTE-A (Long Term Evolution-Advanced) transmits and receives wireless signals to and from at least one of a base station, an external terminal, and a server on a mobile communication network constructed according to (Long Term Evolution-Advanced).

The wireless Internet module 113 refers to a module for wireless Internet access, and may be built in or external to the terminal 100. The wireless Internet module 113 is configured to transmit and receive wireless signals in a communication network according to wireless Internet technologies.

Wireless Internet technologies include, for example, WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA (Digital Living Network Alliance), WiBro (Wireless Broadband), WiMAX (World) Interoperability for Microwave Access (HSDPA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), and Long Term Evolution-Advanced (LTE-A).

The short-range communication module 114 is for short-range communication, Bluetooth™, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wideband (UWB), ZigBee, NFC (Near Field Communication), by using at least one of Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, Wireless USB (Wireless Universal Serial Bus) technology, it can support short-range communication.

The location information module 115 is a module for acquiring a location (or current location) of a mobile terminal, and typical examples thereof include a Global Positioning System (GPS) module or a Wireless Fidelity (WiFi) module. For example, if the terminal utilizes a GPS module, the location of the mobile terminal can be obtained by using a signal from a GPS satellite.

The input unit 120 may include a camera 121 for inputting a video signal, a microphone 122 for receiving an audio signal, and a user input unit 123 for receiving information from a user.

The voice data or image data collected by the input unit 120 may be analyzed and processed by a user's control command.

The input unit 120 is for input of image information (or signals), audio information (or signals), data, or information input from a user. For input of image information, the terminal 100 includes one or more cameras It may be provided with (121).

The camera 121 processes image frames such as still images or moving pictures obtained by an image sensor in a video call mode or a shooting mode. The processed image frame may be displayed on the display unit 151 or stored in the memory 170.

The microphone 122 processes external sound signals as electrical voice data. The processed voice data may be used in various ways according to a function (or a running application program) being performed by the terminal 100. Meanwhile, various noise reduction algorithms for removing noise generated in the process of receiving an external sound signal may be implemented in the microphone 122.

The user input unit 123 is for receiving information from a user. When information is input through the user input unit 123,

The processor 180 may control the operation of the terminal 100 to correspond to the inputted information.

The user input unit 123 is a mechanical input means (or a mechanical key, for example, a button located on the front or rear or side of the terminal 100, a dome switch, a jog wheel, a jog switch, etc.) ) And a touch-type input means. As an example, the touch-type input means is made of a virtual key, a soft key or a visual key displayed on the touch screen through software processing, or a part other than the touch screen It may be made of a touch key (touch key) disposed on.

The learning processor 130 may be configured to receive, classify, store, and output information to be used for data mining, data analysis, intelligent decision making, and machine learning algorithms and techniques.

The learning processor 130 may be received, detected, detected, generated, predefined, or otherwise output by the terminal, or communicated with other components, devices, terminals, or terminals in a received, detected, detected, generated, predefined, or otherwise manner It may include one or more memory units configured to store data output by the device.

The learning processor 130 may include a memory integrated or implemented in a terminal. In some embodiments, the learning processor 130 may be implemented using the memory 170.

Alternatively or additionally, the learning processor 130 may be implemented using memory associated with the terminal, such as external memory coupled directly to the terminal or memory maintained in a server communicating with the terminal.

In another embodiment, the learning processor 130 may be implemented using memory maintained in a cloud computing environment, or other remote memory location accessible by a terminal through a communication method such as a network.

The learning processor 130 typically includes one or more databases for identifying, indexing, categorizing, manipulating, storing, retrieving, and outputting data for use in supervised or unsupervised learning, data mining, predictive analytics, or other machines. It can be configured to store on.

The information stored in the learning processor 130 can be utilized by one or more other controllers of the processor 180 or terminal using any of a variety of different types of data analysis algorithms and machine learning algorithms.

Examples of these, algorithms, k-near neighbor systems, fuzzy logic (eg probability theory), neural networks, Boltzmann machines, vector quantization, pulse neural networks, support vector machines, maximum margin classifier, hill climbing, inductive logic system Bayesian network , Peritnet (e.g. finite state machine, milli machine, moore finite state machine), classifier tree (e.g. perceptron tree, support vector tree, Markov tree, decision tree forest, random forest), stake model and system, artificial Convergence, sensor fusion, image fusion, reinforcement learning, augmented reality, pattern recognition, automated planning, and more.

The processor 180 may determine or predict at least one executable action of the terminal based on the generated information, or determined using data analysis and machine learning algorithms. To this end, the processor 180 may request, search, receive, or utilize data of the learning processor 130, and may use the terminal to perform a predicted operation or an operation determined to be preferable among the at least one executable operation. Can be controlled.

The processor 180 may perform various functions for implementing intelligent emulation (ie, a knowledge-based system, a reasoning system, and a knowledge acquisition system). It can be applied to various types of systems (eg, fuzzy logic systems), including adaptive systems, machine learning systems, artificial neural networks, and the like.

The processor 180 also involves speech and natural language speech processing, such as an I/O processing module, an environmental condition module, a speech-to-text (STT) processing module, a natural language processing module, a work flow processing module, and a service processing module. It may include sub-modules that enable calculation.

Each of these sub-modules can have access to one or more systems or data and models at the terminal, or a subset or superset thereof. In addition, each of these sub-modules can provide various functions, including vocabulary index, user data, work flow model, service model, and automatic speech recognition (ASR) system.

In other embodiments, other aspects of the processor 180 or terminal may be implemented with the submodules, systems, or data and models.

In some examples, based on data from the learning processor 130, the processor 180 may be configured to detect and detect requirements based on the user's intention or contextual conditions expressed in user input or natural language input.

The processor 180 may actively derive and acquire information necessary to completely determine a requirement based on a context condition or a user's intention. For example, the processor 180 may actively derive information necessary to determine a requirement by analyzing historical data including historical input and output, pattern matching, unambiguous words, and input intention.

The processor 180 may determine a task flow for executing a function that responds to a requirement based on a context condition or a user's intention.

The processor 180 collects, detects, extracts, and detects signals or data used in data analysis and machine learning operations through one or more sensing components in the terminal, in order to collect information for processing and storage in the learning processor 130 And/or receive.

Collecting information may include sensing information through a sensor, extracting information stored in the memory 170, or receiving information from another terminal, entity, or external storage device through communication means.

The processor 180 may collect and store usage history information in the terminal.

The processor 180 may use the stored usage history information and predictive modeling to determine the best match for executing a specific function.

The processor 180 may receive or sense surrounding environment information or other information through the sensing unit 140.

The processor 180 may receive a broadcast signal and/or broadcast-related information, a radio signal, and radio data through the radio communication unit 110.

The processor 180 may receive image information (or a corresponding signal), audio information (or a corresponding signal), data, or user input information from the input unit 120.

The processor 180 collects information in real time, processes or classifies information (for example, a knowledge graph, command policy, personalization database, conversation engine, etc.), and processes the processed information in the memory 170 or the learning processor 130 ).

When the operation of the terminal is determined based on data analysis and machine learning algorithms and techniques, the processor 180 can control the components of the terminal to perform the determined operation. In addition, the processor 180 may perform the determined operation by controlling the terminal according to the control command.

When a specific operation is performed, the processor 180 analyzes historical information indicating execution of a specific operation through data analysis and machine learning algorithms and techniques, and performs updating of previously learned information based on the analyzed information. Can.

Accordingly, the processor 180 may improve the accuracy of future performance of data analysis and machine learning algorithms and techniques based on the updated information along with the learning processor 130.

The sensing unit 140 may include one or more sensors for sensing at least one of information in the mobile terminal, surrounding environment information surrounding the mobile terminal, and user information.

For example, the sensing unit 140 includes a proximity sensor 141, an illumination sensor 142, a touch sensor, an acceleration sensor, a magnetic sensor, and gravity G-sensor, gyroscope sensor, motion sensor, RGB sensor, infrared sensor (IR sensor), fingerprint scan sensor, ultrasonic sensor , Optical sensor (e.g., camera (see 121)), microphone (see 122, battery), battery gauge, environmental sensor (e.g. barometer, hygrometer, thermometer, radioactivity sensor, Thermal sensor, gas sensor, etc.), chemical sensors (for example, electronic nose, health care sensor, biometric sensor, etc.). Meanwhile, the mobile terminal disclosed in the present specification may combine and use information sensed by at least two or more of these sensors.

The output unit 150 is for generating output related to vision, hearing, or tactile sense, and includes at least one of a display unit 151, an audio output unit 152, a hap tip module 153, and an optical output unit 154 can do.

The display unit 151 displays (outputs) information processed by the terminal 100. For example, the display unit 151 may display execution screen information of an application program driven by the terminal 100, or UI (User Interface) or GUI (Graphic User Interface) information according to the execution screen information.

The display unit 151 may form a mutual layer structure with the touch sensor or may be integrally formed, thereby realizing a touch screen. The touch screen may function as a user input unit 123 that provides an input interface between the terminal 100 and the user, and at the same time, provide an output interface between the terminal 100 and the user.

The audio output unit 152 may output audio data received from the wireless communication unit 110 or stored in the memory 170 in a call signal reception, call mode or recording mode, voice recognition mode, broadcast reception mode, or the like.

The audio output unit 152 may include at least one of a receiver, a speaker, and a buzzer.

The haptic module 153 generates various tactile effects that the user can feel. A typical example of the tactile effect generated by the haptic module 153 may be vibration.

The light output unit 154 outputs a signal for notifying the occurrence of an event using the light of the light source of the terminal 100. Examples of events generated in the terminal 100 may include receiving messages, receiving call signals, missed calls, alarms, schedule notifications, receiving emails, and receiving information through applications.

The interface unit 160 serves as a passage with various types of external devices connected to the terminal 100. The interface unit 160 connects a device equipped with a wired/wireless headset port, an external charger port, a wired/wireless data port, a memory card port, and an identification module. It may include at least one of a port, an audio input/output (I/O) port, a video input/output (I/O) port, and an earphone port. In the terminal 100, in response to an external device being connected to the interface unit 160, appropriate control related to the connected external device may be performed.

Meanwhile, the identification module is a chip that stores various information for authenticating the usage rights of the terminal 100, a user identification module (UIM), a subscriber identity module (SIM), and a universal user authentication module (universal subscriber identity module; USIM). The device provided with the identification module (hereinafter referred to as an'identification device') may be manufactured in a smart card format. Therefore, the identification device may be connected to the terminal 100 through the interface unit 160.

The memory 170 stores data supporting various functions of the terminal 100.

The memory 170 is a plurality of application programs (application programs or applications) running in the terminal 100, data for the operation of the terminal 100, instructions, data for the operation of the running processor 130 Fields (eg, at least one algorithm information for machine learning, etc.).

The processor 180 controls the overall operation of the terminal 100 in addition to the operations related to the application program. The processor 180 may provide or process appropriate information or functions to the user by processing signals, data, and information input or output through the above-described components or by driving an application program stored in the memory 170.

In addition, the processor 180 may control at least some of the components described with reference to FIG. 1A in order to drive an application program stored in the memory 170. Furthermore, the processor 180 may operate by combining at least two or more of the components included in the terminal 100 for driving the application program.

Under the control of the processor 180, the power supply unit 190 receives external power and internal power to supply power to each component included in the terminal 100. The power supply unit 190 includes a battery, and the battery may be a built-in battery or a replaceable battery.

Meanwhile, as described above, the processor 180 controls the operation related to the application program and generally the overall operation of the terminal 100. For example, when the state of the mobile terminal satisfies a set condition, the processor 180 may execute or release a lock state that restricts input of a user's control command to applications.

Meanwhile, the input unit 120 of the terminal 100 may include the sensing unit 140, and may perform all functions performed by the sensing unit 140. For example, the input unit 120 may detect a user touch input.

Meanwhile, the term wireless communication unit 110 may be used interchangeably with the term communication unit 110.

The method of operation of a terminal according to an embodiment of the present invention includes receiving a message from a transmitting device (S210), receiving a synthesized voice with a voice characteristic of a user of the transmitting device applied to the message (S230), and synthesized voice It may include the step of outputting (S250).

First, synthetic speech will be described.

The synthesized voice in the present specification may be a voice whose voice characteristics are applied to a message.

Specifically, the message can be converted into speech. In this case, a text-to-speech (TTS) technique can be used to convert the message to speech.

In this case, the voice characteristics of a specific person can be applied to the message. Specifically, the device may extract a characteristic of a specific person's voice from a specific person's voice, and convert the message to voice by applying the extracted characteristic. In this case, the terminal can utter the message in the voice of a specific person.

As described above, when a message is converted into voice by applying a voice characteristic of a specific person, the converted voice may be referred to as a synthesized voice.

Meanwhile, in the present specification, the synthesized voice in which the voice characteristic of the user of the transmitting side device is applied to the message may mean that the message is converted into voice by applying the voice characteristic of the user of the transmitting side device.

The specific operation of the present invention will be described with reference to FIGS. 3 and 4.

The transmission-side device may be a terminal, and in this case, description of the configuration and function of the terminal 100 described in FIG. 1 may be applied to all of the transmission-side devices.

The processor 180 of the terminal 100 may receive a message from the transmitting device.

Specifically, the processor of the transmitting side device may receive an input of a message from a user of the transmitting side device. In this case, the processor of the transmitting side device may transmit a message to the terminal 100.

In this case, the processor 180 of the terminal 100 may receive a message through the communication unit 110.

On the other hand, the meaning that the processor 180 of the terminal 100 receives the message from the transmitting device is not only receiving the message directly from the transmitting device, but also receiving the message sent by the transmitting device to the server from the server. Can include up to.

Meanwhile, the processor 180 of the terminal 100 may receive a synthesized voice in which a voice characteristic of the user of the transmitting side device is applied to the message.

Specifically, the processor 180 of the terminal 100 may receive, from a transmitting device or a server, a synthesized voice in which a voice characteristic of a user of the transmitting device is applied to a message.

On the other hand, when a message is received from the transmitting device, the processor 180 of the terminal 100 may receive a synthesized voice in which a voice characteristic of a user of the transmitting device is applied to the message.

For example, suppose that Hong Gil-dong entered the message "hungry" on his terminal.

In this case, the terminal of Hong Gil-dong may receive a message “hungry” from Hong Gil-dong and transmit a message “hungry” to the terminal 100.

Meanwhile, the processor 180 of the terminal 100 may receive a message “hungry” from the terminal of Hong Gil-dong.

In addition, the processor 180 of the terminal 100 may receive the synthesized voice applied to the message that the voice characteristic of Hong Gil-dong is “hungry” from the terminal or server of Hong Gil-dong.

In this case, the processor 180 of the terminal 100 may output a voice of “hungry” as the voice of Hong Gil-dong.

For another example, suppose Sung Chun-hyang entered the message "I'm hungry" on his terminal.

In this case, the terminal of Sung Chun-hyang can receive the message “I am hungry” from Sung Chun-Hyang, and transmit the message “I am hungry” to the terminal 100.

Meanwhile, the processor 180 of the terminal 100 may receive the message “I am hungry” from the terminal of Sungchunhyang.

In addition, the processor 180 of the terminal 100 may receive the synthesized voice applied to the message “I am hungry” for the voice characteristic of Sung Chun-Hyang from the terminal or server of the Sung Chun-Hyang.

In this case, the processor 180 of the terminal 100 may output the voice “I'm not hungry” as the voice of Sungchunhyang.

Meanwhile, the processor 180 of the terminal 100 may store the received synthesized voice in a memory.

In addition, as illustrated in FIG. 3, the processor 180 may output the stored synthesized voice 320.

Meanwhile, when an input for outputting the synthesized voice is received through the input unit, the processor 180 may output the synthesized voice 320 stored in the memory through the sound output unit.

For example, the processor 180 may receive a voice input of “Please read the received message”. In this case, the processor 180 may output the synthesized voice stored in the memory.

Meanwhile, the processor 180 may output a synthesized voice before displaying a message.

In the general case, the terminal 100 displays a message after executing an application for display. For example, when the message is a message input into a chat room of the messenger application, the terminal 100 displays the message by executing the messenger application.

However, in the present invention, when an input for outputting a synthesized voice is received through the input unit without executing an application for displaying a message, the processor 180 can output the synthesized voice 320 stored in the memory through the sound output unit. have.

For example, as illustrated in FIG. 3, the processor 180 may output a synthesized voice stored in a memory in a state where the standby screen 310 is displayed, that is, in a standby mode.

In addition, when an application for displaying a message is executed as shown in FIG. 4, the processor may display the execution screen 410 of the message application. In addition, the processor may display the message 420 received from the transmission-side device on the execution screen 410 of the message application.

For example, when a message is read with a voice of a voice actor or a celebrity, the user cannot know who the message was from. Therefore, according to the prior art, when a message is read with a voice of a voice actor or a celebrity, the terminal outputs a voice “It is a message received from Hong Gil-dong” and then outputs a voice corresponding to the message.

However, in the present invention, since the processor 180 can output a synthesized voice without outputting a guide about who the message was received from, the processor 180 can generate an effect such as a user directly listening to the voice of the sender of the message. .

In addition, according to the present invention, since the terminal 100 receives the synthesized voice in advance and stores it in the memory and outputs the stored synthesized voice when a user input is received, there is an advantage that the synthesized voice can be output without delay.

Meanwhile, the processor 180 may receive a synthesized voice applied to text included in the message of the voice characteristic of the user of the transmitting device.

Here, the message may include'character or number' and'emoticon or symbol'.

For example, referring to FIG. 4, the message 420 may include a text 421 and a sign 422.

Meanwhile, the text corresponding to the message may include letters or numbers.

For example, referring to FIG. 4, text included in the message 420 may include a character 421.

Meanwhile, the processor 180 may receive the message 420 from the transmitting side device, and receive a synthesized voice applied to text included in the message 420 with the voice characteristic of the user of the transmitting side device.

For example, the processor 180 may receive a message “I just came home~ ^^;;” from the transmitting device. Also, the processor 180 may receive the synthesized voice applied to the text “I just came home” for the voice characteristics of the user of the transmitting device.

As described above, according to the present invention, even when the message includes symbols and emoticons as well as letters and numbers, there is an advantage that only text that can be converted into speech can be extracted and output in the voice of the user of the transmitting device.

5 to 7 illustrate a method for generating and receiving synthetic speech according to various embodiments of the present invention.

Previously, it has been described that the text is converted to speech by applying the voice characteristics of the user of the transmitting device.

In addition, the speech synthesis engine may generate a speech synthesized by converting text into speech by applying the speech characteristics of the user of the transmitting device.

Specifically, the speech synthesis engine may retain speech characteristics extracted from the speech of the user of the transmitting-side device, and convert text to speech using speech characteristics of the user of the transmitting-side device.

When speech conversion is implemented by artificial intelligence technology, the speech synthesis engine may be a learning model generated by training a neural network based on a Hidden Markov Model (HMM) or deep learning.

Specifically, the neural network may be trained by training data including voice and text of the user of the transmitting device based on a hidden Markov Model (HMM) or deep learning. In this case, the parameters of the neural network are updated, and the neural network in which the parameters are set by repeating the above process may be referred to as a learning model.

In addition, when new text is input to the learning model, the learning model may output a synthesized speech in which the voice characteristics of the user of the transmitting device are applied to the new text.

The processor of the terminal 100 may receive the synthesized voice applied to the text included in the message, the voice characteristics of the user of the transmitting side device 1100 together with the message, from the transmitting side device 1100.

Here, the meaning of receiving the synthesized voice together with the message means that the terminal 100 receiving the message does not send a request to the transmitting device 1100 (for example, does not transmit text or a key), and the transmitting device 1100 may mean that the synthesized voice is transmitted and the terminal 100 receives the synthesized voice.

First, an embodiment in which the speech synthesis engine is mounted on the transmission-side device 1100 will be described with reference to FIG. 5A.

The processor of the transmission-side device 1100 may receive an input of a message from a user of the transmission-side device through the input unit (S505).

In this case, the processor of the transmission-side device 1100 may extract text from the received message (S510) and input the extracted text into the speech synthesis engine.

In this case, the voice synthesis engine may generate a synthesized voice in which the voice characteristics of the user of the transmitting side device are applied to the text (S515).

In addition, the processor of the transmitting side device 1100 may transmit a message to the terminal 100. In this case, the processor of the transmitting-side device 1100 may transmit a synthesized voice in which the voice characteristics of the user of the transmitting-side device 1100 are applied to the text together with the message (S520).

In this case, the processor 180 of the terminal 100 may receive a synthesized voice in which a voice characteristic of a user of the transmitting side device is applied to text together with a message from the transmitting side device.

In addition, the processor 180 of the terminal 100 may output the received synthesized voice (S520).

Next, an embodiment in which the speech synthesis engine is mounted on the server 2100 communicating with the transmission-side device 1100 will be described with reference to FIG. 5B.

The processor of the transmission-side device 1100 may receive an input of a message from a user of the transmission-side device through the input unit (S555).

In this case, the processor of the transmitting-side device 1100 may extract text from the received message and transmit the extracted text to the server 2100 (S560).

Then, when the server 2100 inputs the received text into the speech synthesis engine, the speech synthesis engine may generate a synthesized speech in which the voice characteristics of the user of the transmitting device are applied to the text.

In this case, the server 2100 may transmit the synthesized voice in which the voice characteristics of the user of the transmitting side device is applied to the text to the transmitting side device 1100 (S565).

In addition, the processor of the transmitting side device 1100 may transmit a message to the terminal 100. In this case, the processor of the transmission-side device 1100 may transmit a synthesized voice in which a voice characteristic of the user of the transmission-side device 1100 is applied to text together with a message (S570).

Then, the processor 180 of the terminal 100 may output the received synthesized voice (S575).

When the message is received, the processor 180 of the terminal 100 may transmit text to a transmitting device or a server, and receive a synthesized voice in which a user's voice characteristics of the transmitting device are applied to the text from the transmitting device or server. have.

First, an embodiment in which text is transmitted to a transmission-side device and a speech synthesis engine is mounted on the transmission-side device will be described with reference to FIG. 6A.

The processor of the transmission-side device 1100 may receive an input of a message from a user of the transmission-side device through an input unit.

In this case, the processor of the transmitting side device 1100 may transmit a message to the terminal 100 (S605).

Meanwhile, the processor 180 of the terminal 100 may receive a message and extract text from the received message.

Then, the processor 180 of the terminal 100 may transmit the extracted text to the transmission side device 1100 (S610).

In this case, the processor of the transmission-side device 1100 may receive text and input the received text into a speech synthesis engine.

In this case, the speech synthesis engine may generate a synthesized speech in which the voice characteristics of the user of the transmitting device are applied to the text.

Then, the processor of the transmitting side device 1100 may transmit the synthesized voice to the terminal 100 (S615).

In addition, the processor 180 of the terminal 100 may output the received synthesized voice (S620).

Next, an embodiment in which text is transmitted to a device on a transmission side and a speech synthesis engine is mounted on a server will be described with reference to FIG. 6B.

In this case, the processor of the transmitting side device 1100 may transmit a message to the terminal 100 (S655).

In addition, the processor 180 of the terminal 100 may transmit the extracted text to the transmission side device 1100 (S660).

In this case, the processor of the transmitting side device 1100 may receive text and transmit the received text to the server 2100 (S665).

In this case, the server 2100 may receive text and input the received text into a speech synthesis engine.

In addition, the server 2100 may transmit the synthesized voice to the transmitting device 1100 (S670).

In this case, the processor of the transmitting side device 1100 may receive the synthesized voice and transmit the synthesized voice to the terminal 100 (S675).

Then, the processor 180 of the terminal 100 may output the received synthesized voice (S680).

The terminal 100 may fail to receive the synthesized voice. Examples are when the power of the transmitting device 1100 is off or when communication with the transmitting device 1100 is poor.

In this case, the terminal 100 may output a synthesized voice in which a predetermined voice characteristic (celebrity, voice actor, machine sound, etc.) is applied to the message.

Next, an embodiment in which a text is transmitted to a server and a speech synthesis engine is mounted on the server will be described with reference to FIG. 7.

In this case, the processor of the transmission-side device 1100 may transmit a message to the terminal 100 (S710).

Then, the processor 180 of the terminal 100 may transmit the extracted text to the server 2100 (S715).

In this case, the server 2100 may receive text and input it into a speech synthesis engine.

And the server 2100 may transmit the synthesized voice to the terminal 100 (S720).

In this case, the processor 180 of the terminal 100 may receive the synthesized voice and output the synthesized voice (S725).

According to the embodiment of FIG. 7, even when the power of the transmitting device 1100 is off or the communication with the transmitting device 1100 is poor, the synthesized voice to which the voice characteristics of the user of the transmitting device 1100 is applied is output. There is an advantage to do.

Meanwhile, the process of FIGS. 6A to 7 may be implemented using a key (S_key) corresponding to a transmission-side device and a key (R_key) corresponding to a terminal.

Here, the key S_key corresponding to the transmission-side device may mean identification information unique to the transmission-side device 1100.

In addition, a key (R_key) corresponding to the receiving device may mean identification information unique to the terminal 100.

This will be described with reference to FIGS. 6A to 7 again.

Referring to FIG. 6A, the processor of the transmission-side device 1100 may receive an input of a message from a user of the transmission-side device through an input unit.

In this case, the processor of the transmission-side device 1100 may transmit a message and a key (S_key) corresponding to the transmission-side device 1100 to the terminal 100 (S605).

In addition, the processor 180 of the terminal 100 may receive a key S_key corresponding to the transmission-side device 100 together with a message.

In addition, the processor 180 of the terminal 100 may transmit the extracted text, a key (S_key) corresponding to the transmission-side device 100, and a key (R_key) corresponding to the terminal 100 to the transmission-side device 1100. There is (S610). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In this case, the processor of the transmission-side device 1100 may determine whether the key received from the terminal 100 is the same as the key S_key corresponding to the transmission-side device 1100.

In addition, when the key received from the terminal 100 is the same as the key S_key corresponding to the transmission-side device 1100, the processor of the transmission-side device 1100 may input the received text into the speech synthesis engine.

On the other hand, when a key (S_key) corresponding to the transmission device 1100 and a key (R_key) corresponding to the terminal 100 are received, the processor of the transmission device 1100, the key received from the terminal 100 It may be determined whether is the same as the key (S_key) corresponding to the transmitting device (1100). Also, the processor of the transmitting side device 1100 may determine whether the terminal 100 has the authority to receive the synthesized voice based on the key R_key corresponding to the received terminal 100.

Specifically, the memory of the transmitting side device 1100 may store information about whether a right to receive the synthesized voice exists for each of a plurality of terminals. For example, when the terminal 100 has the authority to receive the synthesized voice, a key R_key corresponding to the terminal 100 may be stored in the memory of the transmitting device 1100.

In addition, the processor of the transmitting side device 1100 may determine whether the terminal 100 is authorized to receive the synthesized voice based on the key R_key corresponding to the received terminal 100 and information stored in the memory.

And if the key received from the terminal 100 is the same as the key (S_key) corresponding to the transmitting device 1100, and the terminal 100 has the authority to receive the synthesized voice, the processor of the transmitting device 1100 Can input the received text into the speech synthesis engine.

Meanwhile, the transmission-side device 1100 may store a key R_key corresponding to the received terminal 100 in a memory.

Referring to FIG. 6B, the processor of the transmitting device 1100 may receive an input of a message from a user of the transmitting device through an input unit.

In this case, the processor of the transmission-side device 1100 may transmit a message and a key (S_key) corresponding to the transmission-side device 1100 to the terminal 100 (S655).

Meanwhile, the processor 180 of the terminal 100 may receive a message and extract text from the received message. In addition, the processor 180 of the terminal 100 may receive a key S_key corresponding to the transmission-side device 100 together with a message.

In addition, the processor 180 of the terminal 100 may transmit the extracted text, a key (S_key) corresponding to the transmission-side device 100, and a key (R_key) corresponding to the terminal 100 to the transmission-side device 1100. Yes (S660). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In addition, when the key received from the terminal 100 is the same as the key S_key corresponding to the transmission-side device 1100, the processor of the transmission-side device 1100 may transmit the received text to the server (S665).

In this case, the server 2100 transmits the synthesized voice to the transmitting device 1100 (S670), and the processor of the transmitting device 1100 can transmit the synthesized voice to the terminal 100 (S675).

And if the key received from the terminal 100 is the same as the key (S_key) corresponding to the transmitting device 1100, and the terminal 100 has the authority to receive the synthesized voice, the processor of the transmitting device 1100 Can transmit the received text to the server (S665).

Referring to FIG. 7, the transmission-side device 1100 may share a key S_key corresponding to the transmission-side device 1100 with the server 2100 in advance (S705).

Specifically, the transmission-side device 1100 may transmit a key S_key corresponding to the transmission-side device 1100 to the server 2100 in advance. In this case, the server 2100 may store a key S_key corresponding to the transmission-side device 1100 in a memory in the server.

Meanwhile, the processor of the transmitting device 1100 may receive an input of a message from a user of the transmitting device through the input unit.

In this case, the processor of the transmission-side device 1100 may transmit a message and a key (S_key) corresponding to the transmission-side device 1100 to the terminal 100 (S710).

In addition, the processor 180 of the terminal 100 may transmit the extracted text, a key (S_key) corresponding to the transmission-side device 100, and a key (R_key) corresponding to the terminal 100 to the server 2100 ( S715). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In this case, the server 2100 may determine whether the key received from the terminal 100 is the same as the key S_key corresponding to the transmission-side device 1100.

In addition, when the key received from the terminal 100 is the same as the key S_key corresponding to the transmission-side device 1100, the server 2100 may input the received text into the speech synthesis engine.

When the voice synthesis engine outputs the synthesized voice, the server 2100 transmits the synthesized voice to the terminal 100 (S720), and the processor of the terminal 100 may reinvent the synthesized voice (S725).

On the other hand, when a key (S_key) corresponding to the device 1100 of the transmitting side and a key (R_key) corresponding to the terminal 100 are received, the server 2100 receives a key (R_key) corresponding to the received terminal 100 Can be stored in memory on the server.

Also, when a log request is received from the transmitting device 1100, the server 2100 may transmit a key R_key corresponding to the received terminal 100 to the transmitting device 1100.

Synthetic speech should not be given to anyone, because it concerns personal privacy.

Therefore, according to the present invention, the transmitting device 1100 may determine whether the user who received the message requests the synthesized voice using the key (S_key) corresponding to the transmitting device 1100, and accordingly, the third It can fight against synthetic speech hacking.

In addition, according to the present invention, by using a key (R_key) corresponding to the terminal 100 has the advantage that can be provided only to other people who are authorized to receive the synthesized voice.

In addition, according to the present invention, by storing the key (R_key) corresponding to the terminal 100, there is an advantage that can provide a list of others who received the synthesized voice to the user later.

The processor 180 of the terminal 100 receives the message 920 from the transmission-side device 1100, receives the synthesized voice applied to the text in which the voice characteristics of the user of the transmission-side device 1100 are included in the message, Synthetic voice can be output.

In addition, the processor 180 of the terminal 100 receives the second message 930 from the second transmission-side device 1200, and the voice characteristics of the user of the second transmission-side device 1200 are included in the second message. The second synthesized voice applied to the text may be received and the synthesized voice may be output.

In addition, the processor 180 of the terminal 100 receives the third message 940 from the third transmission-side device 1300, and the voice characteristic of the user of the third transmission-side device 1300 is included in the third message The third synthesized voice applied to the text may be received, and the third synthesized voice may be output.

An operation related to this will be briefly described with an example in which text is transmitted to a server and a speech synthesis engine is mounted on the server.

Referring to FIG. 8, the transmission-side device 1100 may share the first key S_key corresponding to the transmission-side device 1100 with the server 2100 in advance.

In addition, the server 2100 may be equipped with a first speech synthesis engine that generates synthesized speech by applying the speech characteristics of the transmission-side device 1100 to text.

Meanwhile, the processor of the transmission-side device 1100 may transmit a message and a first key S_key corresponding to the transmission-side device 1100 to the terminal 100 (S805).

Meanwhile, the processor 180 of the terminal 100 may receive a message and extract text from the received message. In addition, the processor 180 of the terminal 100 may receive a first key S_key corresponding to the transmission-side device 100 together with a message.

In addition, the processor 180 of the terminal 100 may transmit the extracted text, a first key (S_key) corresponding to the transmission-side device 100, and a key (R_key) corresponding to the terminal 100 to the server 2100. Yes (S810). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In this case, the server 2100 may determine whether the key received from the terminal 100 is the same as the first key S_key corresponding to the transmission-side device 1100.

In addition, when the key received from the terminal 100 is the same as the first key S_key corresponding to the transmission-side device 1100, the server 2100 may input the received text into the speech synthesis engine.

When the voice synthesis engine outputs the synthesized voice, the server 2100 may transmit the synthesized voice to the terminal 100 (S815).

Also, referring to FIG. 8, the second transmission-side device 1200 may share the second key S_key corresponding to the second transmission-side device 1200 with the second server 2200 in advance.

In addition, the second server 2200 may be equipped with a second speech synthesis engine that generates synthesized speech by applying speech characteristics of the second transmission-side device 1200 to text.

Meanwhile, the processor of the second transmission-side device 1200 may transmit the second message and the second key S_key corresponding to the second transmission-side device 1200 to the terminal 100 (S820).

Meanwhile, the processor 180 of the terminal 100 may receive the second message and extract the second text from the received second message. Also, the processor 180 of the terminal 100 may receive a second key S_key corresponding to the second transmission-side device 100 together with the second message.

Then, the processor 180 of the terminal 100 extracts the extracted second text, the second key (S_key) corresponding to the second transmission-side device 1200, and the key (R_key) corresponding to the terminal 100 as the second server. It can be transmitted to (2200) (S825). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In this case, the second server 2200 may determine whether the key received from the terminal 100 is the same as the second key S_key corresponding to the second transmission-side device 1200.

And when the key received from the terminal 100 is the same as the second key (S_key) corresponding to the second transmission-side device 1200, the second server 2200 transmits the received second text to the second speech synthesis engine. You can type.

When the second speech synthesis engine outputs the second synthesis speech, the second server 2200 may transmit the second synthesis speech to the terminal 100 (S830).

Also, referring to FIG. 8, the third transmission-side device 1300 may share a third key S_key corresponding to the third transmission-side device 1300 with the third server 2300 in advance.

In addition, the third server 2300 may be equipped with a third speech synthesis engine that generates synthesized speech by applying speech characteristics of the third transmission-side device 1300 to text.

Meanwhile, the processor of the third transmission side device 1300 may transmit the third message (S_key) corresponding to the third message and the third transmission side device 1300 to the terminal 100 (S835).

Meanwhile, the processor 180 of the terminal 100 may receive the third message and extract the third text from the received third message. Also, the processor 180 of the terminal 100 may receive a third key S_key corresponding to the third transmission-side device 100 together with the third message.

Then, the processor 180 of the terminal 100 extracts the extracted third text, the third key (S_key) corresponding to the third transmission-side device 1300, and the key (R_key) corresponding to the terminal 100 to the third server. It can be transmitted to (2300) (S840). Here, the process of transmitting the key R_key corresponding to the terminal 100 may be omitted.

In this case, the third server 2200 may determine whether the key received from the terminal 100 is the same as the third key S_key corresponding to the third transmission-side device 1300.

And when the key received from the terminal 100 is the same as the third key (S_key) corresponding to the third transmission-side device 1300, the third server 2300 sends the received third text to the third speech synthesis engine. You can type.

When the third speech synthesis engine outputs the third speech synthesis, the third server 2300 may transmit the third speech synthesis to the terminal 100 (S845).

On the other hand, referring to Figure 9, the user of the transmission side device 1100, the user of the second transmission side device 1200 and the user of the third transmission side device 1300 are users who participated in one chat room 910 of the messenger application Can be

In this case, the message 920, the second message 930, and the third message 940 may be messages input to one chat room 910.

In addition, the processor 180 of the terminal 100 includes a message 920 received from the transmitting device 1100, a second message 930 received from the second transmitting device 1200, and a third transmitting device 1300. ) May display the third message 940 received from one chat room 910.

Meanwhile, when an input for outputting a synthesized voice is received through the input unit, the processor 180 may output a plurality of synthesized voices corresponding to a plurality of messages inputted into one chat room by a plurality of transmission-side devices.

Specifically, when an input for outputting the synthesized voice is received, the processor 180 of the terminal 100 outputs the synthesized voice corresponding to the message 920 received from the transmitting device 1100, and the second without the additional input. The synthesized voice corresponding to the message 930 received from the transmitting device 1200 may be output.

Meanwhile, the processor 180 of the terminal 100 may output synthesized voices in the order in which the messages are received.

Specifically, when the message 920 is first received among the message 920 received from the transmission-side device 1100 and the second message 930 received from the second transmission-side device 1200, the processor of the terminal 100 May output the synthesized voice in which the voice characteristic of the user of the transmitting side device 1100 is applied to the text included in the message 920 received from the device 1100.

Then, after starting to output the synthesized voice that applies the voice characteristics of the user of the transmitting device 1100 to the text included in the message 920, the processor of the terminal 100 receives the message received from the second device 1200 ( The output of the second synthesized voice in which the voice characteristics of the user of the second transmission side device 1200 is applied to the text included in 930 may be started.

Meanwhile, the processor of the terminal 100 may start outputting the second synthesized voice when the output of the synthesized voice is finished.

For example, the processor of the terminal 100 may output a voice saying “I just came home”. And when the output of the voice “I just came home” is finished, the processor of the terminal 100 may output the voice “I am still going home”.

Meanwhile, the processor of the terminal 100 may start outputting the second synthesized voice while the synthesized voice is being output.

Specifically, when the second message 930 is received while the message 920 is received and the synthesized voice is being output, the processor of the terminal 100 transmits the second transmission-side device to the text included in the second message 930 ( 1200) may receive a second synthesized voice to which the user's voice characteristics are applied.

In addition, the processor of the terminal 100 may start outputting the second synthesized voice while the synthesized voice is being output.

That is, the processor of the terminal 100 may output the synthesized voice and the second second synthesized voice together.

For example, the processor of the terminal 100 may output a voice saying “I just came home”. Then, in the state of outputting “I just”, the processor of the terminal 100 may output a voice of “I am at home” and a voice of “I am still at home”. In addition, the processor of the terminal 100 may output a voice of “going” while the output of the voice of “I just came home” is finished.

As described above, according to the present invention, the user has the advantage of being able to distinguish who the originator of the message is using only synthetic speech. For example, if a message is read with a voice of a voice actor or a celebrity, the user cannot know who the message was from.

In particular, when a plurality of users participate in a conversation, it is not possible to know who sent the message without looking at the chat room, and when outputting a voice prompting who is sending the message, it is not possible to respond to messages input at a high speed. .

However, according to the present invention, even when a message is received from a plurality of senders, the user has an advantage of quickly grasping who the sender of the message is only by voice output.

Referring to FIG. 10, the transmission-side device 1100 may share the first-first key S_key and the first-second key S_key corresponding to the transmission-side device 1100 with the server 2100 in advance. .

Here, the first-first key S_key is identification information unique to the transmission-side device 1100 and may be a key for authorizing access to the synthesized voice.

On the other hand, the 1-2 key (S_key) may be a key that does not approve access to the synthesized voice as identification information unique to the transmitting device 1100.

Meanwhile, the processor of the transmission-side device 1100 may set the access authority for the synthesized voice according to the time zone.

For example, the processor of the transmission-side device 1100 transmits the 1-1 key (S_key) to the terminal from 8 am to 9 pm, and the 1-2 key (S_key) from 9 pm to 8 am To the terminal.

Meanwhile, the processor of the transmission-side device 1100 may set access authority for a preset time.

For example, when the processor of the transmitting device 1100 grants access to the terminal for three hours, the first-1 key (S_key) may be transmitted to the terminal. In addition, if the first-first key (S_key) is received from the terminal within three hours after transmitting the first-first key (S_key) to the terminal, the processor or server of the transmitting-side device 1100 provides the synthesized voice to the terminal. can do. In addition, when the first-first key (S_key) is received from the terminal after three hours have elapsed since the first-first key (S_key) was transmitted to the terminal, the processor or server of the transmitting-side device 1100 sends the synthesized voice to the terminal. May not be provided.

Meanwhile, the processor of the transmission-side device 1100 may set access rights to the synthesized voice for each of a plurality of

terminals

100, 3200, 3300, and 3400 based on an input received from a user. In this case, the processor of the transmitting side device 1100 may set the access authority in advance based on the user input, or may set the access authority at the time of sending the message based on the user input.

For example, the processor of the transmission-side device 1100 may set the terminal 100 to not allow access to the synthesized voice, and to allow

other terminals

3200, 3300, and 3400 to access the synthesized voice.

In this case, the processor of the transmission-side device 1100 may transmit the message and the 1-1 key S_key corresponding to the transmission-side device 1100 to the second terminal 3200. In this case, the second terminal 3200 may transmit the text and the first-first key (S_key) to the server 2100 and receive the synthesized voice from the server 2100.

Also, the processor of the transmission-side device 1100 may transmit the message and the 1-1 key S_key corresponding to the transmission-side device 1100 to the third terminal 3300. In this case, the third terminal 3300 may transmit the text and the first-first key (S_key) to the server 2100 and receive the synthesized voice from the server 2100.

Meanwhile, the processor of the transmission-side device 1100 may transmit a message and a 1-2 key (S_key) corresponding to the transmission-side device 1100 to the terminal 100. In this case, the terminal 100 may transmit the text and the 1-2 key (S_key) to the server 2100.

However, the server 2100 does not provide the synthesized voice to the terminal 100.

This will be described with reference to FIG. 11.

First, the case in which the terminal 100 receives the 1-2 key (S_key) (a key that does not approve access to the synthesized voice) will be described.

The processor 180 of the terminal 100 may receive the 1-2 key (S_key) and the first message from the transmitting device 1100 (S1105). Here, the 1-2 key S_key is identification information unique to the transmission-side device 1100 and may be a key that does not approve access to the synthesized voice.

Meanwhile, the processor 180 of the terminal 100 may extract the first text from the received first message.

In addition, the processor 180 of the terminal 100 may transmit the extracted first text and the received 1-2 key (S_key) to the server 2100 (S1110).

Meanwhile, information on the first-first key S_key and the first-second key S_key may be stored in the memory of the server 2100.

In addition, when the key received from the terminal 100 is a 1-2 key (S_key), the server 2100 may not provide the synthesized voice to the terminal 100 (S1115).

Meanwhile, the terminal 100 has failed to receive the synthesized voice. In this case, the terminal 100 may output a synthesized voice in which a predetermined voice characteristic (celebrity, voice actor, machine sound, etc.) is applied to the first message.

The following describes a case in which the terminal 100 receives the first-first key (S_key) (the key for authorizing access to the synthesized voice).

The processor 180 of the terminal 100 may receive the first-first key (S_key) and the first message from the transmission-side device 1100. Here, the first-first key S_key is identification information unique to the transmission-side device 1100 and may be a key for authorizing access to the synthesized voice.

On the other hand, the processor 180 of the terminal 100, if the key (key 1) to approve access to the synthesized voice (key 1-1) corresponding to the transmitting device 1100, the transmitting device 1100 The user's voice characteristics may receive the synthesized voice applied to the text.

Specifically, the processor 180 of the terminal 100 may extract the first text from the received first message.

In addition, the processor 180 of the terminal 100 may transmit the extracted first text and the received first-1 key (S_key) to the server 2100.

Meanwhile, the server 2100 may determine whether the key received from the terminal 100 is a key that authorizes access to the synthesized voice.

In addition, the server 2100 may transmit the synthesized voice to the terminal 100 when the key received from the terminal 100 is a 1-1 key (S_key) that authorizes access to the synthesized voice.

In this case, the processor of the terminal 100 may receive the synthesized voice with the voice characteristic of the transmitting device 1100 applied to the first message, and output the received synthesized voice.

Therefore, according to the present invention, the transmission side device 1100 may set the access authority for the synthesized voice, and the terminal for which the access authority is not set cannot receive the synthesized voice.

Accordingly, according to the present invention, the user of the transmission-side device has an advantage of selecting a user to provide characteristics of his/her voice.

On the other hand, the processor 180 is a configuration in charge of controlling a device in general, and may be used interchangeably with terms such as a central processing unit, a microprocessor, and a control unit.

Meanwhile, the terminal 100 according to an embodiment of the present invention may be an audio book. In this case, the processor 180 may output a plurality of synthesized voices by applying voice characteristics of a plurality of people to each of the plurality of texts.

The above-described present invention can be embodied as computer readable codes on a medium on which a program is recorded. The computer-readable medium includes all kinds of recording devices in which data readable by a computer system is stored. Examples of computer-readable media include a hard disk drive (HDD), solid state disk (SSD), silicon disk drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. This includes, and is also implemented in the form of a carrier wave (eg, transmission over the Internet). In addition, the computer may include a control unit 180 of the terminal. Accordingly, the above detailed description should not be construed as limiting in all respects, but should be considered illustrative. The scope of the invention should be determined by rational interpretation of the appended claims, and all changes within the equivalent scope of the invention are included in the scope of the invention.

Claims

A communication unit communicating with an external device;

An audio output unit for outputting voice; And

And a processor for receiving a message from a transmitting device, receiving a synthesized voice having a user's voice characteristic applied to the message, and outputting the synthesized voice.

terminal.
According to claim 1,

The processor,

Receiving the synthesized voice applied to the text included in the message and outputting the synthesized voice, the voice characteristic of the user of the transmitting side device

terminal.
According to claim 2,

The above message,

Contains ‘letters or numbers’ and ‘emoticons or symbols’,

The text included in the message,

Containing letters or numbers

terminal.
According to claim 1,

A memory for storing data; And

Further comprising an input unit for receiving input from the user,

The processor,

The synthesized voice is stored in the memory, and when the input for outputting the synthesized voice is received, the synthesized voice is output.

terminal.
The method of claim 4,

Further comprising a display unit for displaying the image,

The processor,

Outputting the synthesized voice before displaying the message

terminal.
According to claim 2,

The processor,

Receive a second message from a second sending-side device, receive a second synthesized voice applied to text included in the second message, and output the second synthesized voice from a user of the second transmitting-side device doing

terminal.
The method of claim 6,

The user of the transmission-side device and the user of the second transmission-side device are users participating in one chat room of the messenger application,

The message and the second message,

The message that is entered in the one chat room

terminal.
The method of claim 7,

The processor,

When the message is first received among the message and the second message, outputting the synthesized voice first

terminal.
The method of claim 8,

The processor,

When the output of the synthesized voice is finished, the output of the second synthesized voice is started or the output of the second synthesized voice is started while the synthesized voice is being output.

terminal.
The method of claim 8,

The processor,

If the second message is received while the synthesized voice is being output, starting to output the second synthesized voice while the synthesized voice is being output
According to claim 2,

The processor,

Receiving, from the transmitting-side device, the synthesized voice applied to the text with a voice characteristic of the user of the transmitting-side device together with the message

terminal.
According to claim 2,

The processor,

When the message is received, the text is transmitted to the transmission-side device or server, and the synthesized voice applied to the text with the voice characteristics of the user of the transmission-side device is received from the transmission-side device or the server.

terminal.
According to claim 1,

The processor,

A key corresponding to the transmitting-side device is received together with the message, and a key corresponding to the transmitting-side device is transmitted together with the text.

terminal.
The method of claim 13,

The processor,

If the key corresponding to the transmitting device is a key that authorizes access to the synthesized voice, the voice characteristic of the user of the transmitting device receives the synthesized voice applied to the text.

terminal.
Receiving a message from the transmitting device;

Receiving a synthesized voice in which a voice characteristic of the user of the transmitting side device is applied to the message; And

And outputting the synthesized speech.

terminal.