KR20160138613A

KR20160138613A - Method for auto interpreting using emoticon and apparatus using the same

Info

Publication number: KR20160138613A
Application number: KR1020150072656A
Authority: KR
Inventors: 김현호; 김상훈; 박준
Original assignee: 한국전자통신연구원
Priority date: 2015-05-26
Filing date: 2015-05-26
Publication date: 2016-12-06

Abstract

An automatic interpretation method and apparatus using emoticons are disclosed. According to another aspect of the present invention, there is provided an automatic interpretation method using an emoticon, comprising the steps of: obtaining speech data from a speaker and translating the speech data to generate text data; Analyzing the text data to extract emotion information; Selecting an emoticon corresponding to the emotion information; And outputting the output data obtained by combining the text data and the emoticons to the other party.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an automatic interpretation method using an emoticon,

BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to a voice recognition-based automatic interpretation technology that can be used in a portable user terminal device. More particularly, the present invention relates to an automatic interpretation technology for recognizing a speaker's intention or emotion state, The present invention relates to an automatic interpretation technology that enables efficient communication by outputting a plurality of images.

In general, the automatic interpretation system translates the recognized voice and then transmits the translated contents to the other party only with the synthesized voice and text. However, such a method can deliver the sentences intended by the speaker, but it is difficult to accurately convey the feelings, speech, and intention of the speaker. In addition, there is a problem in that the contents of the intention of the speaker can not be properly transmitted, thereby causing misunderstanding.

Therefore, there is a need for a new automatic interpretation technology that can more accurately convey the intention and feelings of a speaker even during communication through automatic interpretation.

Korean Patent No. 10-0883352, published on Feb. 5, 2009 (Name: Real Emoticon System for Emotion and Physician Expression Method in Remote Dialogue)

It is an object of the present invention to make it easier for an opponent to understand speech contents intended by a speaker in an automatic interpretation service and to facilitate communication.

It is also an object of the present invention to improve the accuracy in using emoticons by determining the use of the emoticons through simple interaction with a user using an input tool such as a touch screen included in the user terminal.

It is also an object of the present invention to enable the other party to feel emotions and tone of a speaker as well as contents of a simple communication in an automatic interpretation service.

According to another aspect of the present invention, there is provided an automatic interpretation method using an emoticon, the method comprising: acquiring speech data from a speaker and translating the speech data to generate text data; Analyzing the text data and selecting an emoticon corresponding to the text data; And generating output data by combining the text data and the emoticon, and outputting the output data to the other party.

In this case, the selecting step may include extracting a characteristic feature of the text data from a morpheme analysis result of the text data; And extracting emotional information by inputting the unique feature into a learning model for emotional analysis, wherein the emoticon includes a plurality of emoticons corresponding to the emotional information, Emoticons can be selected.

In this case, the step of outputting may include a step of determining whether or not an emoticon deletion input is generated from the speaker, and when the emoticon deletion input occurs, the emoticon may be deleted from the output data and output.

At this time, the learning model may be generated based on at least one learning data composed of the text data and the emoticons.

In this case, the automatic interpretation method may further include updating the learning model using the deletion history in which the emoticon is deleted from the output data.

At this time, the output data may be output corresponding to at least one of the test and the voice.

At this time, the emoticons database may store the plurality of emoticons by category according to the emotion information.

According to another aspect of the present invention, there is provided an automatic interpretation apparatus using an emoticon, the apparatus comprising: a text data generation unit for acquiring speech data from a speaker and translating the speech data to generate text data; An emoticon selecting unit for analyzing the text data and selecting an emoticon corresponding to the text data; And an output unit for generating output data by combining the text data and the emoticons and outputting the output data to the other party.

In this case, the emoticon selection unit may include an inherent feature extraction unit that extracts the intrinsic feature of the text data from the morpheme analysis result of the text data; And an emotion information extracting unit for extracting emotion information by inputting the inherent characteristic into a learning model for emotional analysis, wherein one of the plurality of emoticons stored in the emoticon database is an emoticon corresponding to the emotion information, You can select from the corresponding emoticons.

In this case, the output unit may determine whether or not the emoticon delete input is generated from the speaker, and delete the emoticon from the output data when the emoticon delete input occurs, and output the emoticon.

In this case, the automatic interpretation apparatus may further include a learning model update unit that updates the learning model using the deletion history from which the emoticons are deleted from the output data.

At this time, the output data may be output corresponding to at least one of the text and the voice.

According to the present invention, it is possible for a person to understand the uttered contents of the automatic interpretation service more easily so that communication can be facilitated.

In addition, the present invention can improve the accuracy when using the emoticons by determining the use of the emoticons through simple interaction with a user using an input tool such as a touch screen included in the user terminal.

Further, in the automatic interpretation service of the present invention, not only the communication contents but also the emotion or tone of the speaker can be felt by the other party.

1 is a flowchart illustrating an automatic interpretation method using an emoticon according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an operation of selecting an emoticon shown in FIG. 1. Referring to FIG.
3 is a flowchart illustrating a learning process of a learning model according to an embodiment of the present invention.
4 is a diagram illustrating a process of generating a learning model according to an embodiment of the present invention.
5 is a block diagram illustrating an automatic interpretation apparatus using emoticons according to an embodiment of the present invention.
6 is a block diagram showing an example of the emoticon selecting unit shown in FIG.
7 is a view showing an output screen of an automatic interpretation apparatus according to an embodiment of the present invention.

The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a flowchart illustrating an automatic interpretation method using an emoticon according to an embodiment of the present invention.

Referring to FIG. 1, an automatic interpretation method using an emoticon according to an embodiment of the present invention acquires voice data from a speaker and translates voice data to generate text data (S110).

For example, it can be assumed that a user corresponding to a speaker interacts with an automatic transponder using a personal computer (PC) or a mobile device as an intermediary. Therefore, it is possible to recognize the speech signal of the speaker with the automatic translator, acquire the voice data, and translate the voice data to generate text data.

Further, the text data may correspond to data generated by text by translating the speech data into a language corresponding to the other party of the speaker. For example, supposing that the speaker uses English and the other uses Korean, he can acquire voice data in English, translate it into Korean, and generate text data in Korean.

At this time, the language setting for the voice data and the language setting for the text data can be set by the user. In addition, the language can be set using a language database included in the existing automatic interpretation technology.

At this time, functions of speech recognition and translation can use existing established techniques.

In addition, in the automatic interpretation method using emoticons according to an embodiment of the present invention, text data is analyzed and an emoticon corresponding to text data is selected (S120).

Conventional automatic interpretation technology has difficulty to transmit the intended meaning completely because it output synthesized voice or simply convey the recognized sentence. In other words, although the contents of the speaker's utterances can be conveyed in the form of sentences, the intention through the speaker's feelings or words was not transmitted. As a result, the contents intended by the speaker are not properly transmitted to the other party, which may cause misunderstanding.

Accordingly, in the present invention, the emoticon corresponding to the speaker's feelings can be transmitted along with the uttered contents, so that the emotions and feelings of the speaker can be felt by the other party.

At this time, the characteristic feature of the text data can be extracted from the result of morpheme analysis of the text data.

In this case, the morpheme analysis may correspond to the process of dividing the sentence uttered by the speaker into morpheme units, which is the smallest unit of words. For example, the word 'story book' can be morphed into 'story' and 'book'.

Therefore, it is possible to divide sentences uttered by the speaker into morpheme units, and then extract unique features based on the divided morphemes.

At this time, the inherent characteristic may include the form of the sentence, the tone of the sentence or the atmosphere. For example, when a speaker analyzes a sentence that is spoken, if the sentence includes a morpheme that appears in the sentence, it can be judged that the sentence has a unique characteristic called a sentence.

At this time, the emotion information can be extracted by inputting the unique characteristic into the learning model for emotion analysis. For example, emotion information corresponding to various unique features that can be extracted based on the morpheme may be matched, and matching specific emotion information may be found and provided when specific intrinsic features are input. In addition, there may be a learning model using various methods in addition to a method of matching and extracting unique features and emotion information.

In addition, the learning model can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that it can be implemented by a similar method or another approach.

At this time, among the plurality of emoticons stored in the emoticons database, any one emoticon corresponding to the emotion information can be selected as an emoticon corresponding to the text data.

At this time, the learning model can be generated based on at least one learning data composed of text data and emoticons. That is, learning can be performed using the characteristic features extracted when the text data included in the learning data is morpheme analyzed and the emotion information corresponding to the emoticons. In this way, learning using more learning data may improve the accuracy of the learning model.

At this time, the emoticons database can store a plurality of emoticons by category according to emotion information. For example, if it is assumed that categories such as joy, sadness, anger, and the like are stored and emoticons corresponding to the categories are respectively stored, the emotion information extracted through the learning model is compared with categories and stored in the same category Emoticons can be used.

In addition, an automatic interpretation method using an emoticon according to an embodiment of the present invention combines text data and emoticon to generate output data, and outputs the output data to the other party (S130). That is, the emoticon capable of expressing the emotion can be output together with the text data indicating the contents, so that the speaker can easily understand the intended utterance contents and can easily communicate with the other party. In addition, it is possible to obtain a difference from other interpretation systems and apparatuses by enabling more familiar communication between the speaker and the other party.

At this time, the emoticon may be included in the middle of the text data and outputted according to the time when the emotion of the speaker is extracted.

At this time, it is possible to judge whether or not the emoticon deletion input is generated from the speaker.

At this time, if the emoticon delete input occurs, the emoticon can be deleted from the output data and output.

For example, when the emoticon combined with text data is not the emoticon the speaker intended, the speaker can delete the emoticon using an input tool such as a touch screen.

Also, the case where the user deletes and outputs the emoticons can be analyzed to learn that the unique features of the sentences and the deleted emoticons are not matched again in the future.

At this time, the output data may be output corresponding to at least one of the text and the voice. That is, the contents of the output data can be outputted as text through the screen or the contents of the output data can be outputted as a voice by using a synthesized sound. It is also possible to output text and audio simultaneously.

Also, although not shown in FIG. 1, the automatic interpretation method using an emoticon according to an embodiment of the present invention can update the learning model using the deletion history in which the emoticons are deleted from the output data. That is, if the speaker deletes the emoticon, the feedback is sent to the learning model for selecting the emoticon, and after that, the sentence can be prevented from being combined with the wrong sentence.

By carrying out communication between the speaker and the other party by using the automatic interpretation method as described above, it is possible for the other party to easily understand the contents of the speech intended by the speaker, thereby facilitating the communication and also enabling more familiar communication.

In addition, accuracy can be continuously improved by increasing the accuracy of emoticon determination through simple interaction with a user using an input tool such as a touch screen.

FIG. 2 is a flowchart illustrating an operation of selecting an emoticon shown in FIG. 1. Referring to FIG.

Referring to FIG. 2, the process of selecting the emoticons shown in FIG. 1 analyzes morphemes of text data (S210). At this time, the text data may correspond to the translation result through the translator.

Thereafter, characteristic features of the text data are extracted from the result of the morphological analysis (S220). At this time, the inherent characteristic may be in the form of data for obtaining emotion information corresponding to the text data.

Then, the intrinsic feature is input to the learning model to extract the emotion information (S230). For example, emotion information corresponding to emotions such as joy, sadness, anger, and surprise can be extracted.

Thereafter, an emoticon corresponding to the emotion information is selected from the emoticon database (S240). At this time, the emoticons database can store emoticons for each category corresponding to the emotion information.

3 is a flowchart illustrating a learning process of a learning model according to an embodiment of the present invention.

Referring to FIG. 3, the learning process of the learning model according to the embodiment of the present invention inputs learning data composed of text data and emoticons (S310). At this time, learning data including emoticons corresponding to various emotional information may be input so that the learning model can learn various data.

Thereafter, the text data included in the learning data is morpheme-analyzed (S320). In other words, the texts constituting the sentence can be divided into morphemes, which is the smallest unit of words. For example, the word 'hot water' can be morphed into 'hot' and 'water' meaning 'hot.'

Thereafter, unique features are extracted from the result of the morphological analysis (S330), and unique features are matched with the emotion information according to the emoticons (S340). Accordingly, the emotion information can be extracted based on the learned and matched information in which the specific inherent characteristic is input to the learning model, and the emoticon can be selected.

When the unique feature is matched according to the emoticons, the user may analyze the case of deleting the emoticons, and later learn that the deleted emoticons in the sentences having the same inherent characteristics are not matched.

4 is a diagram illustrating a process of generating a learning model according to an embodiment of the present invention.

Referring to FIG. 4, in the process of generating a learning model according to an embodiment of the present invention, learning is performed using learning data 410-1 to 410-N, and a learning model is generated based on the result have.

At this time, the learning data 410-1 to 410-N may be composed of text data capable of morphological analysis and emoticons corresponding to text data.

At this time, by analyzing the morpheme of the text data and extracting the characteristic feature, the characteristic feature information is matched with the emotion information corresponding to the emoticon, so that the matching emotion information can be extracted when the characteristic feature is actually input to the learning model.

At this time, the learning model can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that it can be implemented by a similar method or another approach.

5 is a block diagram illustrating an automatic interpretation apparatus using emoticons according to an embodiment of the present invention.

5, an automatic interpretation apparatus 500 using an emoticon according to an exemplary embodiment of the present invention includes a text data generation unit 510, an emoticon selection unit 520, an output unit 530, an emoticon database 540, And a learning model update unit 550.

The text data generation unit 510 acquires speech data from the speaker, and translates the speech data to generate text data.

The emoticon selection unit 520 analyzes the text data and selects an emoticon corresponding to the text data.

At this time, the emoticon database 540 can store a plurality of emoticons by category according to emotion information. For example, if it is assumed that categories such as joy, sadness, anger, and the like are stored and emoticons corresponding to the categories are respectively stored, the emotion information extracted through the learning model is compared with categories and stored in the same category Emoticons can be used.

The output unit 530 generates output data by combining the text data and the emoticon, and outputs the output data to the other party. That is, the emoticon capable of expressing the emotion can be output together with the text data indicating the contents, so that the speaker can easily understand the intended utterance contents and can easily communicate with the other party. In addition, it is possible to obtain a difference from other interpretation systems and apparatuses by enabling more familiar communication between the speaker and the other party.

The learning model update unit 550 can update the learning model using the deletion history in which the emoticons are deleted from the output data. That is, if the speaker deletes the emoticon, the feedback is sent to the learning model for selecting the emoticon, and after that, the sentence can be prevented from being combined with the wrong sentence.

By using the automatic interpreting device 500, the communication between the speaker and the other party can be performed, so that the other party easily understands the intended content of the speaker, thereby facilitating communication and more familiar communication.

In addition, the automatic interpretation apparatus 500 may be included in portable devices such as a PC, a mobile phone, a smart phone, a PDA, and a laptop.

6 is a block diagram showing an example of the emoticon selecting unit shown in FIG.

Referring to FIG. 6, the emoticon selection unit 520 shown in FIG. 5 may include an intrinsic feature extraction unit 610, an emotion information extraction unit 620, and a learning model 630.

The intrinsic feature extraction unit 610 extracts intrinsic features of the text data from the morpheme analysis result of the text data. At this time, the text data may correspond to the translation result through the translator. In addition, the inherent characteristic may be in the form of data for acquiring emotion information corresponding to the text data.

The emotion information extracting unit 620 extracts the emotion information by inputting the unique feature into the learning model 630 for emotion analysis. For example, emotion information corresponding to emotions such as joy, sadness, anger, and surprise can be extracted.

At this time, the learning model 630 can be learned through learning data composed of text data and emoticons. In addition, the learning model 630 can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that the learning model 630 can be implemented by a similar method or another approach.

7 is a view showing an output screen of an automatic interpretation apparatus according to an embodiment of the present invention.

Referring to FIG. 7, an output screen 710 output to the automatic interpretation apparatus through speech recognition after the speaker has spoken can be confirmed.

At this time, the output screen 710 may include the text data 730 and the output data 720 in which the emoticon 740 is combined.

At this time, since the emoticones 740 and 741 can be included at the time when the emotion of the speaker is extracted, they can be included in the middle of the sentence like the emoticon 741.

At this time, the output data 720 may be output to the speaker in a form including the emoticon delete button 750 before being output to the other party of the speaker.

Therefore, if the speaker directly confirms the output data 720 and the emoticon 740 included in the output data 720 is incorrect, the emoticon deletion button 750 may be input to delete the emoticon 740.

At this time, when the emoticon delete button 750 is input, only the emoticon 740 is deleted from the output data 720, and only the text data 730 is output to the other party.

As described above, the automatic interpreting method using the emoticon according to the present invention and the apparatus using the same can be applied to the configuration and method of the embodiments described above in a limited manner, All or some of the embodiments may be selectively combined.

410-1 to 410-N: learning data 420: learning model
500: Automatic interpretation device 510: Text data generation unit
520: emoticone selection unit 530: output unit
540: emoticons database 550: learning model update unit
610: Intrinsic feature extraction unit 620: Emotion information extraction unit
630: Learning Model 710: Output Screen
720: output data 730: text data
740, 741: Emoticon 750: Emoticon delete button

Claims

Acquiring speech data from a speaker and generating translated text data as text data;
Analyzing the text data to extract emotion information;
Selecting an emoticon corresponding to the emotion information; And
Outputting the output data obtained by combining the text data and the emoticons to the other party
The method of claim 1,