KR20160138613A - Method for auto interpreting using emoticon and apparatus using the same - Google Patents
Method for auto interpreting using emoticon and apparatus using the same Download PDFInfo
- Publication number
- KR20160138613A KR20160138613A KR1020150072656A KR20150072656A KR20160138613A KR 20160138613 A KR20160138613 A KR 20160138613A KR 1020150072656 A KR1020150072656 A KR 1020150072656A KR 20150072656 A KR20150072656 A KR 20150072656A KR 20160138613 A KR20160138613 A KR 20160138613A
- Authority
- KR
- South Korea
- Prior art keywords
- emoticon
- data
- speaker
- text data
- emoticons
- Prior art date
Links
Images
Classifications
-
- G06F17/28—
-
- G06F17/2755—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/02—Input arrangements using manually operated switches, e.g. using keyboards or dials
- G06F3/023—Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
- G06F3/0233—Character input methods
- G06F3/0235—Character input methods using chord techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
An automatic interpretation method and apparatus using emoticons are disclosed. According to another aspect of the present invention, there is provided an automatic interpretation method using an emoticon, comprising the steps of: obtaining speech data from a speaker and translating the speech data to generate text data; Analyzing the text data to extract emotion information; Selecting an emoticon corresponding to the emotion information; And outputting the output data obtained by combining the text data and the emoticons to the other party.
Description
BACKGROUND OF THE
In general, the automatic interpretation system translates the recognized voice and then transmits the translated contents to the other party only with the synthesized voice and text. However, such a method can deliver the sentences intended by the speaker, but it is difficult to accurately convey the feelings, speech, and intention of the speaker. In addition, there is a problem in that the contents of the intention of the speaker can not be properly transmitted, thereby causing misunderstanding.
Therefore, there is a need for a new automatic interpretation technology that can more accurately convey the intention and feelings of a speaker even during communication through automatic interpretation.
It is an object of the present invention to make it easier for an opponent to understand speech contents intended by a speaker in an automatic interpretation service and to facilitate communication.
It is also an object of the present invention to improve the accuracy in using emoticons by determining the use of the emoticons through simple interaction with a user using an input tool such as a touch screen included in the user terminal.
It is also an object of the present invention to enable the other party to feel emotions and tone of a speaker as well as contents of a simple communication in an automatic interpretation service.
According to another aspect of the present invention, there is provided an automatic interpretation method using an emoticon, the method comprising: acquiring speech data from a speaker and translating the speech data to generate text data; Analyzing the text data and selecting an emoticon corresponding to the text data; And generating output data by combining the text data and the emoticon, and outputting the output data to the other party.
In this case, the selecting step may include extracting a characteristic feature of the text data from a morpheme analysis result of the text data; And extracting emotional information by inputting the unique feature into a learning model for emotional analysis, wherein the emoticon includes a plurality of emoticons corresponding to the emotional information, Emoticons can be selected.
In this case, the step of outputting may include a step of determining whether or not an emoticon deletion input is generated from the speaker, and when the emoticon deletion input occurs, the emoticon may be deleted from the output data and output.
At this time, the learning model may be generated based on at least one learning data composed of the text data and the emoticons.
In this case, the automatic interpretation method may further include updating the learning model using the deletion history in which the emoticon is deleted from the output data.
At this time, the output data may be output corresponding to at least one of the test and the voice.
At this time, the emoticons database may store the plurality of emoticons by category according to the emotion information.
According to another aspect of the present invention, there is provided an automatic interpretation apparatus using an emoticon, the apparatus comprising: a text data generation unit for acquiring speech data from a speaker and translating the speech data to generate text data; An emoticon selecting unit for analyzing the text data and selecting an emoticon corresponding to the text data; And an output unit for generating output data by combining the text data and the emoticons and outputting the output data to the other party.
In this case, the emoticon selection unit may include an inherent feature extraction unit that extracts the intrinsic feature of the text data from the morpheme analysis result of the text data; And an emotion information extracting unit for extracting emotion information by inputting the inherent characteristic into a learning model for emotional analysis, wherein one of the plurality of emoticons stored in the emoticon database is an emoticon corresponding to the emotion information, You can select from the corresponding emoticons.
In this case, the output unit may determine whether or not the emoticon delete input is generated from the speaker, and delete the emoticon from the output data when the emoticon delete input occurs, and output the emoticon.
At this time, the learning model may be generated based on at least one learning data composed of the text data and the emoticons.
In this case, the automatic interpretation apparatus may further include a learning model update unit that updates the learning model using the deletion history from which the emoticons are deleted from the output data.
At this time, the output data may be output corresponding to at least one of the text and the voice.
At this time, the emoticons database may store the plurality of emoticons by category according to the emotion information.
According to the present invention, it is possible for a person to understand the uttered contents of the automatic interpretation service more easily so that communication can be facilitated.
In addition, the present invention can improve the accuracy when using the emoticons by determining the use of the emoticons through simple interaction with a user using an input tool such as a touch screen included in the user terminal.
Further, in the automatic interpretation service of the present invention, not only the communication contents but also the emotion or tone of the speaker can be felt by the other party.
1 is a flowchart illustrating an automatic interpretation method using an emoticon according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an operation of selecting an emoticon shown in FIG. 1. Referring to FIG.
3 is a flowchart illustrating a learning process of a learning model according to an embodiment of the present invention.
4 is a diagram illustrating a process of generating a learning model according to an embodiment of the present invention.
5 is a block diagram illustrating an automatic interpretation apparatus using emoticons according to an embodiment of the present invention.
6 is a block diagram showing an example of the emoticon selecting unit shown in FIG.
7 is a view showing an output screen of an automatic interpretation apparatus according to an embodiment of the present invention.
The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.
Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.
1 is a flowchart illustrating an automatic interpretation method using an emoticon according to an embodiment of the present invention.
Referring to FIG. 1, an automatic interpretation method using an emoticon according to an embodiment of the present invention acquires voice data from a speaker and translates voice data to generate text data (S110).
For example, it can be assumed that a user corresponding to a speaker interacts with an automatic transponder using a personal computer (PC) or a mobile device as an intermediary. Therefore, it is possible to recognize the speech signal of the speaker with the automatic translator, acquire the voice data, and translate the voice data to generate text data.
Further, the text data may correspond to data generated by text by translating the speech data into a language corresponding to the other party of the speaker. For example, supposing that the speaker uses English and the other uses Korean, he can acquire voice data in English, translate it into Korean, and generate text data in Korean.
At this time, the language setting for the voice data and the language setting for the text data can be set by the user. In addition, the language can be set using a language database included in the existing automatic interpretation technology.
At this time, functions of speech recognition and translation can use existing established techniques.
In addition, in the automatic interpretation method using emoticons according to an embodiment of the present invention, text data is analyzed and an emoticon corresponding to text data is selected (S120).
Conventional automatic interpretation technology has difficulty to transmit the intended meaning completely because it output synthesized voice or simply convey the recognized sentence. In other words, although the contents of the speaker's utterances can be conveyed in the form of sentences, the intention through the speaker's feelings or words was not transmitted. As a result, the contents intended by the speaker are not properly transmitted to the other party, which may cause misunderstanding.
Accordingly, in the present invention, the emoticon corresponding to the speaker's feelings can be transmitted along with the uttered contents, so that the emotions and feelings of the speaker can be felt by the other party.
At this time, the characteristic feature of the text data can be extracted from the result of morpheme analysis of the text data.
In this case, the morpheme analysis may correspond to the process of dividing the sentence uttered by the speaker into morpheme units, which is the smallest unit of words. For example, the word 'story book' can be morphed into 'story' and 'book'.
Therefore, it is possible to divide sentences uttered by the speaker into morpheme units, and then extract unique features based on the divided morphemes.
At this time, the inherent characteristic may include the form of the sentence, the tone of the sentence or the atmosphere. For example, when a speaker analyzes a sentence that is spoken, if the sentence includes a morpheme that appears in the sentence, it can be judged that the sentence has a unique characteristic called a sentence.
At this time, the emotion information can be extracted by inputting the unique characteristic into the learning model for emotion analysis. For example, emotion information corresponding to various unique features that can be extracted based on the morpheme may be matched, and matching specific emotion information may be found and provided when specific intrinsic features are input. In addition, there may be a learning model using various methods in addition to a method of matching and extracting unique features and emotion information.
In addition, the learning model can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that it can be implemented by a similar method or another approach.
At this time, among the plurality of emoticons stored in the emoticons database, any one emoticon corresponding to the emotion information can be selected as an emoticon corresponding to the text data.
At this time, the learning model can be generated based on at least one learning data composed of text data and emoticons. That is, learning can be performed using the characteristic features extracted when the text data included in the learning data is morpheme analyzed and the emotion information corresponding to the emoticons. In this way, learning using more learning data may improve the accuracy of the learning model.
At this time, the emoticons database can store a plurality of emoticons by category according to emotion information. For example, if it is assumed that categories such as joy, sadness, anger, and the like are stored and emoticons corresponding to the categories are respectively stored, the emotion information extracted through the learning model is compared with categories and stored in the same category Emoticons can be used.
In addition, an automatic interpretation method using an emoticon according to an embodiment of the present invention combines text data and emoticon to generate output data, and outputs the output data to the other party (S130). That is, the emoticon capable of expressing the emotion can be output together with the text data indicating the contents, so that the speaker can easily understand the intended utterance contents and can easily communicate with the other party. In addition, it is possible to obtain a difference from other interpretation systems and apparatuses by enabling more familiar communication between the speaker and the other party.
At this time, the emoticon may be included in the middle of the text data and outputted according to the time when the emotion of the speaker is extracted.
At this time, it is possible to judge whether or not the emoticon deletion input is generated from the speaker.
At this time, if the emoticon delete input occurs, the emoticon can be deleted from the output data and output.
For example, when the emoticon combined with text data is not the emoticon the speaker intended, the speaker can delete the emoticon using an input tool such as a touch screen.
Also, the case where the user deletes and outputs the emoticons can be analyzed to learn that the unique features of the sentences and the deleted emoticons are not matched again in the future.
At this time, the output data may be output corresponding to at least one of the text and the voice. That is, the contents of the output data can be outputted as text through the screen or the contents of the output data can be outputted as a voice by using a synthesized sound. It is also possible to output text and audio simultaneously.
Also, although not shown in FIG. 1, the automatic interpretation method using an emoticon according to an embodiment of the present invention can update the learning model using the deletion history in which the emoticons are deleted from the output data. That is, if the speaker deletes the emoticon, the feedback is sent to the learning model for selecting the emoticon, and after that, the sentence can be prevented from being combined with the wrong sentence.
By carrying out communication between the speaker and the other party by using the automatic interpretation method as described above, it is possible for the other party to easily understand the contents of the speech intended by the speaker, thereby facilitating the communication and also enabling more familiar communication.
In addition, accuracy can be continuously improved by increasing the accuracy of emoticon determination through simple interaction with a user using an input tool such as a touch screen.
FIG. 2 is a flowchart illustrating an operation of selecting an emoticon shown in FIG. 1. Referring to FIG.
Referring to FIG. 2, the process of selecting the emoticons shown in FIG. 1 analyzes morphemes of text data (S210). At this time, the text data may correspond to the translation result through the translator.
Thereafter, characteristic features of the text data are extracted from the result of the morphological analysis (S220). At this time, the inherent characteristic may be in the form of data for obtaining emotion information corresponding to the text data.
Then, the intrinsic feature is input to the learning model to extract the emotion information (S230). For example, emotion information corresponding to emotions such as joy, sadness, anger, and surprise can be extracted.
Thereafter, an emoticon corresponding to the emotion information is selected from the emoticon database (S240). At this time, the emoticons database can store emoticons for each category corresponding to the emotion information.
3 is a flowchart illustrating a learning process of a learning model according to an embodiment of the present invention.
Referring to FIG. 3, the learning process of the learning model according to the embodiment of the present invention inputs learning data composed of text data and emoticons (S310). At this time, learning data including emoticons corresponding to various emotional information may be input so that the learning model can learn various data.
Thereafter, the text data included in the learning data is morpheme-analyzed (S320). In other words, the texts constituting the sentence can be divided into morphemes, which is the smallest unit of words. For example, the word 'hot water' can be morphed into 'hot' and 'water' meaning 'hot.'
Thereafter, unique features are extracted from the result of the morphological analysis (S330), and unique features are matched with the emotion information according to the emoticons (S340). Accordingly, the emotion information can be extracted based on the learned and matched information in which the specific inherent characteristic is input to the learning model, and the emoticon can be selected.
When the unique feature is matched according to the emoticons, the user may analyze the case of deleting the emoticons, and later learn that the deleted emoticons in the sentences having the same inherent characteristics are not matched.
4 is a diagram illustrating a process of generating a learning model according to an embodiment of the present invention.
Referring to FIG. 4, in the process of generating a learning model according to an embodiment of the present invention, learning is performed using learning data 410-1 to 410-N, and a learning model is generated based on the result have.
At this time, the learning data 410-1 to 410-N may be composed of text data capable of morphological analysis and emoticons corresponding to text data.
At this time, by analyzing the morpheme of the text data and extracting the characteristic feature, the characteristic feature information is matched with the emotion information corresponding to the emoticon, so that the matching emotion information can be extracted when the characteristic feature is actually input to the learning model.
At this time, the learning model can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that it can be implemented by a similar method or another approach.
5 is a block diagram illustrating an automatic interpretation apparatus using emoticons according to an embodiment of the present invention.
5, an
The text
For example, it can be assumed that a user corresponding to a speaker interacts with an automatic transponder using a personal computer (PC) or a mobile device as an intermediary. Therefore, it is possible to recognize the speech signal of the speaker with the automatic translator, acquire the voice data, and translate the voice data to generate text data.
Further, the text data may correspond to data generated by text by translating the speech data into a language corresponding to the other party of the speaker. For example, supposing that the speaker uses English and the other uses Korean, he can acquire voice data in English, translate it into Korean, and generate text data in Korean.
At this time, the language setting for the voice data and the language setting for the text data can be set by the user. In addition, the language can be set using a language database included in the existing automatic interpretation technology.
At this time, functions of speech recognition and translation can use existing established techniques.
The
Conventional automatic interpretation technology has difficulty to transmit the intended meaning completely because it output synthesized voice or simply convey the recognized sentence. In other words, although the contents of the speaker's utterances can be conveyed in the form of sentences, the intention through the speaker's feelings or words was not transmitted. As a result, the contents intended by the speaker are not properly transmitted to the other party, which may cause misunderstanding.
Accordingly, in the present invention, the emoticon corresponding to the speaker's feelings can be transmitted along with the uttered contents, so that the emotions and feelings of the speaker can be felt by the other party.
At this time, the characteristic feature of the text data can be extracted from the result of morpheme analysis of the text data.
In this case, the morpheme analysis may correspond to the process of dividing the sentence uttered by the speaker into morpheme units, which is the smallest unit of words. For example, the word 'story book' can be morphed into 'story' and 'book'.
Therefore, it is possible to divide sentences uttered by the speaker into morpheme units, and then extract unique features based on the divided morphemes.
At this time, the inherent characteristic may include the form of the sentence, the tone of the sentence or the atmosphere. For example, when a speaker analyzes a sentence that is spoken, if the sentence includes a morpheme that appears in the sentence, it can be judged that the sentence has a unique characteristic called a sentence.
At this time, the emotion information can be extracted by inputting the unique characteristic into the learning model for emotion analysis. For example, emotion information corresponding to various unique features that can be extracted based on the morpheme may be matched, and matching specific emotion information may be found and provided when specific intrinsic features are input. In addition, there may be a learning model using various methods in addition to a method of matching and extracting unique features and emotion information.
At this time, among the plurality of emoticons stored in the emoticons database, any one emoticon corresponding to the emotion information can be selected as an emoticon corresponding to the text data.
At this time, the learning model can be generated based on at least one learning data composed of text data and emoticons. That is, learning can be performed using the characteristic features extracted when the text data included in the learning data is morpheme analyzed and the emotion information corresponding to the emoticons. In this way, learning using more learning data may improve the accuracy of the learning model.
In addition, the learning model can be implemented by a machine learning method such as SVM (Support Vector Machined) and DNN (Deep Neural Network), which is an example, so that it can be implemented by a similar method or another approach.
At this time, the
The
At this time, it is possible to judge whether or not the emoticon deletion input is generated from the speaker.
At this time, if the emoticon delete input occurs, the emoticon can be deleted from the output data and output.
For example, when the emoticon combined with text data is not the emoticon the speaker intended, the speaker can delete the emoticon using an input tool such as a touch screen.
At this time, the output data may be output corresponding to at least one of the text and the voice. That is, the contents of the output data can be outputted as text through the screen or the contents of the output data can be outputted as a voice by using a synthesized sound. It is also possible to output text and audio simultaneously.
The learning
By using the automatic interpreting
In addition, accuracy can be continuously improved by increasing the accuracy of emoticon determination through simple interaction with a user using an input tool such as a touch screen.
In addition, the
6 is a block diagram showing an example of the emoticon selecting unit shown in FIG.
Referring to FIG. 6, the
The intrinsic
The emotion
At this time, the
7 is a view showing an output screen of an automatic interpretation apparatus according to an embodiment of the present invention.
Referring to FIG. 7, an
At this time, the
At this time, since the
At this time, the
Therefore, if the speaker directly confirms the
At this time, when the emoticon delete
As described above, the automatic interpreting method using the emoticon according to the present invention and the apparatus using the same can be applied to the configuration and method of the embodiments described above in a limited manner, All or some of the embodiments may be selectively combined.
410-1 to 410-N: learning data 420: learning model
500: Automatic interpretation device 510: Text data generation unit
520: emoticone selection unit 530: output unit
540: emoticons database 550: learning model update unit
610: Intrinsic feature extraction unit 620: Emotion information extraction unit
630: Learning Model 710: Output Screen
720: output data 730: text data
740, 741: Emoticon 750: Emoticon delete button
Claims (1)
Analyzing the text data to extract emotion information;
Selecting an emoticon corresponding to the emotion information; And
Outputting the output data obtained by combining the text data and the emoticons to the other party
The method of claim 1,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150072656A KR20160138613A (en) | 2015-05-26 | 2015-05-26 | Method for auto interpreting using emoticon and apparatus using the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150072656A KR20160138613A (en) | 2015-05-26 | 2015-05-26 | Method for auto interpreting using emoticon and apparatus using the same |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20160138613A true KR20160138613A (en) | 2016-12-06 |
Family
ID=57576554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150072656A KR20160138613A (en) | 2015-05-26 | 2015-05-26 | Method for auto interpreting using emoticon and apparatus using the same |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20160138613A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200036188A (en) * | 2018-09-28 | 2020-04-07 | 주식회사 솔루게이트 | Virtual Counseling System and counseling method using the same |
KR20210020977A (en) * | 2018-09-28 | 2021-02-24 | 주식회사 솔루게이트 | Virtual Counseling System and counseling method using the same |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100883352B1 (en) | 2006-11-21 | 2009-02-11 | 한국전자통신연구원 | Method for expressing emotion and intention in remote interaction and Real emoticon system therefor |
-
2015
- 2015-05-26 KR KR1020150072656A patent/KR20160138613A/en unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100883352B1 (en) | 2006-11-21 | 2009-02-11 | 한국전자통신연구원 | Method for expressing emotion and intention in remote interaction and Real emoticon system therefor |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200036188A (en) * | 2018-09-28 | 2020-04-07 | 주식회사 솔루게이트 | Virtual Counseling System and counseling method using the same |
KR20210020977A (en) * | 2018-09-28 | 2021-02-24 | 주식회사 솔루게이트 | Virtual Counseling System and counseling method using the same |
US11837251B2 (en) | 2018-09-28 | 2023-12-05 | Solugate Inc. | Virtual counseling system and counseling method using the same |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11514886B2 (en) | Emotion classification information-based text-to-speech (TTS) method and apparatus | |
KR102371188B1 (en) | Apparatus and method for speech recognition, and electronic device | |
CN105895103B (en) | Voice recognition method and device | |
JP6251958B2 (en) | Utterance analysis device, voice dialogue control device, method, and program | |
US9070363B2 (en) | Speech translation with back-channeling cues | |
US9484034B2 (en) | Voice conversation support apparatus, voice conversation support method, and computer readable medium | |
KR102191425B1 (en) | Apparatus and method for learning foreign language based on interactive character | |
US20170199867A1 (en) | Dialogue control system and dialogue control method | |
WO2017127296A1 (en) | Analyzing textual data | |
CN104166462A (en) | Input method and system for characters | |
CN110910903B (en) | Speech emotion recognition method, device, equipment and computer readable storage medium | |
WO2019075406A1 (en) | Reading level based text simplification | |
KR101534413B1 (en) | Method and apparatus for providing counseling dialogue using counseling information | |
US20150254238A1 (en) | System and Methods for Maintaining Speech-To-Speech Translation in the Field | |
CN116821290A (en) | Multitasking dialogue-oriented large language model training method and interaction method | |
KR20100068965A (en) | Automatic interpretation apparatus and its method | |
CN115186080A (en) | Intelligent question-answering data processing method, system, computer equipment and medium | |
KR20160138613A (en) | Method for auto interpreting using emoticon and apparatus using the same | |
KR100593589B1 (en) | Multilingual Interpretation / Learning System Using Speech Recognition | |
CN112818096A (en) | Dialog generating method and device | |
JPWO2018198807A1 (en) | Translation equipment | |
CN110908631A (en) | Emotion interaction method, device, equipment and computer readable storage medium | |
KR20210037857A (en) | Realistic AI-based voice assistant system using relationship setting | |
Reddy et al. | Indian sign language generation from live audio or text for tamil | |
JP6538399B2 (en) | Voice processing apparatus, voice processing method and program |