CN112819664A - Apparatus for learning foreign language and method for providing foreign language learning service using the same - Google Patents

Apparatus for learning foreign language and method for providing foreign language learning service using the same Download PDF

Info

Publication number
CN112819664A
CN112819664A CN202011153817.4A CN202011153817A CN112819664A CN 112819664 A CN112819664 A CN 112819664A CN 202011153817 A CN202011153817 A CN 202011153817A CN 112819664 A CN112819664 A CN 112819664A
Authority
CN
China
Prior art keywords
learner
conversation
sentence
utterance
foreign language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011153817.4A
Other languages
Chinese (zh)
Inventor
朴悊淳
姜命秀
姜锡台
金昶殷
李春植
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG CNS Co Ltd
Original Assignee
LG CNS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG CNS Co Ltd filed Critical LG CNS Co Ltd
Publication of CN112819664A publication Critical patent/CN112819664A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An apparatus for learning a foreign language and a method of providing a foreign language learning service using the same. The present invention relates to a foreign language learning apparatus and a method for providing a foreign language learning service by using the same. The foreign language learning apparatus may include: an interface module which receives utterance voice information of a learner from a voice Artificial Intelligence (AI) device, performs voice synthesis, and transmits the synthesized utterance voice information; a learner utterance analysis module that stores and analyzes a log of utterance history; an evaluation module that calculates a score of an evaluation index through a conversation level calculation algorithm for the utterance voice information of the learner, and evaluates a conversation level; and a learning conversation module that stores the scene model, and determines an answer utterance sentence of the virtual speaker according to an intention of the utterance sentence of the learner and a conversation level of the utterance voice information of the learner estimated in the estimation module, and advances a conversation flow during the learning conversation between the learner and the virtual speaker.

Description

Apparatus for learning foreign language and method for providing foreign language learning service using the same
Cross Reference to Related Applications
This application claims priority and benefit from korean patent application No. 10-2019-.
Technical Field
The present invention relates to an apparatus for learning a foreign language, which enables a user to learn a foreign language conversation by talking with a virtual speaker, and a method of providing a foreign language learning service using the same.
Background
With the advent of the international age, the demand and supply of education for various foreign languages including english have been increased, and research and development on education methods have been increasingly active. In particular, as the number of people who start learning in the early stage and the number of people who live in foreign countries increase, and the number of foreigners who live in korea increases, the importance of language skills (particularly, conversational skills) also increases. Conventional instillation education has difficulty in improving language skills, and an effective education method is required in order to improve the ability to actually conduct a conversation.
Recently, with the development of computer technology, a number of language education methods using computers and networks have been developed, but it is difficult to perform effective learning because it is difficult for users to learn with computers alone with respect to conversation practices. Further, some language learning service companies provide a conversational learning service based on an application program of the application AI, but this is only a level of learning words, brief sentences, or brief one-way conversations stored in the application program, or a level of learning a language by imitating their pronunciation, and conversational learning cannot be performed by free conversation similar to human conversation due to a limitation of a technical implementation level. Thus, until now, if people wish to have conversational lessons through free conversation, most of them rely on offline lessons, tutoring by native language speakers, telephone/video english lessons, and so on. However, in this type of conversation classroom, since people learn the language only at the corresponding time, it is impossible to perform sufficient conversation exercises, and since the price for performing sufficient conversation exercises is high, there is a limit in economic burden. In addition, there is a problem in that it is difficult to continuously manage the quality of physical study because the quality of teachers and work loss rate are high.
Disclosure of Invention
The present application is conceived to provide an apparatus for learning a foreign language for enabling a user to learn a foreign language conversation through a free talk with a virtual speaker and a method of providing a foreign language learning service by using the same.
The present application is conceived to provide an apparatus for learning a foreign language and a method of providing a foreign language learning service by using the same, in which a plurality of conversation streams are branched according to the intention of a spoken sentence spoken by a learner, the level of a conversation, the type of the conversation, the function of the conversation, etc. to perform a natural conversation.
The present application is conceived to provide an apparatus for learning a foreign language and a method of providing a foreign language learning service by using the same, in which a learner's conversation level is accurately diagnosed by providing a numerical analysis of the learner's conversation level, and an appropriate conversation stream is generated according to the conversation level, or an insufficient expression is enabled to be reviewed or additionally learned by the learner.
An exemplary embodiment of the present application provides a foreign language learning apparatus that provides a foreign language learning service to a learner through a conversation with a virtual speaker, the foreign language learning apparatus including: an interface module which receives utterance voice information of a learner from a voice Artificial Intelligence (AI) device performing voice recognition (utterance to text (STT)) on an utterance voice received from the learner, and performs voice synthesis (text to utterance (TTS)) on the utterance voice information of a virtual speaker corresponding to the utterance voice information of the learner received through the AI device, and transmits the synthesized utterance voice information; a learner utterance analysis module that stores and analyzes a log of an utterance history related to utterance voice information of a learner; an evaluation module that calculates a score of each of one or more evaluation indexes through a conversation level calculation algorithm with respect to utterance voice information of a learner, and evaluates a conversation level; and a learning conversation module that stores a scene model for learning the conversation flow, and determines an answer utterance sentence of the virtual speaker based on the scene model according to an intention of the utterance sentence of the learner and a conversation level of utterance voice information of the learner estimated in the estimation module during the learning conversation between the learner and the virtual speaker, and advances the conversation flow.
Another exemplary embodiment of the present application provides a method of providing a foreign language learning service to a learner through a conversation with a virtual speaker by a foreign language learning apparatus, the method including: receiving a selection input for any one of a plurality of preset foreign language learning topics from a learner through a learner terminal; displaying a script sentence included in at least one scene model corresponding to the selected topic on a screen through a learner terminal or providing the script sentence to the learner in an utterance voice form of a virtual speaker; receiving spoken voice information spoken by the learner in response to the script statement; comparing the learner's spoken voice information with an answer set preset in response to the scene model, calculating a score of each of the one or more evaluation indexes through a conversation-level calculation algorithm, and calculating a conversation-level score of the learner's spoken voice information; and determining an answer utterance sentence of the virtual speaker according to an intention of the spoken sentence corresponding to the utterance speech information of the learner based on the scene model and the assessed conversation-level score of the utterance speech information of the learner, and providing the determined answer utterance sentence to the learner.
In addition, not all features of the present invention are set forth in the claims. The various features of the present invention, as well as the advantages and benefits attendant thereto, will be more clearly understood by reference to the following specific exemplary embodiments.
According to the foreign language learning apparatus and the method of providing the foreign language learning service by using the same according to the exemplary embodiments of the present invention, the foreign language conversation learning can be performed through the free talk with the virtual speaker, thereby efficiently performing the foreign language conversation learning without being limited by time and space.
According to the foreign language learning apparatus and the method of providing the foreign language learning service by using the same according to the exemplary embodiments of the present invention, a plurality of conversation streams are branched according to respective conversation types, conversation functions, or subjects based on the intention and the conversation level of a spoken sentence of a learner, so that the learner can freely converse with a virtual speaker. Therefore, the learner can have a conversation as if the learner is actually talking with the native language speaker, thereby improving the effect of conversational learning.
According to the foreign language learning apparatus and the method of providing the foreign language learning service by using the same according to the exemplary embodiment of the present invention, it is possible to perform numerical analysis and diagnosis of a learner's conversation level by using AI. Therefore, free-talk or conversational learning can be performed according to the learner's level, or customized learning contents can be recommended.
Drawings
Fig. 1 is a schematic view illustrating a foreign language learning system according to an exemplary embodiment of the present invention.
Fig. 2 is a block diagram illustrating a foreign language learning apparatus according to an exemplary embodiment of the present invention.
Fig. 3 is a schematic view illustrating a sentence similarity algorithm of a foreign language learning apparatus according to an exemplary embodiment of the present invention.
Fig. 4 is a diagram illustrating selection of an additional recommended sentence from a learning conversation of a gooey learner in the foreign language learning apparatus according to an exemplary embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating a determination of an advancing path of a scene model according to an intention of a spoken sentence of a foreign language learning apparatus according to an exemplary embodiment of the present invention.
Fig. 6 is a schematic view illustrating a method of designing a scene model by using a scene design module of a foreign language learning apparatus according to an exemplary embodiment of the present invention.
Fig. 7 (a) is a schematic view illustrating a scene model designed by setting various functional nodes required for designing a talk stream using a scene design module of a foreign language learning device according to an exemplary embodiment of the present invention.
Fig. 7 (B) is a schematic diagram illustrating a case where a new scene model is configured by connecting the scene model of fig. 7 (a) with content from an external content server according to an exemplary embodiment of the present invention.
Fig. 8 and 9 are flowcharts illustrating a method of learning a foreign language according to an exemplary embodiment of the present invention.
Detailed Description
Hereinafter, exemplary embodiments will be described in detail so that those skilled in the art can easily implement the present invention with reference to the accompanying drawings. However, in describing in detail exemplary embodiments of the present invention, a detailed description of known functions and configurations incorporated herein is omitted to avoid unnecessarily obscuring the subject matter of the present invention. Further, throughout the drawings, portions having similar functions or actions are denoted by the same reference numerals.
Throughout the specification, when an element is described as being "coupled" to another element, the element may be "directly coupled" to the other element or "indirectly coupled" to the other element through a third element. Furthermore, unless explicitly described to the contrary, the word "comprise" and variations such as "comprises" or "comprising" will be understood to imply the inclusion of stated elements but not the exclusion of any other elements. In addition, the terms "device", "instrument", and "module" described in the specification denote units for processing at least one of functions and operations, and can be implemented by hardware components or software components, and a combination thereof.
Fig. 1 is a schematic view illustrating a foreign language learning system according to an exemplary embodiment of the present invention. Referring to fig. 1, a foreign language learning system according to an exemplary embodiment of the present invention may include a learner's terminal 1, a voice Artificial Intelligence (AI) device 10, and a foreign language learning device 100.
Hereinafter, a foreign language learning system according to an exemplary embodiment of the present invention will be described with reference to fig. 1.
The learner terminal 1 may execute various applications including the foreign language learning application and provide the running application to the user by visually or aurally displaying the running application. The learner terminal 1 may include a display unit for visually displaying a foreign language learning application program or the like, and may include an input unit receiving an input of a user, a communication unit communicating with a server or another terminal through a network, a memory storing at least one program and related data, and a process of executing the program. Further, the learner terminal 1 may further include a microphone to recognize a voice spoken by the user, a speaker to output a sound provided from the foreign language learning application, and the like.
As shown, the learner terminal 1 may be a mobile terminal such as a smart phone and a tablet PC, and may further include a stationary device such as a desktop computer according to an exemplary embodiment. Specifically, the learner terminal 1 may include a mobile phone, a smart phone, a voice recognition speaker, a notebook computer (laptop), a digital broadcasting terminal, a Personal Digital Assistant (PDA), a Portable Multimedia Player (PMP), a tablet Personal Computer (PC), a tablet PC, an ultrabook, a wearable device (e.g., a watch type terminal (smart watch), a glasses type terminal (smart glasses), and a Head Mounted Display (HMD)), etc.
Meanwhile, the learner terminal 1 may be connected with the foreign language learning apparatus 100 through a communication network. Here, the communication network may include a wired network and a wireless network, and particularly, the communication network may include various networks such as a Local Area Network (LAN), a Metropolitan Area Network (MAN), and a Wide Area Network (WAN). The communication network may also include the well-known World Wide Web (WWW). However, the communication network according to the present invention is not limited to the listed networks, and may include a well-known wireless data network, a well-known telephone network, a well-known wired or wireless television network, and the like.
The foreign language learning apparatus 100 may be a single or a plurality of service providing servers (or service providers), and according to an exemplary embodiment, the foreign language learning apparatus 100 may be connected with the learner terminal 1 through a foreign language learning application that may be installed in the learner terminal 1. The foreign language learning apparatus 100 may provide a foreign language learning service in response to a request from the learner terminal 1 and may provide the foreign language learning service in the form of a conversation with a virtual speaker so that the learner can naturally improve the foreign language conversation skill. Here, the foreign language provided by the foreign language learning apparatus 100 may be english, japanese, chinese, etc., and the foreign language provided by the foreign language learning apparatus 100 is selected in various ways according to an exemplary embodiment.
The foreign language learning apparatus 100 can recognize a learner's voice (voice to text (STT)) received from the learner terminal 1 and convert the voice into a text in order to provide a foreign language learning service in the form that the learner talks with a virtual speaker. Then, the foreign language learning apparatus 100 may analyze the learner's intention from the converted characters, and voice-synthesize (text-to-speech (TTS)) the virtual speaker requires a script to speak with the learner and provide the synthesized voice to the learner terminal 1.
The foreign language learning apparatus 100 or the learner terminal 1 may include a voice AI module directly executing STT and TTS. However, according to an exemplary embodiment, a separate voice AI device 10 may be implemented to perform STT and TTS. Here, the voice AI device 10 may also be an external server that provides a voice AI analysis/processing function.
The voice AI device 10 may support an Application Program Interface (API) for STT and TTS. Therefore, when the learner speaks, the learner terminal 1 may provide the voice of the learner to the voice AI device 10 and request the conversion of the voice of the learner into text. Then, the voice AI apparatus 10 may provide the foreign language learning apparatus 100 with voice information corresponding to the voice of the learner.
The foreign language learning apparatus 100 can determine an answer utterance sentence to be answered by the virtual speaker in response to the utterance voice information of the learner. Then, the foreign language learning apparatus 100 can provide the voice AI apparatus 10 with the answer utterance sentence and request the answer utterance sentence to be synthesized with speech. In this case, the voice AI device 10 may synthesize a voice corresponding to the answer utterance sentence, and provide the synthesized voice to the learner terminal 1.
The voice AI device 10 may generate voice data corresponding to text by a method of synthesizing a corpus recorded by a voice operator or the like. However, according to exemplary embodiments, artificial intelligence such as machine learning or deep learning may also be utilized. That is, the voice AI device 10 may calculate a data weight of a phoneme by machine learning or deep learning a text and voice data corresponding to the text and perform voice synthesis using the data weight. According to an exemplary embodiment, it may also be implemented by further including a cache or the like in the voice AI device 10 to minimize the voice synthesis request.
Fig. 2 is a block diagram illustrating a foreign language learning apparatus 100 according to an exemplary embodiment of the present invention. Referring to fig. 2, the foreign language learning apparatus 100 according to an exemplary embodiment of the present invention may include an interface module 110, a learner utterance analysis module 120, an evaluation module 130, a learning conversation module 140, a learning function module 150, and a scene design module 160.
Hereinafter, a foreign language learning apparatus 100 according to an exemplary embodiment of the present invention will be described with reference to fig. 2.
The interface module 110 may be configured to link the foreign language learning device 100 to the learner terminal 1 and the voice AI device 10. The interface module 110 may input a voice spoken by the learner into the learner terminal 1 or input an answer utterance to be spoken by the virtual speaker into the learner terminal 1. Accordingly, the interface module 110 may control the voice AI device 10 to perform STT or TTS, intention analysis, etc. to provide a foreign language learning service.
Specifically, when the learner speaks, the interface module 110 receives spoken voice information including text obtained by speech recognition (STT) and converting the spoken voice of the learner from the voice AI device 10 inside or outside the foreign language learning device 100. Further, when the foreign language learning apparatus 100 desires to select an answer utterance sentence answered by a virtual speaker and provide the selected answer utterance sentence to the learner, the interface module 110 may transmit the answer utterance sentence in text form to the voice AI device 10. In this case, the voice AI device 10 may answer the utterance sentence in speech synthesis (TTS) and transmit the answer utterance sentence in speech to the learner terminal 1.
Meanwhile, the interface module 110 may be implemented to link with various devices in addition to the voice AI device 10. For example, the interface module 110 may be configured to link with a content providing server (not shown). In this case, the interface module 110 may receive various additional learning contents, such as scene-conversation contents provided in conjunction with the scene model and a set of answers to the scene-conversation contents, from the content-providing server. In addition, the interface module 110 may also be configured to link with another AI server and an open Application Programming Interface (API) server.
The learner voice analysis module 120 may store the learner's utterance voice information input through the interface module 110, and generate, store, and analyze a log of utterance history related to the utterance voice information, and the like. Here, the learner utterance analysis module 120 may generate utterance voice information and an utterance history by distinguishing each learner from each scene model executed by each learner.
The evaluation module 130 may evaluate the learner's conversational skills by using a conversational level calculation algorithm. That is, the evaluation module 130 may apply a conversation-level calculation algorithm to the utterance-speech information of the learner and calculate an evaluation index for each of the evaluation indexes included in the conversation-level calculation algorithm. Thereafter, the assessment module 130 can calculate a session level score by using the score for each assessment index. Here, the evaluation module 130 may evaluate a conversation level of the utterance voice information about each sentence spoken by the learner or evaluate a conversation level for the whole conversational sentence of the learner.
In particular, the conversational level calculation algorithms used in the evaluation module 130 may include sentence similarity algorithms, context similarity algorithms, utterance fluency algorithms, and the like.
First, a sentence similarity algorithm may evaluate sentence similarity between the learner's spoken speech information and the answer set corresponding to each script sentence of the stored scene model. The answer set may include a representative sentence as an answer to each script sentence and a plurality of recommended sentences suitable as substitute answers. Thus, the sentence similarity algorithm can determine the degree of similarity of the learner's utterance speech information to the preset answer set in the scene model and evaluate the learner's conversation level.
According to an exemplary embodiment, in order to apply the sentence similarity algorithm, first, a similarity determination model trained by a method of converting semantic similarities between words included in an answer set corresponding to each script sentence of a scene model into vector values and converting the vector values into numerical values may be generated. As shown in FIG. 3, in an exemplary embodiment, a similarity determination model trained by applying the word2vec method to the answer set may be generated. When the similarity determination model is applied, the degree of similarity of the speech voice information of the learner's utterance to the preset answer set can be numerically expressed. Accordingly, the evaluation model 130 may apply the learner's utterance speech information to the similarity determination model and calculate a sentence similarity score of the learner's utterance speech information through a sentence similarity vector distance value between the learner's utterance speech information and the answer set. In an exemplary embodiment, after the sentence similarity score reference value is preset, when the sentence similarity score is equal to or greater than the reference value, the evaluation model 130 may determine that the utterance speech information of the learner is appropriate, and when the sentence similarity score is less than the reference value, the evaluation model 130 may determine that the utterance speech information of the learner is inappropriate. In an exemplary embodiment, after a plurality of sentence similarity degree level interval reference values are preset, the evaluation model 130 may check to which level interval the sentence similarity degree score corresponds and calculate the level of the sentence similarity degree evaluation index.
The contextual similarity algorithm may evaluate the contextual similarity of the conversation between the learner's spoken voice information and the back and forth question and answer between the learner and the virtual speaker. That is, in the case where the virtual speaker and the learner have a conversation, the learner does not know the answer set preset in the scene model, and thus the learner can make a random answer that is not included in the answer set. In this case, the learner's utterance content is not included in the answer set, but may be appropriate for the pre-and post-conversation context. Therefore, in order to accurately evaluate the learner's session level, a contextual similarity needs to be determined.
The context similarity algorithm may generate the context determination model by learning the context determination model in a method of connecting words in a script or a conversation sentence explaining a situation or a subject in the scene model, words used in a previous script or conversation sentence providing guidance to a learner, and words in a sentence uttered in response to the word learner, and converting semantic similarities between the words into vector values and converting the vector values into numerical values. The assessment model 130 may then calculate a context similarity score by applying the learner's utterance speech information to the context determination model. In an exemplary embodiment, after the context similarity score reference value is preset, when the context similarity score is equal to or greater than the reference value, the evaluation model 130 may determine to maintain the conversation context, and when the context similarity score is less than the reference value, the evaluation model 130 may determine that the utterance voice information is not suitable for the conversation context. Further, it is also possible for example embodiments to calculate the level of the context similarity evaluation index by checking which level interval the context similarity score corresponds to after a plurality of context similarity level interval reference values are preset.
The speech fluency algorithm may determine the speech fluency of the learner's spoken voice information by calculating the speech fluency of the learner's spoken voice information as a score including indicators of pronunciation accuracy, speech speed, and the like.
That is, the utterance fluency algorithm may generate an utterance fluency model by learning pronunciation information, utterance speed information, and the like of reference utterance voices of a single or a plurality of native language speakers. Then, information about the learner's pronunciation information, the speaking speed, etc., extracted from the learner's spoken voice information may be applied to the spoken fluency model. In this case, the index value of the information on the pronunciation information, the speaking speed, and the like of the learner may be calculated by comparing the information on the pronunciation information, the speaking speed, and the like of the learner with the pronunciation information, the speaking speed, and the like of the reference spoken voice. By doing so, the learner's utterance fluency score may be calculated. In an exemplary embodiment, the utterance fluency score may be calculated by comparing the pronunciation speech waveform of the learner with the pronunciation speech waveform of the reference utterance speech.
In addition to the sentence similarity algorithm, the context similarity algorithm, and the utterance fluency algorithm, other assessment index algorithms for assessing the learner's level of conversation may also be included in the assessment module 130.
Meanwhile, the evaluation module 130 may generate an evaluation model based on a supervised learning method by inputting an evaluation index score calculated via a sentence similarity algorithm, a context similarity algorithm, and a speech fluency algorithm. The assessment module 130 can then calculate a session level score for the learner by inputting the learner's spoken voice information into the assessment model. That is, the respective session level scores may be calculated based on each evaluation index score by using a machine learning or deep learning method. Here, in order to generate the evaluation model, existing evaluation data obtained by performing evaluation based on the evaluation index score of each english teacher during execution of supervised learning may be utilized. The score calculated in the evaluation module 130 may be displayed in percentage on the screen of the session learning application so that the learner immediately checks the score. However, when the score is displayed to interfere with the learning enthusiasm of the learner, the score may not be displayed separately.
According to an exemplary embodiment, as shown in fig. 4, the evaluation module 130 may additionally set recommended sentences corresponding to each script sentence based on the learner's answer to the corresponding answer set, in addition to answer sets preset in the scene model. That is, the evaluation module 130 may select excellent learners who have obtained a preset score or higher from the learning histories of a plurality of learners stored in the learner utterance analysis module 120. Then, the utterance voice information stored in the history of the excellent learner may be selected as a recommended sentence corresponding to each script sentence of the scene model. Here, the selected recommendation sentence may be additionally included in an answer set corresponding to each script sentence of the scenario model. In this way, the data of the answer set can be expanded, and besides the textbook answers, the answers actually used in the oral corpus often used in the times can be sufficiently reflected. That is, the learner's session level can be more accurately evaluated.
The learning-conversation module 140 may store a preset scene model for the foreign language learning-conversation stream. Further, the learning-conversation module 140 may advance the conversation flow based on the context model during the learning conversation between the learner and the virtual speaker. In the existing conversation learning application, a plurality of preset scenes (e.g., a1-B1, a2-B2, A3-B3, or a1-B1-a2, a2-B2-A3 in fig. 5 (a)) are stored to enable only a short conversation flow in a short sentence or in one direction, and the learner learns the conversation by using the preset sound scenes, so that it is difficult to implement a natural learning conversation.
However, the scene model stored in the learning-conversation module 140 may be configured in units such as a conversation type, a conversation function, a topic, or a task, and configured such that the units are connected to each other according to the learner's selection, and the conversation is expanded or changed in various directions to advance a conversation flow. Thus, similar to conversational lessons with native language speakers, learners can engage in more natural conversational learning with virtual speakers.
Fig. 5 (B) illustrates an exemplary embodiment of the extended scene model. In the present exemplary embodiment, the scene model is configured in units of conversation types and topics, but in addition to this, the scene model may be configured in another necessary reference unit. Further, the scene model may be preset and stored, but may also be designed by the scene design module 160 included in the foreign language learning apparatus 100.
In advancing the conversation flow based on the scene model, the learning-conversation module 140 may determine an answering utterance voice sentence of the virtual speaker and a conversation flow direction according to the intention of the learner to speak the sentence and the conversation level of the utterance voice information evaluated in the evaluation module 130, and advance the conversation flow. For example, in an exemplary embodiment, in the case where the learner's conversation level is 75%, the learning conversation module 140 may determine an answer utterance sentence of a virtual speaker corresponding to the next situation in the scenario model to continue advancing the conversation flow. However, in the case where the learner's conversation level is in the section of 50% to 75%, the learning conversation module 140 may extract keywords in the recommended sentences and the sentences spoken by the learner, and determine a re-question changed in another form as an answer utterance sentence of the virtual speaker to guide the learner to speak an appropriate answer similar to the answer set. Further, in the case where the learner's conversation level is less than 50%, the learning conversation module 140 may determine an answer utterance sentence of the virtual speaker so that the learner can learn a recommended sentence suitable for the current conversation situation. For example, the learning conversation module 140 may determine the answer utterance sentence of the virtual speaker in a form of providing a prompt, providing a complete sentence of a recommended sentence, giving an appropriate recommended sentence in voice and then providing guidance, so that the learner repeatedly repeats the appropriate recommended sentence, providing a sentence pattern/expression learning, and the like. In this way, the learning-conversation module 140 can guide learners who have difficulty in talking to the corresponding purpose to be fully aware of the appropriate sentence first.
The learning function module 150 may provide the learner terminal 1 with learning contents according to the learner's level. That is, the learning function module 150 may be configured to allow the learner to review or additionally learn the learning content including words, sentences, expressions, pronunciations, etc. required for learning the foreign language according to the conversation level of each spoken sentence or the entire conversation sentence of the learner evaluated in the evaluation module 130 during the learning conversation stream or after the learning conversation stream is terminated.
In an exemplary embodiment, the learning function module 150 may be implemented to make the learner fully aware of words, sentences, pronunciations, expressions, etc. related to each script sentence and an answer set of the script sentence, which is determined that the conversational level score of the learner is below a preset value. For example, the learning function module 150 may play a pronunciation of the corresponding learning content through the virtual speaker and request the learner to repeat the pronunciation repeatedly. Otherwise, the learning function module 150 may play the learner's own spoken voice and request the learner to speak again. In addition to the foregoing methods, the learning function module 150 may perform review or additional learning functions for the learner in various ways.
The scene design module 160 can design, combine, change, and manage a scene model suitable for a conversation flow between a learner and a virtual speaker by using a conversation type or a conversation function node. The context model may be in units of conversation type, topic, task, or other classifications that learners and virtual speakers can speak, including each script statement and a preset set of answers corresponding to the script statement. The answer set may include a representative sentence that is an answer to each script sentence and a plurality of recommended sentences that may be suitable as alternative answers. Further, the context model may be configured to branch into multiple forward paths for each node corresponding to a respective conversation type, conversation function, or other classification, depending on the intent of the sentence spoken by the learner.
In an exemplary embodiment, the scenario design module 160 may also provide a User Interface (UI) or user experience (UX) for designing a chat robot session to design a new scenario module. In an exemplary embodiment, a UI for designing a chat robot session may design a scene model by appropriately setting a plurality of required function nodes in a drag-and-drop method on a web canvas based on no coding and visually configuring a conversation flow of the scene model, and immediately reflect the designed scene model to a learning service. Furthermore, self-sufficient conversation topic can be conducted through the connecting conversation stream (i.e., the scene model).
Fig. 6 is a diagram illustrating a scenario design module 160 configured to design a chat robot session for a scenario model having nine conversation function node types according to an exemplary embodiment of the present invention.
In an exemplary embodiment, the nine talk function nodes may include a listen node 161, a talk node 162, a slot node 163, a carousel node 164, a hop node 165, a split node 166, a function node 167, an Application Program Interface (API) node 168, and a template node 169. Further, in another exemplary embodiment, the scenario design module 160 may also be configured to include more or less than nine talk function node types.
The listening node 161 may have a function configured to know a conversational intention of the learner from the spoken voice information of the learner obtained by performing analysis on the intention of the spoken sentence of the learner in the voice AI device 10. The speaking node 162 may be configured to cause the desired answer to be determined to be spoken in response to the learner's spoken sentence. The time slot node 163 may be configured to ask the question again to obtain additional information from the learner or to guide accurate answers from the learner, and the carousel node 164 may be configured to enable the learner to select an option from a plurality of options by providing the learner with a select view function. The jumping-node 165 may be configured to change/switch the talk stream by connecting different scene models, and the splitting node 166 may be configured to branch the talk stream depending on conditions.
Functional node 167 may be configured to process data and may also be encoded to implement the required functionality. The API node 167 can support connection with an external API and perform data parsing and connection mapping, and this configuration can facilitate connection of external learning content with a scene model.
In an exemplary embodiment, the API node 168 may be configured to display information in the form of a tree when a Universal Resource Locator (URL) parameter is set. The API node 168 can transmit and receive legacy data through the Restful API and can connect the system only by making simple settings during operation. The template node 169 is a functional node connected to individual voice AI devices 10 that perform voice recognition, voice synthesis, and intention analysis during a conversation with a learner to solve the problem that each voice AI device 10 supports a different template to smoothly conduct a learning conversation between the learner and a virtual speaker. Depending on the different voice AI devices 10, the template node may be configured to perform conversation learning by using templates corresponding to the different voice AI devices 10. Accordingly, the template node enables the foreign language learning apparatus 100 to design a scene model from different voice AI apparatuses so as to perform the same chat robot conversation flow without being affected. During scene design, the scene design module 160 according to an exemplary embodiment may design a scene model by setting nodes having desired functions according to a conversation flow on a web canvas according to a drag-and-drop method, inputting corresponding contents, and connecting the respective nodes.
The scenario design module 160 may be configured to design a new scenario model by applying additional learning content to each node of the scenario model. The scene design module 160 may also support connection with other AI servers, content providing servers, open API support servers, etc. through the interface module 110 and/or API nodes. In an exemplary embodiment, the scene design module 160 may design a scene model by connecting various contents provided from an external content providing server or a content market platform. For example, the scene design module 160 may design a scene model to perform a conversation flow by connecting weather information provided by a server of a weather bureau, traffic information provided by a server of a highway company, and the like.
Fig. 7 (a) is a schematic view illustrating a scene model designed by setting various functional nodes required for designing a talk stream using a scene design module of a foreign language learning device according to an exemplary embodiment of the present invention. Specifically, fig. 7 (a) illustrates an exemplary embodiment of a scene model in which various function nodes required for designing a conversation flow for greetings are designed, connected, and set in a drag-and-drop fashion on a web canvas of the scene design module 160. The same node type in fig. 7 (a) and 7 (B) is a node having the same function among nine functional node types, and T1 denotes a listening node, T2 denotes a splitting node, T3 denotes a speaking node, T4 denotes a functional node, T5 denotes a jumping node, T6 denotes an API node, T7 denotes a slot node, T8 denotes a carousel node, and T9 denotes a template node.
Fig. 7 (B) is a schematic diagram illustrating a case where a new scene model is configured by connecting the scene model designed in fig. 7 (a) with content from an external content server according to an exemplary embodiment of the present invention. Fig. 7 (B) represents an exemplary embodiment in which a new scene model related to a weather greeting is formed on a web canvas by combining weather content received from a weather bureau's server with the weather greeting in the scene model for a pre-stored greeting conversation stream. As shown in fig. 7 (B), the weather content received from the weather bureau's server can be combined into the scene model of the greeting conversation stream of fig. 7 (a). Task suggestions can then even be connected by configuring the weather greeting scene model on the Web canvas.
Fig. 8 and 9 are flowcharts illustrating a method of providing a foreign language learning service by using a foreign language learning apparatus according to an exemplary embodiment of the present invention.
Referring to fig. 8, the foreign language learning apparatus may design a scene model suitable for a conversation flow between a learner and a virtual speaker by using a node for each conversation type or conversation function (S10). The foreign language learning apparatus 100 may provide a scenario model design tool 160 for designing a scenario model, and may design a scenario model by using the scenario model design tool 160 to include each script sentence and a preset answer set corresponding to the script sentence in a node in units of a conversation type, a conversation function, or other classification in which a learner and a virtual speaker can converse. In an exemplary embodiment, the scenario model may be designed to branch into a plurality of forward paths for each node corresponding to a corresponding conversation type and conversation function according to an intention of a sentence spoken by a learner in response to a script sentence input. The answer set may include a representative sentence as an answer to each script sentence and a plurality of recommended sentences that may be suitable as alternative answers.
In an exemplary embodiment, the scenario model design tool 160 may provide a UI or UX for designing chat robot conversations for designing new scenario models. Here, the UI for designing a chat robot session may provide an interface capable of designing a scene model by: a plurality of required function nodes are properly set on a web canvas based on no coding by a drag-and-drop method, and a conversation flow of a scene model is visually configured and a designed scene model is immediately reflected to a learning service. Further, when the UI for designing the chat robot conversation is used, it is also possible to perform a natural and sufficient conversation topic by connecting conversation streams (i.e., scene models). A specific example of generating a scene model by using the scene model design tool 160 has been described above with reference to fig. 6 and 7, and thus a detailed description thereof will be omitted.
Meanwhile, the scene model design tool 160 may support connection with other AI servers, content providing servers, open API support servers, and the like. In an exemplary embodiment, the scenario model design tool 160 may design a scenario model by connecting various contents provided in an external content providing server or a content market platform. For example, the scene design module 160 may design a scene model to perform a conversation flow by connecting weather information provided by a server of a weather bureau, traffic information provided by a server of a highway company, and the like. Further, according to an exemplary embodiment, the scenario model may be provided from an external content providing server, a content marketplace platform, or the like, and in this case, operation S10 of designing the scenario model may be omitted.
The foreign language learning apparatus 100 may receive a selection input for any one of a plurality of preset foreign language learning subjects from the learner through the learner terminal 1 (S20). In an exemplary embodiment, the learner may select one of a plurality of menu items of a conversation type or other reference in addition to a desired foreign language learning topic, and the foreign language learning apparatus 100 may provide foreign language learning corresponding to a selection input of the learner.
Specifically, the foreign language learning apparatus 100 may display a script sentence included in at least one scene model corresponding to a topic selected by the learner terminal 1 on a screen or provide the script sentence to the learner in the form of an utterance voice of a virtual speaker (S30). The foreign language learning apparatus 100 may explain to the learner a situation in the scene model corresponding to the subject selected by the learner and request the learner to have a conversation with the virtual speaker according to the situation. The context model may include each script statement and a preset set of answers corresponding to the script statement in units of conversation type, topic, conversation, or other category in which learners and virtual speakers may converse. Further, the context model may be configured to branch into multiple forward paths for each node corresponding to a respective conversation type, conversation function, or other classification, depending on the learner's intent to speak the sentence. The answer set may include a representative sentence as an answer to each script sentence and a plurality of recommended sentences that may be suitable as substitute answers in place of the representative sentence.
The learner may speak in response to the script sentence displayed on the screen or provided through the virtual speaker with respect to at least one scene model corresponding to the selected subject, and the foreign language learning apparatus 100 may receive spoken voice information spoken by the learner in response to the script sentence (S40). According to an exemplary embodiment, the learner may receive, through voice AI device 10, spoken voice information as text corresponding to the voice spoken by the learner. That is, the voice AI device 10 may be provided to be linked between the foreign language learning device 100 and the learner terminal 1, and may recognize a speech spoken by a learner or synthesize an answer speech to be spoken by a virtual speaker in the voice AI device 10 by a speech recognition (STT) or speech synthesis (TTS) and transmit to the foreign language learning device 100 or the learner terminal 1. In an exemplary embodiment, when the learner speaks, the foreign language learning apparatus 100 may receive spoken voice information including a text obtained by performing speech recognition and conversion (STT) on the voice spoken by the learner from the internal or external voice AI apparatus 10 of the foreign language learning apparatus 100. Further, when the foreign language learning apparatus 100 desires to select an answer utterance sentence to be answered by the virtual speaker and provide the selected answer utterance sentence to the learner, the foreign language learning apparatus 100 may transmit the answer utterance sentence in text to the voice AI apparatus 10, and the voice AI apparatus 10 may speech-synthesize (TTS) the answer utterance sentence and transmit the synthesized answer utterance sentence in speech to the learner terminal 1.
Here, the foreign language learning apparatus 100 may store input utterance information of a learner, and generate, store, and analyze a log of utterance history related to utterance voice information, and the like. In this case, the foreign language learning apparatus 100 can generate utterance voice information and an utterance history by distinguishing each learner from each scene model executed by each learner.
Thereafter, the foreign language learning apparatus 100 may compare the learner's utterance voice information in response to the scene model with a preset answer set, and calculate a score for each evaluation index through a conversation-level calculation algorithm. That is, the foreign language learning apparatus 100 can apply the conversation-level calculation algorithm to the utterance voice information of the learner and calculate the score of each evaluation index included in the conversation-level calculation algorithm. Further, the foreign language learning apparatus 100 may estimate a conversation level of the utterance voice information of the learner from the score of each estimation index and calculate a conversation level score (S50), and the foreign language learning apparatus 100 may estimate the utterance voice information of each sentence spoken by the learner or estimate a conversation level of all the conversational sentences of the learner.
The conversation level calculation algorithms may include evaluation index algorithms such as sentence similarity algorithms, context similarity algorithms, and utterance fluency algorithms. However, each evaluation index algorithm has been described above, and thus a detailed description thereof will be omitted.
Meanwhile, in operation S50 of calculating the conversation-level score, an evaluation model may be generated based on a supervised learning method by inputting an evaluation index score calculated via a sentence similarity algorithm, a context similarity algorithm, and a speech fluency algorithm, and then the conversation-level score of the learner may be calculated by inputting the spoken voice information of the learner into the evaluation model. According to an exemplary embodiment, the foreign language learning apparatus 100 can calculate the conversation level score of the learner by including other evaluation index algorithms for evaluating the conversation level of the learner in addition to the sentence similarity algorithm, the context similarity algorithm, and the utterance fluency algorithm.
Further, in operation S50 of calculating the conversation-level score, an evaluation model generated based on a supervised learning method by inputting an evaluation index score calculated through a sentence similarity algorithm, a context similarity algorithm, and a speech fluency algorithm may be utilized, and then, the conversation-level score of the learner may be calculated by inputting the utterance voice information of the learner into the evaluation model. That is, the respective session level scores may be calculated based on each evaluation index score using a machine learning or deep learning method. Here, in order to generate the evaluation model, existing evaluation data obtained by performing evaluation based on the evaluation index score of each english teacher during execution of supervised learning may be utilized. The score calculated by the foreign language learning apparatus 100 may be displayed on the screen of the conversational learning application in percentage so that the learner immediately checks the score, and the score may not be displayed separately when the displayed score interferes with the learning enthusiasm of the learner.
According to an exemplary embodiment, the foreign language learning apparatus 100 may select an excellent learner, which has obtained a preset score or more, from among learning histories of a plurality of learners, and may additionally select a sentence stored in the history of the excellent learner as a recommended sentence corresponding to each script sentence of a scene model and include the selected recommended sentence in an answer set corresponding to each script sentence of the scene model. That is, in addition to the textbook answers, the data pool of the answer set can be expanded based on the answers of the excellent learners to reflect the answers actually used in the spoken corpora and the like, and with this configuration, natural conversational learning similar to real conversation between people can be provided.
Then, the foreign language learning apparatus 100 may determine an answer utterance sentence of the virtual speaker according to an intention of the spoken sentence corresponding to the utterance speech information of the learner based on the scene model and the conversation level score evaluated for the utterance speech information of the learner, and provide the determined answer utterance sentence to the learner (S60). Here, the foreign language learning apparatus 100 may store a preset scene model for a foreign language learning conversation stream and advance the conversation stream based on the scene model during a learning conversation between a learner and a virtual speaker. In the existing conversation learning application, a plurality of preset scenes (e.g., a1-B1, a2-B2, A3-B3, or a1-B1-a2, a2-B2-A3 in fig. 5 (a)) are stored to enable only a short conversation flow in a short sentence or in one direction, and the learner learns the conversation by using the preset scenes, so that it is difficult to implement a natural learning conversation. However, the scene model stored in the foreign language learning apparatus 100 may be configured in units of conversation types, subjects, tasks, and the like, and is configured such that the units are connected to each other according to the learner's selection and the conversation is expanded or changed in various directions to advance the conversation flow. Thus, similar to conversational lessons with native language speakers, learners can engage in more natural conversational learning with virtual speakers. Fig. 5 (B) illustrates an exemplary embodiment of the extended scene model. In the present exemplary embodiment, the scene model is configured in units of conversation type and conversation function, but in addition to this, the scene model may be configured in other necessary reference units.
In the process of advancing the conversation flow based on the scene model, the foreign language learning apparatus 100 can determine the answer utterance sentence of the virtual speaker and the conversation flow direction and advance the conversation flow according to the intention of the spoken sentence of the learner and the conversation level of the utterance voice information. For example, in an exemplary embodiment, in the case where the learner's conversation level is 75%, the foreign language learning apparatus 100 may determine an answer utterance sentence of a virtual speaker corresponding to the next situation in the scene model to continue advancing the conversation flow; also, in the case where the learner's conversation level is in the section of 50% to 75%, the foreign language learning apparatus 100 may extract keywords in the recommended sentence and the learner's spoken sentence, and determine a re-question changed in another form as an answer utterance sentence of the virtual speaker to guide the learner to speak an appropriate answer similar to the answer set. Further, in the case where the learner's conversation level is less than 50%, the foreign language learning apparatus 100 may determine the answer utterance sentence of the virtual speaker and guide the learner who has difficulty in talking with respect to the corresponding subject to first actually recognize the sentence in the following form: providing a prompt, providing a complete sentence of a recommended sentence, giving an appropriate recommended sentence in voice, and then providing guidance so that the learner repeatedly repeats the appropriate recommended sentence, providing a sentence pattern/expression learning, etc. so that the learner can learn a recommended sentence suitable for the current conversation situation.
According to an exemplary embodiment, the foreign language learning apparatus 100 may allow the learner to review or additionally learn the learning content including words, sentences, expressions, pronunciations, etc. required to learn the foreign language according to the evaluated conversation level of the learner during the learning conversation stream or after the learning conversation stream is terminated. In an exemplary embodiment, the foreign language learning apparatus 100 may be implemented such that the learner is sufficiently aware of words, sentences, pronunciations, expressions, and the like, which are associated with each script sentence, and an answer set of the script sentence, which are determined that the dialogue level score of the learner is lower than a preset value. In an exemplary embodiment, the foreign language learning apparatus 100 may play the pronunciation of the corresponding learning content through the virtual speaker and request the learner to repeat the pronunciation repeatedly. Otherwise, the foreign language learning apparatus 100 may also play the learner's own spoken voice and request the learner to speak again, and perform review or additional learning functions for the learner in various methods other than the above-described method.
Specifically, referring to fig. 9, when the foreign language learning apparatus 100 receives the conversation level score from the evaluation module 130 (S61), the foreign language learning apparatus 100 may compare the conversation level score with a reference value (S62). Here, when the conversation level score is equal to or greater than the reference value, the foreign language learning device 100 may extract the next script sentence according to the scene model and continue the conversation with the virtual speaker (S63). In this case, the advancing path of the scene model may be determined according to the intention of the spoken sentence corresponding to the speech information of the learner. That is, the foreign language learning apparatus 100 may compare the spoken sentence of the learner with the answer set for each forward path preset in the scene model, and perform the scene model according to the forward path determined as the answer set corresponding to the spoken sentence of the learner among the answer sets for each forward path.
Meanwhile, when the conversation level score is less than the reference value, the virtual speaker may perform utterance guidance on the learner by asking the conversation again, which includes giving a prompt of a keyword to the learner and listening to and repeating a sentence based on the response sentence.
Specifically, when the conversation level score is equal to or greater than the limit value and less than the reference value (S64), the foreign language learning device 100 may perform TTS based on the response sentence, find a recommended sentence in the answer set that is most similar to the spoken sentence of the learner, and then ask the meaning to the learner again (S65). That is, since the STT about what the learner said may be incorrect, the correct meaning can be confirmed to the learner again.
Meanwhile, when the conversation-level score is less than the limit value, the learner may learn the set answer to the script sentence (S66). That is, since the learner's conversation level is low, the foreign language learning apparatus 100 can make the learner recognize and learn the corresponding script sentence and the set answer corresponding to the script sentence.
According to an exemplary embodiment, as shown in fig. 8, after the conversation between the learner and the virtual speaker is terminated, the foreign language learning apparatus 100 may recommend learning contents suitable for the difficulty level and the subject of the learner according to the conversation level score of the spoken voice information of the learner (S70).
The present invention can be implemented as computer readable codes in a medium in which a program is recorded. The computer-readable medium may store the computer-executable program continuously or temporarily for execution or download. Further, the medium may be various recording devices or storage devices in a form combined into a single or several pieces of hardware, and is not limited to a medium directly connected to a specific computer system, but may also be distributed in a network. Examples of the medium may be a medium configured to store program commands, including magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as compact disc read only memories (CD-ROMs) and Digital Video Discs (DVDs), magneto-optical media such as floppy discs (floptical disks), Read Only Memories (ROMs), Random Access Memories (RAMs), and flash memories. Further, another example of the medium may include a recording medium or a storage medium managed in an application store that distributes application programs, a site that provides distribution of various kinds of software, a server, and the like. The detailed description is therefore not to be construed in a limiting sense in all respects and should be considered in an illustrative manner. The scope of the invention should be determined by reasonable interpretation of the appended claims and all modifications included in the scope of the invention within the equivalent scope of the invention.
The invention is not limited to the foregoing exemplary embodiments and drawings. It is apparent that those skilled in the art can make various modifications, alterations, or substitutions within the scope of the spirit of the present technology.

Claims (27)

1. A foreign language learning apparatus which provides a foreign language learning service to a learner through a conversation with a virtual speaker, the foreign language learning apparatus comprising:
an interface module that receives utterance voice information of the learner from a voice Artificial Intelligence (AI) device that performs speech recognition of an utterance voice received from the learner by an utterance to text (STT), and performs speech synthesis of a text to utterance (TTS) on the utterance voice information of the virtual speaker corresponding to the utterance voice information of the learner received through the AI device, and transmits the synthesized utterance voice information;
a learner utterance analysis module that stores and analyzes a log of utterance history related to the utterance speech information of the learner;
an evaluation module that calculates a score of each of one or more evaluation indexes through a conversation level calculation algorithm with respect to the utterance voice information of the learner, and evaluates a conversation level; and
a learning-conversation module that stores a scenario model for a learning-conversation flow, and determines an answer-to-speech sentence of the virtual speaker based on the scenario model according to an intention of the spoken sentence of a learner and a conversation level of the spoken voice information of the learner evaluated in the evaluation module during the learning-conversation between the learner and the virtual speaker, and advances a conversation flow.
2. The foreign language learning apparatus of claim 1, further comprising:
a learning function module configured to cause the learner to review or additionally learn learning content including at least one of words, sentences, expressions, and pronunciations required by the learner according to the session level of the learner evaluated in the evaluation module during the learning conversation flow or after the learning conversation flow is terminated.
3. The foreign language learning apparatus of claim 1, further comprising:
a scene design module that designs a scene model suitable for a conversation flow between the learner and the virtual speaker by using a conversation type or a conversation function node,
wherein the scenario model includes each script sentence in units of a conversation type or a topic and an answer set corresponding to the script sentence, and is configured to branch into a plurality of forward paths for each node corresponding to the respective conversation type or topic according to an intention of the sentence of the learner.
4. The foreign language learning apparatus of claim 3, wherein the interface module supports connection with at least one of an AI server, a contents providing server and an open API support server.
5. The foreign language learning apparatus of claim 4, wherein the interface module is configured to receive an answer set provided in conjunction with a foreign language learning service and additional learning content including situation talk content from the content providing server, and the scene design module is configured to design a new scene model by applying the additional learning content to the talk type or talk function node.
6. The foreign language learning apparatus of claim 1, wherein the session level calculation algorithm comprises at least one of the following algorithms:
a sentence similarity algorithm that evaluates a sentence similarity between the spoken phonetic information of the learner and an answer set corresponding to each script sentence of the stored scene model,
a contextual similarity algorithm that evaluates contextual similarity of the spoken voice information of the learner to a conversation between the learner and the back and forth question and answer between the learner and the virtual speaker; and
a speech fluency algorithm that calculates a speech fluency of the speech information of the learner as a score of an index including pronunciation accuracy and a speech speed, and determines the speech fluency.
7. The foreign language learning apparatus of claim 6, wherein the sentence similarity algorithm generates a similarity determination model trained by a method of converting semantic similarities between words included in the answer set corresponding to each script sentence of the scene model into vector values and converting the vector values into numerical values, and applies the utterance speech information of the learner to the similarity determination model and calculates a sentence similarity score of the utterance speech information of the learner through a sentence similarity vector distance value between the utterance speech information of the learner and the answer set.
8. The foreign language learning apparatus of claim 6, wherein the contextual similarity algorithm calculates a conversation contextual similarity score of the learner's spoken sentence by learning in a method of converting semantic similarity between words into vector values and converting the vector values into numerical values, and determines whether the calculated contextual similarity score is within a preset value range associated with the context of the conversation flow, the method being to convert the semantic similarity between the words in a script or a conversation sentence explaining a situation or a subject in the scene model, a word used in a previous script or a conversation sentence providing guidance to the learner, and a word linkage in the learner's spoken sentence configured in response to the word.
9. The foreign language learning apparatus of claim 6, wherein the speech fluency algorithm learns a speech speed and a pronunciation information of a preset reference speech, compares the speech information and the speech speed extracted from the learner's speech information with the speech information and the speech speed of the reference speech corresponding to the spoken sentence of the learner, and calculates an index value of a pronunciation accuracy and the speech speed.
10. The foreign language learning apparatus of claim 6, wherein the evaluation module generates an evaluation model based on a supervised learning method by inputting an evaluation index score calculated through at least one of the sentence similarity algorithm, the context similarity algorithm, and the utterance fluency algorithm, and calculates the conversation level score of the learner by inputting utterance voice information of the learner into the evaluation model.
11. The foreign language learning apparatus of claim 6, wherein the evaluation module is configured to select an excellent learner, which has obtained a preset score or more, from among learning histories of a plurality of learners stored in the learner utterance analysis module, and to select the sentences stored in the history of the excellent learner as additional recommended sentences corresponding to each script sentence of the scenario model, and to include the selected additional recommended sentences in an answer set corresponding to each script sentence of the scenario model.
12. A method for providing a foreign language learning service to a learner through a conversation with a virtual speaker by a foreign language learning apparatus, the method comprising the steps of:
receiving a selection input for any one of a plurality of preset foreign language learning topics from the learner through a learner terminal;
displaying a script sentence included in at least one scene model corresponding to the selected topic on a screen through the learner terminal or providing the script sentence to the learner in an utterance voice of the virtual speaker;
receiving spoken voice information spoken by the learner in response to the script statement;
comparing the utterance speech information of the learner with an answer set preset in response to the scene model, calculating a score of each of one or more evaluation indexes through a conversation level calculation algorithm, and calculating a conversation level score of the utterance speech information of the learner; and
determining an answer utterance sentence of the virtual speaker according to an intention of a spoken sentence corresponding to the utterance speech information of the learner based on the scene model and the assessed conversation level score of the utterance speech information of the learner, and providing the determined answer utterance sentence to the learner.
13. The method of claim 12, further comprising the steps of:
designing a scene model suitable for a conversation flow between the learner and the virtual speaker using nodes for each conversation type or each conversation function,
wherein the scenario model includes each script sentence in units of a conversation type or a topic and an answer set corresponding to the script sentence, and is configured to branch into a plurality of forward paths for each node corresponding to the respective conversation type or topic according to an intention of the sentence of the learner.
14. The method of claim 12, further comprising the steps of:
recommending learning content of an appropriate difficulty and subject to the learner according to a conversation level score of the spoken voice information of the learner after the conversation between the learner and the virtual speaker is terminated.
15. The method of claim 12, wherein the session level calculation algorithm comprises at least one of the following algorithms:
a sentence similarity algorithm that evaluates a sentence similarity between the spoken phonetic information of the learner and an answer set corresponding to each script sentence of the stored scene model,
a contextual similarity algorithm that evaluates contextual similarity of the spoken voice information of the learner to a conversation between the learner and the back and forth question and answer between the learner and the virtual speaker; and
a speech fluency algorithm that calculates a speech voice fluency of the speech voice information of the learner as a score of an index including pronunciation accuracy and a speech speed, and determines the speech fluency.
16. The method of claim 15, wherein the sentence similarity algorithm generates a similarity determination model trained by a method of converting semantic similarities between words included in the answer set corresponding to each script sentence of the scene model into vector values and converting the vector values into numerical values, and applies the utterance speech information of the learner to the similarity determination model and calculates a sentence similarity score of the utterance speech information of the learner by a sentence similarity vector distance value between the utterance speech information of the learner and the answer set.
17. The method of claim 15, wherein the contextual similarity algorithm calculates a conversation contextual similarity score of the learner's spoken sentence by learning according to a method of interpreting words in a script or conversation sentence of a situation or topic in the context model, words used in a previous script or conversation sentence providing guidance to the learner, and word connections in the learner's spoken sentence configured in response to the words, and converting semantic similarities between the words and vector values into numerical values, and determines whether the calculated contextual similarity score is within a preset value range associated with a context of a conversation flow.
18. The method of claim 15, wherein the speech fluency algorithm learns the speech speed and the pronunciation information of the reference speech voice, and compares the speech information and the speech speed extracted from the learner's speech voice information with the speech information and the speech speed of the reference speech voice corresponding to the spoken sentence of the learner, and calculates an index value of the pronunciation accuracy and the speech speed.
19. The method of claim 15, wherein calculating the session level score comprises: generating an assessment model based on a supervised learning method by inputting an assessment index score calculated via at least one of the sentence similarity algorithm, the context similarity algorithm, and the utterance fluency algorithm, and calculating a conversation level score of the learner by inputting utterance voice information of the learner into the assessment model.
20. The method of claim 15, wherein calculating the session level score comprises: selecting an excellent learner, which has obtained a preset score or more, from among learning histories of a plurality of learners stored in the learner utterance analysis module, and selecting the sentence stored in the history of the excellent learner as an additional recommended sentence corresponding to each script sentence of the scenario model, and including the selected additional recommended sentence in an answer set corresponding to each script sentence of the scenario model.
21. The method of claim 12, wherein determining the answer utterance sentence and providing the determined answer utterance sentence to the learner further comprises:
when the conversation level score is equal to or greater than a reference value, extracting a next script statement according to the scene model and continuing the conversation with the virtual speaker; and
performing, by the virtual speaker, utterance guidance for the learner through a challenge-back conversation including giving a keyword prompt to the learner and listening to and repeating a sentence based on a response sentence when the conversation level score is less than the reference value.
22. The method of claim 21, wherein performing the utterance guidance further comprises:
performing speech synthesis based on the response sentence when the conversation level score is equal to or greater than a limit value and less than the reference value, finding a recommended sentence that is most similar to the sentence of the learner in the answer set; and ask the learner for meaning again; and
when the session level score is less than the limit value, the learner is caused to learn the set answer of the script sentence.
23. The method of claim 21, wherein continuing the conversation comprises: determining a forward path branching from a node for each conversation type or conversation function in the scene model according to an intention of the spoken sentence corresponding to the utterance voice information of the learner.
24. The method of claim 23, wherein continuing the conversation comprises: comparing the sentence of the learner with a preset answer set with respect to the forward path branching from a node for each conversation type or conversation function in the scene model, and learning in a forward path according to an answer set determined to correspond to the sentence of the learner among the answer sets of each forward path.
25. The method of claim 12, wherein the scenario model is generated by using a scenario model design tool provided by the foreign language learning apparatus to include each script sentence and an answer set corresponding to the script sentence, and the scenario model is branched into a plurality of forward paths according to an intention of the sentence inputted by the learner in response to the script sentence.
26. The method of claim 25, wherein the scene model design tool supports connection with a content providing server and is configured to receive an answer set provided in connection with a foreign language learning service provided from the content providing server and additional learning content including scene conversation content, and the scene model design module is configured to design a new scene model by applying the additional learning content to nodes for each conversation type or conversation function.
27. A computer program for executing the method for providing a foreign language learning service of claim 12, the computer program being stored in a recording medium.
CN202011153817.4A 2019-10-31 2020-10-26 Apparatus for learning foreign language and method for providing foreign language learning service using the same Pending CN112819664A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2019-0137882 2019-10-31
KR1020190137882A KR102302137B1 (en) 2019-10-31 2019-10-31 Apparatus for studying foreign language and method for providing foreign language study service by using the same

Publications (1)

Publication Number Publication Date
CN112819664A true CN112819664A (en) 2021-05-18

Family

ID=75713083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011153817.4A Pending CN112819664A (en) 2019-10-31 2020-10-26 Apparatus for learning foreign language and method for providing foreign language learning service using the same

Country Status (3)

Country Link
JP (1) JP7059492B2 (en)
KR (1) KR102302137B1 (en)
CN (1) CN112819664A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743418A (en) * 2022-04-26 2022-07-12 简伟广 Japanese teaching system based on AI intelligence

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220159170A (en) * 2021-05-25 2022-12-02 삼성전자주식회사 Electronic device and operation method thereof
KR102622067B1 (en) * 2021-08-11 2024-01-09 한국과학기술원 Method and system for English conversation skill analysis using dialogue transcript
KR102384573B1 (en) * 2021-09-09 2022-04-11 주식회사 오리진 Terminal for language learning including free talking option based on artificial intelligence and operating method
KR102551296B1 (en) * 2021-09-24 2023-07-05 한국전자통신연구원 Dialogue system and its method for learning to speak foreign language
KR102410110B1 (en) * 2021-11-05 2022-06-22 주식회사 살랑코리아 How to provide Korean language learning service
KR102418558B1 (en) * 2021-11-22 2022-07-07 주식회사 유나이티드어소시에이츠 English speaking teaching method using interactive artificial intelligence avatar, device and system therefor
KR102491978B1 (en) * 2021-12-21 2023-01-27 (주)웅진씽크빅 Apparatus for learning character language using learner voice and method for supporting learning of character language using the same
KR102410644B1 (en) 2022-02-16 2022-06-22 주식회사 알투스 Method, device and system for providing foreign language education contents service based on voice recognition using artificial intelligence
US11875699B2 (en) 2022-04-21 2024-01-16 Columbia College Methods for online language learning using artificial intelligence and avatar technologies
KR102655327B1 (en) * 2022-06-16 2024-04-04 장미화 Educational tool system for improving thinking skills for young children's english and mathematical coding
KR102569339B1 (en) * 2023-03-09 2023-08-22 주식회사 공터영어 Speaking test system
KR102678364B1 (en) * 2024-02-02 2024-06-26 주식회사 공터영어 Online based multi learning system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290300A1 (en) * 2009-12-16 2012-11-15 Postech Academy- Industry Foundation Apparatus and method for foreign language study
JP2017188039A (en) * 2016-04-08 2017-10-12 Kddi株式会社 Program, device and method for estimating score of text by calculating multiple similarity degrees
KR20180100001A (en) * 2017-02-28 2018-09-06 서울대학교산학협력단 System, method and recording medium for machine-learning based korean language conversation using artificial intelligence
CN110379225A (en) * 2018-04-12 2019-10-25 百度(美国)有限责任公司 The system and method for interactive language acquisition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101037247B1 (en) * 2009-06-18 2011-05-26 포항공과대학교 산학협력단 Foreign language conversation training method and apparatus and trainee simulation method and apparatus for qucikly developing and verifying the same
KR20110120552A (en) * 2010-04-29 2011-11-04 포항공과대학교 산학협력단 Foreign language learning game system and method based on natural language dialogue technology
KR20120006154A (en) * 2010-07-12 2012-01-18 (주)유비바다 3-d image education system and method
JP2012215645A (en) * 2011-03-31 2012-11-08 Speakglobal Ltd Foreign language conversation training system using computer
JP2017201342A (en) * 2016-05-02 2017-11-09 良一 春日 Language Learning Robot Software
JP6633250B2 (en) * 2017-06-15 2020-01-22 株式会社Caiメディア Interactive robot, interactive system, and interactive program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120290300A1 (en) * 2009-12-16 2012-11-15 Postech Academy- Industry Foundation Apparatus and method for foreign language study
JP2017188039A (en) * 2016-04-08 2017-10-12 Kddi株式会社 Program, device and method for estimating score of text by calculating multiple similarity degrees
KR20180100001A (en) * 2017-02-28 2018-09-06 서울대학교산학협력단 System, method and recording medium for machine-learning based korean language conversation using artificial intelligence
CN110379225A (en) * 2018-04-12 2019-10-25 百度(美国)有限责任公司 The system and method for interactive language acquisition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743418A (en) * 2022-04-26 2022-07-12 简伟广 Japanese teaching system based on AI intelligence

Also Published As

Publication number Publication date
JP2021071723A (en) 2021-05-06
KR20210051948A (en) 2021-05-10
KR102302137B1 (en) 2021-09-15
JP7059492B2 (en) 2022-04-26

Similar Documents

Publication Publication Date Title
KR102302137B1 (en) Apparatus for studying foreign language and method for providing foreign language study service by using the same
CN108962217B (en) Speech synthesis method and related equipment
CN110489756B (en) Conversational human-computer interactive spoken language evaluation system
WO2007062529A1 (en) Interactive language education system and method
KR101037247B1 (en) Foreign language conversation training method and apparatus and trainee simulation method and apparatus for qucikly developing and verifying the same
Komatani et al. User modeling in spoken dialogue systems to generate flexible guidance
WO2021212954A1 (en) Method and apparatus for synthesizing emotional speech of specific speaker with extremely few resources
KR20160008949A (en) Apparatus and method for foreign language learning based on spoken dialogue
KR20090058320A (en) Example-based communicating system for foreign conversation education and method therefor
Skidmore et al. Using Alexa for flashcard-based learning
KR102410110B1 (en) How to provide Korean language learning service
KR100593589B1 (en) Multilingual Interpretation / Learning System Using Speech Recognition
Ureta et al. At home with Alexa: a tale of two conversational agents
Baur et al. A textbook-based serious game for practising spoken language
KR101873379B1 (en) Language learning system with dialogue
KR102272567B1 (en) Speech recognition correction system
CN114255759A (en) Method, apparatus and readable storage medium for spoken language training using machine
Komatani et al. Flexible spoken dialogue system based on user models and dynamic generation of VoiceXML scripts
Leppik et al. Estoñol, a computer-assisted pronunciation training tool for Spanish L1 speakers to improve the pronunciation and perception of Estonian vowels
CN113192484A (en) Method, apparatus, and storage medium for generating audio based on text
Boroș et al. Rss-tobi-a prosodically enhanced romanian speech corpus
Ross et al. Speaking with your computer: A new way to practice and analyze conversation
Wik Designing a virtual language tutor
Shukla Development of a human-AI teaming based mobile language learning solution for dual language learners in early and special educations
Kasrani et al. A Mobile Cloud Computing Based Independent Language Learning System with Automatic Intelligibility Assessment and Instant Feedback.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination