KR102652008B1

KR102652008B1 - Method and apparatus for providing a multimodal-based english learning service applying native language acquisition principles to a user terminal using a neural network

Info

Publication number: KR102652008B1
Application number: KR1020230119036A
Authority: KR
Inventors: 신은미; 김윤현
Original assignee: 아이보람 주식회사
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2024-03-27

Abstract

실시예들은 뉴럴 네트워크(neural network)를 이용하여 서버가 사용자 단말에게 멀티모달(multimodal) 기반의 영어 학습 서비스를 제공하는 방법 및 장치를 제시한다. 일 실시예에 따른 상기 방법은, 사용자 단말로부터 영단어 학습을 위한 제1 요청 메시지를 수신하고, 상기 제1 요청 메시지는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 포함하고, 상기 사용자에 대한 개인 정보 및 상기 사용자 단말에 대한 정보를 기반으로 제1 뉴럴 네트워크를 이용하는 방식 결정 모델을 통해 상기 사용자 단말에 대한 진행 방식 정보를 결정하고, 상기 진행 방식 정보를 포함하는 학습 개시 메시지를 상기 사용자 단말에게 전송하고, 상기 진행 방식 정보에 기반하여 상기 사용자 단말에 디스플레이된 멀티모달 기반의 영단어 학습 컨텐츠가 진행되고, 상기 멀티모달 기반의 영단어 학습 컨텐츠는 영단어에 대한 정보, 영단어의 정의에 대한 정보 및 영단어와 관련된 애니메이션에 대한 정보를 포함하고, 상기 사용자 단말로부터 상기 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보를 수신하고, 상기 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보에 기반하여 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델을 통해 상기 사용자 단말에 대한 학습 점수를 결정하고, 상기 사용자 단말에게 상기 사용자 단말에 대한 학습 점수를 포함하는 학습 보상 메시지를 전송하는 단계를 포함할 수 있다. 예를 들어, 상기 학습 보상 메시지에 기반하여 복수의 영단어 게임이 상기 사용자 단말에 대해 활성화될 수 있다.Embodiments present a method and device in which a server provides a multimodal-based English learning service to a user terminal using a neural network. The method according to one embodiment includes receiving a first request message for learning English words from a user terminal, the first request message including personal information about the user and information about the user terminal, and personal information about the user. Based on the information and the information on the user terminal, proceeding method information for the user terminal is determined through a method decision model using the first neural network, and a learning start message including the proceeding method information is transmitted to the user terminal. And, multimodal-based English word learning content displayed on the user terminal is progressed based on the progress method information, and the multimodal-based English word learning content includes information about the English word, information about the definition of the English word, and information related to the English word. Contains information about animation, receives information about the progress result of the multimodal-based English word learning content from the user terminal, and generates a second neural signal based on the information about the progress result of the multimodal-based English word learning content. It may include determining a learning score for the user terminal through a learning achievement evaluation model using a network, and transmitting a learning reward message including the learning score for the user terminal to the user terminal. For example, a plurality of English word games may be activated for the user terminal based on the learning reward message.

Description

In performing English education applying the principles of native language acquisition, a method and device for providing a multimodal-based English learning service to a user terminal using a neural network {METHOD AND APPARATUS FOR PROVIDING A MULTIMODAL-BASED ENGLISH LEARNING SERVICE APPLYING NATIVE LANGUAGE ACQUISITION PRINCIPLES TO A USER TERMINAL USING A NEURAL NETWORK}

본 개시의 실시예들은 사용자 단말에게 멀티모달 기반의 영어 학습 서비스를 제공하는 기술에 관한 것으로, 뉴럴 네트워크를 이용하여 사용자 단말에게 멀티모달 기반의 영어 학습 서비스를 제공하는 방법 및 장치에 대한 것이다.Embodiments of the present disclosure relate to technology for providing a multimodal-based English learning service to a user terminal, and to a method and device for providing a multimodal-based English learning service to a user terminal using a neural network.

일반적으로 영어 단어를 익히는 것으로 시작하는 온라인 영어 교육 서비스는 대부분 파닉스(Phonics)로 시작한다. 그렇지 않는 경우에는 단어-사진-한글 뜻의 단순한 구성이 대부분이다. 이들 서비스의 대부분은 암기를 잘하는 방법에 초점이 맞춰져 있다. 즉, 단어가 문장안에서 어떤 의미를 갖는지는 중요하지 않으며, 영어 단어와 한국어 해석을 1:1로 매칭해서 빠르고 쉽게 암기할 수 있는 방법을 시도한다. Online English education services that generally start with learning English words mostly start with phonics. In other cases, it is mostly a simple composition of word-picture-Korean meaning. Most of these services focus on how to memorize better. In other words, it does not matter what meaning the word has in the sentence, and we try to match the English word and the Korean interpretation 1:1 to enable memorization quickly and easily.

이러한 파닉스(Phonics) 중심의 영어교육이 효과가 없음은 널리 인정되고 있다. 다만, 그 대안을 찾지 못해 계속해서 파닉스(Phonics)를 이용하고 있으며, 이런 교육으로 인해, 전세계에서 토익(TOEIC)과 토플(TOEFL)을 가장 많이 보는 나라라는 한국이 결과적으로는 2019년 토플(TOEFL) 성적이 171개국 중 87위, 토플(TOEFL) 말하기는 132위에 불과한 성적을 거두고 있다.It is widely acknowledged that such phonics-centered English education is ineffective. However, as they have not been able to find an alternative, they continue to use phonics, and due to this type of education, Korea, which is said to be the country that takes TOEIC and TOEFL the most in the world, eventually took the TOEFL in 2019. ) It ranks 87th out of 171 countries, and ranks only 132nd in TOEFL speaking.

이런 상황에서도 불구하고 파닉스(Phonics) 교육에 매달리는 이유는 파닉스 교육자체가 암기를 중심으로 이루어지기 때문이다. 한국의 교육에서는 암기가 중요한 도구다. 영어는 물론 다른 과목의 교육도 대부분 암기하는 방식으로 이루어지고 있으며, 심지어는 수학도 공식을 암기해서 문제를 빨리 푸는 방식으로 이루어지고 있다. 미국의 대입 시험인 SAT 시험의 수학과목(Math)은 리즈닝 테스트(Reasoning Test)라고 불리며, 논리력과 추론력을 테스트하기 위한 시험으로 이루어지는 것과 큰 차이가 있다. 따라서, 암기하는 방식의 교육인 파닉스 교육이 아주 쉽게 한국의 어린이 영어 교육에 자리를 잡게 된 것이다. The reason why we stick to phonics education despite this situation is because phonics education itself is centered on memorization. In Korean education, memorization is an important tool. Education in English and other subjects is mostly done through memorization, and even in mathematics, it is done through memorizing formulas to quickly solve problems. The math subject of the SAT test, a college entrance exam in the United States, is called the Reasoning Test, and is very different from the test that tests logic and reasoning skills. Therefore, phonics education, which is a memorization-based education, has easily taken its place in Korean children's English education.

암기로 접근하는 한국에서의 단어 습득 방식은, 영어 단어와 매칭되는 한국어 단어의 뜻을 얼마나 빠르게 암기할 수 있느냐에 초점이 맞춰져 있다. 그렇게 영어와 한국어를 매칭해서 암기한 단어는 다양한 상황에서 의미가 확장되어 파생되는 경우에는 힘을 발휘하지 못한다. 일 예를 들면, 우리는 "아주 큰", "대규모의"와 같은 의미를 나타낼 때 "매머드급"이라는 표현을 많이 사용한다. 그리고 빙하기 시대에 살았던 코끼리와 유사한 동물을 맘모스라고 부른다. 그런데, 매머드와 맘모스는 동일한 단어인 "Mammuthus"다. 어느 때는 매머드로, 어느때는 맘모스로 한국식으로 부르고 이해하다 보니 서로 다른 것처럼 인식이 되고 있는 것이다. 한국식으로 해석해서 암기하는 방식에 익숙하다 보니 나타나는 현상이다. 단어 자체가 갖는 근본적인 의미에 대한 이미지가 형성되어 있으면 변용되어 사용되더라도 의미를 유추해서 알기가 쉽다.The Korean method of learning words through memorization focuses on how quickly one can memorize the meaning of a Korean word that matches an English word. Words memorized by matching English and Korean are not effective when their meaning is expanded and derived in various situations. For example, we often use the expression “mammoth-scale” to mean something like “very large” or “on a large scale.” And animals similar to elephants that lived during the Ice Age are called mammoths. By the way, mammoth and mammoth are the same word, “Mammuthus”. Sometimes they are called mammoth and sometimes they are called mammoth in Korean and they are recognized as different. This is a phenomenon that occurs because people are accustomed to the Korean way of interpreting and memorizing things. If an image of the fundamental meaning of the word itself is formed, it is easy to infer and understand its meaning even if it is used in a modified manner.

이런 이론적 기반을 근거로, 가장 효과적인 초기 영어 교육 방법은 "단어+단어 정의+짧은 애니메이션(Animation)"으로 구성된 전체적 언어접근법을 활용하는 모국어 습득방식의 학습 방법을 제시하고자 한다.Based on this theoretical foundation, we would like to suggest that the most effective early English education method is a native language acquisition method that utilizes a holistic language approach consisting of "words + word definitions + short animations."

이에, 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 기반으로 제1 뉴럴 네트워크를 이용하는 방식 결정 모델을 통해 사용자 단말에 대한 진행 방식 정보를 결정하고, 진행 방식 정보에 기반하여 사용자 단말에 디스플레이된 멀티모달 기반의 영단어 학습 컨텐츠가 진행되고, 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보에 기반하여 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델을 통해 사용자 단말에 대한 학습 점수를 결정하여 사용자 단말에게 멀티모달 기반의 영어 학습 서비스를 제공하는 방법 및 장치가 필요하다.Accordingly, based on personal information about the user and information about the user terminal, proceeding method information for the user terminal is determined through a method decision model using the first neural network, and based on the proceeding method information, the multimedia display is displayed on the user terminal. Modal-based English word learning content is progressed, and based on information about the progress of the multimodal-based English word learning content, a learning score for the user terminal is determined through a learning achievement evaluation model using a second neural network and sent to the user terminal. A method and device for providing a multimodal-based English learning service is needed.

본 개시의 실시예들은, 뉴럴 네트워크를 이용하여 사용자 단말에게 멀티모달 기반의 영어 학습 서비스를 제공하는 방법 및 장치를 제공할 수 있다. Embodiments of the present disclosure can provide a method and device for providing a multimodal-based English learning service to a user terminal using a neural network.

실시예들에서 이루고자 하는 기술적 과제들은 이상에서 언급한 사항들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 이하 설명할 다양한 실시예들로부터 당해 기술분야에서 통상의 지식을 가진 자에 의해 고려될 수 있다.The technical challenges to be achieved in the embodiments are not limited to the matters mentioned above, and other technical challenges not mentioned may be considered by those skilled in the art from the various embodiments described below. You can.

일 실시예에 따른 뉴럴 네트워크(neural network)를 이용하여 서버가 사용자 단말에게 멀티모달(multimodal) 기반의 영어 학습 서비스를 제공하는 방법은, 사용자 단말로부터 영단어 학습을 위한 제1 요청 메시지를 수신하고, 상기 제1 요청 메시지는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 포함하고, 상기 사용자에 대한 개인 정보 및 상기 사용자 단말에 대한 정보를 기반으로 제1 뉴럴 네트워크를 이용하는 방식 결정 모델을 통해 상기 사용자 단말에 대한 진행 방식 정보를 결정하고, 상기 진행 방식 정보를 포함하는 학습 개시 메시지를 상기 사용자 단말에게 전송하고, 상기 진행 방식 정보에 기반하여 상기 사용자 단말에 디스플레이된 멀티모달 기반의 영단어 학습 컨텐츠가 진행되고, 상기 멀티모달 기반의 영단어 학습 컨텐츠는 영단어에 대한 정보, 영단어의 정의에 대한 정보 및 영단어와 관련된 애니메이션에 대한 정보를 포함하고, 상기 사용자 단말로부터 상기 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보를 수신하고, 상기 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보에 기반하여 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델을 통해 상기 사용자 단말에 대한 학습 점수를 결정하고, 상기 사용자 단말에게 상기 사용자 단말에 대한 학습 점수를 포함하는 학습 보상 메시지를 전송하는 단계를 포함할 수 있다. 예를 들어, 상기 학습 보상 메시지에 기반하여 복수의 영단어 게임이 상기 사용자 단말에 대해 활성화될 수 있다.A method in which a server provides a multimodal-based English learning service to a user terminal using a neural network according to an embodiment includes receiving a first request message for English word learning from the user terminal, The first request message includes personal information about the user and information about the user terminal, and the user determines how to use the first neural network based on the personal information about the user and information about the user terminal. Determine progress method information for the terminal, transmit a learning start message including the progress method information to the user terminal, and proceed with multimodal-based English word learning content displayed on the user terminal based on the progress method information. The multimodal-based English word learning content includes information about the English word, information about the definition of the English word, and information about the animation related to the English word, and the progress result of the multimodal-based English word learning content is sent from the user terminal. receives information about the multimodal-based English word learning content, determines a learning score for the user terminal through a learning achievement evaluation model using a second neural network based on information about the progress of the multimodal-based English word learning content, and sends the user terminal to the user terminal. It may include transmitting a learning reward message including a learning score for the user terminal. For example, a plurality of English word games may be activated for the user terminal based on the learning reward message.

일 실시예에 따르면, 서버는 사용자 단말로부터 복수의 영단어 게임 중 어느 하나의 영단어 게임에 대한 제2 요청 메시지를 수신할 수 있다. 제2 요청 메시지는 어느 하나의 영단어 게임에 참가하거나 어느 하나의 영단어 게임을 생성하기 위해 서버에게 요청하는 메시지이다. 예를 들어, 제2 요청 메시지는 게임 참가를 나타내는 값 또는 게임 생성을 나타내는 값 중 어느 하나와 게임에 대한 식별 정보를 포함할 수 있다. 게임에 대한 식별 정보는 게임의 종류를 나타내는 식별 값을 포함할 수 있다.According to one embodiment, the server may receive a second request message for one English word game among a plurality of English word games from the user terminal. The second request message is a message requesting the server to participate in one English word game or to create one English word game. For example, the second request message may include either a value indicating game participation or a value indicating game creation, and identification information about the game. Identification information about the game may include an identification value indicating the type of game.

일 실시예에 따르면, 서버는 사용자 단말로부터 영단어 학습을 위한 제3 요청 메시지를 수신할 수 있다. 제3 요청 메시지는 학습 성취 평가 모델을 통해 사용자 단말에 대한 학습 점수가 결정된 이후 사용자 단말이 영단어 학습을 요청하는 메시지일 수 있다. 예를 들어, 제3 요청 메시지는 사용자 단말에 대한 정보를 포함할 수 있다.According to one embodiment, the server may receive a third request message for learning English words from the user terminal. The third request message may be a message requesting the user terminal to learn English words after the learning score for the user terminal is determined through the learning achievement evaluation model. For example, the third request message may include information about the user terminal.

일 실시예에 따르면, 멀티 모달 기반의 영단어 학습 컨텐츠에 포함된 애니메이션을 구성하는 복수의 애니메이션 이미지는 생성적 적대 신경망 기반의 뉴럴 네트워크를 이용하는 애니메이션 생성 모델을 통해 생성될 수 있다.According to one embodiment, a plurality of animation images constituting animation included in multi-modal-based English word learning content may be generated through an animation generation model using a neural network based on a generative adversarial network.

실시예들에 따르면, 서버는 멀티모달 기반의 영단어 학습 컨텐츠를 사용자자 단말에게 제공함으로써, 영단어와 짧은 애니메이션을 통해 사용자의 연상능력을 향상시킬 수 있다. 이를 통해, 사용자에 대한 창의력이 향상되고, 영단어에 대한 기억력을 증대시킬 뿐만 아니라 의미 유추능력까지 향상될 수 있다.According to embodiments, the server can improve the user's association ability through English words and short animations by providing multimodal-based English word learning content to the user terminal. Through this, the user's creativity can be improved, memory for English words can be increased, and even meaning inference ability can be improved.

실시예들에 따르면, 서버는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 기반으로 제1 뉴럴 네트워크를 이용하는 방식 결정 모델을 통해 사용자 단말에 대한 진행 방식 정보를 결정하여 제공함으로써, 사용자 단말에 대한 기본적인 정보에 따라 적합한 진행 방식으로 멀티모달 기반의 영단어 학습 컨텐츠를 진행할 수 있다. According to embodiments, the server determines and provides proceeding method information for the user terminal through a method decision model using the first neural network based on personal information about the user and information about the user terminal, Depending on basic information, you can proceed with multimodal-based English vocabulary learning content in an appropriate way.

실시예들에 따르면, 서버는 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보에 기반하여 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델을 통해 사용자 단말에 대한 학습 점수를 결정하여 제공함으로써, 사용자에게 학습에 대한 동기부여를 할 수 있을 뿐만 아니라,According to embodiments, the server determines and provides a learning score for the user terminal through a learning achievement evaluation model using a second neural network based on information about the progress of multimodal-based English word learning content, and provides it to the user. Not only can it motivate learning,

실시예들로부터 얻을 수 있는 효과들은 이상에서 언급된 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 이하의 상세한 설명을 기반으로 당해 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다.The effects that can be obtained from the examples are not limited to the effects mentioned above, and other effects not mentioned can be clearly derived and understood by those skilled in the art based on the detailed description below. It can be.

실시예들에 대한 이해를 돕기 위해 상세한 설명의 일부로 포함된, 첨부 도면은 다양한 실시예들을 제공하고, 상세한 설명과 함께 다양한 실시예들의 기술적 특징을 설명한다.
도 1은 일 실시예에 따른 전자 장치의 구성을 나타내는 도면이다.
도 2는 일 실시예에 따른 프로그램의 구성을 나타내는 도면이다.
도 3은 일 실시예에 따라 뉴럴 네트워크(neural network)를 이용하여 서버가 사용자 단말에게 멀티모달(multimodal) 기반의 영어 학습 서비스를 제공하는 방법을 나타낸다.
도 4는 일 실시예에 따라 멀티모달 기반의 영단어 학습 컨텐츠에 대한 예이다.
도 5는 일 실시예에 따라 멀티모달 기반의 영단어 학습 컨텐츠의 설정 화면에 대한 예이다.
도 6은 일 실시예에 따른 멀티모달 기반의 영단어 학습 컨텐츠의 보상 화면에 대한 예이다.
도 7은 일 실시예에 따른 멀티모달 기반의 영단어 학습 컨텐츠의 팀 단위의 게임에 대한 예이다.
도 8은 일 실시예에 따른 생성적 적대 신경망 기반의 뉴럴 네트워크를 사용하는 애니메이션 생성 모델에 대한 예를 나타낸 도면이다.
도 9는 일 실시예에 따른 서버의 구성을 나타내는 블록도이다. The accompanying drawings, which are included as part of the detailed description to aid understanding of the embodiments, provide various embodiments and together with the detailed description describe technical features of the various embodiments.
1 is a diagram showing the configuration of an electronic device according to an embodiment.
Figure 2 is a diagram showing the configuration of a program according to one embodiment.
Figure 3 shows a method in which a server provides a multimodal-based English learning service to a user terminal using a neural network according to one embodiment.
Figure 4 is an example of multimodal-based English word learning content according to an embodiment.
Figure 5 is an example of a settings screen for multimodal-based English word learning content according to an embodiment.
Figure 6 is an example of a reward screen of multimodal-based English word learning content according to an embodiment.
Figure 7 is an example of a team-based game of multimodal-based English word learning content according to an embodiment.
Figure 8 is a diagram showing an example of an animation generation model using a neural network based on a generative adversarial network according to an embodiment.
Figure 9 is a block diagram showing the configuration of a server according to an embodiment.

이하의 실시예들은 실시예들의 구성요소들과 특징들을 소정 형태로 결합한 것들이다. 각 구성요소 또는 특징은 별도의 명시적 언급이 없는 한 선택적인 것으로 고려될 수 있다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 다양한 실시예들을 구성할 수도 있다. 다양한 실시예들에서 설명되는 동작들의 순서는 변경될 수 있다. 어느 실시예의 일부 구성이나 특징은 다른 실시예에 포함될 수 있고, 또는 다른 실시예의 대응하는 구성 또는 특징과 교체될 수 있다.The following embodiments combine elements and features of the embodiments in a predetermined form. Each component or feature may be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form that is not combined with other components or features. Additionally, various embodiments may be configured by combining some components and/or features. The order of operations described in various embodiments may change. Some features or features of one embodiment may be included in other embodiments or may be replaced with corresponding features or features of other embodiments.

도면에 대한 설명에서, 다양한 실시예들의 요지를 흐릴 수 있는 절차 또는 단계 등은 기술하지 않았으며, 당해 기술분야에서 통상의 지식을 가진 자의 수준에서 이해할 수 있을 정도의 절차 또는 단계는 또한 기술하지 아니하였다.In the description of the drawings, procedures or steps that may obscure the gist of the various embodiments are not described, and procedures or steps that can be understood at the level of a person with ordinary knowledge in the relevant technical field are not described. did.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함(comprising 또는 including)"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, "일(a 또는 an)", "하나(one)", "그(the)" 및 유사 관련어는 다양한 실시예들을 기술하는 문맥에 있어서(특히, 이하의 청구항의 문맥에서) 본 명세서에 달리 지시되거나 문맥에 의해 분명하게 반박되지 않는 한, 단수 및 복수 모두를 포함하는 의미로 사용될 수 있다.Throughout the specification, when a part is said to “comprise or include” a certain element, this means that it does not exclude other elements but may further include other elements, unless specifically stated to the contrary. do. In addition, terms such as "... unit", "... unit", and "module" used in the specification refer to a unit that processes at least one function or operation, which refers to hardware, software, or a combination of hardware and software. It can be implemented as: Additionally, the terms “a or an,” “one,” “the,” and similar related terms are used herein in the context of describing various embodiments (particularly in the context of the claims below). Unless otherwise indicated or clearly contradicted by context, it may be used in both singular and plural terms.

이하, 다양한 실시예들에 따른 실시 형태를 첨부된 도면을 참조하여 상세하게 설명한다. 첨부된 도면과 함께 이하에 개시될 상세한 설명은 다양한 실시예들의 예시적인 실시 형태를 설명하고자 하는 것이며, 유일한 실시형태를 나타내고자 하는 것이 아니다.Hereinafter, embodiments according to various embodiments will be described in detail with reference to the attached drawings. The detailed description set forth below in conjunction with the accompanying drawings is intended to describe exemplary embodiments of various embodiments and is not intended to represent the only embodiment.

또한, 다양한 실시예들에서 사용되는 특정(特定) 용어들은 다양한 실시예들의 이해를 돕기 위해서 제공된 것이며, 이러한 특정 용어의 사용은 다양한 실시예들의 기술적 사상을 벗어나지 않는 범위에서 다른 형태로 변경될 수 있다.In addition, specific terms used in various embodiments are provided to aid understanding of the various embodiments, and the use of such specific terms may be changed to other forms without departing from the technical spirit of the various embodiments. .

도 1은 일 실시예에 따른 전자 장치의 구성을 나타내는 도면이다.1 is a diagram showing the configuration of an electronic device according to an embodiment.

도 1은, 다양한 실시예들에 따른, 네트워크 환경(100) 내의 전자 장치(101)의 블록도이다. 도 1을 참조하면, 네트워크 환경(100)에서 전자 장치(101)는 제 1 네트워크(198)(예: 근거리 무선 통신 네트워크)를 통하여 전자 장치(102)와 통신하거나, 또는 제 2 네트워크(199)(예: 원거리 무선 통신 네트워크)를 통하여 전자 장치(104) 또는 서버(108) 중 적어도 하나와 통신할 수 있다. 일실시예에 따르면, 전자 장치(101)는 서버(108)를 통하여 전자 장치(104)와 통신할 수 있다. 일실시예에 따르면, 전자 장치(101)는 프로세서(120), 메모리(130), 입력 모듈(150), 음향 출력 모듈(155), 디스플레이 모듈(160), 오디오 모듈(170), 센서 모듈(176), 인터페이스(177), 연결 단자(178), 햅틱 모듈(179), 카메라 모듈(180), 전력 관리 모듈(188), 배터리(189), 통신 모듈(190), 가입자 식별 모듈(196), 또는 안테나 모듈(197)을 포함할 수 있다. 어떤 실시예에서는, 전자 장치(101)에는, 이 구성요소들 중 적어도 하나(예: 연결 단자(178))가 생략되거나, 하나 이상의 다른 구성요소가 추가될 수 있다. 어떤 실시예에서는, 이 구성요소들 중 일부들(예: 센서 모듈(176), 카메라 모듈(180), 또는 안테나 모듈(197))은 하나의 구성요소(예: 디스플레이 모듈(160))로 통합될 수 있다. 전자 장치(101)는 클라이언트, 단말기 또는 피어로 지칭될 수도 있다.1 is a block diagram of an electronic device 101 in a network environment 100, according to various embodiments. Referring to FIG. 1, in the network environment 100, the electronic device 101 communicates with the electronic device 102 through a first network 198 (e.g., a short-range wireless communication network) or a second network 199. It is possible to communicate with at least one of the electronic device 104 or the server 108 through (e.g., a long-distance wireless communication network). According to one embodiment, the electronic device 101 may communicate with the electronic device 104 through the server 108. According to one embodiment, the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, and a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or may include an antenna module 197. In some embodiments, at least one of these components (eg, the connection terminal 178) may be omitted or one or more other components may be added to the electronic device 101. In some embodiments, some of these components (e.g., sensor module 176, camera module 180, or antenna module 197) are integrated into one component (e.g., display module 160). It can be. The electronic device 101 may also be referred to as a client, terminal, or peer.

프로세서(120)는, 예를 들면, 소프트웨어(예: 프로그램(140))를 실행하여 프로세서(120)에 연결된 전자 장치(101)의 적어도 하나의 다른 구성요소(예: 하드웨어 또는 소프트웨어 구성요소)를 제어할 수 있고, 다양한 데이터 처리 또는 연산을 수행할 수 있다. 일실시예에 따르면, 데이터 처리 또는 연산의 적어도 일부로서, 프로세서(120)는 다른 구성요소(예: 센서 모듈(176) 또는 통신 모듈(190))로부터 수신된 명령 또는 데이터를 휘발성 메모리(132)에 저장하고, 휘발성 메모리(132)에 저장된 명령 또는 데이터를 처리하고, 결과 데이터를 비휘발성 메모리(134)에 저장할 수 있다. 일실시예에 따르면, 프로세서(120)는 메인 프로세서(121)(예: 중앙 처리 장치 또는 어플리케이션 프로세서) 또는 이와는 독립적으로 또는 함께 운영 가능한 보조 프로세서(123)(예: 그래픽 처리 장치, 신경망 처리 장치(NPU: neural processing unit), 이미지 시그널 프로세서, 센서 허브 프로세서, 또는 커뮤니케이션 프로세서)를 포함할 수 있다. 예를 들어, 전자 장치(101)가 메인 프로세서(121) 및 보조 프로세서(123)를 포함하는 경우, 보조 프로세서(123)는 메인 프로세서(121)보다 저전력을 사용하거나, 지정된 기능에 특화되도록 설정될 수 있다. 보조 프로세서(123)는 메인 프로세서(121)와 별개로, 또는 그 일부로서 구현될 수 있다.The processor 120, for example, executes software (e.g., program 140) to operate at least one other component (e.g., hardware or software component) of the electronic device 101 connected to the processor 120. It can be controlled and various data processing or calculations can be performed. According to one embodiment, as at least part of data processing or computation, the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132. The commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134. According to one embodiment, the processor 120 includes a main processor 121 (e.g., a central processing unit or an application processor) or an auxiliary processor 123 that can operate independently or together (e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device 101 includes a main processor 121 and a secondary processor 123, the secondary processor 123 may be set to use lower power than the main processor 121 or be specialized for a designated function. You can. The auxiliary processor 123 may be implemented separately from the main processor 121 or as part of it.

보조 프로세서(123)는, 예를 들면, 메인 프로세서(121)가 인액티브(예: 슬립) 상태에 있는 동안 메인 프로세서(121)를 대신하여, 또는 메인 프로세서(121)가 액티브(예: 어플리케이션 실행) 상태에 있는 동안 메인 프로세서(121)와 함께, 전자 장치(101)의 구성요소들 중 적어도 하나의 구성요소(예: 디스플레이 모듈(160), 센서 모듈(176), 또는 통신 모듈(190))와 관련된 기능 또는 상태들의 적어도 일부를 제어할 수 있다. 일실시예에 따르면, 보조 프로세서(123)(예: 이미지 시그널 프로세서 또는 커뮤니케이션 프로세서)는 기능적으로 관련 있는 다른 구성요소(예: 카메라 모듈(180) 또는 통신 모듈(190))의 일부로서 구현될 수 있다. 일실시예에 따르면, 보조 프로세서(123)(예: 신경망 처리 장치)는 인공지능 모델의 처리에 특화된 하드웨어 구조를 포함할 수 있다. The auxiliary processor 123 may, for example, act on behalf of the main processor 121 while the main processor 121 is in an inactive (e.g., sleep) state, or while the main processor 121 is in an active (e.g., application execution) state. ), together with the main processor 121, at least one of the components of the electronic device 101 (e.g., the display module 160, the sensor module 176, or the communication module 190) At least some of the functions or states related to can be controlled. According to one embodiment, co-processor 123 (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module 180 or communication module 190). there is. According to one embodiment, the auxiliary processor 123 (eg, neural network processing unit) may include a hardware structure specialized for processing artificial intelligence models.

인공지능 모델은 기계 학습을 통해 생성될 수 있다. 이러한 학습은, 예를 들어, 인공지능 모델이 수행되는 전자 장치(101) 자체에서 수행될 수 있고, 별도의 서버(예: 서버(108))를 통해 수행될 수도 있다. 학습 알고리즘은, 예를 들어, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)을 포함할 수 있으나, 전술한 예에 한정되지 않는다. 인공지능 모델은, 복수의 인공 신경망 레이어들을 포함할 수 있다. 인공 신경망은 심층 신경망(DNN: deep neural network), CNN(convolutional neural network), RNN(recurrent neural network), RBM(restricted boltzmann machine), DBN(deep belief network), BRDNN(bidirectional recurrent deep neural network), 심층 Q-네트워크(deep Q-networks) 또는 상기 중 둘 이상의 조합 중 하나일 수 있으나, 전술한 예에 한정되지 않는다. 인공지능 모델은 하드웨어 구조 이외에, 추가적으로 또는 대체적으로, 소프트웨어 구조를 포함할 수 있다.Artificial intelligence models can be created through machine learning. For example, such learning may be performed in the electronic device 101 itself on which the artificial intelligence model is performed, or may be performed through a separate server (e.g., server 108). Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited. An artificial intelligence model may include multiple artificial neural network layers. Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above. In addition to hardware structures, artificial intelligence models may additionally or alternatively include software structures.

메모리(130)는, 전자 장치(101)의 적어도 하나의 구성요소(예: 프로세서(120) 또는 센서 모듈(176))에 의해 사용되는 다양한 데이터를 저장할 수 있다. 데이터는, 예를 들어, 소프트웨어(예: 프로그램(140)) 및, 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 포함할 수 있다. 메모리(130)는, 휘발성 메모리(132) 또는 비휘발성 메모리(134)를 포함할 수 있다. The memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101. Data may include, for example, input data or output data for software (e.g., program 140) and instructions related thereto. Memory 130 may include volatile memory 132 or non-volatile memory 134.

프로그램(140)은 메모리(130)에 소프트웨어로서 저장될 수 있으며, 예를 들면, 운영 체제(142), 미들 웨어(144) 또는 어플리케이션(146)을 포함할 수 있다. The program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142, middleware 144, or application 146.

입력 모듈(150)은, 전자 장치(101)의 구성요소(예: 프로세서(120))에 사용될 명령 또는 데이터를 전자 장치(101)의 외부(예: 사용자)로부터 수신할 수 있다. 입력 모듈(150)은, 예를 들면, 마이크, 마우스, 키보드, 키(예: 버튼), 또는 디지털 펜(예: 스타일러스 펜)을 포함할 수 있다. The input module 150 may receive commands or data to be used in a component of the electronic device 101 (e.g., the processor 120) from outside the electronic device 101 (e.g., a user). The input module 150 may include, for example, a microphone, mouse, keyboard, keys (eg, buttons), or digital pen (eg, stylus pen).

음향 출력 모듈(155)은 음향 신호를 전자 장치(101)의 외부로 출력할 수 있다. 음향 출력 모듈(155)은, 예를 들면, 스피커 또는 리시버를 포함할 수 있다. 스피커는 멀티미디어 재생 또는 녹음 재생과 같이 일반적인 용도로 사용될 수 있다. 리시버는 착신 전화를 수신하기 위해 사용될 수 있다. 일실시예에 따르면, 리시버는 스피커와 별개로, 또는 그 일부로서 구현될 수 있다.The sound output module 155 may output sound signals to the outside of the electronic device 101. The sound output module 155 may include, for example, a speaker or a receiver. Speakers can be used for general purposes such as multimedia playback or recording playback. The receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.

디스플레이 모듈(160)은 전자 장치(101)의 외부(예: 사용자)로 정보를 시각적으로 제공할 수 있다. 디스플레이 모듈(160)은, 예를 들면, 디스플레이, 홀로그램 장치, 또는 프로젝터 및 해당 장치를 제어하기 위한 제어 회로를 포함할 수 있다. 일실시예에 따르면, 디스플레이 모듈(160)은 터치를 감지하도록 설정된 터치 센서, 또는 상기 터치에 의해 발생되는 힘의 세기를 측정하도록 설정된 압력 센서를 포함할 수 있다. The display module 160 can visually provide information to the outside of the electronic device 101 (eg, a user). The display module 160 may include, for example, a display, a hologram device, or a projector, and a control circuit for controlling the device. According to one embodiment, the display module 160 may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of force generated by the touch.

오디오 모듈(170)은 소리를 전기 신호로 변환시키거나, 반대로 전기 신호를 소리로 변환시킬 수 있다. 일실시예에 따르면, 오디오 모듈(170)은, 입력 모듈(150)을 통해 소리를 획득하거나, 음향 출력 모듈(155), 또는 전자 장치(101)와 직접 또는 무선으로 연결된 외부 전자 장치(예: 전자 장치(102))(예: 스피커 또는 헤드폰)를 통해 소리를 출력할 수 있다.The audio module 170 can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device (e.g., directly or wirelessly connected to the electronic device 101). Sound may be output through the electronic device 102 (e.g., speaker or headphone).

센서 모듈(176)은 전자 장치(101)의 작동 상태(예: 전력 또는 온도), 또는 외부의 환경 상태(예: 사용자 상태)를 감지하고, 감지된 상태에 대응하는 전기 신호 또는 데이터 값을 생성할 수 있다. 일실시예에 따르면, 센서 모듈(176)은, 예를 들면, 제스처 센서, 자이로 센서, 기압 센서, 마그네틱 센서, 가속도 센서, 그립 센서, 근접 센서, 컬러 센서, IR(infrared) 센서, 생체 센서, 온도 센서, 습도 센서, 또는 조도 센서를 포함할 수 있다. The sensor module 176 detects the operating state (e.g., power or temperature) of the electronic device 101 or the external environmental state (e.g., user state) and generates an electrical signal or data value corresponding to the detected state. can do. According to one embodiment, the sensor module 176 includes, for example, a gesture sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, humidity sensor, or light sensor.

인터페이스(177)는 전자 장치(101)가 외부 전자 장치(예: 전자 장치(102))와 직접 또는 무선으로 연결되기 위해 사용될 수 있는 하나 이상의 지정된 프로토콜들을 지원할 수 있다. 일실시예에 따르면, 인터페이스(177)는, 예를 들면, HDMI(high definition multimedia interface), USB(universal serial bus) 인터페이스, SD카드 인터페이스, 또는 오디오 인터페이스를 포함할 수 있다.The interface 177 may support one or more designated protocols that can be used to connect the electronic device 101 directly or wirelessly with an external electronic device (eg, the electronic device 102). According to one embodiment, the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

연결 단자(178)는, 그를 통해서 전자 장치(101)가 외부 전자 장치(예: 전자 장치(102))와 물리적으로 연결될 수 있는 커넥터를 포함할 수 있다. 일실시예에 따르면, 연결 단자(178)는, 예를 들면, HDMI 커넥터, USB 커넥터, SD 카드 커넥터, 또는 오디오 커넥터(예: 헤드폰 커넥터)를 포함할 수 있다.The connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the electronic device 102). According to one embodiment, the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).

햅틱 모듈(179)은 전기적 신호를 사용자가 촉각 또는 운동 감각을 통해서 인지할 수 있는 기계적인 자극(예: 진동 또는 움직임) 또는 전기적인 자극으로 변환할 수 있다. 일실시예에 따르면, 햅틱 모듈(179)은, 예를 들면, 모터, 압전 소자, 또는 전기 자극 장치를 포함할 수 있다.The haptic module 179 can convert electrical signals into mechanical stimulation (e.g., vibration or movement) or electrical stimulation that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.

카메라 모듈(180)은 정지 영상 및 영상을 촬영할 수 있다. 일실시예에 따르면, 카메라 모듈(180)은 하나 이상의 렌즈들, 이미지 센서들, 이미지 시그널 프로세서들, 또는 플래시들을 포함할 수 있다.The camera module 180 can capture still images and videos. According to one embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.

전력 관리 모듈(188)은 전자 장치(101)에 공급되는 전력을 관리할 수 있다. 일실시예에 따르면, 전력 관리 모듈(188)은, 예를 들면, PMIC(power management integrated circuit)의 적어도 일부로서 구현될 수 있다.The power management module 188 can manage power supplied to the electronic device 101. According to one embodiment, the power management module 188 may be implemented as at least a part of, for example, a power management integrated circuit (PMIC).

배터리(189)는 전자 장치(101)의 적어도 하나의 구성요소에 전력을 공급할 수 있다. 일실시예에 따르면, 배터리(189)는, 예를 들면, 재충전 불가능한 1차 전지, 재충전 가능한 2차 전지 또는 연료 전지를 포함할 수 있다.The battery 189 may supply power to at least one component of the electronic device 101. According to one embodiment, the battery 189 may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

통신 모듈(190)은 전자 장치(101)와 외부 전자 장치(예: 전자 장치(102), 전자 장치(104), 또는 서버(108)) 간의 직접(예: 유선) 통신 채널 또는 무선 통신 채널의 수립, 및 수립된 통신 채널을 통한 통신 수행을 지원할 수 있다. 통신 모듈(190)은 프로세서(120)(예: 어플리케이션 프로세서)와 독립적으로 운영되고, 직접(예: 유선) 통신 또는 무선 통신을 지원하는 하나 이상의 커뮤니케이션 프로세서를 포함할 수 있다. 일실시예에 따르면, 통신 모듈(190)은 무선 통신 모듈(192)(예: 셀룰러 통신 모듈, 근거리 무선 통신 모듈, 또는 GNSS(global navigation satellite system) 통신 모듈) 또는 유선 통신 모듈(194)(예: LAN(local area network) 통신 모듈, 또는 전력선 통신 모듈)을 포함할 수 있다. 이들 통신 모듈 중 해당하는 통신 모듈은 제 1 네트워크(198)(예: 블루투스, WiFi(wireless fidelity) direct 또는 IrDA(infrared data association)와 같은 근거리 통신 네트워크) 또는 제 2 네트워크(199)(예: 레거시 셀룰러 네트워크, 5G 네트워크, 차세대 통신 네트워크, 인터넷, 또는 컴퓨터 네트워크(예: LAN 또는 WAN)와 같은 원거리 통신 네트워크)를 통하여 외부의 전자 장치(104)와 통신할 수 있다. 이런 여러 종류의 통신 모듈들은 하나의 구성요소(예: 단일 칩)로 통합되거나, 또는 서로 별도의 복수의 구성요소들(예: 복수 칩들)로 구현될 수 있다. 무선 통신 모듈(192)은 가입자 식별 모듈(196)에 저장된 가입자 정보(예: 국제 모바일 가입자 식별자(IMSI))를 이용하여 제 1 네트워크(198) 또는 제 2 네트워크(199)와 같은 통신 네트워크 내에서 전자 장치(101)를 확인 또는 인증할 수 있다. Communication module 190 is configured to provide a direct (e.g., wired) communication channel or wireless communication channel between electronic device 101 and an external electronic device (e.g., electronic device 102, electronic device 104, or server 108). It can support establishment and communication through established communication channels. Communication module 190 operates independently of processor 120 (e.g., an application processor) and may include one or more communication processors that support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module 190 is a wireless communication module 192 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (e.g., : LAN (local area network) communication module, or power line communication module) may be included. Among these communication modules, the corresponding communication module is a first network 198 (e.g., a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (e.g., legacy It may communicate with an external electronic device 104 through a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN). These various types of communication modules may be integrated into one component (e.g., a single chip) or may be implemented as a plurality of separate components (e.g., multiple chips). The wireless communication module 192 uses subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199. The electronic device 101 can be confirmed or authenticated.

무선 통신 모듈(192)은 4G 네트워크 이후의 5G 네트워크 및 차세대 통신 기술, 예를 들어, NR 접속 기술(new radio access technology)을 지원할 수 있다. NR 접속 기술은 고용량 데이터의 고속 전송(eMBB(enhanced mobile broadband)), 단말 전력 최소화와 다수 단말의 접속(mMTC(massive machine type communications)), 또는 고신뢰도와 저지연(URLLC(ultra-reliable and low-latency communications))을 지원할 수 있다. 무선 통신 모듈(192)은, 예를 들어, 높은 데이터 전송률 달성을 위해, 고주파 대역(예: mmWave 대역)을 지원할 수 있다. 무선 통신 모듈(192)은 고주파 대역에서의 성능 확보를 위한 다양한 기술들, 예를 들어, 빔포밍(beamforming), 거대 배열 다중 입출력(massive MIMO(multiple-input and multiple-output)), 전차원 다중입출력(FD-MIMO: full dimensional MIMO), 어레이 안테나(array antenna), 아날로그 빔형성(analog beam-forming), 또는 대규모 안테나(large scale antenna)와 같은 기술들을 지원할 수 있다. 무선 통신 모듈(192)은 전자 장치(101), 외부 전자 장치(예: 전자 장치(104)) 또는 네트워크 시스템(예: 제 2 네트워크(199))에 규정되는 다양한 요구사항을 지원할 수 있다. 일실시예에 따르면, 무선 통신 모듈(192)은 eMBB 실현을 위한 Peak data rate(예: 20Gbps 이상), mMTC 실현을 위한 손실 Coverage(예: 164dB 이하), 또는 URLLC 실현을 위한 U-plane latency(예: 다운링크(DL) 및 업링크(UL) 각각 0.5ms 이하, 또는 라운드 트립 1ms 이하)를 지원할 수 있다.The wireless communication module 192 may support 5G networks after 4G networks and next-generation communication technologies, for example, NR access technology (new radio access technology). NR access technology provides high-speed transmission of high-capacity data (eMBB (enhanced mobile broadband)), minimization of terminal power and access to multiple terminals (mMTC (massive machine type communications)), or high reliability and low latency (URLLC (ultra-reliable and low latency). -latency communications)) can be supported. The wireless communication module 192 may support high frequency bands (eg, mmWave bands), for example, to achieve high data rates. The wireless communication module 192 uses various technologies to secure performance in high frequency bands, for example, beamforming, massive array multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. It can support technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna. The wireless communication module 192 may support various requirements specified in the electronic device 101, an external electronic device (e.g., electronic device 104), or a network system (e.g., second network 199). According to one embodiment, the wireless communication module 192 supports Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mmTC, or U-plane latency (e.g., 164 dB or less) for realizing URLLC. Example: Downlink (DL) and uplink (UL) each of 0.5 ms or less, or round trip 1 ms or less) can be supported.

안테나 모듈(197)은 신호 또는 전력을 외부(예: 외부의 전자 장치)로 송신하거나 외부로부터 수신할 수 있다. 일실시예에 따르면, 안테나 모듈(197)은 서브스트레이트(예: PCB) 위에 형성된 도전체 또는 도전성 패턴으로 이루어진 방사체를 포함하는 안테나를 포함할 수 있다. 일실시예에 따르면, 안테나 모듈(197)은 복수의 안테나들(예: 어레이 안테나)을 포함할 수 있다. 이런 경우, 제 1 네트워크(198) 또는 제 2 네트워크(199)와 같은 통신 네트워크에서 사용되는 통신 방식에 적합한 적어도 하나의 안테나가, 예를 들면, 통신 모듈(190)에 의하여 상기 복수의 안테나들로부터 선택될 수 있다. 신호 또는 전력은 상기 선택된 적어도 하나의 안테나를 통하여 통신 모듈(190)과 외부의 전자 장치 간에 송신되거나 수신될 수 있다. 어떤 실시예에 따르면, 방사체 이외에 다른 부품(예: RFIC(radio frequency integrated circuit))이 추가로 안테나 모듈(197)의 일부로 형성될 수 있다. The antenna module 197 may transmit or receive signals or power to or from the outside (eg, an external electronic device). According to one embodiment, the antenna module 197 may include an antenna including a radiator made of a conductor or a conductive pattern formed on a substrate (eg, PCB). According to one embodiment, the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is connected to the plurality of antennas by, for example, the communication module 190. can be selected Signals or power may be transmitted or received between the communication module 190 and an external electronic device through the at least one selected antenna. According to some embodiments, in addition to the radiator, other components (eg, radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module 197.

다양한 실시예에 따르면, 안테나 모듈(197)은 mmWave 안테나 모듈을 형성할 수 있다. 일실시예에 따르면, mmWave 안테나 모듈은 인쇄 회로 기판, 상기 인쇄 회로 기판의 제 1 면(예: 아래 면)에 또는 그에 인접하여 배치되고 지정된 고주파 대역(예: mmWave 대역)을 지원할 수 있는 RFIC, 및 상기 인쇄 회로 기판의 제 2 면(예: 윗 면 또는 측 면)에 또는 그에 인접하여 배치되고 상기 지정된 고주파 대역의 신호를 송신 또는 수신할 수 있는 복수의 안테나들(예: 어레이 안테나)을 포함할 수 있다.According to various embodiments, the antenna module 197 may form a mmWave antenna module. According to one embodiment, a mmWave antenna module includes: a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed circuit board and capable of transmitting or receiving signals in the designated high frequency band. can do.

상기 구성요소들 중 적어도 일부는 주변 기기들간 통신 방식(예: 버스, GPIO(general purpose input and output), SPI(serial peripheral interface), 또는 MIPI(mobile industry processor interface))을 통해 서로 연결되고 신호(예: 명령 또는 데이터)를 상호간에 교환할 수 있다.At least some of the components are connected to each other through a communication method between peripheral devices (e.g., bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)) and signal ( (e.g. commands or data) can be exchanged with each other.

일실시예에 따르면, 명령 또는 데이터는 제 2 네트워크(199)에 연결된 서버(108)를 통해서 전자 장치(101)와 외부의 전자 장치(104)간에 송신 또는 수신될 수 있다. 외부의 전자 장치(102, 또는 104) 각각은 전자 장치(101)와 동일한 또는 다른 종류의 장치일 수 있다. 일실시예에 따르면, 전자 장치(101)에서 실행되는 동작들의 전부 또는 일부는 외부의 전자 장치들(102, 104, 또는 108) 중 하나 이상의 외부의 전자 장치들에서 실행될 수 있다. 예를 들면, 전자 장치(101)가 어떤 기능이나 서비스를 자동으로, 또는 사용자 또는 다른 장치로부터의 요청에 반응하여 수행해야 할 경우에, 전자 장치(101)는 기능 또는 서비스를 자체적으로 실행시키는 대신에 또는 추가적으로, 하나 이상의 외부의 전자 장치들에게 그 기능 또는 그 서비스의 적어도 일부를 수행하라고 요청할 수 있다. 상기 요청을 수신한 하나 이상의 외부의 전자 장치들은 요청된 기능 또는 서비스의 적어도 일부, 또는 상기 요청과 관련된 추가 기능 또는 서비스를 실행하고, 그 실행의 결과를 전자 장치(101)로 전달할 수 있다. 전자 장치(101)는 상기 결과를, 그대로 또는 추가적으로 처리하여, 상기 요청에 대한 응답의 적어도 일부로서 제공할 수 있다. 이를 위하여, 예를 들면, 클라우드 컴퓨팅, 분산 컴퓨팅, 모바일 에지 컴퓨팅(MEC: mobile edge computing), 또는 클라이언트-서버 컴퓨팅 기술이 이용될 수 있다. 전자 장치(101)는, 예를 들어, 분산 컴퓨팅 또는 모바일 에지 컴퓨팅을 이용하여 초저지연 서비스를 제공할 수 있다. 다른 실시예에 있어서, 외부의 전자 장치(104)는 IoT(internet of things) 기기를 포함할 수 있다. 서버(108)는 기계 학습 및/또는 신경망을 이용한 지능형 서버일 수 있다. 일실시예에 따르면, 외부의 전자 장치(104) 또는 서버(108)는 제 2 네트워크(199) 내에 포함될 수 있다. 전자 장치(101)는 5G 통신 기술 및 IoT 관련 기술을 기반으로 지능형 서비스(예: 스마트 홈, 스마트 시티, 스마트 카, 또는 헬스 케어)에 적용될 수 있다. According to one embodiment, commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199. Each of the external electronic devices 102 or 104 may be of the same or different type as the electronic device 101. According to one embodiment, all or part of the operations performed in the electronic device 101 may be executed in one or more of the external electronic devices 102, 104, or 108. For example, when the electronic device 101 needs to perform a certain function or service automatically or in response to a request from a user or another device, the electronic device 101 may perform the function or service instead of executing the function or service on its own. Alternatively, or additionally, one or more external electronic devices may be requested to perform at least part of the function or service. One or more external electronic devices that have received the request may execute at least part of the requested function or service, or an additional function or service related to the request, and transmit the result of the execution to the electronic device 101. The electronic device 101 may process the result as is or additionally and provide it as at least part of a response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology can be used. The electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device 104 may include an Internet of Things (IoT) device. Server 108 may be an intelligent server using machine learning and/or neural networks. According to one embodiment, the external electronic device 104 or server 108 may be included in the second network 199. The electronic device 101 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

서버(108)는 전자 장치(101)가 접속되며, 접속된 전자 장치(101)로 서비스를 제공할 수 있다. 또한, 서버(108)는 회원 가입 절차를 진행하여 그에 따라 회원으로 가입된 사용자의 각종 정보를 저장하여 관리하고, 서비스에 관련된 각종 구매 및 결제 기능을 제공할 수도 있다. 또한, 서버(108)는, 사용자 간에 서비스를 공유할 수 있도록, 복수의 전자 장치(101) 각각에서 실행되는 서비스 애플리케이션의 실행 데이터를 실시간으로 공유할 수도 있다. 이러한 서버(108)는 하드웨어적으로는 통상적인 웹 서버(Web Server) 또는 서비스 서버(Service Server)와 동일한 구성을 가질 수 있다. 그러나, 소프트웨어적으로는, C, C++, Java, Python, Golang, kotlin 등 여하한 언어를 통하여 구현되어 여러 가지 기능을 하는 프로그램 모듈(Module)을 포함할 수 있다. 또한, 서버(108)는 일반적으로 인터넷과 같은 개방형 컴퓨터 네트워크를 통하여 불특정 다수 클라이언트 및/또는 다른 서버와 연결되어 있고, 클라이언트 또는 다른 서버의 작업수행 요청을 접수하고 그에 대한 작업 결과를 도출하여 제공하는 컴퓨터 시스템 및 그를 위하여 설치되어 있는 컴퓨터 소프트웨어(서버 프로그램)를 뜻하는 것이다. 또한, 서버(108)는, 전술한 서버 프로그램 이외에도, 서버(108) 상에서 동작하는 일련의 응용 프로그램(Application Program)과 경우에 따라서는 내부 또는 외부에 구축되어 있는 각종 데이터베이스(DB: Database, 이하 "DB"라 칭함)를 포함하는 넓은 개념으로 이해되어야 할 것이다. 따라서, 서버(108)는, 회원 가입 정보와, 게임에 대한 각종 정보 및 데이터를 분류하여 DB에 저장시키고 관리하는데, 이러한 DB는 서버(108)의 내부 또는 외부에 구현될 수 있다. 또한, 서버(108)는, 일반적인 서버용 하드웨어에 윈도우(windows), 리눅스(Linux), 유닉스(UNIX), 매킨토시(Macintosh) 등의 운영체제에 따라 다양하게 제공되고 있는 서버 프로그램을 이용하여 구현될 수 있으며, 대표적인 것으로는 윈도우 환경에서 사용되는 IIS(Internet Information Server)와 유닉스환경에서 사용되는 CERN, NCSA, APPACH, TOMCAT 등을 이용하여 웹 서비스를 구현할 수 있다. 또한, 서버(108)는, 서비스의 사용자 인증이나 서비스와 관련된 구매 결제를 위한 인증 시스템 및 결제 시스템과 연동할 수도 있다.The server 108 is connected to the electronic device 101 and can provide services to the connected electronic device 101. In addition, the server 108 may perform a membership registration process, store and manage various information of users who have registered as members, and provide various purchase and payment functions related to the service. Additionally, the server 108 may share execution data of service applications running on each of the plurality of electronic devices 101 in real time so that services can be shared between users. This server 108 may have the same hardware configuration as a typical web server or service server. However, in terms of software, it may be implemented through any language such as C, C++, Java, Python, Golang, and Kotlin and may include program modules that perform various functions. In addition, the server 108 is generally connected to an unspecified number of clients and/or other servers through an open computer network such as the Internet, and receives work performance requests from clients or other servers and derives and provides work results in response. It refers to a computer system and the computer software (server program) installed for it. In addition, in addition to the server program described above, the server 108 includes a series of application programs running on the server 108 and, in some cases, various databases (DBs) built internally or externally, hereinafter " It should be understood as a broad concept including “DB”). Accordingly, the server 108 classifies membership registration information and various information and data about games, stores them in a DB, and manages this DB, which may be implemented inside or outside the server 108. In addition, the server 108 can be implemented using a variety of server programs provided on general server hardware and operating systems such as Windows, Linux, UNIX, and Macintosh. , Representative examples include IIS (Internet Information Server) used in a Windows environment and CERN, NCSA, APPACH, and TOMCAT used in a Unix environment, etc., to implement web services. Additionally, the server 108 may be linked with an authentication system and payment system for user authentication of the service or payment for purchases related to the service.

제1 네트워크(198) 및 제2 네트워크(199)는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조 또는 서버(108)와 전자 장치들(101, 104)을 연결하는 망(Network)을 의미한다. 제1 네트워크(198) 및 제2 네트워크(199)는 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 3G, 4G, LTE, 5G, Wi-Fi 등이 포함되나 이에 한정되지는 않는다. 제1 네트워크(198) 및 제2 네트워크(199)는 LAN, WAN 등의 폐쇄형 제1 네트워크(198) 및 제2 네트워크(199)일 수도 있으나, 인터넷(Internet)과 같은 개방형인 것이 바람직하다. 인터넷은 TCP/IP 프로토콜, TCP, UDP(user datagram protocol) 등의 프로토콜 및 그 상위계층에 존재하는 여러 서비스, 즉 HTTP(HyperText Transfer Protocol), Telnet, FTP(File Transfer Protocol), DNS(Domain Name System), SMTP(Simple Mail Transfer Protocol), SNMP(Simple Network Management Protocol), NFS(Network File Service), NIS(Network Information Service)를 제공하는 전 세계적인 개방형 컴퓨터 제1 네트워크(198) 및 제2 네트워크(199) 구조를 의미한다.The first network 198 and the second network 199 are a connection structure that allows information exchange between each node, such as terminals and servers, or a network connecting the server 108 and the electronic devices 101 and 104. It means (Network). The first network 198 and the second network 199 are the Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), and 3G. , 4G, LTE, 5G, Wi-Fi, etc., but are not limited to these. The first network 198 and the second network 199 may be closed, such as a LAN or WAN, but are preferably open, such as the Internet. The Internet includes protocols such as TCP/IP protocol, TCP, and UDP (user datagram protocol), as well as various services that exist at the upper layer, such as HTTP (HyperText Transfer Protocol), Telnet, FTP (File Transfer Protocol), and DNS (Domain Name System). ), a worldwide open computer primary network (198) and secondary network (199) that provides Simple Mail Transfer Protocol (SMTP), Simple Network Management Protocol (SNMP), Network File Service (NFS), and Network Information Service (NIS). ) refers to the structure.

데이터베이스는 데이터베이스 관리 프로그램(DBMS)을 이용하여 컴퓨터 시스템의 저장공간(하드디스크 또는 메모리)에 구현된 일반적인 데이터구조를 가질 수 있다. 데이터베이스는 데이터의 검색(추출), 삭제, 편집, 추가 등을 자유롭게 행할 수 있는 데이터 저장형태를 가질 수 있다. 데이터베이스는 오라클(Oracle), 인포믹스(Infomix), 사이베이스(Sybase), DB2와 같은 관계형 데이타베이스 관리 시스템(RDBMS)이나, 겜스톤(Gemston), 오리온(Orion), O2 등과 같은 객체 지향 데이타베이스 관리 시스템(OODBMS) 및 엑셀론(Excelon), 타미노(Tamino), 세카이주(Sekaiju) 등의 XML 전용 데이터베이스(XML Native Database)를 이용하여 본 개시의 일 실시예의 목적에 맞게 구현될 수 있고, 자신의 기능을 달성하기 위하여 적당한 필드(Field) 또는 엘리먼트들을 가질 수 있다.A database can have a general data structure implemented in the storage space (hard disk or memory) of a computer system using a database management program (DBMS). A database may have a data storage format that allows for free search (extraction), deletion, editing, addition, etc. of data. Databases are relational database management systems (RDBMS) such as Oracle, Infomix, Sybase, and DB2, or object-oriented database management such as Gemston, Orion, and O2. It can be implemented according to the purpose of an embodiment of the present disclosure using a system (OODBMS) and an XML native database such as Excelon, Tamino, and Sekaiju, and has its own functions. To achieve this, you can have appropriate fields or elements.

도 2는 일 실시예에 따른 프로그램의 구성을 나타내는 도면이다.Figure 2 is a diagram showing the configuration of a program according to one embodiment.

도 2은 다양한 실시예에 따른 프로그램(140)을 예시하는 블록도(200)이다. 일실시예에 따르면, 프로그램(140)은 전자 장치(101)의 하나 이상의 리소스들을 제어하기 위한 운영 체제(142), 미들웨어(144), 또는 상기 운영 체제(142)에서 실행 가능한 어플리케이션(146)을 포함할 수 있다. 운영 체제(142)는, 예를 들면, AndroidTM, iOSTM, WindowsTM, SymbianTM, TizenTM, 또는 BadaTM를 포함할 수 있다. 프로그램(140) 중 적어도 일부 프로그램은, 예를 들면, 제조 시에 전자 장치(101)에 프리로드되거나, 또는 사용자에 의해 사용 시 외부 전자 장치(예: 전자 장치(102 또는 104), 또는 서버(108))로부터 다운로드되거나 갱신 될 수 있다. 프로그램(140)의 전부 또는 일부는 뉴럴 네트워크를 포함할 수 있다. Figure 2 is a block diagram 200 illustrating program 140 according to various embodiments. According to one embodiment, the program 140 includes an operating system 142, middleware 144, or an application 146 executable on the operating system 142 for controlling one or more resources of the electronic device 101. It can be included. Operating system 142 may include, for example, AndroidTM, iOSTM, WindowsTM, SymbianTM, TizenTM, or BadaTM. At least some of the programs 140 are preloaded into the electronic device 101, for example, at the time of manufacture, or are stored in an external electronic device (e.g., the electronic device 102 or 104, or a server) when used by a user. It can be downloaded or updated from 108)). All or part of the program 140 may include a neural network.

운영 체제(142)는 전자 장치(101)의 하나 이상의 시스템 리소스들(예: 프로세스, 메모리, 또는 전원)의 관리(예: 할당 또는 회수)를 제어할 수 있다. 운영 체제(142)는, 추가적으로 또는 대체적으로, 전자 장치(101)의 다른 하드웨어 디바이스, 예를 들면, 입력 모듈(150), 음향 출력 모듈(155), 디스플레이 모듈(160), 오디오 모듈(170), 센서 모듈(176), 인터페이스(177), 햅틱 모듈(179), 카메라 모듈(180), 전력 관리 모듈(188), 배터리(189), 통신 모듈(190), 가입자 식별 모듈(196), 또는 안테나 모듈(197)을 구동하기 위한 하나 이상의 드라이버 프로그램들을 포함할 수 있다.The operating system 142 may control management (eg, allocation or retrieval) of one or more system resources (eg, process, memory, or power) of the electronic device 101 . Operating system 142 may additionally or alternatively operate on other hardware devices of electronic device 101, such as input module 150, audio output module 155, display module 160, and audio module 170. , sensor module 176, interface 177, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196, or It may include one or more driver programs for driving the antenna module 197.

미들웨어(144)는 전자 장치(101)의 하나 이상의 리소스들로부터 제공되는 기능 또는 정보가 어플리케이션(146)에 의해 사용될 수 있도록 다양한 기능들을 어플리케이션(146)으로 제공할 수 있다. 미들웨어(144)는, 예를 들면, 어플리케이션 매니저(201), 윈도우 매니저(203), 멀티미디어 매니저(205), 리소스 매니저(207), 파워 매니저(209), 데이터베이스 매니저(211), 패키지 매니저(213), 커넥티비티 매니저(215), 노티피케이션 매니저(217), 로케이션 매니저(219), 그래픽 매니저(221), 시큐리티 매니저(223), 통화 매니저(225), 또는 음성 인식 매니저(227)를 포함할 수 있다. The middleware 144 may provide various functions to the application 146 so that functions or information provided from one or more resources of the electronic device 101 can be used by the application 146. The middleware 144 includes, for example, an application manager 201, a window manager 203, a multimedia manager 205, a resource manager 207, a power manager 209, a database manager 211, and a package manager 213. ), connectivity manager (215), notification manager (217), location manager (219), graphics manager (221), security manager (223), call manager (225), or voice recognition manager (227). You can.

어플리케이션 매니저(201)는, 예를 들면, 어플리케이션(146)의 생명 주기를 관리할 수 있다. 윈도우 매니저(203)는, 예를 들면, 화면에서 사용되는 하나 이상의 GUI 자원들을 관리할 수 있다. 멀티미디어 매니저(205)는, 예를 들면, 미디어 파일들의 재생에 필요한 하나 이상의 포맷들을 파악하고, 그 중 선택된 해당하는 포맷에 맞는 코덱을 이용하여 상기 미디어 파일들 중 해당하는 미디어 파일의 인코딩 또는 디코딩을 수행할 수 있다. 리소스 매니저(207)는, 예를 들면, 어플리케이션(146)의 소스 코드 또는 메모리(130)의 메모리의 공간을 관리할 수 있다. 파워 매니저(209)는, 예를 들면, 배터리(189)의 용량, 온도 또는 전원을 관리하고, 이 중 해당 정보를 이용하여 전자 장치(101)의 동작에 필요한 관련 정보를 결정 또는 제공할 수 있다. 일실시예에 따르면, 파워 매니저(209)는 전자 장치(101)의 바이오스(BIOS: basic input/output system)(미도시)와 연동할 수 있다.The application manager 201 may, for example, manage the life cycle of the application 146. The window manager 203 may, for example, manage one or more GUI resources used on the screen. For example, the multimedia manager 205 identifies one or more formats required for playing media files, and encodes or decodes the corresponding media file using a codec suitable for the selected format. It can be done. The resource manager 207 may, for example, manage the source code of the application 146 or the memory space of the memory 130. The power manager 209 manages, for example, the capacity, temperature, or power of the battery 189, and may use this information to determine or provide related information necessary for the operation of the electronic device 101. . According to one embodiment, the power manager 209 may interface with a basic input/output system (BIOS) (not shown) of the electronic device 101.

데이터베이스 매니저(211)는, 예를 들면, 어플리케이션(146)에 의해 사용될 데이터베이스를 생성, 검색, 또는 변경할 수 있다. 패키지 매니저(213)는, 예를 들면, 패키지 파일의 형태로 배포되는 어플리케이션의 설치 또는 갱신을 관리할 수 있다. 커넥티비티 매니저(215)는, 예를 들면, 전자 장치(101)와 외부 전자 장치 간의 무선 연결 또는 직접 연결을 관리할 수 있다. 노티피케이션 매니저(217)는, 예를 들면, 지정된 이벤트(예: 착신 통화, 메시지, 또는 알람)의 발생을 사용자에게 알리기 위한 기능을 제공할 수 있다. 로케이션 매니저(219)는, 예를 들면, 전자 장치(101)의 위치 정보를 관리할 수 있다. 그래픽 매니저(221)는, 예를 들면, 사용자에게 제공될 하나 이상의 그래픽 효과들 또는 이와 관련된 사용자 인터페이스를 관리할 수 있다. Database manager 211 may create, search, or change a database to be used by application 146, for example. The package manager 213 may, for example, manage the installation or update of applications distributed in the form of package files. The connectivity manager 215 may manage, for example, a wireless connection or direct connection between the electronic device 101 and an external electronic device. For example, the notification manager 217 may provide a function for notifying the user of the occurrence of a designated event (eg, an incoming call, message, or alarm). The location manager 219 may, for example, manage location information of the electronic device 101. The graphics manager 221 may, for example, manage one or more graphic effects to be provided to the user or a user interface related thereto.

시큐리티 매니저(223)는, 예를 들면, 시스템 보안 또는 사용자 인증을 제공할 수 있다. 통화(telephony) 매니저(225)는, 예를 들면, 전자 장치(101)에 의해 제공되는 음성 통화 기능 또는 영상 통화 기능을 관리할 수 있다. 음성 인식 매니저(227)는, 예를 들면, 사용자의 음성 데이터를 서버(108)로 전송하고, 그 음성 데이터에 적어도 일부 기반하여 전자 장치(101)에서 수행될 기능에 대응하는 명령어(command), 또는 그 음성 데이터에 적어도 일부 기반하여 변환된 문자 데이터를 서버(108)로부터 수신할 수 있다. 일 실시예에 따르면, 미들웨어(244)는 동적으로 기존의 구성요소를 일부 삭제하거나 새로운 구성요소들을 추가할 수 있다. 일 실시예에 따르면, 미들웨어(144)의 적어도 일부는 운영 체제(142)의 일부로 포함되거나, 또는 운영 체제(142)와는 다른 별도의 소프트웨어로 구현될 수 있다.Security manager 223 may provide, for example, system security or user authentication. The telephony manager 225 may manage, for example, a voice call function or a video call function provided by the electronic device 101. For example, the voice recognition manager 227 transmits the user's voice data to the server 108 and provides a command corresponding to a function to be performed in the electronic device 101 based at least in part on the voice data, Alternatively, text data converted based at least in part on the voice data may be received from the server 108. According to one embodiment, the middleware 244 may dynamically delete some existing components or add new components. According to one embodiment, at least a portion of the middleware 144 may be included as part of the operating system 142 or may be implemented as separate software different from the operating system 142.

어플리케이션(146)은, 예를 들면, 홈(251), 다이얼러(253), SMS/MMS(255), IM(instant message)(257), 브라우저(259), 카메라(261), 알람(263), 컨택트(265), 음성 인식(267), 이메일(269), 달력(271), 미디어 플레이어(273), 앨범(275), 와치(277), 헬스(279)(예: 운동량 또는 혈당과 같은 생체 정보를 측정), 또는 환경 정보(281)(예: 기압, 습도, 또는 온도 정보 측정) 어플리케이션을 포함할 수 있다. 일실시예에 따르면, 어플리케이션(146)은 전자 장치(101)와 외부 전자 장치 사이의 정보 교환을 지원할 수 있는 정보 교환 어플리케이션(미도시)을 더 포함할 수 있다. 정보 교환 어플리케이션은, 예를 들면, 외부 전자 장치로 지정된 정보 (예: 통화, 메시지, 또는 알람)를 전달하도록 설정된 노티피케이션 릴레이 어플리케이션, 또는 외부 전자 장치를 관리하도록 설정된 장치 관리 어플리케이션을 포함할 수 있다. 노티피케이션 릴레이 어플리케이션은, 예를 들면, 전자 장치(101)의 다른 어플리케이션(예: 이메일 어플리케이션(269))에서 발생된 지정된 이벤트(예: 메일 수신)에 대응하는 알림 정보를 외부 전자 장치로 전달할 수 있다. 추가적으로 또는 대체적으로, 노티피케이션 릴레이 어플리케이션은 외부 전자 장치로부터 알림 정보를 수신하여 전자 장치(101)의 사용자에게 제공할 수 있다. The application 146 includes, for example, home 251, dialer 253, SMS/MMS (255), instant message (IM) 257, browser 259, camera 261, and alarm 263. , Contacts (265), Voice Recognition (267), Email (269), Calendar (271), Media Player (273), Album (275), Watch (277), Health (279) (such as exercise amount or blood sugar) It may include applications that measure biometric information) or environmental information 281 (e.g., measure atmospheric pressure, humidity, or temperature information). According to one embodiment, the application 146 may further include an information exchange application (not shown) that can support information exchange between the electronic device 101 and an external electronic device. The information exchange application may include, for example, a notification relay application configured to deliver designated information (e.g., calls, messages, or alarms) to an external electronic device, or a device management application configured to manage the external electronic device. there is. The notification relay application, for example, transmits notification information corresponding to a specified event (e.g., mail reception) generated in another application (e.g., email application 269) of the electronic device 101 to an external electronic device. You can. Additionally or alternatively, the notification relay application may receive notification information from an external electronic device and provide it to the user of the electronic device 101.

장치 관리 어플리케이션은, 예를 들면, 전자 장치(101)와 통신하는 외부 전자 장치 또는 그 일부 구성 요소(예: 외부 전자장치의 디스플레이 모듈 또는 카메라 모듈)의 전원(예: 턴-온 또는 턴-오프) 또는 기능(예: 밝기, 해상도, 또는 포커스)을 제어할 수 있다. 장치 관리 어플리케이션은, 추가적으로 또는 대체적으로, 외부 전자 장치에서 동작하는 어플리케이션의 설치, 삭제, 또는 갱신을 지원할 수 있다.The device management application, for example, controls the power (e.g., turn-on or turn-off) of an external electronic device or some component thereof (e.g., a display module or camera module of the external electronic device) that communicates with the electronic device 101. ) or functions (such as brightness, resolution, or focus). A device management application may additionally or alternatively support installation, deletion, or update of applications running on external electronic devices.

본 명세서에 걸쳐, 뉴럴 네트워크(neural network), 신경망 네트워크, 네트워크 함수는, 동일한 의미로 사용될 수 있다. 뉴럴 네트워크는, 일반적으로 "노드"라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 "노드"들은, "뉴런(neuron)"들로 지칭될 수도 있다. 뉴럴 네트워크는, 적어도 둘 이상의 노드들을 포함하여 구성된다. 뉴럴 네트워크들을 구성하는 노드(또는 뉴런)들은 하나 이상의 "링크"에 의해 상호 연결될 수 있다.Throughout this specification, neural network, neural network, and network function may be used with the same meaning. A neural network may consist of a set of interconnected computational units, which may generally be referred to as “nodes.” These “nodes” may also be referred to as “neurons.” A neural network is composed of at least two or more nodes. Nodes (or neurons) that make up neural networks may be interconnected by one or more “links.”

뉴럴 네트워크 내에서, 링크를 통해 연결된 둘 이상의 노드들은 상대적으로 입력 노드 및 출력 노드의 관계를 형성할 수 있다. 입력 노드 및 출력 노드의 개념은 상대적인 것으로서, 하나의 노드에 대하여 출력 노드 관계에 있는 임의의 노드는 다른 노드와의 관계에서 입력 노드 관계에 있을 수 있으며, 그 역도 성립할 수 있다. 전술한 바와 같이, 입력 노드 대 출력 노드 관계는 링크를 중심으로 생성될 수 있다. 하나의 입력 노드에 하나 이상의 출력 노드가 링크를 통해 연결될 수 있으며, 그 역도 성립할 수 있다.Within a neural network, two or more nodes connected through a link can relatively form a relationship as an input node and an output node. The concepts of input node and output node are relative, and any node in an output node relationship with one node may be in an input node relationship with another node, and vice versa. As described above, input node to output node relationships can be created around links. One or more output nodes can be connected to one input node through a link, and vice versa.

하나의 링크를 통해 연결된 입력 노드 및 출력 노드 관계에서, 출력 노드는 입력 노드에 입력된 데이터에 기초하여 그 값이 결정될 수 있다. 여기서, 입력 노드와 출력 노드를 상호 연결하는 노드는 가중치를 가질 수 있다. 가중치는 가변적일 수 있으며, 뉴럴 네트워크가 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변될 수 있다. 여기서, 입력 노드와 출력 노드를 상호 연결하는 에지 또는 링크는 뉴럴 네트워크가 원하는 기능의 수행, 사용자 또는 알고리즘에 의해 가변적으로 적용될 수 있는 가중치를 갖는다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드 값을 결정할 수 있다.In a relationship between an input node and an output node connected through one link, the value of the output node may be determined based on data input to the input node. Here, nodes connecting the input node and the output node may have weights. Weights may be variable and may be varied by a user or algorithm in order for the neural network to perform a desired function. Here, the edges or links that interconnect the input nodes and output nodes have weights that can be variably applied by the user or algorithm to perform the function desired by the neural network. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. The output node value can be determined based on the weight.

전술한 바와 같이, 뉴럴 네트워크는, 둘 이상의 노드들이 하나 이상의 링크를 통해 상호연결 되어 뉴럴 네트워크 내에서 입력 노드 및 출력 노드 관계를 형성한다. 뉴럴 네트워크 내에서 노드들과 링크들의 개수 및 노드들과 링크들 사이의 연관관계, 링크들 각각에 부여된 가중치의 값에 따라, 신경망 네트워크의 특성이 결정될 수 있다. 예를 들어, 동일한 개수의 노드 및 링크들이 존재하고, 링크들 사이의 가중치 값이 상이한 두 신경망 네트워크가 존재하는 경우, 두 개의 신경망 네트워크들은 서로 상이한 것으로 인식될 수 있다.As described above, in a neural network, two or more nodes are interconnected through one or more links to form an input node and output node relationship within the neural network. The characteristics of the neural network may be determined according to the number of nodes and links in the neural network, the correlation between the nodes and links, and the value of the weight assigned to each link. For example, if there are two neural networks with the same number of nodes and links and different weight values between the links, the two neural networks may be recognized as different from each other.

한편, 본 개시의 다양한 실시예들에 있어서, 서버는 뉴럴 네트워크를 이용하여 사용자 단말에게 멀티모달 기반의 영어 학습 서비스를 제공하는 서버일 수 있다. 예를 들어, 서버는 도 1의 서버(108)를 포함할 수 있다. Meanwhile, in various embodiments of the present disclosure, the server may be a server that provides a multimodal-based English learning service to a user terminal using a neural network. For example, the server may include server 108 of FIG. 1 .

예를 들어, 서버는 사용자에게 적합한 멀티모달 기반의 영어 학습 서비스를 제공하기 위해 뉴럴 네트워크를 통해 사용자 단말에 대한 진행 방식을 결정할 수 있다. 예를 들어, 서버는 진행 방식에 따라 사용자 단말에게 멀티모달 기반의 영단어 학습 컨텐츠를 제공할 수 있다. 예를 들어, 서버는 사용자 단말로부터 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과를 반영하여, 뉴럴 네트워크를 통해 사용자 단말에 대한 새로운 진행 방식을 결정할 수 있다.For example, the server can determine a method of proceeding for the user terminal through a neural network in order to provide a multimodal-based English learning service suitable for the user. For example, the server may provide multimodal-based English word learning content to the user terminal depending on the progress method. For example, the server may reflect the progress results of multimodal-based English word learning content from the user terminal and determine a new progress method for the user terminal through a neural network.

사용자 단말은 멀티모달 기반의 영어 학습 서비스를 이용하는 학습자에 대한 단말일 수 있다. 예를 들어, 사용자 단말은 도 1의 전자 장치(101)를 포함할 수 있다.The user terminal may be a terminal for learners who use a multimodal-based English learning service. For example, the user terminal may include the electronic device 101 of FIG. 1 .

이하, 멀티모달 기반의 영어 학습 서비스에 대해 설명한다.Hereinafter, a multimodal-based English learning service will be described.

멀티모달 기반의 영어 학습 서비스는 일명 엄마표 영어 교육법으로 알려지고 있는 모국어 습득 방식의 영어 교육법을 포함할 수 있다. 이러한, 모국어 습득 방식의 영어 교육법은 한국 사람이 태어나서 한국말을 배웠던 과정을 그대로 영어를 배우는 과정에서 매칭해서 영어를 모국어를 배우듯 자연스럽게 언어로 습득하도록 지도하는 교육 방법을 말한다. 따라서, 모국어 습득 방식의 영어 교육에 있어서 독서를 기반으로 한다거나, 문법 체계를 기반으로 한다는 등의 말은 논리적으로 모순되는 말일 수 있다.Multimodal-based English learning services may include an English teaching method based on native language acquisition, also known as the mother's English teaching method. This native language acquisition method of English education refers to an educational method that matches the process of learning Korean since Koreans were born with the process of learning English and teaches them to naturally acquire English as a language as if learning their native language. Therefore, it may be logically contradictory to say that English education based on native language acquisition is based on reading or based on the grammar system.

모국어 습득 과정은 다음과 같은 과정을 따를 수 있다. 이러한 과정은 한국어, 영어, 일본어, 중국어는 물론 대부분의 언어에서 모국어를 배울 때 거의 동일하게 나타날 수 있다. 다만, 표현하는 용어만 다를 뿐이며, 모국어를 배울 때는, 전 언어기(Pre-Linguistic stage), 한단어 언어기(First-word infant syntax stage), 복합단어 언어기(Muti-words infant syntax stage), 언어체계 습득기(Language system acquisition stage), 언어 정착기 (Language settlement stage) 및 언어 학습 시작기-학령기 (Language learning stage)의 과정을 거치게 된다.The process of acquiring one's native language can follow the following process. This process can occur almost identically when learning one's native language, including Korean, English, Japanese, Chinese, and most other languages. However, only the terms used are different, and when learning one's native language, there are three stages: Pre-Linguistic stage, First-word infant syntax stage, Multi-words infant syntax stage, It goes through the language system acquisition stage, language settlement stage, and language learning start stage - school age.

전 언어기(Pre-Linguistic stage)에서 학습자는 소리를 듣기만 하고, 사람의 소리를 의미 있는 말소리로 인식하지 못할 수 있다. 학습자(예: 아기) 스스로도 울음 이외의 소리를 내기 시작하며, 이 때, 아기는 많이 들은 소리를 흉내내려고 노력한다. 이를 옹알이라고 부르며, 이러한 과정을 모국어 습득 방식의 영어 교육법에서는 터잡기라는 용어를 사용한다. 전 언어기는 아이가 소리를 흉내내서 내려고 노력하는 시기로 무한정으로 보고, 듣기만을 반복할 수 있다. In the pre-linguistic stage, learners only hear sounds and may not recognize human sounds as meaningful speech sounds. Learners (e.g. babies) themselves begin to make sounds other than crying, and at this time, the baby tries to imitate the sounds they have heard a lot. This is called babbling, and the term grounding is used for this process in the English education method of learning the native language. The prelingual stage is a period in which the child tries to imitate sounds and can repeat watching and listening indefinitely.

한단어 언어기(First-word infant syntax stage)에서는 제대로 된 발화를 시작하는 시기이다. 대표적으로, 학습자는 "엄마", "아빠"라는 단어로 발화를 시작할 수 있다. 처음 학습자는 "~~마", "~~빠"로 시작해서 점차 확실하게 엄마, 아빠를 발음할 수 있게 된다. 이렇게 엄마, 아빠를 발음하게 되기까지, 학습자가 엄마, 아빠를 발음하게 되고, 그 의미를 이해하기까지 약 1만번을 듣는다는 연구결과가 있으며, 이 시기의 전세계 아이들의 발음은 유사하다. 한국어는 엄마, 아빠, 영어는 mama, papa, 일본어는 ママ、パパ (mama, papa), 중국어 모두 "음~~마", "음~~빠"에서 발전하는 소리 형태를 갖는다. 주로, 한단어 언어기에 학습자는 가까운 사람들의 호칭이나 감탄사를 한 단어로 발음한다. The first-word infant syntax stage is the time when proper speech begins. Typically, a learner may start an utterance with the words “mom” and “dad.” First-time learners start with “~~ma” and “~~ppa” and gradually become able to pronounce mom and dad clearly. There is research showing that learners hear the words approximately 10,000 times before they can pronounce them and understand their meaning, and the pronunciations of children around the world at this age are similar. Korean has a sound form that develops from mom and dad, English has mama and papa, Japanese has ママ、パパ (mama, papa), and Chinese both have sound forms that develop from “um~~ma” and “um~~pa.” Mainly, in the single-word language stage, learners pronounce the titles of people close to them or exclamations as one word.

복합단어 언어기(Muti-words infant syntax stage)에서는 단어들을 뭉쳐서 학습하는 시기이다. 학습자는"엄마, 밥", "아빠 놀자"가 점차 "엄마, 배고파, 밥줘", "아빠 같이 놀자" 등과 같이 의미 있는 간단한 단어들을 붙여서 점점 복잡한 의사표현을 하기 시작한다. 이때부터, 본격적으로 학습자의 언어가 늘어가는 시기이며, 몰입 말하기 시기라고도 한다. The multi-word infant syntax stage is a period in which words are learned by grouping them together. Learners begin to express more and more complex thoughts by adding simple meaningful words such as "Mom, eat" and "Dad, let's play" to "Mom, I'm hungry, feed me" and "Dad, let's play together." From this point on, it is the period when the learner's language skills increase in earnest, and is also called the immersion speaking period.

언어체계 습득기(Language system acquisition stage)에서는 학습자의 말로 하는 의사소통이 읽기로 조금씩 발전하는 시기이다. 부모가 읽어주는 그림책을 시작으로 점차 동화책으로 읽기가 발전할 수 있다. 이 시기에는 책을 읽을 때는 단어를 통째로 읽을 수 있다. 이 시기의 가장 중요한 점은, 학습자가 말로 부모와 의사소통을 상당히 능숙하게 할 수 있는 수준이 되었다는 것이다. 초기의 책 읽기에는 파닉스(Phonics)를 이용하지 않으며, 가르치지 않을 수 있다. 또한, 미국에서는 베드타임 스토리(Bedtime Story)라고 칭해지는 아이들의 잠자리에서 책을 읽어주는 활동을 매우 중요하게 생각한다. 이를 통해, 아이들의 창의력과 상상력을 키워주는 역할과 함께, 아이들 스스로 책읽기에 흥미를 느끼고 책을 가까이할 수 있게 되는 동기부여가 발생할 수 있다. 이러한 베드타임 스토리의 시기를 잘 거치면, 학습자들은 독서의 재미에 빠지게 되며, 파닉스(Phonics)는 스스로 책을 읽고자 하는 욕구가 강해지는 끝 시기에 기본적인 것부터 가르치게 된다. 따라서, 이 시기의 가장 중요하게 다루어지는 부분은 아이들에게 책 읽기의 재미를 붙여주도록 하는 것이다. 책 읽기를 위해서는 단어를 읽을 수 있는 훈련이 필요하다. 텔레비전이나 냉장고, 세탁기, 책상 등에 종이로 명칭을 적어서 붙여 놓고 아이들에게 단어를 읽는 연습을 시키는데, 이 방식은 한국은 물론 영미권 모두 동일한 방식으로 진행할 수 있다. The language system acquisition stage is a period in which the learner's verbal communication gradually develops into reading. Reading can begin with picture books read by parents and gradually progress to storybooks. At this stage, you can read whole words when reading a book. The most important thing about this period is that the learner has become quite proficient at communicating with parents verbally. Phonics is not used or may not be taught in early book reading. Additionally, in the United States, the activity of reading books to children at bedtime, referred to as bedtime stories, is considered very important. Through this, along with the role of nurturing children's creativity and imagination, children can become interested in reading and become motivated to approach books. If learners go through this period of bedtime story well, they fall in love with reading, and phonics is taught from the basics at the end when the desire to read books on their own becomes stronger. Therefore, the most important part of this period is to make reading books fun for children. In order to read books, you need training to read words. Names are written on paper and pasted on televisions, refrigerators, washing machines, desks, etc., and children are taught to read the words. This method can be used in the same way in both Korea and English-speaking countries.

언어 정착기(Language settlement stage)는 학습자가 일기를 쓰는 시기일 수 있다. 이 시기의 학습자는 말로써 의사소통이 자연스럽고, 기초적인 책 읽기가 가능한 시기이다. 학습자가 모르는 단어는 물어보면서 알아가는 시기이며, 서서히 글 쓰기를 시작한다. 글 쓰기는 일차적으로는 문자를 쓰는 단계를 거쳐, 문장을 쓰는 단계로 진행된다. 문장을 쓰는 단계 또한 베껴쓰기를 거쳐 스스로 쓰기로 진행된다. 이 시기에 학습자는 일기 쓰기를 시작하며 이를 한국에서는 다이어리(Diary)라고 표현하는데, 미국에서는 이를 주로 저널(Journal)이라고 표현한다. 언어 정착기에서는 한국과 영미권에서 약간의 차이점을 보여준다. 용어의 의미 차이가 실제로 쓰기지도를 하는 방식에서도 그대로 반영되어 나타난다. 즉, 한국에서는 일기(하루 하루의 기록), 다이어리(매일 매일의 기록)라는 용어를 사용하면서, 학습자의 초기 글쓰기는 매일의 일상을 기록하는 방식으로 이루어지고, 학교에서도 아이들의 일기장이 매일 쓰여졌는지를 확인한다. 영미권에서는 저널(특정 주제에 대한 의견 기록)이라는 용어가 주는 것처럼 평소와 다른 특별한 일이 있을 때, 그 일을 주제로 글쓰기를 지도한다. 하루의 일상을 저널 쓰기의 주제가 되지 않는다. 다만, 언어 정착기 단계에 본격적으로 글쓰기를 지도한다는 점에서는 모국어 배우는 과정상의 동일선상에 있게 된다.The language settlement stage may be the time when the learner writes a diary. Learners at this age have natural verbal communication and can read basic books. This is the time when learners find out words they do not know by asking them, and they gradually begin to write. Writing progresses first through the stage of writing letters and then into the stage of writing sentences. The step of writing a sentence also progresses through copying and writing on your own. During this period, learners begin to write diaries, which in Korea is called a diary, but in the United States it is usually expressed as a journal. The language settlement period shows some differences between Korea and the English-speaking world. The difference in meaning of terms is also reflected in the actual writing instruction method. In other words, in Korea, the terms diary (a record of each day) and diary (a record of each day) are used, and learners' early writing is done by recording their daily lives, and whether children's diaries are written every day at school. Check . In the English-speaking world, as the term journal (recording opinions on a specific topic) suggests, when something special happens that is different from usual, writing is taught on that topic. Daily life is not the subject of journal writing. However, in that writing is taught in earnest during the language settlement stage, it is along the same line as the process of learning one's native language.

언어학습 시작기(학령기)에서는 본격적으로 학습의 개념이 접목된다. 학습이란, 의식적 또는 의도적으로 무언가를 배워서 익히는 것을 말한다. 모국어 습득과정에서의 학습은 모국어의 문법체계를 본격적으로 배우게 될 수 있다.At the beginning of language learning (school age), the concept of learning is incorporated in earnest. Learning refers to learning and mastering something consciously or intentionally. Learning in the process of acquiring one's native language can lead to learning the grammatical system of one's native language in earnest.

이렇게 모국어 습득과정이 진행됨에 있어서 몇 가지 고려할 점이 있는데, 아이는 태어나는 시점에 사물에 대한 이해와 인지 능력이 없다. 그 상태에서 모국어를 배우기 시작하나, 한국과 같은 환경에서 영어를 접하는 아이들의 경우, 이미 우리나라 말로 의사소통이 상당히 가능한 상태에서 그에 해당하는 사물에 대한 이해와 인지능력이 형성되어 있다. 따라서, 잡음이나 음향처럼 의미없는 소리가 의미를 가진 소리로 들리기 시작하는 시점부터는 소리 듣기와 함께, 단어를 익히는 과정을 병행할 수 있다. 즉, 한단어 언어기 시점부터 언어체계 습득기에 본격적으로 진행되는 단어 습득이 함께 진행되는 것이다. 다만, 이 시기에 아이들에게 스스로 책 읽기를 강요하지는 않는다. 이 시기에 단어를 익히는 것은 책 읽기보다는 소리로 듣는 단어의 의미를 이해하기 위한 것이 주된 목적이다. 이때의 책 읽기는 스스로 읽기를 강요하는 것이 아니라, 책 읽어주기가 되어야 하며, 책 읽어주기를 할 때 아이들은 듣는 소리만으로 책의 내용을 상당부분 이해할 수 있어야 한다. 여러 연구에 의하면, 90% 이상의 단어를 듣고 이해할 수 있는 책을 선택했을 때, 아이들이 책 읽기가 편해질 수 있다고 한다. 따라서, 단어 학습의 시기에는 책 읽기가 아닌 책 읽어주기가 실행되어야 하며, 이는 영미권의 베드임 스토리의 활동과 같다고 이해하면 된다. 소리를 많이 듣다 보면 그 소리를 알아들을 수 있다고 해서 읽기도 무조건 읽다 보면 글을 이해할 수 있다는 주장은 잘못된 주장이다. 글이 없었던 시기에도 말로써 의사소통이 가능했다. 글을 읽지 못하는 사람도 말로 의사소통을 할 수 있다. 글을 알 수 없는 동물들도 소리로써 최소한의 의사소통을 할 수 있다. 소리와 사물의 이미지는 자연적으로 존재하는 것이며, 자연의 일부로써 본래부터 존재하는 본연의 능력인 반면, 글자는 인간이 의도적으로 만들어낸 기능적인 발명품이기 때문에 별도의 학습과정을 통해 익혀야만 하는 것이기 때문이다. 이 때문에 책에 쓰여진 글자는 의도적으로 단어를 학습한 이후에만 이해하고 쓸 수 있는 것이다. 따라서, 많이 읽게 하는 것이 아니라, 많이 읽어 줌으로써 아이들이 소리와 단어를 매칭시킬 수 있도록 해 주어야 하고, 그런 과정을 거친 다음에 책 읽기에 접근할 수 있도록 해야 한다. There are several things to consider as the mother tongue acquisition process progresses. At the time of birth, a child does not have the ability to understand and recognize objects. Children begin to learn their native language in that state, but in the case of children who encounter English in an environment such as Korea, they are already quite capable of communicating in Korean and have formed an understanding of corresponding objects and cognitive abilities. Therefore, from the point when meaningless sounds such as noise or sound begin to sound meaningful, the process of learning words can be done in parallel with listening to sounds. In other words, word acquisition progresses from the single-word language stage to the language system acquisition stage. However, children are not forced to read books on their own during this period. The main purpose of learning vocabulary during this period is to understand the meaning of words heard by sound rather than reading books. Reading at this time should not be about forcing children to read on their own, but rather should be about reading to them, and when reading to them, children should be able to understand much of the content of the book just by hearing what they hear. According to several studies, children can become more comfortable reading books when they select books in which they can hear and understand more than 90% of the words. Therefore, during the time of word learning, book reading, rather than reading, should be carried out, and this can be understood as the same as the bedtime story activity in English-speaking countries. The claim that just because you can understand sounds if you listen to them a lot, you can understand texts if you read them unconditionally is a false claim. Even when there was no writing, communication was possible through words. Even people who cannot read can communicate verbally. Even animals that cannot read or write can at least communicate using sounds. Sounds and images of objects exist naturally and are natural abilities that exist naturally as part of nature, while letters are functional inventions intentionally created by humans and must be learned through a separate learning process. am. For this reason, the letters written in books can only be understood and written after intentionally learning the words. Therefore, rather than having children read a lot, we should enable children to match sounds and words by reading a lot, and after going through that process, we should be able to access reading.

결국, 독서를 기반으로 한 모국어 습득 방식, 문법 체계 학습을 기반으로 한 모국어 습득 방식 등의 표현은 모국어 습득 원리 자체를 이해하지 못했기 때문에 사용할 수 있는 잘못된 표현이다. 이러한 표현이 사용되는 이유는 독서, 문법 등의 용어가 주는 대리 만족 때문이다. 부모들은 아이들의 독서나 문법 학습이 성적을 높여주는 절대적인 방법이라는 믿음을 가지고 있다. 따라서, 부모들을 현혹하기 위해서는 이렇듯 독서나 문법 학습을 시켜준다는 말이 힘을 갖는다. 그래서 서로 완전히 다른 접근법임에도 불구하고 모국어 습득 방식의 영어 습득 방법과, 독서/문법을 기반으로 한 교육 방법이라는 표현이 함께 사용되고 있으며, 논리적으로 맞지 않는 말임에도 부모들은 현혹되는 것이다. 정확한 것은 모국어를 습득하는 단계 중, 읽기 단계에서 독서를 중요하게 다루어야 한다는 것이다. 즉, 모국어 습득 방식의 언어 습득에 있어서는"독서를 기반으로 한 모국어 습득 방식"이 아니라 "모국어 습득 과정에 있어서의 독서"가 독서를 올바르게 바라보는 관점이 된다. In the end, expressions such as the native language acquisition method based on reading and the native language acquisition method based on learning the grammar system are incorrect expressions that can be used because the principles of native language acquisition itself are not understood. The reason these expressions are used is because of the vicarious satisfaction provided by terms such as reading and grammar. Parents believe that reading or learning grammar is the absolute best way to improve children's grades. Therefore, in order to deceive parents, it is powerful to say that children will learn reading or grammar. Therefore, even though they are completely different approaches, the expressions 'English acquisition method through native language acquisition method' and 'Education method based on reading/grammar' are used together, and even though these words do not make logical sense, parents are deceived. What is accurate is that reading must be treated as important during the reading stage of acquiring one's native language. In other words, when it comes to language acquisition through the native language acquisition method, the correct perspective on reading is “reading in the process of native language acquisition,” rather than “a native language acquisition method based on reading.”

본 개시의 멀티모달 기반의 영어 학습 서비스는 "자연스럽게 기억되는 방법"에 중심을 둔 서비스이다. 잘 암기하는 방법은 쉽게 기억하고 쉽게 잊게 되며, 조금만 변화되는 의미에도 대응하기 어려운 반면, 자연스럽게 기억되는 방법은 연상에 의해 의미를 유추해서 기억하므로, 한번 기억되면 잘 잊지 않게 되고, 다양하게 변화되는 의미에도 쉽게 적용할 수 있게 된다. The multimodal-based English learning service of this disclosure is a service that focuses on “natural memorization methods.” The method of memorizing well is easy to remember and forget easily, and it is difficult to respond to even the slightest change in meaning, whereas the method of memorizing naturally remembers the meaning by inferring it through association, so once it is remembered, it is difficult to forget, and the meaning changes in various ways. It can also be easily applied.

관용 표현이란, 몇 개의 단어들이 특정한 상황에서 원 단어들의 의미와 다른 의미를 갖는 표현을 말한다. 이러한 관용 표현은 보통 암기를 통해 익히게 된다. 그러나, 관용 표현이 간혹 단어들의 원 뜻을 그대로 가진 채 사용되는 경우도 있다. 간단한 예로 빅 피쉬(Big fish)는 보스나 리더를 의미하는 표현이다. 그러나, 문장 속에서 실제로 큰 물고기를 의미할 수도 있다. 이 표현을 보스로 암기했다면, 문장의 의미는 완전히 이상하게 된다. 즉, 잘 암기하는 방법으로의 학습은 실제 구사하는 말이나 글에서 의미를 이해하는데 한계가 있다. An idiomatic expression is an expression in which several words have a meaning different from that of the original words in a specific situation. These idiomatic expressions are usually learned through memorization. However, idiomatic expressions are sometimes used while retaining the original meaning of the words. As a simple example, big fish is an expression meaning a boss or leader. However, in a sentence it can actually mean a large fish. If you have memorized this expression as a boss, the meaning of the sentence becomes completely strange. In other words, learning through memorization has limitations in understanding the meaning of actual spoken words or writing.

한편, 영어 철자의 음가를 통해 발음을 연습하여 단어 읽기 능력을 키우고자 하는 파닉스(Phonics)의 경우, 실질적으로는 영어를 모국어로 사용하고 있는 국가의 아이들이 듣고, 말하는 능력이 형성된 상태에서 책 읽기를 하기 위해 적용하는 교육 방법으로 개발되었기 때문에, 영어로 듣고 말하는 능력이 형성되지 않은 ESL(English as a Second Language)환경이나, EFL (English as a Foreign Language)환경의 아이들에게는 처음부터 적용하는 것은 맞지 않는다. 그 이유는 영어의 듣기와 말하기 능력이 형성되지 않은 아이들에게 철자를 암기시키고, 철자가 내는 소리를 암기시켜서, 책에 나오는 단어를 소리 내어 읽는 연습부터 시키는 것이기 때문이며, 이것은 애초의 Phonics 교육 목적과도 부합하지 않는다. 그러나, 대부분의 사교육 기관 등에서는 파닉스(Phonics)가 마치 영어를 시작하는 관문이며, 말하기의 시작을 파닉스(Phonics)로 시작하고 있고, 인터넷을 통해 서비스되는 대부분의 아이들 영어교육 사이트들 역시 파닉스(Phonics)로 시작하고 있다. 이러한 교육 방법은 크게 3가지 측면에서 문제가 있다. 첫째, 파닉스(Phonics)는 애초에 말하는 연습을 하기 위한 발음 훈련법으로 개발된 것이 아니라, 책을 읽을 때, 모르는 단어가 나오는 경우 이에 대한 발음을 어떻게 할 것인지를 가르치기 위해 개발된 것이다. 즉, 책을 낭독하기 위해 개발된 프로그램이, 영어 말하기 훈련을 위해 필요한 기초 능력 향상 프로그램으로 잘못 변질된 것이다. 둘째, 영어가 모국어인 아이들이 듣기와 말하기 능력이 형성된 후에 진행하는 교육단계가 파닉스(Phonics)인데, 이를 영어가 모국어가 아니며, 듣기와 말하기 능력이 형성되지 않은 아이들에게 주입식으로 적용하는 것이기 때문에, 애초에 교육 효과를 보기 어렵다. 셋째, 모국어를 배우는 과정을 살펴보면, 파닉스(Phonics)처럼 각 철자의 음가를 학습하는 과정부터 시작하지 않는다. 이는 모든 나라의 공통이다. 예를 들어, '사과'라는 단어를 살펴보면, 우리나라에서는 '사과'라고 쓰여진 글자의 형태를 통으로 읽는 법을 가르친다. 그 다음 읽는 능력이 어느정도 형성된 후에, '사과'는 'ㅅ(스), ㅏ(아), ㄱ(그), ㅗ(오), ㅏ(아)'로 분해하고, '스아그오와'를 빨리 발음하면 '사과'가 된다는 것을 가르친다. 이 과정은 모국어 습득 과정에 있어서, 학령기에 해당하는 시기에 중점적으로 가르친다. 영어도 마찬가지다. 처음에는 'apple'이라는 단어 자체를 통으로 익히도록 한다. 그 후, 학령기에 접어들면 'a(애), pp(쁘), l(ㄹ),e(묵음)'이라고 철자의 소리가 내는 음가를 익혀 어떻게 발음하는지를 가르친다. 즉, 한국어나 영어 모두 모국어 습득 절차에 따르면 단어 자체가 가진 '형태' 자체와 '의미'를 연결하여 가르치는 과정을 따른다는 것이다. 이를 전체적 언어 접근법(Whole Language Approach)의 일부인 전체적 단어 접근법(Whole Word Approach)라고 부른다. 따라서, 모국어 습득 방식의 영어교육을 표방하면서 파닉스(Phonics)을 연습하는 프로그램으로 시작하는 것은 옳지 않다. Meanwhile, in the case of phonics, which aims to improve word reading skills by practicing pronunciation through the phonetic sounds of English letters, children in countries where English is their native language read books while their listening and speaking skills have been developed. Since it was developed as an educational method to apply, it is correct to apply it from the beginning to children in ESL (English as a Second Language) or EFL (English as a Foreign Language) environments who have not developed the ability to listen and speak in English. No. The reason is that children who do not have English listening and speaking skills are taught to memorize letters, the sounds they make, and practice reading the words in books out loud. This is in line with the original purpose of phonics education. does not match However, in most private education institutions, phonics is the gateway to learning English, and speaking begins with phonics, and most children's English education sites provided through the Internet also use phonics. ) and starts with This teaching method has problems in three major aspects. First, phonics was not initially developed as a pronunciation training method to practice speaking, but to teach how to pronounce unfamiliar words when reading a book. In other words, a program developed to read books aloud was wrongly transformed into a program to improve basic skills needed for English speaking training. Second, the educational stage that children whose native language is English undergoes after their listening and speaking skills are developed is phonics. This is because it is applied through indoctrination to children whose native language is not English and whose listening and speaking skills have not been developed. It is difficult to see the effect of education in the first place. Third, looking at the process of learning one's native language, it does not start with learning the phonetic value of each letter like phonics. This is common to all countries. For example, looking at the word 'apple', in Korea, we teach how to read the shape of the letter written 'apple' as a whole. Then, after the reading ability is formed to a certain extent, 'sap' is broken down into 'ㅅ(su), ㅏ(ah), ㄱ(he), ㅗ(oh), ㅏ(ah)', and 'suagwowa'. Teach them that if you pronounce it quickly, it becomes ‘apple’. This course focuses on the period corresponding to school age in the process of acquiring the native language. The same goes for English. First, learn the word 'apple' itself. Afterwards, when children reach school age, they learn the sounds of the letters 'a (ae), pp (ppu), l (ㄹ), e (silent sound)' and are taught how to pronounce them. In other words, according to the native language acquisition procedure for both Korean and English, the process of teaching is taught by connecting the 'form' of the word itself and its 'meaning'. This is called the Whole Word Approach, which is part of the Whole Language Approach. Therefore, it is not right to start with a program that practices phonics while advocating English education using the method of learning one's native language.

파닉스(Phonics)를 배우더라도 80~85%의 단어들만 읽을 수 있다고 알려져 있다. 이것은 5개 단어로 이루어진 문장 중 1개 정도의 단어를 파닉스(Phonics)의 원리로 읽지 못한다는 것을 의미한다. 학년이 올라갈수록 체감상 파닉스(Phonics)로 읽을 수 있는 단어는 50% 정도도 되지 않는다는 주장도 있다. 또한, 프랑스어, 독일어 등의 언어에서 유래한 외래어들이 영어 단어로 정착한 경우들이 점점 더 많아지고 있다. 철자대로 발음하는 스페인어를 제외한 많은 나라의 언어에서 파생된 영어 단어들은 파닉스(Phonics)의 원리로 발음되지 않는 경우가 대부분이다. 이는 파닉스(Phonics)를 배워서 발음할 수 있는 단어들이 점점 더 많아지고 있다는 것을 의미한다. It is known that even if you learn phonics, you can only read 80-85% of words. This means that about one word out of five-word sentences cannot be read according to the principles of phonics. Some claim that as grades go up, less than 50% of words can be read using phonics. In addition, there are an increasing number of cases where loan words originating from languages such as French and German have become established as English words. Except for Spanish, which is pronounced exactly as it is spelled, English words derived from many languages are most often not pronounced according to the principle of phonics. This means that there are more and more words that can be pronounced by learning phonics.

단어가 가지는 의미를 익힐 때, 과거의 방식처럼 영어 단어를 쓰고, 뜻을 암기하는 방식의 효율성은 매우 떨어진다. 독일 심리학자 헤르만 에빙하우스(Hermann Ebbinghaus)가 제안한 망각 곡선에 따르면 새로 배운 정보에 대한 우리의 기억력이 시간이 지남에 따라 급격히 감소하며 처음 24시간 이내에 정보의 최대 80%가 잊혀진다고 한다. 이를 증명하듯이 망각되기 전에 또다시 학습하기를 반복함으로써 기억의 효율성을 높이기 위해 노력한다. When learning the meaning of words, the efficiency of writing English words and memorizing their meanings as in the past is very low. According to the forgetting curve proposed by German psychologist Hermann Ebbinghaus, our memory for newly learned information declines rapidly over time, with up to 80% of the information being forgotten within the first 24 hours. As proof of this, efforts are made to increase the efficiency of memory by repeating learning again before being forgotten.

이를 극복하기 위한 이론으로는 앨런 파비오(Allan Paivio)가 1971년 제안한 듀얼 코딩 이론(Dual Coding Theory)이 있다. 문자, 말 등의 언어적 영역을 담당하는 좌뇌와 이미지, 감정 등의 비언어적 영역을 담당하는 우뇌를 동시에 활용할 때 학습효과가 훨씬 높아진다는 이론으로, 언어적 요소인 문자를 하나의 코드로 비언어적 요소인 이미지를 다른 하나의 코드로 정의하며, 이 두가지 코드가 결합되었을 때 학습 효과가 높게 나타난다는 이론이다. 예를 들면, 단어(문자)와 의미(문자)를 중얼거리면서(말) 암기를 한다는 것은 언어적 요소인 하나의 코드만을 사용한 학습법이다. 언어적 요소인 단어(문자)와 그 의미를 직/간접적으로 설명해주는 비언어적 요소인 이미지를 이용해서 함께 학습할 때, 학습효과와 지속효과가 높게 나타난다. 천(Chun)과 플래스(Plass)(1996)의 연구에 따르면 그림과 함께 단어를 공부한 학습자는 그림 없이 단어를 공부한 학습자보다 단어 기억력이 더 뛰어났다. A theory to overcome this is the Dual Coding Theory proposed by Allan Paivio in 1971. The theory is that the learning effect is much higher when the left brain, which is responsible for the verbal area such as letters and words, and the right brain, which is responsible for the non-verbal area such as images and emotions, are utilized simultaneously. The theory is that an image is defined as another code, and that the learning effect is high when these two codes are combined. For example, memorizing words (characters) and their meaning (characters) while mumbling (speaking) is a learning method that uses only one code, a linguistic element. When learning together using words (characters), which are linguistic elements, and images, which are non-verbal elements that directly or indirectly explain their meaning, the learning effect and lasting effect are high. According to a study by Chun and Plass (1996), learners who studied words with pictures had better word memory than learners who studied words without pictures.

듀얼 코딩 이론(Dual Coding Theory)을 확장한 주장이 멀티모달 학습 이론(Multimodal Learning Theory)이다. 이 이론은 언어적/비언어적 코드를 담당하는 시각, 청각적 요소에 더해 운동감각과 같은 다양한 감각 채널을 통해 정보를 제공하면, 학습자가 더 쉽게 이해하고 기억하게 된다는 이론이다. 이 이론에서 말하는 운동 감각이란, 동영상(시청각)을 시청하는 것을 의미하는 것이 아니라, 학습자 개인의 움직이는 활동이 포함되는 것을 말한다. 즉, 좋아하는 영상을 보면서 행동을 따라하고, 노래와 율동을 따라서 흉내내는 등의 몸을 함께 움직이는 직접적인 활동이 함께 이루어지는 것을 말한다. 멀티모달 학습은 인간의 행동 인식이나 감정 인식 등을 바탕으로 인공지능 분야에서 다양한 연구자들에 의해 활발하게 연구되고 있으며, 이는 아이들의 학습에도 동일하게 적용되고 있다.An extension of the Dual Coding Theory is the Multimodal Learning Theory. This theory is that if information is provided through various sensory channels such as kinesthetics in addition to the visual and auditory elements responsible for verbal/non-verbal codes, learners will understand and remember it more easily. The kinesthetic sense mentioned in this theory does not mean watching videos (audiovisual), but rather includes the learner's individual moving activities. In other words, it refers to direct activities that involve moving the body together, such as imitating actions while watching a favorite video or imitating songs and movements. Multimodal learning is being actively studied by various researchers in the field of artificial intelligence based on human behavior recognition and emotion recognition, and is equally applied to children's learning.

과학 교육 및 기술 저널(Journal of Science Education and Technology, 2009)에 발표된 연구 "The effects of animated and static visuals on students' understanding and recall of chemical experiments" 에서는 과학 실험을 묘사한 애니메이션 비디오가 정적인 이미지나 텍스트 기반 설명에 비해 실험에 대한 학생들의 이해와 기억을 향상시키는 데 더 효과적이라는 사실을 발견했다.A study published in the Journal of Science Education and Technology (2009), “The effects of animated and static visuals on students' understanding and recall of chemical experiments,” found that animated videos depicting scientific experiments were more effective than static images or We found that it was more effective in improving students' understanding and memory of the experiment compared to text-based explanations.

또 다른 참고 이론은, 존 스웰러(John Sweller)가 주장한 "인지 부하 이론(Cognitive Load Theory)"이 있다. 이 이론에 따르면 사람은 작업기억 용량이 제한되어 있어 한 번에 너무 많은 정보를 제시하면 기억하는데 과부하가 걸려 학습에 방해가 될 수 있으며, 짧은 비디오는 인지 부하를 줄이고 학습자가 정보를 더 쉽게 처리하고 유지할 수 있기 때문에 학습에 더 효과적이라고 설명한다. 이 연구 결과는 짧은 동영상이 긴 동영상에 비해 특히 정보 유지 및 참여 측면에서 더 나은 학습 결과로 이어질 수 있음을 보여주고 있다.Another reference theory is the “Cognitive Load Theory” proposed by John Sweller. According to this theory, people have limited working memory capacity, so presenting too much information at once can overload memory and interfere with learning. Short videos reduce cognitive load and allow learners to process information more easily. It is explained that it is more effective in learning because it can be maintained. The results of this study show that short videos can lead to better learning outcomes compared to longer videos, especially in terms of information retention and engagement.

도 3은 일 실시예에 따라 뉴럴 네트워크(neural network)를 이용하여 서버가 사용자 단말에게 멀티모달(multimodal) 기반의 영어 학습 서비스를 제공하는 방법을 나타낸다. 도 4는 일 실시예에 따라 멀티모달 기반의 영단어 학습 컨텐츠에 대한 예이다. 도 5는 일 실시예에 따라 멀티모달 기반의 영단어 학습 컨텐츠의 설정 화면에 대한 예이다. 도 6은 일 실시예에 따른 멀티모달 기반의 영단어 학습 컨텐츠의 보상 화면에 대한 예이다. 도 7은 일 실시예에 따른 멀티모달 기반의 영단어 학습 컨텐츠의 팀 단위의 게임에 대한 예이다. 도 3 내지 도 7의 실시예들은 본 개시의 다양한 실시예들과 결합될 수 있다.Figure 3 shows a method in which a server provides a multimodal-based English learning service to a user terminal using a neural network according to one embodiment. Figure 4 is an example of multimodal-based English word learning content according to an embodiment. Figure 5 is an example of a settings screen for multimodal-based English word learning content according to an embodiment. Figure 6 is an example of a reward screen of multimodal-based English word learning content according to an embodiment. Figure 7 is an example of a team-based game of multimodal-based English word learning content according to an embodiment. The embodiments of FIGS. 3 to 7 can be combined with various embodiments of the present disclosure.

도 3을 참조하면, 단계 S310에서, 서버는 사용자 단말로부터 영단어 학습을 위한 제1 요청 메시지를 수신할 수 있다.Referring to FIG. 3, in step S310, the server may receive a first request message for English word learning from the user terminal.

제1 요청 메시지는 사용자 단말이 멀티모달 기반의 영단어 학습 컨텐츠를 제공받기 위해 서버에게 요청하는 메시지이다. 예를 들어, 제1 요청 메시지는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 포함할 수 있다. 사용자에 대한 개인 정보는 사용자의 이름, 사용자의 생년월일 및 사용자의 성별을 포함할 수 있다. 사용자 단말에 대한 정보는 단말에 대한 식별 정보, 단말의 기종에 대한 정보 및 단말의 용량에 대한 정보를 포함할 수 있다. The first request message is a message that the user terminal requests the server to receive multimodal-based English word learning content. For example, the first request message may include personal information about the user and information about the user terminal. Personal information about a user may include the user's name, the user's date of birth, and the user's gender. Information about the user terminal may include identification information about the terminal, information about the type of terminal, and information about the capacity of the terminal.

예를 들어, 단말에 대한 식별 정보는 사용자 단말의 ID(identifier) 또는 사용자 단말의 국제 휴대전화 식별 번호(international mobile equipment identity) 중 적어도 하나를 포함할 수 있다. 사용자 단말의 ID는 영어 학습 서비스에 가입한 ID일 수 있다. 국제 휴대 전화 식별 번호는 제조사가 단말을 제작할 때 부여하는 15자리 숫자로 된 번호이며, 인증기관 고유 번호, 단말 제조사, 모델명 및 단말 일련번호로 구성될 수 있다.For example, the identification information for the terminal may include at least one of an identifier (ID) of the user terminal or an international mobile equipment identity of the user terminal. The ID of the user terminal may be the ID that subscribed to the English learning service. The international mobile phone identification number is a 15-digit number assigned by the manufacturer when manufacturing the terminal, and may consist of the certification authority's unique number, terminal manufacturer, model name, and terminal serial number.

예를 들어, 단말의 기종에 대한 정보는 단말의 기종을 나타내는 정보이며, 단말의 제품명에 대한 값을 포함할 수 있다.For example, information about the terminal model is information indicating the terminal model and may include a value for the product name of the terminal.

예를 들어, 단말의 용량에 대한 정보는 단말에 데이터를 저장할 수 있는 공간에 대한 정보이며, 단말의 현재 저장 가능한 용량을 포함할 수 있다.For example, information about the capacity of the terminal is information about the space where data can be stored in the terminal and may include the current storage capacity of the terminal.

도 4를 참조하면, 멀티모달 기반의 영단어 학습 컨텐츠의 화면(400)은 알파벳 썸네일 메뉴(410), 영단어(420), 영단어의 정의(430), 출력 버튼(440), 애니메이션(450), 보상 영역(460) 및 메뉴 바(470)를 포함할 수 있다. Referring to FIG. 4, the screen 400 of the multimodal-based English word learning content includes an alphabet thumbnail menu 410, an English word 420, an English word definition 430, an output button 440, an animation 450, and a reward. It may include an area 460 and a menu bar 470.

알파벳 썸네일 메뉴(410)는 각각의 알파벳(예: A부터 Z)으로 시작하는 영단어를 분류한 메뉴일 수 있다. 출력 버튼(440)은 영단어(420) 또는 영단어의 정의(430) 중 적어도 하나에 대해 녹음된 음성을 출력하는 버튼일 수 있다. 예를 들어, 영단어의 정의(430)는 영단어(420)에 대한 정의를 영어로 기재한 영역이며, 중요 단어를 다른 색깔로 표시한 하이퍼 텍스트(431) 및 녹음된 음성이 출력되는 텍스트를 나타내는 하이라이트(432)를 포함할 수 있다. 애니메이션(450)은 영단어(420)를 설명하는 동적 이미지가 표시된 영역일 수 있다. 예를 들어, 애니메이션(450)이 활성화된 경우에는 이미지가 특정 시간동안 움직일 수 있다. 이때, 영단어(420)에 매칭된 애니메이션(450)마다 특정 시간이 상이할 수 있고, 특정 시간은 3초 이상이고 7초 이하인 범위에 포함될 수 있다. 예를 들어, 애니메이션(450)이 비활성된 경우에는 이미지는 멈춘 상태로 표시될 수 있다. 이때, 멈춘 상태에 표시되는 이미지를 대표 이미지라 지칭할 수 있다. 보상 영역(460)은 해당 영단어에 대한 학습 진행율에 따라 사용자 단말이 획득하게 되는 보상을 표시한 영역일 수 있다. 예를 들어, 보상은 영단어(420)를 표현한 스티커일 수 있고, 보상을 획득하는 행위를 스티커 헌트라고 지칭할 수 있다. 메뉴 바(470)는 멀티모달 기반의 영단어 학습 컨텐츠를 제어할 수 있는 메뉴를 포함한 영역일 수 있다. 예를 들어, 메뉴 바(470)는 이전 단계로 진행하는 버튼, 다음 단계로 진행하는 버튼, 녹음/재생 버튼, 설정 버튼, 홈 버튼 및 종료 버튼을 포함할 수 있다.The alphabet thumbnail menu 410 may be a menu that categorizes English words starting with each alphabet (eg, A to Z). The output button 440 may be a button that outputs a recorded voice for at least one of the English word 420 or the English word definition 430. For example, the English word definition 430 is an area where the definition of the English word 420 is written in English, and the hypertext 431, which displays important words in different colors, and a highlight indicating the text in which the recorded voice is output. It may include (432). The animation 450 may be an area where a dynamic image explaining the English word 420 is displayed. For example, when animation 450 is activated, the image may move for a certain period of time. At this time, the specific time may be different for each animation 450 matched to the English word 420, and the specific time may be within the range of 3 seconds or more and 7 seconds or less. For example, when the animation 450 is inactive, the image may be displayed in a stopped state. At this time, the image displayed in the stopped state may be referred to as a representative image. The reward area 460 may be an area that displays the reward that the user terminal obtains according to the learning progress rate for the corresponding English word. For example, the reward may be a sticker expressing an English word 420, and the act of obtaining the reward may be referred to as a sticker hunt. The menu bar 470 may be an area containing a menu that can control multimodal-based English word learning content. For example, the menu bar 470 may include a button to proceed to the previous step, a button to proceed to the next step, a record/play button, a settings button, a home button, and an end button.

부가적으로, 예를 들어, 멀티모달 기반의 영단어 학습 컨텐츠에서 영단어 학습의 진행 순서는 사용자 단말에 의해 선택될 수 있다. 영단어 학습의 진행 순서는 영단어의 빈출도 순서, 알파벳 순서 또는 주제별 분류 중 적어도 하나를 포함할 수 있다. 예를 들어, 서버는 영화 영어 자막, 영어 교과서 및 영어 동화책 등과 같은 소스 정보로부터 복수의 영단어를 수집할 때, 각 영단어가 노출된 숫자를 카운트할 수 있다. 서버는 영단어가 노출된 숫자를 영단어의 빈출도로 결정하고, 영단어의 빈출도가 높은 순서로 영단어 학습의 진행 순서를 결정할 수 있다. 예를 들어, 멀티모달 기반의 영단어 학습 컨텐츠에 각 영단어의 빈출도가 표시될 수 있다. 예를 들어, 알파벳 순서는 A부터 Z까지의 순서일 수 있다. 예를 들어, 영단어의 빈출도 순서와 알파벳 순서가 모두 선택된 경우, 서버는 A부터 Z까지의 알파벳 순서로 학습을 진행하되, 영단어의 빈출도가 높은 순서로 알파벳별 영단어에 대한 학습을 진행할 수 있다. 예를 들어, 주제별 분류가 선택된 경우, 집, 학교, 탈 것 등으로 분류되어 구분된 영단어에 대한 학습을 진행할 수 있다. 주제별 분류와 함께 빈출도 순서, 알파벳 순서를 선택해서 진행할 수 있다. Additionally, for example, in multimodal-based English word learning content, the order in which English word learning progresses may be selected by the user terminal. The progress order of English word learning may include at least one of the order of frequency of English words, alphabet order, or classification by subject. For example, when the server collects a plurality of English words from source information such as English subtitles of movies, English textbooks, and English children's books, it can count the number of times each English word is exposed. The server determines the number of exposed English words as the frequency of the English word, and determines the order of English word learning in the order of the frequency of the English word. For example, the frequency of each English word may be displayed in multimodal-based English word learning content. For example, the alphabetical order may be from A to Z. For example, if both the frequency order and the alphabet order of English words are selected, the server can proceed with learning in alphabetical order from A to Z, but can proceed with learning English words by alphabet in the order of high frequency of English words. . For example, if classification by topic is selected, you can proceed with learning about English words classified by home, school, vehicle, etc. You can select classification by topic, frequency order, or alphabetical order.

도 5를 참조하면, 멀티모달 기반의 영단어 학습 컨텐츠에 대한 설정(500)은 기본 설정(510), 진행 방식(520), 학습 시간(530), 언어 발음(540) 및 세부 설정(550)을 포함할 수 있다. Referring to Figure 5, settings 500 for multimodal-based English word learning content include basic settings 510, progress method 520, learning time 530, language pronunciation 540, and detailed settings 550. It can be included.

예를 들어, 사용자 단말은 기본 설정(510)을 통해, 멀티모달 기반의 영단어 학습 컨텐츠에 대해, 자동 재생, 텍스트 하이라이트, 반복 읽기, 사용자 인터페이스 사운드, 자동 페이지 넘김 및 받아쓰기 각각의 기능을 온오프할 수 있다. 예를 들어, 사용자 단말은 자동 재생을 온으로 설정함으로써, 멀티모달 기반의 영단어 학습 컨텐츠의 각각의 학습 단계를 자동으로 진행할 수 있다. 예를 들어, 사용자 단말은 텍스트 하이라이트를 온으로 설정함으로써, 멀티모달 기반의 영단어 학습 컨텐츠의 영단어의 정의에 대한 하이라이트(432)가 영단어의 정의에 대한 음성에 맞추어 표시될 수 있다. 예를 들어, 사용자 단말은 반복 읽기를 온으로 설정함으로써, 영단어 자체 또는 영단어의 정의에 대한 음성이 반복하여 재생될 수 있다. 예를 들어, 사용자 단말은 사용자 인터페이스 사운드를 온으로 설정함으로써, 사용자 인터페이스에 설정된 사운드가 사전 설정된 상호 작용에 대해 출력될 수 있다. 예를 들어, 사용자 단말은 자동 페이지 넘김을 온으로 설정함으로써, 하나의 영단어에 대한 모든 학습 단계가 종료된 후 다음 영단어로 자동으로 넘어갈 수 있다. 예를 들어, 사용자 단말은 받아쓰기를 온으로 설정함으로써, 멀티모달 기반의 영단어 학습 컨텐츠의 영단어의 철자 또는 정의를 입력하는 창이 표시될 수 있다.For example, the user terminal can turn on or off each function of auto-play, text highlighting, repeated reading, user interface sound, automatic page turning, and dictation for multimodal-based English word learning content through the default settings 510. You can. For example, the user terminal can automatically proceed with each learning step of multimodal-based English word learning content by setting auto play to on. For example, the user terminal sets text highlighting to on, so that the highlight 432 for the definition of an English word in multimodal-based English word learning content can be displayed in accordance with the voice for the definition of the English word. For example, the user terminal can set repeated reading to on, so that the English word itself or the voice for the definition of the English word can be played repeatedly. For example, the user terminal sets the user interface sound to on, so that the sound set in the user interface can be output for a preset interaction. For example, by setting automatic page turning to on, the user terminal can automatically move on to the next English word after all learning steps for one English word are completed. For example, by setting dictation to on, the user terminal may display a window for entering the spelling or definition of an English word in multimodal-based English word learning content.

예를 들어, 사용자 단말은 선호 방식(520)을 통해, 멀티모달 기반의 영단어 학습 컨텐츠에 대해, 빠른 진행, 기본 진행 및 전체 진행 기능을 온오프할 수 있다. 예를 들어, 빠른 진행은 단어와 애니메이션의 이미지만을 보고 다음 단어로 넘어가는 플래시(flash) 기법의 빠른 진행이 될 수 있다. 빠른 진행의 방법은 단어를 눈으로 보기, 단어의 소리를 듣기, 애니메이션의 영상 보기, 애니메이션의 이미지만 보기, 정의를 눈으로 보기, 정의의 소리를 듣기 중 선택해서 진행할 수 있다. For example, the user terminal can turn on or off the fast progress, basic progress, and full progress functions for multimodal-based English word learning content through the preference method 520. For example, fast progress can be the fast progress of the flash technique, which only looks at the image of the word and animation and then moves on to the next word. To progress quickly, you can choose between seeing the word, listening to the sound of the word, watching the animation video, only viewing the animation image, seeing the definition, or listening to the sound of the definition.

예를 들어, 기본 진행은 단어를 눈으로 보기, 단어의 소리 듣기, 애니메이션 보기, 정의를 눈으로 보기, 정의의 소리를 듣기를 진행하도록 설정된 것을 말한다. For example, the basic process is set to look at the word, listen to the sound of the word, watch the animation, look at the definition, and listen to the sound of the definition.

예를 들어, 전체 진행은 단어를 눈으로 보기, 단어의 소리 듣기, 단어를 녹음하고 재생하기, 애니메이션 보기, 정의를 눈으로 보기, 정의의 소리를 듣기, 정의를 녹음하고 재생하기를 모두 진행하도록 설정하며, 추가적으로 따라쓰기를 선택적으로 할 수 있도록 한다. For example, the entire process involves seeing the word, hearing the sound of the word, recording and playing the word, viewing the animation, seeing the definition, hearing the sound of the definition, and recording and playing the definition. It is set, and additional copying can be optionally performed.

예를 들어, 사용자 단말은 학습 시간(530)을 통해 현재까지 학습한 시간을 확인할 수 있다.For example, the user terminal can check the learning time to date through the learning time 530.

예를 들어, 사용자 단말은 언어 발음(540)을 통해 재생되는 음성을 영어 발음 또는 영국 발음 중 어느 하나로 설정할 수 있다. For example, the user terminal can set the voice played through the language pronunciation 540 to either English pronunciation or British pronunciation.

예를 들어, 사용자 단말은 세부 설정(550)을 통해 자동 진행 또는 수동 진행 중 어느 하나를 선택할 수 있다. 예를 들어, 수동 진행을 선택한 경우, 사용자 단말이 세부 설정(550)을 통해 영단어, 영단어의 정의 및 애니메이션의 동작 각각을 설정할 수 있다. For example, the user terminal can select either automatic progress or manual progress through detailed settings 550. For example, when manual progress is selected, the user terminal can set each of the English word, definition of the English word, and animation operation through detailed settings 550.

단계 S320에서, 서버는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보를 기반으로 제1 뉴럴 네트워크를 이용하는 방식 결정 모델을 통해 사용자 단말에 대한 진행 방식 정보를 결정할 수 있다.In step S320, the server may determine proceeding method information for the user terminal through a method decision model using the first neural network based on personal information about the user and information about the user terminal.

서버는 사용자에 대한 개인 정보 및 사용자 단말에 대한 정보에 대한 데이터 전처리를 통해 사용자 벡터를 생성할 수 있다. 사용자 벡터는 사용자의 나이에 대한 값, 사용자의 성별에 대한 값 및 사용자 단말의 용량과 관련된 값을 포함할 수 있다.The server may generate a user vector through data preprocessing on personal information about the user and information about the user terminal. The user vector may include a value for the user's age, a value for the user's gender, and a value related to the capacity of the user terminal.

사용자의 나이에 대한 값은 사용자의 나이를 나타내는 값이다. 예를 들어, 사용자의 나이에 대한 값은 사용자의 생년월일 및 현재 날짜를 기반으로 결정될 수 있다. 예를 들어, 사용자의 생년월일이 2020년 6월 5일이고, 현재 날짜가 2023년 7월 17일인 경우, 사용자의 나이에 대한 값은 현재 연도에서 사용자가 태어난 연도를 뺀 값에 12를 곱하고, 현재 월에서 사용자가 태어난 월을 뺀 값을 더해서 계산될 수 있고, 37로 결정될 수 있다. 즉, 예를 들어, 사용자의 나이에 대한 값은 개월 단위로 표현될 수 있다.The value for the user's age is a value representing the user's age. For example, the value for the user's age may be determined based on the user's date of birth and current date. For example, if the user's date of birth is June 5, 2020, and the current date is July 17, 2023, the value for the user's age is the current year minus the year the user was born, multiplied by 12, and It can be calculated by adding the month minus the month in which the user was born, and can be determined to be 37. That is, for example, the value of the user's age may be expressed in months.

사용자의 성별에 대한 값은 사용자의 성별을 나타내는 값이다. 예를 들어, 사용자의 성별에 대한 값은 남성인 경우에는 1 값을 가지고, 여성인 경우에는 2 값을 가질 수 있다.The value for the user's gender is a value representing the user's gender. For example, the value for the user's gender may have a value of 1 if the user is male and a value of 2 if the user is female.

사용자 단말의 용량과 관련된 값은 사용자 단말의 저장 가능한 용량을 나타내는 값이다. 예를 들어, 사용자 단말의 용량과 관련된 값은 메가바이트의 단위일 수 있다.The value related to the capacity of the user terminal is a value representing the storage capacity of the user terminal. For example, the value related to the capacity of the user terminal may be in units of megabytes.

서버는 해당 서버에 설정된 복수의 그룹 중에서 사용자 벡터와 가장 유사도가 높은 그룹의 평균 성취도 벡터를 사용자 단말에 대한 기본 성취도 벡터로 결정할 수 있다. 예를 들어, 평균 성취도 벡터는 사용자 벡터와 가장 유사도가 높은 그룹에 포함된 복수의 성실도, 복수의 발화속도, 복수의 발음의 정확도 및 복수의 성과도를 평균한 값일 수 있다. The server may determine the average achievement vector of the group with the highest similarity to the user vector among the plurality of groups set in the server as the basic achievement vector for the user terminal. For example, the average achievement vector may be the average value of multiple conscientiousness, multiple speech rates, multiple pronunciation accuracy, and multiple performance scores included in the group with the highest similarity to the user vector.

부가적으로, 예를 들어, 서버는 사전 설정된 기간동안 사용자의 나이에 대한 값, 사용자의 성별에 대한 값, 사용자 단말의 용량과 관련된 값, 성실도, 발화속도, 발음의 정확도 및 성과도를 멀티모달 기반의 영단어 학습 컨텐츠를 진행한 복수의 사용자 단말 각각으로부터 획득할 수 있다. 여기서, 성실도, 발화속도, 발음의 정확도 및 성과도는 후술한 학습 성취 평가 모델에 의해 출력된 값일 수 있다. 즉, 서버는 사전 설정된 기간동안 멀티모달 기반의 영어 학습 서비스를 진행한 복수의 사용자 단말 각각에 대해 사용자 벡터와 학습 성취도 벡터를 저장할 수 있다. 예를 들어, 사전 설정된 기간은 6개월 이상의 기간일 수 있다. 예를 들어, 서버는 복수의 그룹을 사전 설정된 기간의 간격으로 업데이트할 수 있다.Additionally, for example, the server multiplies values for the user's age, values for the user's gender, values related to the capacity of the user's terminal, sincerity, speech rate, pronunciation accuracy, and performance over a preset period of time. Modal-based English word learning content can be obtained from each of multiple user terminals. Here, sincerity, speech rate, pronunciation accuracy, and performance may be values output by the learning achievement evaluation model described later. That is, the server can store user vectors and learning achievement vectors for each of a plurality of user terminals that have performed a multimodal-based English learning service during a preset period. For example, the preset period may be a period of 6 months or longer. For example, the server may update multiple groups at preset intervals.

예를 들어, 서버는 복수의 사용자 벡터를 기반으로 뉴럴 네트워크를 이용한 클러스터링(clustering) 기법을 통해 n개의 그룹을 결정할 수 있다. 클러스터링은 유사한 속성들을 갖는 데이터를 일정한 수의 군집으로 그룹핑하는 비지도 학습을 지칭할 수 있다. 예를 들어, n개의 그룹은 복수의 사용자 벡터에 기반하여 DBSCAN(Density-Based Spatial Clustering of Applications with Noise) 기법을 통해 결정될 수 있다. 예를 들어, DBSCAN은 특정 요소(point)가 클러스터에 속하는 경우, 해당 클러스터 내 다른 많은 요소와 가까운 위치에 있어야 하는 것을 전제로 하며, 이러한 계산을 위해 직경(radius)과 최소 요소(minimum points)가 사용될 수 있다. 예를 들어, 직경은 특정 데이터 요소를 기준으로 하는 반경일 수 있고, 이를 밀도 영역(dense area)이라 지칭할 수 있다. 예를 들어, 최소 요소는 핵심 요소(core point)를 지정하기 위해 핵심 요소 주변으로 요소가 몇 개 필요한 지를 나타낼 수 있다. 또한, 데이터 세트의 각 요소는 핵심(core), 경계(border), 이상치 요소(outlier point)로 구분될 수 있다.For example, the server can determine n groups through a clustering technique using a neural network based on a plurality of user vectors. Clustering may refer to unsupervised learning that groups data with similar properties into a certain number of clusters. For example, n groups can be determined through DBSCAN (Density-Based Spatial Clustering of Applications with Noise) technique based on a plurality of user vectors. For example, DBSCAN assumes that if a specific element (point) belongs to a cluster, it must be close to many other elements in the cluster, and for this calculation, the radius and minimum points are required. can be used For example, the diameter may be a radius based on a specific data element, which may be referred to as a dense area. For example, the minimum element can indicate how many elements are needed around the core point to specify it. Additionally, each element in the data set can be divided into core, border, and outlier points.

예를 들어, 서버는 요소별로 직경의 크기를 체크하고, 주변의 요소가 몇 개 있는지를 탐색할 수 있다. 이후, 서버는 직경의 범위 내에 m개 이상의 요소가 존재하면, 해당 요소를 핵심 요소로 결정할 수 있다. 그리고, 서버는 핵심 요소로부터 직경의 범위 내에 포함된 요소를 경계 요소로 결정할 수 있다. 그리고, 서버는 핵심 요소로부터 직경의 범위 내에 포함되지 않은 요소는 이상치 요소로 결정할 수 있고, 상기 이상치 요소는 해당 클러스터에서 제외될 수 있다. 또한, 서버는 핵심 요소들 사이의 거리가 직경보다 작을 경우, 해당 요소들을 동일한 클러스터로 분류할 수 있다. 이를 통해, 서버는 복수의 사용자 벡터에 대해 나이, 성별 및 사용자 단말의 용량을 기준으로 분류된 n개의 그룹을 결정할 수 있다. For example, the server can check the size of the diameter of each element and discover how many surrounding elements there are. Afterwards, if there are more than m elements within the diameter range, the server may determine that element to be a key element. Additionally, the server may determine an element included within the diameter range from the core element as a boundary element. Additionally, the server may determine elements that are not included within the diameter range from the core element as outlier elements, and the outlier elements may be excluded from the corresponding cluster. Additionally, the server can classify core elements into the same cluster if the distance between them is less than the diameter. Through this, the server can determine n groups classified based on age, gender, and capacity of the user terminal for a plurality of user vectors.

예를 들어, 서버는 n개의 그룹 각각에 포함된 복수의 사용자 단말에 대한 학습 성취도 벡터를 평균하여 n개의 그룹 각각에 대해 평균 성취도 벡터를 설정할 수 있다.For example, the server may set an average achievement vector for each of the n groups by averaging the learning achievement vectors for a plurality of user terminals included in each of the n groups.

예를 들어, 사용자 단말과 가장 유사도가 높은 그룹은 복수의 그룹 중에서 사용자 단말에 대한 사용자 벡터와 해당 그룹의 중심 벡터 사이의 거리가 가장 짧은 그룹일 수 있다. 여기서, 중심 벡터는 해당 그룹에 포함된 복수의 사용자 벡터를 평균한 벡터일 수 있다.For example, the group with the highest similarity to the user terminal may be the group with the shortest distance between the user vector for the user terminal and the center vector of the group among the plurality of groups. Here, the center vector may be a vector obtained by averaging a plurality of user vectors included in the corresponding group.

예를 들어, 사용자 벡터 및 기본 성취도 벡터가 방식 결정 모델에 입력되는 것에 기반하여 사용자 단말에 대한 진행 방식 정보가 출력될 수 있다. 여기서, 진행 방식은 사용자 단말에 대한 멀티모달 기반의 영단어 학습 컨텐츠의 초기 진행 방식일 수 있다. 예를 들어, 진행 방식 정보는 학습 단계의 개수, 학습 단계별 반복 횟수, 학습 단계별 총 진행시간 및 학습 단계별로 영단어, 영단어의 정의 및 애니메이션 각각과 관련된 적어도 하나의 동작에 대한 값을 포함하고, 학습 단계별 영단어와 관련된 재생 시간, 학습 단계별 영단어와 관련된 최대 녹음 시간, 학습 단계별 영단어의 정의와 관련된 재생 시간 및 학습 단계별 영단어의 정의와 관련된 최대 녹음 시간을 포함할 수 있다.For example, progress method information for the user terminal may be output based on the user vector and the basic achievement vector being input to the method decision model. Here, the progress method may be the initial progress method of multimodal-based English word learning content for the user terminal. For example, the progress method information includes the number of learning steps, the number of repetitions for each learning step, the total progress time for each learning step, and values for at least one action related to each of the English words, definitions of English words, and animations for each learning step, It may include the playback time related to the English word, the maximum recording time related to the English word at each learning stage, the playback time related to the definition of the English word at each learning stage, and the maximum recording time related to the definition of the English word at each learning stage.

학습 단계의 개수는 멀티모달 기반의 영단어 학습 컨텐츠에서 하나의 영단어에 대해 진행할 학습 단계의 개수일 수 있다. 학습 단계별 반복 횟수는 멀티모달 기반의 영단어 학습 컨텐츠의 각 학습 단계가 반복되는 횟수를 나타낼 수 있다. 학습 단계별 총 진행시간은 멀티모달 기반의 영단어 학습 컨텐츠의 각 학습 단계가 진행되는 시간을 나타낼 수 있다. The number of learning steps may be the number of learning steps to be performed for one English word in multimodal-based English word learning content. The number of repetitions at each learning stage may indicate the number of repetitions of each learning stage of multimodal-based English word learning content. The total progress time for each learning step may represent the time taken for each learning step of multimodal-based English word learning content.

학습 단계별로 영단어, 영단어의 정의 및 애니메이션 각각과 관련된 적어도 하나의 동작에 대한 값은 멀티모달 기반의 영단어 학습 컨텐츠에서 영단어, 영단어의 정의 및 애니메이션 각각의 동작을 나타내는 값일 수 있다. At each learning stage, the value for at least one operation related to each English word, English word definition, and animation may be a value representing each operation of the English word, English word definition, and animation in multimodal-based English word learning content.

예를 들어, 영단어와 관련된 적어도 하나의 동작에 대한 값은 영단어의 표시 동작에 대한 값, 영단어의 재생 동작에 대한 값, 영단어의 읽기 동작에 대한 값 중 영단어의 녹음에 대한 값 적어도 하나를 포함할 수 있다. 예를 들어, 영단어의 정의와 관련된 적어도 하나의 동작에 대한 값은 영단어의 정의의 표시 동작에 대한 값, 영단어의 정의의 재생 동작에 대한 값, 영단어의 정의의 읽기 동작에 대한 값, 영단어의 정의의 하이라이트 표시 동작에 대한 값 또는 영단어의 정의의 녹음에 대한 값 중 적어도 하나를 포함할 수 있다. 예를 들어, 애니메이션과 관련된 적어도 하나의 동작에 대한 값은 디스플레이 동작에 대한 값 또는 활성화 동작에 대한 값 중 적어도 하나를 포함할 수 있다. For example, the value for at least one operation related to an English word may include at least one value for recording the English word among the values for the display operation of the English word, the value for the playback operation of the English word, and the value for the reading operation of the English word. You can. For example, the value for at least one operation related to the definition of an English word may include a value for a display operation of the definition of an English word, a value for a playback operation of the definition of an English word, a value for a reading operation of the definition of an English word, and a value for the operation of reading the definition of an English word. It may include at least one of a value for a highlight display operation or a value for recording a definition of an English word. For example, the value for at least one action related to animation may include at least one of a value for a display action or a value for an activation action.

여기서, 표시 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 표시하는 동작일 수 있다. 예를 들어, 표시 동작에 대한 값이 1인 경우, 멀티모달 기반의 영단어 학습 컨텐츠에 영단어 또는 영단어의 정의를 표시하는 동작을 지시하고, 표시 동작에 대한 값이 2인 경우, 텍스트를 입력하는 창을 표시하는 동작을 지시할 수 있다.Here, the display operation may be an operation of displaying multimodal-based English word learning content. For example, if the value for the display action is 1, the action of displaying an English word or definition of an English word is instructed in multimodal-based English word learning content, and if the value for the display action is 2, a window for entering text is indicated. You can instruct an action to display .

여기서, 재생 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 영단어 또는 영단어의 정의에 대한 음성을 재생하는 동작일 수 있다. 영단어 또는 영단어의 정의에 대한 음성은 사전 녹음되어 서버에 사전 저장될 수 있다. 예를 들어, 재생 동작에 대한 값은 0 또는 1의 값을 가질 수 있다. 예를 들어, 재생 동작에 대한 값이 1인 경우, 멀티모달 기반의 영단어 학습 컨텐츠에 영단어 또는 영단어의 정의에 대한 음성을 재생하는 동작을 지시할 수 있다. 예를 들어, 재생 동작에 대한 값이 0인 경우, 멀티모달 기반의 영단어 학습 컨텐츠에 영단어 또는 영단어의 정의에 대한 음성을 재생하지 않을 수 있다. Here, the playback operation may be an operation of playing a voice for an English word or a definition of an English word in multimodal-based English word learning content. Voices for English words or definitions of English words may be pre-recorded and pre-stored on a server. For example, the value for the playback operation may have a value of 0 or 1. For example, if the value for the playback action is 1, the action of playing back an English word or a voice for the definition of an English word can be instructed in multimodal-based English word learning content. For example, if the value for the playback operation is 0, the audio for the English word or the definition of the English word may not be played in the multimodal-based English word learning content.

여기서, 읽기 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 표시된 영단어 또는 영단어의 정의를 읽는 것을 유도하는 동작일 수 있다. 예를 들어, 읽기 동작에 대한 값은 0 또는 1의 값을 가질 수 있다. 즉, 예를 들어, 읽기 동작에 대한 값이 1인 경우, 해당 동작을 지시할 수 있다. 예를 들어, 읽기를 유도하는 동작은 "읽어보세요" 또는 "read"라는 문구를 영단어 또는 영단어의 정의 주변에 표시하는 동작을 포함할 수 있다. 예를 들어, 읽기 동작에 대한 값이 0인 경우, 멀티모달 기반의 영단어 학습 컨텐츠에 표시된 영단어 또는 영단어의 정의에 대한 읽기를 유도하지 않을 수 있다.Here, the reading operation may be an operation that leads to reading an English word or a definition of an English word displayed in multimodal-based English word learning content. For example, the value for a read operation may have a value of 0 or 1. That is, for example, if the value for a read operation is 1, the corresponding operation can be indicated. For example, an action to encourage reading may include an action of displaying the phrase “read” or “read” around an English word or a definition of an English word. For example, if the value for the reading operation is 0, reading of English words or definitions of English words displayed in multimodal-based English word learning content may not be encouraged.

여기서, 하이라이트 표시 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 영단어의 정의에 대한 음성이 재생되는 동안, 영단어의 정의에서 재생되고 있는 단어에 대해 실시간으로 하이라이트를 표시하는 동작일 수 있다. 예를 들어, 하이라이트 표시 동작에 대한 값은 0 또는 1의 값을 가질 수 있다. 예를 들어, 하이라이트 표시 동작에 대한 값이 1인 경우, 해당 동작을 지시할 수 있다. 예를 들어, 하이라이트 표시 동작에 대한 값이 0인 경우, 해당 동작을 수행하지 않는 것을 지시할 수 있다.Here, the highlight display operation may be an operation of displaying a highlight in real time for a word being played in the definition of an English word while a voice for the definition of an English word is played in multimodal-based English word learning content. For example, the value for the highlight display operation may have a value of 0 or 1. For example, if the value for the highlight display operation is 1, the corresponding operation can be indicated. For example, if the value for the highlight display operation is 0, it may indicate not performing the corresponding operation.

여기서, 디스플레이 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 애니메이션을 표시하는 동작일 수 있다. 예를 들어, 디스플레이 동작에 대한 값은 0 또는 1의 값을 가질 수 있다. 예를 들어, 디스플레이 동작에 대한 값이 1인 경우에는 해당 동작을 지시할 수 있다. 예를 들어, 디스플레이 동작에 대한 값이 0인 경우, 해당 동작을 수행하지 않는 것을 지시할 수 있다.Here, the display operation may be an operation of displaying animation on multimodal-based English word learning content. For example, the value for display operation may have a value of 0 or 1. For example, if the value for the display operation is 1, the corresponding operation can be indicated. For example, if the value for the display operation is 0, it may indicate not performing the corresponding operation.

여기서, 활성화 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 표시된 애니메이션을 활성화하는 동작일 수 있다. 예를 들어, 활성화 동작에 대한 값이 1인 경우에는 애니메이션이 움직이는 동작을 지시할 수 있고, 활성화 동작에 대한 값이 2인 경우에는 애니메이션이 멈춰 있는 상태를 지시할 수 있다.Here, the activation operation may be an operation of activating an animation displayed in multimodal-based English word learning content. For example, if the value for the activation motion is 1, it can indicate a motion in which the animation is moving, and if the value for the activation motion is 2, it can indicate a state in which the animation is stopped.

여기서, 녹음 동작은 멀티모달 기반의 영단어 학습 컨텐츠에 표시된 영단어 또는 영단어의 정의를 녹음하고 청취하는 동작일 수 있다. 예를 들어, 녹음 동작에 대한 값은 0 또는 1의 값을 가질 수 있다. 즉, 예를 들어, 녹음 동작에 대한 값이 1인 경우, 해당 동작을 지시할 수 있다. 예를 들어, 녹음 동작에 대한 값이 0인 경우, 해당 동작을 수행하지 않는 것을 지시할 수 있다.Here, the recording operation may be an operation of recording and listening to an English word or definition of an English word displayed in multimodal-based English word learning content. For example, the value for the recording operation may have a value of 0 or 1. That is, for example, if the value for the recording operation is 1, the corresponding operation can be indicated. For example, if the value for the recording operation is 0, it may indicate not performing the corresponding operation.

학습 단계별 영단어와 관련된 재생 시간은 영단어의 재생 동작에 소요되는 시간일 수 있다. 학습 단계별 영단어와 관련된 최대 녹음 시간은 영단어의 녹음 동작에 할당된 최대 시간일 수 있다. 학습 단계별 영단어의 정의와 관련된 재생 시간은 영단어의 정의의 재생 동작에 소요되는 시간일 수 있다. 학습 단계별 영단어의 정의와 관련된 최대 녹음 시간은 영단어의 정의의 녹음 동작에 할당된 최대 시간일 수 있다.The playback time related to an English word at each learning stage may be the time required to play the English word. The maximum recording time related to an English word at each learning stage may be the maximum time allocated to the recording operation of the English word. The playback time related to the definition of an English word at each learning stage may be the time required to play the definition of the English word. The maximum recording time related to the definition of an English word at each learning stage may be the maximum time allocated to the recording operation of the definition of an English word.

예를 들어, 제1 뉴럴 네트워크는 제1 입력 레이어, 하나 이상의 제1 히든 레이어 및 제1 출력 레이어를 포함할 수 있다. 하나 이상의 제1 히든 레이어는 하나 이상의 컨벌루션 레이어 및 하나 이상의 풀링 레이어를 포함할 수 있다.For example, a first neural network may include a first input layer, one or more first hidden layers, and a first output layer. One or more first hidden layers may include one or more convolutional layers and one or more pooling layers.

예를 들어, 컨벌루션 레이어에서 사용자 벡터 및 기본 성취도 벡터 각각에 대해 필터링될 수 있고, 컨벌루션 레이어를 통해 특징 맵(feature map)이 형성될 수 있다.For example, each user vector and basic achievement vector may be filtered in a convolutional layer, and a feature map may be formed through the convolutional layer.

예를 들어, 풀링 레이어에서 상기 형성된 특징 맵에 기반하여 차원 축소를 위해 특징과 관련된 고정 벡터를 선별하고, 상기 형성된 특징 맵에서 서브-샘플링을 수행함으로써, 상기 벡터화된 시계열 데이터에서 채색 패턴을 나타내는 특징을 추출할 수 있다. 예를 들어, 풀링 레이어는 가장 큰 값을 추출하는 맥스 풀링 레이어일 수 있다. 예를 들어, 풀링 레이어는 평균 값을 추출하는 평균 풀링 레이어일 수 있다. 예를 들어, 이때, 제1 뉴럴 네트워크의 파라미터는 상기 컨벌루션 레이어 및 상기 풀링 레이어와 관련된 파라미터(특징 맵의 크기, 필터의 크기, 깊이, 스트라이드, 제로 패딩)를 포함할 수 있다.For example, in a pooling layer, fixed vectors related to features are selected for dimensionality reduction based on the formed feature map, and sub-sampling is performed on the formed feature map, thereby representing a coloring pattern in the vectorized time series data. can be extracted. For example, the pooling layer may be a max pooling layer that extracts the largest value. For example, the pooling layer may be an average pooling layer that extracts average values. For example, at this time, the parameters of the first neural network may include parameters (size of feature map, size of filter, depth, stride, zero padding) related to the convolutional layer and the pooling layer.

복수의 사용자 벡터, 복수의 학습 성취도 벡터 및 복수의 정답 진행 방식 정보로 구성된 각각의 제1 뉴럴 네트워크의 학습을 위한 데이터는 상기 제1 뉴럴 네트워크의 상기 제1 입력 레이어에 입력되고, 상기 하나 이상의 제1 히든 레이어 및 제1 출력 레이어를 통과하여 제1 출력 벡터를 출력하고, 상기 제1 출력 벡터는 상기 제1 출력 레이어에 연결된 제1 손실함수 레이어에 입력되고, 상기 제1 손실함수 레이어는 상기 제1 출력 벡터와 각각의 학습 데이터에 대한 제1 정답 벡터를 비교하는 제1 손실 함수를 이용하여 제1 손실값을 출력하고, 상기 제1 뉴럴 네트워크의 파라미터가 상기 제1 손실값이 작아지는 방향으로 학습될 수 있다.Data for learning of each first neural network consisting of a plurality of user vectors, a plurality of learning achievement vectors, and a plurality of correct answer progress method information is input to the first input layer of the first neural network, and the one or more first neural networks 1 A first output vector is output through a hidden layer and a first output layer, the first output vector is input to a first loss function layer connected to the first output layer, and the first loss function layer is the first loss function layer. 1 A first loss value is output using a first loss function that compares the output vector and the first correct answer vector for each training data, and the parameters of the first neural network are adjusted in a direction in which the first loss value becomes smaller. It can be learned.

예를 들어, 복수의 사용자 벡터, 복수의 학습 성취도 벡터 및 복수의 정답 진행 방식 정보는 상기 서버에 의해 사전 획득될 수 있다. 예를 들어, 학습 성취도 벡터는 성실도, 발화속도, 발음의 정확도 및 성과도를 포함할 수 있다. 예를 들어, 학습 데이터로 사용되는 하나의 사용자 벡터 및 하나의 학습 성취도 벡터는 하나의 정답 진행 방식 정보와 하나의 세트로 구성될 수 있다. 예를 들어, 복수의 세트가 서버에 사전 저장될 수 있다.For example, a plurality of user vectors, a plurality of learning achievement vectors, and a plurality of correct answer progress method information may be obtained in advance by the server. For example, the learning achievement vector may include sincerity, speech rate, pronunciation accuracy, and performance. For example, one user vector and one learning achievement vector used as learning data may consist of one correct answer progress method information and one set. For example, multiple sets may be pre-stored on a server.

이를 통해, 서버는 사용자 단말의 초기 정보에 따라 적합한 학습 진행 방식을 결정하도록 방식 결정 모델을 학습시킬 수 있다.Through this, the server can train a method decision model to determine an appropriate learning method according to the initial information of the user terminal.

단계 S33O에서, 서버는 진행 방식 정보를 포함하는 학습 개시 메시지를 사용자 단말에게 전송할 수 있다. In step S33O, the server may transmit a learning start message including progress method information to the user terminal.

학습 개시 메시지는 사용자 단말이 멀티모달 기반의 영단어 학습 컨텐츠에 대한 학습을 개시하는 것을 허용하는 메시지일 수 있다. The learning start message may be a message that allows the user terminal to start learning about multimodal-based English word learning content.

예를 들어, 진행 방식 정보에 기반하여 사용자 단말에 디스플레이된 멀티모달 기반의 영단어 학습 컨텐츠가 진행될 수 있다. 사용자 단말은 설치된 멀티모달 기반의 영어 학습 서비스에 대한 프로그램이 실행된 상태에서, 서버에게 제1 요청 메시지를 전송할 수 있다. 사용자 단말이 서버로부터 학습 개시 메시지를 수신한 것에 기반하여, 멀티모달 기반의 영단어 학습 컨텐츠가 진행 방식 정보에 따라 실행될 수 있다.For example, multimodal-based English word learning content displayed on the user terminal may be progressed based on progress method information. The user terminal may transmit a first request message to the server while the program for the installed multimodal-based English learning service is running. Based on the user terminal receiving a learning start message from the server, multimodal-based English word learning content may be executed according to the progress method information.

예를 들어, 멀티모달 기반의 영단어 학습 컨텐츠는 기본 인터페이스에 대한 정보, 영단어에 대한 정보, 영단어의 정의에 대한 정보 및 영단어와 관련된 애니메이션에 대한 정보, 복수의 보상 정보, 복수의 영단어 게임에 대한 정보를 포함할 수 있다. 기본 인터페이스에 대한 정보는 멀티모달 기반의 영단어 학습 컨텐츠에 대한 기본적인 인터페이스에 대한 정보를 포함할 수 있다. 복수의 보상 정보는 멀티모달 기반의 영단어 학습 컨텐츠의 학습 달성율에 따라 사용자 단말에게 지급되는 보상에 대한 정보를 포함할 수 있다. 복수의 영단어 게임에 대한 정보는 사용자 단말이 영단어의 학습을 위해 진행하는 복수의 영단어 게임에 대한 정보를 포함할 수 있다. 영단어에 대한 정보는 복수의 영단어 및 복수의 영단어 각각에 대한 음성 정보를 포함할 수 있다. 영단어의 정의에 대한 정보는 복수의 영단어의 정의, 복수의 영단어의 정의 각각에 대한 음성 정보, 복수의 영단어의 정의 각각에 대한 하이라이트 정보를 포함할 수 있다. 영단어와 관련된 애니메이션에 대한 정보는 복수의 영단어 각각에 매칭되는 복수의 애니메이션 이미지를 포함할 수 있다.For example, multimodal-based English word learning content includes information about the basic interface, information about English words, information about the definition of English words, information about animations related to English words, multiple reward information, and information about multiple English word games. may include. Information about the basic interface may include information about the basic interface for multimodal-based English word learning content. The plurality of reward information may include information about rewards paid to the user terminal according to the learning achievement rate of multimodal-based English word learning content. The information about the plurality of English word games may include information about the plurality of English word games that the user terminal plays to learn English words. Information about English words may include a plurality of English words and audio information for each of the plurality of English words. Information about the definition of an English word may include definitions of a plurality of English words, audio information for each definition of the plurality of English words, and highlight information for each definition of the plurality of English words. Information about animations related to English words may include a plurality of animation images matching each of a plurality of English words.

단계 S340에서, 서버는 사용자 단말로부터 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보를 수신할 수 있다.In step S340, the server may receive information about the progress of multimodal-based English word learning content from the user terminal.

멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보는 사용자 단말이 멀티모달 기반의 영단어 학습 컨텐츠를 통해 학습한 결과를 포함할 수 있다. 예를 들어, 멀티모달 기반의 학습 컨텐츠의 진행 결과에 대한 정보는 각각의 영단어에 대한 학습 단계별 달성율 및 각각의 영단어에 대한 복수의 녹음된 음성 파일을 포함할 수 있다. 여기서, 학습 단계별 달성율은 각 학습 단계에 대해 사용자 단말이 반복한 횟수에서 진행 방식 정보에 포함된 반복 횟수를 나눈 값일 수 있다. 각각의 영단어에 대한 복수의 녹음된 음성 파일은 사용자 단말이 학습 단계에서 진행된 녹음 동작에 따라 영단어 자체의 발화를 녹음한 파일 및 영단어의 정의에 대한 발화를 녹음한 파일을 포함할 수 있다. 즉, 하나의 영단어에 대해 복수 개의 음성 파일이 녹음될 수 있다. Information about the progress results of the multimodal-based English word learning content may include results learned by the user terminal through the multimodal-based English word learning content. For example, information about the progress of multimodal-based learning content may include the achievement rate of each learning stage for each English word and a plurality of recorded voice files for each English word. Here, the achievement rate for each learning step may be the number of repetitions included in the progress method information divided by the number of repetitions by the user terminal for each learning step. The plurality of recorded voice files for each English word may include a file recording the utterance of the English word itself and a file recording an utterance about the definition of the English word according to the recording operation performed by the user terminal in the learning stage. That is, multiple voice files can be recorded for one English word.

단계 S350에서, 서버는 멀티모달 기반의 학습 컨텐츠의 진행 결과에 대한 정보에 기반하여 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델을 통해 사용자 단말에 대한 학습 성취도를 결정할 수 있다.In step S350, the server may determine the learning achievement level for the user terminal through a learning achievement evaluation model using a second neural network based on information about the progress results of multimodal-based learning content.

예를 들어, 서버는 멀티모달 기반의 학습 컨텐츠의 진행 결과에 대한 정보에 대한 데이터 전처리를 통해 학습 단계별 달성율로 구성된 제1 평가 벡터를 영단어별로 생성할 수 있다. 예를 들어, 학습 단계별 달성율은 0 이상의 값을 가질 수 있다.For example, the server may generate a first evaluation vector composed of the achievement rate of each learning stage for each English word through data preprocessing of information on the progress results of multimodal-based learning content. For example, the achievement rate for each learning stage may have a value of 0 or more.

예를 들어, 서버는 복수의 녹음된 음성 파일에 대한 데이터 전처리를 통해 복수의 제2 평가 벡터 및 복수의 제3 평가 벡터를 영단어별로 생성할 수 있다. 하나의 영단어에 대한 복수의 제2 평가 벡터는 음성 파일이 먼저 녹음된 순서로 배열될 수 있다. 하나의 영단어에 대한 복수의 제3 평가 벡터는 음성 파일이 먼저 녹음된 순서로 배열될 수 있다.For example, the server may generate a plurality of second evaluation vectors and a plurality of third evaluation vectors for each English word through data preprocessing of a plurality of recorded voice files. A plurality of second evaluation vectors for one English word may be arranged in the order in which the voice file was recorded first. A plurality of third evaluation vectors for one English word may be arranged in the order in which the voice file was recorded first.

예를 들어, 서버는 음성 파일의 파형을 일정 프레임으로 나누고, 시간 영역에서 주파수 영역으로 변환시킴으로써, 서로 다른 주파수의 합의 형태로 변환할 수 있다. 예를 들어, 서버는 과거의 일정 개수의 샘플 값들에 계수를 각각 곱하고 이를 총 합한 값으로 현재의 샘플 값을 예측하는 방식의 선형 예측 계수(LPC) 기술 또는 각 대역통과 필터를 통과한 출력의 에너지 값을 음성 특징으로 사용하는 주파수 대역별 에너지 기술 등과 같이 다양한 음성 특징 추출 기술을 통해 음성 파일에 대한 데이터 전처리를 수행할 수 있다. For example, the server divides the waveform of the voice file into certain frames and converts it from the time domain to the frequency domain, thereby converting it into a sum of different frequencies. For example, the server uses linear prediction coefficient (LPC) technology, which predicts the current sample value by multiplying a certain number of past sample values by a coefficient and predicting the current sample value using the sum of the coefficients, or the energy of the output that passed through each band-pass filter. Data preprocessing for voice files can be performed through various voice feature extraction technologies, such as energy technology for each frequency band that uses values as voice features.

이때, 서버는 각 음성 파일에 대한 데이터 전처리를 통해 피치(pitch)와 관련된 값, 진폭(amplitude)과 관련된 값, 음성 고조대역과 관련된 값, 상성음 에너지와 관련된 값 및 상성음과 잡음의 에너지 비율과 관련된 값을 결정할 수 있다. 피치는 기본 주파수, 즉, F0를 의미하고, 음성의 주기적 특성을 나타낼 수 있다. 여기서, 진폭은 소리의 크기를 결정하는 요소이다. 여기서, 음성 고조대역은 성도의 공명을 나타내는 것으로, 음성 신호를 주파수 영역으로 변환하여 주파수 에너지의 정점을 연결한 선들을 의미한다. 이때, 가장 낮은 주파수 정점부터 F1, F2, F3의 순서로 표현하며, 일반적으로 모음에 대해 F1 내지 F3 주파수 영역에서 높은 에너지가 나타날 수 있다. 여기서, 상성음은 기본 주파수의 배수 주파수 성분으로, 상성음 에너지는 기본 주파수의 배수 주파수에 대한 에너지이다.At this time, the server preprocesses data for each voice file to determine values related to pitch, values related to amplitude, values related to voice high-pitched sound, values related to energy of phase sounds, and the energy ratio between phase sounds and noise. The value associated with can be determined. Pitch refers to the fundamental frequency, that is, F0, and can represent the periodic characteristics of the voice. Here, amplitude is a factor that determines the loudness of sound. Here, the voice high-pitched band represents the resonance of the vocal tract, and refers to lines connecting the peaks of frequency energy by converting the voice signal into the frequency domain. At this time, it is expressed in the order of F1, F2, and F3, starting from the lowest frequency peak, and generally, for vowels, high energy may appear in the F1 to F3 frequency range. Here, the phase tone is a frequency component that is a multiple of the fundamental frequency, and the phase tone energy is the energy for a frequency that is a multiple of the fundamental frequency.

예를 들어, 피치와 관련된 값은 피치 주기의 평균 변화율을 포함할 수 있다. 진폭과 관련된 값은 진폭 변화율의 평균값을 포함할 수 있다. 음성 고조대역과 관련된 값은 음성 고조대역의 진폭 및 음성 고조대역의 대역폭을 포함할 수 있다. 상성음 에너지와 관련된 값은 주파수가 증가함에 따른 상성음 에너지의 감소율, 저주파에서의 상성음 에너지와 고주파에서의 에너지의 비율 및 상성음의 진폭을 포함할 수 있다. 상성음과 잡음의 에너지 비율은 복수의 주파수 범위에 대한 상성음과 잡음의 에너지 비율을 포함할 수 있다. 발화 시간은 녹음된 시간 중에서 사용자가 영단어를 발화한 시간일 수 있다. 초당 평균 음절수는 영단어의 정의에 대한 발화 시간에 전체 음절수를 나눈 값으로 결정될 수 있다.For example, a value related to pitch may include the average rate of change of the pitch period. Values related to amplitude may include the average value of the rate of change of amplitude. Values related to the voice high-pitched band may include the amplitude of the voice high-pitched band and the bandwidth of the voice high-pitched band. Values related to phase sound energy may include the rate of decrease of phase sound energy as frequency increases, the ratio of phase sound energy at low frequencies to energy at high frequencies, and the amplitude of phase sound. The energy ratio of the matched sound and the noise may include the energy ratio of the matched sound and the noise for a plurality of frequency ranges. The utterance time may be the time when the user uttered the English word among the recorded times. The average number of syllables per second can be determined by dividing the total number of syllables by the utterance time for the definition of an English word.

예를 들어, 제2 평가 벡터는, 사용자가 영단어 자체를 녹음한 음성 파일에 대한, 피치 주기와 관련된 값, 진폭과 관련된 값, 기본 주파수와 관련된 값, 상성음 에너지와 관련된 값, 상성음과 잡음의 에너지 비율과 관련된 값 및 발화 시간으로 구성될 수 있다. 예를 들어, 영단어 자체가 "airplane"인 경우, 제2 평가 벡터는, "airplane"을 사용자가 녹음한 음성 파일에 대한, 피치 주기와 관련된 값, 진폭과 관련된 값, 기본 주파수와 관련된 값, 상성음 에너지와 관련된 값, 상성음과 잡음의 에너지 비율과 관련된 값 및 발화 시간을 포함할 수 있다. For example, the second evaluation vector is a value related to the pitch period, a value related to the amplitude, a value related to the fundamental frequency, a value related to the energy of the consonant, a value related to the consonant and noise, for a voice file in which the user recorded the English word itself. It can be composed of a value related to the energy rate and ignition time. For example, if the English word itself is "airplane", the second evaluation vector is a value related to the pitch period, a value related to the amplitude, a value related to the fundamental frequency, and a phase for a voice file recorded by the user of "airplane". It may include values related to voice energy, values related to the energy ratio of phased sounds and noise, and speech time.

예를 들어, 제3 평가 벡터는, 사용자가 영단어의 정의를 녹음한 음성 파일에 대한, 피치 주기와 관련된 값, 진폭과 관련된 값, 기본 주파수와 관련된 값, 상성음 에너지와 관련된 값, 상성음과 잡음의 에너지 비율과 관련된 값 및 초당 평균 음절수를 포함할 수 있다. 예를 들어, 영단어 자체가 "airplane"인 경우, 제3 평가 벡터는, "airplane"의 정의를 사용자가 녹음한 음성 파일에 대한, 피치 주기와 관련된 값, 진폭과 관련된 값, 기본 주파수와 관련된 값, 상성음 에너지와 관련된 값 및 상성음과 잡음의 에너지 비율과 관련된 값 및 초당 평균 음절수를 포함할 수 있다.For example, the third evaluation vector is a value related to the pitch period, a value related to the amplitude, a value related to the fundamental frequency, a value related to the energy of the consonant, a value related to the consonant and the consonant, for a voice file in which the user recorded the definition of an English word. It may include values related to the energy rate of the noise and the average number of syllables per second. For example, if the English word itself is "airplane", the third evaluation vector is the definition of "airplane" for the voice file recorded by the user, a value related to the pitch period, a value related to the amplitude, and a value related to the fundamental frequency. , it may include values related to the energy of phase consonants, values related to the energy ratio of phase consonants and noise, and the average number of syllables per second.

예를 들어, 제1 평가 벡터, 복수의 제2 평가 벡터 및 복수의 제3 평가 벡터가 제2 뉴럴 네트워크를 이용하는 학습 성취 평가 모델에 입력되는 것에 기반하여 사용자 단말에 대한 성실도, 발화속도, 발음의 정확도 및 성과도가 영단어별로 출력될 수 있다.For example, the fidelity, speech rate, and pronunciation for the user terminal are based on the first evaluation vector, the plurality of second evaluation vectors, and the plurality of third evaluation vectors being input to the learning achievement evaluation model using the second neural network. Accuracy and performance can be output for each English word.

예를 들어, 제2 뉴럴 네트워크는 제2 입력 레이어, 하나 이상의 제2 히든 레이어 및 제2 출력 레이어를 포함할 수 있다. 복수의 제1 평가 벡터, 복수의 제2 평가 벡터, 복수의 제3 평가 벡터, 복수의 기준 벡터 및 복수의 정답 학습 성취도 벡터로 구성된 각각의 제2 뉴럴 네트워크의 학습을 위한 데이터는 상기 제2 뉴럴 네트워크의 상기 제2 입력 레이어에 입력되고, 상기 하나 이상의 제2 히든 레이어 및 제2 출력 레이어를 통과하여 제2 출력 벡터를 출력하고, 상기 제2 출력 벡터는 상기 제2 출력 레이어에 연결된 제2 손실함수 레이어에 입력되고, 상기 제2 손실함수 레이어는 상기 제2 출력 벡터와 각각의 학습 데이터에 대한 제2 정답 벡터를 비교하는 제2 손실 함수를 이용하여 제2 손실값을 출력하고, 상기 제2 뉴럴 네트워크의 파라미터가 상기 제2 손실값이 작아지는 방향으로 학습될 수 있다.For example, a second neural network may include a second input layer, one or more second hidden layers, and a second output layer. Data for training each second neural network consisting of a plurality of first evaluation vectors, a plurality of second evaluation vectors, a plurality of third evaluation vectors, a plurality of reference vectors, and a plurality of correct answer learning achievement vectors are the second neural network. is input to the second input layer of a network, passes through the one or more second hidden layers and a second output layer, and outputs a second output vector, and the second output vector is connected to the second output layer. is input to a function layer, and the second loss function layer outputs a second loss value using a second loss function that compares the second output vector with the second correct vector for each training data, and the second loss function layer Parameters of the neural network may be learned in a direction that reduces the second loss value.

학습을 위해 사용되는 복수의 제1 평가 벡터, 복수의 제2 평가 벡터, 복수의 제3 평가 벡터는 멀티모달 기반의 영단어 학습 컨텐츠를 통해 수집한 데이터를 기반으로 획득될 수 있다. 학습을 위해 사용되는 복수의 제1 평가 벡터는 영단어별로 하나씩 구성될 수 있고, 복수의 제2 평가 벡터 및 복수의 제3 평가 벡터는 복수의 영단어 각각에 대해 복수 개로 구성될 수 있다. 여기서, 기준 벡터는 제2 평가 벡터 및 제3 평가 벡터와 비교하여 발화속도 및 발음의 정확도를 평가하기 위한 기준이 되는 벡터이며, 복수의 기준 벡터는 영단어별 영단어 자체에 대한 기준 벡터 및 영단어의 정의에 대한 기준 벡터로 서버에 사전 저장될 수 있다. 영단어 자체에 대한 기준 벡터는 피치 주기와 관련된 기준 값, 진폭과 관련된 기준 값, 기본 주파수와 관련된 기준 값, 상성음 에너지와 관련된 기준 값, 상성음과 잡음의 에너지 비율과 관련된 기준 값 및 기준 발화 시간을 포함할 수 있다. 영단어의 정의에 대한 기준 벡터는 피치 주기와 관련된 기준 값, 진폭과 관련된 기준 값, 기본 주파수와 관련된 기준 값, 상성음 에너지와 관련된 기준 값, 상성음과 잡음의 에너지 비율과 관련된 기준 값 및 초당 기준 음절수를 포함할 수 있다. 예를 들어, 학습을 위해 사용되는 하나의 제1 평가 벡터는 복수의 제2 평가 벡터 및 복수의 제3 평가 벡터와 하나의 세트로 구성될 수 있다. 예를 들어, 복수의 세트가 서버에 사전 저장될 수 있다.A plurality of first evaluation vectors, a plurality of second evaluation vectors, and a plurality of third evaluation vectors used for learning may be obtained based on data collected through multimodal-based English word learning content. The plurality of first evaluation vectors used for learning may be configured one for each English word, and the plurality of second evaluation vectors and the plurality of third evaluation vectors may be configured for each of the plurality of English words. Here, the reference vector is a vector that serves as a standard for evaluating the accuracy of speech rate and pronunciation compared to the second and third evaluation vectors, and the plurality of reference vectors are reference vectors for the English word itself for each English word and definitions of the English word. It can be pre-stored on the server as a reference vector for . The reference vectors for the English word itself are: a reference value related to the pitch period, a reference value related to the amplitude, a reference value related to the fundamental frequency, a reference value related to the energy of the overtone, a reference value related to the energy ratio of the overtone to the noise, and a reference time of speech. may include. The reference vectors for the definition of an English word are: a reference value related to the pitch period, a reference value related to the amplitude, a reference value related to the fundamental frequency, a reference value related to the energy of the phased consonant, a reference value related to the energy ratio of the phased consonant and the noise, and a reference value per second. Can include number of syllables. For example, one first evaluation vector used for learning may be composed of a set with a plurality of second evaluation vectors and a plurality of third evaluation vectors. For example, multiple sets may be pre-stored on a server.

정답 학습 성취도 벡터는 성실도, 발화속도, 발음의 정확도 및 성과도를 포함할 수 있다. 예를 들어, 정답 학습 성취도 벡터는 하나의 영단어마다 설정될 수 있다.The correct answer learning achievement vector may include sincerity, speech speed, pronunciation accuracy, and performance. For example, a correct answer learning achievement vector can be set for each English word.

예를 들어, 성실도는 제1 평가 벡터에 포함된 학습 단계별 달성율을 평균한 값으로 결정될 수 있다. For example, sincerity may be determined as the average of the achievement rates for each learning step included in the first evaluation vector.

예를 들어, 발화속도는 제2 평가 벡터의 발화 시간 및 제3 평가 벡터의 초당 평균 음절수를 기준 벡터와 비교하여 결정될 수 있다. 예를 들어, 발화 속도에 대한 기본 값으로부터, 기준 벡터의 발화 시간과 제2 평가 벡터의 발화 시간의 차이 값에 대한 제1 절대 값 및 기준 벡터의 초당 기준 음절수와 제3 평가 벡터의 초당 평균 음절수의 차이 값에 대한 제2 절대 값을 합산한 값을 뺀 값일 수 있다.For example, the speech rate may be determined by comparing the speech time of the second evaluation vector and the average number of syllables per second of the third evaluation vector with the reference vector. For example, from a base value for speech rate, a first absolute value for the difference value of the speech time of the reference vector and the speech time of the second evaluation vector and the reference syllables per second of the reference vector and the average per second of the third evaluation vector. It may be a value obtained by subtracting the sum of the second absolute value for the difference value of the number of syllables.

예를 들어, 발음의 정확도는 하나의 영단어에 대한 복수의 제2 평가 벡터 중에서 가장 나중에 위치한 제2 평가 벡터 및 하나의 영단어에 대한 복수의 제3 평가 벡터 중에서 가장 나중에 위치한 제3 평가 벡터를 기준 벡터와 비교하여 결정될 수 있다. For example, pronunciation accuracy is determined by using the second evaluation vector located last among a plurality of second evaluation vectors for one English word and the third evaluation vector located last among a plurality of third evaluation vectors for one English word as a reference vector. It can be determined by comparing with .

예를 들어, 성과도는 하나의 영단어에 대한 복수의 제2 평가 벡터 중에서 가장 나중에 위치한 제2 평가 벡터에 대한 발음 정확도를 가장 처음 위치한 제2 평가 벡터에 대한 발음 정확도로 나눈 제1 값과 하나의 영단어에 대한 복수의 제3 평가 벡터 중에서 가장 나중에 위치한 제3 평가 벡터에 대한 발음 정확도를 가장 처음 위치한 제3 평가 벡터에 대한 발음 정확도로 나눈 값 제2 값을 합산한 제3 값일 수 있다. 하나의 영단어에 대한 복수의 제2 평가 벡터 중에서 가장 나중에 위치한 제2 평가 벡터는 하나의 영단어에 대해 영단어 자체를 가장 나중에 녹음한 음성 파일을 기반으로 생성된 벡터일 수 있다. 하나의 영단어에 대한 복수의 제3 평가 벡터 중에서 가장 나중에 위치한 제3 평가 벡터는 하나의 영단어에 대해 영단어의 정의를 가장 나중에 녹음한 음성 파일을 기반으로 생성된 벡터일 수 있다.For example, the performance level is calculated by dividing the pronunciation accuracy for the second evaluation vector located last among a plurality of second evaluation vectors for one English word by the pronunciation accuracy for the second evaluation vector located first, and one value. It may be a third value obtained by dividing the pronunciation accuracy for the third evaluation vector located last among the plurality of third evaluation vectors for an English word by the pronunciation accuracy for the third evaluation vector located first, and adding up the second value. The second evaluation vector located last among the plurality of second evaluation vectors for one English word may be a vector generated for one English word based on the voice file in which the English word itself was recorded last. The third evaluation vector located last among the plurality of third evaluation vectors for one English word may be a vector generated based on the voice file in which the definition of the English word was recorded last.

부가적으로, 예를 들어, 발음의 정확도는 하기 수학식 1에 의해 결정될 수 있다.Additionally, for example, pronunciation accuracy can be determined by Equation 1 below.

상기 수학식 1에서, 상기 AP_score는 상기 발음의 정확도이고, 상기 diff1는 해당 영단어에 대한 복수의 제2 평가 벡터 중에서 가장 나중에 위치한 제2 평가 벡터와 기준 벡터 사이의 비교 값이고, 상기 diff2는 해당 영단어에 대한 복수의 제3 평가 벡터 중에서 가장 나중에 위치한 제3 평가 벡터와 기준 벡터 사이의 비교 값이고, 상기 AP1은 제2 평가 벡터에 대한 기본 값이고, 상기 AP2는 제3 평가 벡터에 대한 기본 값일 수 있다.In Equation 1, the AP _score is the accuracy of the pronunciation, the diff1 is a comparison value between the reference vector and the second evaluation vector located last among the plurality of second evaluation vectors for the corresponding English word, and the diff2 is the corresponding It is a comparison value between the third evaluation vector located last among a plurality of third evaluation vectors for an English word and a reference vector, where AP1 is the default value for the second evaluation vector, and AP2 is the default value for the third evaluation vector. You can.

예를 들어, 제2 평가 벡터에 대한 기본 값 및 제3 평가 벡터에 대한 기본 값은 서버에 사전 저장될 수 있다.For example, the default value for the second evaluation vector and the default value for the third evaluation vector may be pre-stored in the server.

예를 들어, 해당 영단어에 대한 복수의 제2 평가 벡터 중에서 가장 나중에 위치한 제2 평가 벡터와 기준 벡터 사이의 비교 값은 하기 수학식 2에 의해 결정될 수 있다.For example, the comparison value between the second evaluation vector located last among the plurality of second evaluation vectors for the corresponding English word and the reference vector may be determined by Equation 2 below.

상기 수학식 2에서, 상기 diff1은 상기 가장 나중에 위치한 제2 평가 벡터와 기준 벡터 사이의 비교 값이고, 상기 F0_diff는 해당 제2 평가 벡터의 F0의 최고 값에서 FO의 최저 값을 뺀 값이고, 상기 F0_ref는 해당 제2 평가 벡터에 매칭된 기준 벡터에 포함된 F0의 최대 변화량에 대한 기준 값이고, 상기 F21_diff는 해당 제2 평가 벡터의 F2 값에서 F1 값을 뺀 값이고, 상기 F21_ref는 해당 제2 평가 벡터에 매칭된 기준 벡터에 포함된 F2 값과 F1 값 사이의 변화량에 대한 기준 값이고, 상기 v_add는 해당 제2 평가 벡터의 피치 주기의 평균 변화율과 진폭 변화율의 평균 값을 합산한 값이고, 상기 v_ref는 해당 제2 평가 벡터에 매칭된 기준 벡터에 포함된 피치 주기의 평균 변화율에 대한 기준 값과 진폭 변화율의 평균 값에 대한 기준 값을 합산한 값이고, 상기 t_s는 해당 제2 평가 벡터의 발화 시간이고, 상기 t_ref는 해당 제2 평가 벡터에 매칭된 기준 벡터에 포함된 기준 발화 시간일 수 있다.In Equation 2, diff1 is a comparison value between the second evaluation vector located last and a reference vector, and F0 _diff is a value obtained by subtracting the lowest value of FO from the highest value of F0 of the second evaluation vector, The F0 _ref is a reference value for the maximum change in F0 included in the reference vector matched to the second evaluation vector, the F21 _diff is a value obtained by subtracting the F1 value from the F2 value of the second evaluation vector, and the F21 _ref is a reference value for the amount of change between the F2 value and the F1 value included in the reference vector matched to the corresponding second evaluation vector, and v _add is the average change rate of the pitch period and the average value of the amplitude change rate of the corresponding second evaluation vector. It is a summed value, and v _ref is a sum of the reference value for the average rate of change of the pitch period and the reference value for the average value of the amplitude change rate included in the reference vector matched to the second evaluation vector, and t _s is the utterance time of the corresponding second evaluation vector, and t _ref may be the reference utterance time included in the reference vector matched to the corresponding second evaluation vector.

예를 들어, 해당 영단어에 대한 복수의 제3 평가 벡터 중에서 가장 나중에 위치한 제3 평가 벡터와 기준 벡터 사이의 비교 값은 하기 수학식 3에 의해 결정될 수 있다.For example, the comparison value between the third evaluation vector located last among the plurality of third evaluation vectors for the corresponding English word and the reference vector can be determined by Equation 3 below.

상기 수학식 3에서, 상기 diff2는 상기 가장 나중에 위치한 제3 평가 벡터와 기준 벡터 사이의 비교 값이고, 상기 F0_diff는 해당 제3 평가 벡터의 F0의 최고 값에서 FO의 최저 값을 뺀 값이고, 상기 F0_ref는 해당 제3 평가 벡터에 매칭된 기준 벡터에 포함된 F0의 최대 변화량에 대한 기준 값이고, 상기 F21_diff는 해당 제3 평가 벡터의 F2 값에서 F1 값을 뺀 값이고, 상기 F21_ref는 해당 제3 평가 벡터에 매칭된 기준 벡터에 포함된 F2 값과 F1 값 사이의 변화량에 대한 기준 값이고, 상기 v_add는 해당 제3 평가 벡터의 피치 주기의 평균 변화율과 진폭 변화율의 평균 값을 합산한 값이고, 상기 v_ref는 해당 제3 평가 벡터에 매칭된 기준 벡터에 포함된 피치 주기의 평균 변화율에 대한 기준 값과 진폭 변화율의 평균 값에 대한 기준 값을 합산한 값이고, 상기 n_s는 해당 제3 평가 벡터의 초당 평균 음절수이고, 상기 n_ref는 해당 제3 평가 벡터에 매칭된 기준 벡터에 포함된 초당 기준 음절수일 수 있다.In Equation 3, diff2 is a comparison value between the third evaluation vector located last and a reference vector, and F0 _diff is a value obtained by subtracting the lowest value of FO from the highest value of F0 of the third evaluation vector, The F0 _ref is a reference value for the maximum change in F0 included in the reference vector matched to the third evaluation vector, the F21 _diff is a value obtained by subtracting the F1 value from the F2 value of the third evaluation vector, and the F21 _ref is a reference value for the amount of change between the F2 value and the F1 value included in the reference vector matched to the third evaluation vector, and v _add is the average change rate of the pitch period and the average change rate of amplitude of the third evaluation vector. It is a summed value, and v _ref is a sum of the reference value for the average rate of change of the pitch period and the average value of the amplitude change rate included in the reference vector matched to the third evaluation vector, and n _s is the average number of syllables per second of the corresponding third evaluation vector, and n _ref may be the reference number of syllables per second included in the reference vector matched to the corresponding third evaluation vector.

이를 통해, 서버는 사용자가 녹음한 음성 파일에 대해 기준 값과 비교한 차이 값에 기초하여 보다 정확하게 발음의 정확도를 결정하도록 학습 성취 평가 모델을 학습시킬 수 있다.Through this, the server can train a learning achievement evaluation model to more accurately determine the accuracy of pronunciation based on the difference value compared to the reference value for the voice file recorded by the user.

예를 들어, 사용자 단말에 대한 학습 점수는 영단어별로 출력된 성실도, 발화속도, 발음의 정확도 및 성과도를 기반으로 결정될 수 있다.For example, the learning score for the user terminal may be determined based on the sincerity, speech rate, pronunciation accuracy, and performance output for each English word.

예를 들어, 사용자 단말에 대한 학습 점수는 성실도, 발화속도, 발음의 정확도 및 성과도를 모두 합산한 값을 영단어별로 평균한 값일 수 있다.For example, the learning score for the user terminal may be the average of the sum of sincerity, speech speed, pronunciation accuracy, and performance for each English word.

예를 들어, 특정 영단어에 대한 성실도가 사전 설정된 제1 기준 점수 이상인 것에 기반하여, 사용자 단말에 디스플레이된 상기 특정 영단어와 관련된 제1 오브젝트의 상태가 변경될 수 있다. 여기서, 특정 영단어와 관련된 제1 오브젝트는 특정 영단어에 표현된 물체에 대한 실루엣일 수 있다. 예를 들어, 특정 영단어에 대한 성실도가 사전 설정된 제1 기준 점수 이상인 것에 기반하여, 특정 영단어에 대한 실루엣이 사전 설정된 이미지로 변경될 수 있다. For example, based on the fact that the sincerity for a specific English word is higher than a preset first standard score, the state of the first object related to the specific English word displayed on the user terminal may be changed. Here, the first object related to a specific English word may be a silhouette of an object expressed in the specific English word. For example, based on the sincerity for a specific English word being higher than a preset first standard score, the silhouette for a specific English word may be changed to a preset image.

도 6을 참조하면, 보상 수집 영역(600)은 복수의 실루엣 오브젝트를 포함할 수 있다. 여기서, 보상 수집 영역(600)은 영단어를 주제별로 분류하여, 주제별 영단어에 해당하는 이미지를 수집하는 영역으로, 스티커 헌트(Sticker Hunt)라 지칭될 수 있다. 스티커 헌트에 해당하는 영단어에는 스티커 헌트를 나타내는 이미지 아이콘(예: 애니메이션(450) 또는 보상 영역(460))이 멀티모달 기반의 영단어 학습 컨텐츠의 화면(400)에 나타날 수 있다. 해당 영단어에 대한 학습 단계별 반복 횟수를 만족하면, 사용자 단말은 해당 영단어에 대한 이미지 아이콘에 해당하는 이미지를 획득할 수 있다. 보상 수집 영역(600)은 처음에는 해당 영단어들에 대한 실루엣 오브젝트만 존재하며, 해당 영단어에 대한 이미지를 획득할 때마다, 실루엣 오브젝트가 이미지로 변경될 수 있다. 예를 들어, 특정 영단어가 'dog'인 경우, 'dog'에 대한 성실도가 사전 설정된 제1 기준 점수 이상인 것에 기반하여, 'dog'에 대한 실루엣 오브젝트(610)가 'dog'에 대한 사전 설정된 이미지(620)로 변경될 수 있다.Referring to FIG. 6, the compensation collection area 600 may include a plurality of silhouette objects. Here, the reward collection area 600 is an area where English words are classified by topic and images corresponding to the English words for each topic are collected, and may be referred to as a sticker hunt. For English words corresponding to sticker hunts, an image icon representing the sticker hunt (e.g., animation 450 or reward area 460) may appear on the screen 400 of the multimodal-based English word learning content. If the number of repetitions for each learning step for the corresponding English word is satisfied, the user terminal can obtain an image corresponding to the image icon for the corresponding English word. The compensation collection area 600 initially contains only silhouette objects for the corresponding English words, and each time an image for the corresponding English word is acquired, the silhouette object may be changed into an image. For example, when a specific English word is 'dog', based on the sincerity for 'dog' being greater than or equal to the preset first standard score, the silhouette object 610 for 'dog' is It can be changed to image 620.

단계 S360에서, 서버는 사용자 단말에게 사용자 단말에 대한 학습 점수를 포함하는 학습 보상 메시지를 전송할 수 있다. In step S360, the server may transmit a learning reward message including a learning score for the user terminal to the user terminal.

학습 보상 메시지는 사용자 단말에게 사용자 단말의 멀티 모달 기반의 영단어 학습 컨텐츠에 대한 결과에 따른 보상을 알리는 메시지일 수 있다. 예를 들어, 학습 보상 메시지는 각각의 영단어에 대한 학습 성취도 벡터 및 각각의 학습 단계에서 사용자의 발화를 녹음한 파일을 더 포함할 수 있다.The learning reward message may be a message informing the user terminal of compensation according to the results of the multi-modal-based English word learning content of the user terminal. For example, the learning reward message may further include a learning achievement vector for each English word and a file recording the user's utterance in each learning step.

예를 들어, 학습 보상 메시지에 기반하여 복수의 영단어 게임이 상기 사용자 단말에 대해 활성화될 수 있다. 예를 들어, 멀티모달 기반의 영단어 학습 컨텐츠의 진행 결과에 대한 정보를 서버가 수신한 것에 기반하여, 서버는 해당 사용자 단말에 대해 복수의 영단어 게임을 활성화시킬 수 있다.For example, a plurality of English word games may be activated for the user terminal based on a learning reward message. For example, based on the server receiving information about the progress of multimodal-based English word learning content, the server may activate a plurality of English word games for the user terminal.

여기서, 복수의 영단어 게임은 다양한 영단어를 학습하기 위한 게임일 수 있다. 예를 들어, 복수의 영단어 게임은, 일대일 게임, 서바이벌 게임 및 팀 단위의 게임을 포함할 수 있다. 예를 들어, 사용자 단말에 대한 학습 점수에 기반하여 상기 복수의 영단어 게임 각각에 대한 난이도가 결정될 수 있다. 예를 들어, 사용자 단말에 대한 학습 점수가 높을수록 복수의 영단어 게임 각각에 대한 난이도가 높아질 수 있다. 복수의 영단어 게임은 사이트 워즈(sight words), 감탄사, 접속사, 전치사, 의문사, 관계대명사를 분리해서 별도의 메뉴로 수행될 수 있다. 여기서, 사이트 워즈는 영어권 어린이 출판물에 가장 많이 등장하는 단어를 빈도순으로 리스트화한 단어의 모음일 수 있다.Here, the plural English word game may be a game for learning various English words. For example, multiple English word games may include one-on-one games, survival games, and team games. For example, the difficulty level for each of the plurality of English word games may be determined based on the learning score for the user terminal. For example, the higher the learning score for the user terminal, the higher the level of difficulty for each of the plurality of English word games. The multiple English word game can be performed in a separate menu by separating sight words, exclamations, conjunctions, prepositions, interrogative words, and relative pronouns. Here, Sight Words may be a collection of words listing the words that appear most frequently in English-speaking children's publications in order of frequency.

예를 들어, 팀 단위의 게임은 복수의 팀이 참가하여 팀 단위로 복수의 영단어로 구성된 빙고를 수행하는 게임(이하, 빙고 게임)일 수 있다. For example, a team-based game may be a game in which multiple teams participate and play bingo consisting of multiple English words on a team-by-team basis (hereinafter referred to as a bingo game).

도 7을 참조하면, 해당 빙고 게임은 위치와 방향을 나타내는 전치사의 메뉴가 선택된 예를 나타낼 수 있다. 빙고 게임(700)은 5*5 박스로 구성될 수 있다. 5*5 박스는 정중앙, 그 앞, 그 뒤, 그 위, 그 아래, ~사이, ~다음, 가장 앞, 가장 뒤, 가장 높은, 가장 낮은, 오른쪽의, 왼쪽의, ~안의, ~ 위의를 모두 표현할 수 있으며, 박스의 형태는 ~안으로, ~밖으로를 모두 표현할 수 있다. 학습 점수가 사전 설정된 점수보다 낮거나, 낮은 연령의 사용자는 3*3 박스 형태의 빙고 게임을 수행할 수 있다. 해당 빙고 게임을 통해 위치에 대한 다양한 교육이 가능하며, 각 칸에 들어가는 알파벳이나 숫자는 고정형태로 또는 변화하는 형태로 모두 적용이 가능하다. 해당 빙고 게임의 박스를 이용한 몇가지 표현을 예로 들면, "J is in the middle", "3 is next to 2", "K is above L", "F is between C and K"등으로, 다양한 위치에 대한 표현을 적용할 수 있다.Referring to FIG. 7, the corresponding bingo game may represent an example in which a menu of prepositions indicating location and direction is selected. Bingo game 700 may be composed of 5*5 boxes. The 5*5 box is centered, in front of it, behind it, above it, below it, between, next to, frontmost, backmost, highest, lowest, to the right, to the left, in, and above. Both can be expressed, and the shape of the box can be expressed as both ~inside and ~outside. Users whose learning scores are lower than the preset score or who are of a younger age can play a bingo game in the form of a 3*3 box. Through this bingo game, various education about locations is possible, and the alphabet or numbers in each space can be applied in a fixed or changing form. Examples of some expressions using boxes in the bingo game include "J is in the middle", "3 is next to 2", "K is above L", "F is between C and K", etc., in various locations. The expression for can be applied.

빙고 게임에서 팀 게임의 인원은 최소 1명에서 최대 5명까지 구성할 수 있다. 이때, 최소 2팀에서 최대 4팀까지 게임에 참가할 수 있다. 예를 들어, 빙고 게임에 들어가는 단어는 사전 설정된 1,000개의 영단어 중 무작위로 배치될 수 있다. 예를 들어, 빙고 게임에 들어가는 단어는 사전 설정된 동일한 25개 영단어가 임의 지정으로 서로 다르게 배치될 수 있다. 예를 들어, 빙고 게임에 들어가는 단어는 100개 영단어가 보기에서 주어지고 각 사용자가 순서대로 한 단어씩을 차례로 번갈아 선정하고, 선정된 단어는 화면에서 사라지게 배치될 수 있다.In bingo games, the number of team players can range from a minimum of 1 to a maximum of 5 people. At this time, a minimum of 2 teams and a maximum of 4 teams can participate in the game. For example, words used in a bingo game may be randomly placed among 1,000 preset English words. For example, the words used in a bingo game may be the same 25 preset English words arranged differently at random. For example, 100 English words for a bingo game are given, and each user selects one word in turn, and the selected words can be arranged to disappear from the screen.

게임의 점수는 1) 10부터 100까지의 범위에서 참가한 팀들이 써낸 점수의 평균을 게임 점수로 선정하는 방법, 2) 10부터 100까지, 10단위로 롤러가 돌아가면서 하나의 숫자가 임의로 결정되는 방법, 3) 게임의 점수를 방을 만든 사람이 처음부터 결정하는 방법이 있을 수 있다. The game score is 1) a method in which the average of the scores of participating teams in the range of 10 to 100 is selected as the game score, and 2) a method in which one number is randomly determined as the roller rotates in increments of 10 from 10 to 100. , 3) There may be a way for the person who created the room to determine the score of the game from the beginning.

팀 게임에서는 복수의 가중치가 적용될 수 있다. 복수의 가중치는 팀원 가중치, 팀 가중치, 진행속도 가중치 및 승리 가중치를 포함할 수 있다. 예를 들어, 각 팀원의 레벨에 따라 개인별로 팀원 가중치가 주어진다. 여기서, 레벨은 학습 점수에 따라 결정되며 1부터 10사이인 10개의 레벨을 포함할 수 있다. 즉, 학습 점수가 높을수록 높은 레벨로 결정될 수 있다. 팀원 가중치는 개인별로 계산될 수 있다. 예를 들어, 팀원 가중치는 자신과 같은 레벨의 팀원에 대해서는 1로 결정될 수 있다. 팀원 가중치는 자신보다 낮은 레벨의 팀원과 팀을 이룰 때, 레벨이 낮은 각각의 팀원들과의 가중치 차이를 모두 곱해서 가중치가 계산될 수 있다. 팀원 가중치는 자신보다 높은 팀원과의 관계에서는 가중치를 계산하지 않을 수 있다. 즉, 자신보다 높은 팀원과의 관계에서 팀원 가중치는 1로 결정될 수 있다. 따라서, 레벨이 높은 사용자는 되도록 낮은 사용자들과 팀을 맺어야 최종 결과에서 점수를 많이 획득할 수 있다. 팀은 최소 1명부터 최대 5명까지로 구성할 수 있다. 이때, 인원수가 많을수록 팀 가중치가 높게 적용될 수 있다. 게임을 빠르게 진행하기 위해서, 빠른 단어 선택을 하는 팀에게 높은 진행속도 가중치를 부여할 수 있다. 예를 들어, 자신의 팀 차례에 정해진 순서의 사용자가 20초를 초과하면, 단어의 선택권은 다음 순서의 팀으로 넘어갈 수 있다. 빠르게 빙고를 한 팀의 순서대로 높은 승리 가중치가 부여될 수 있다. In team games, multiple weights may be applied. The plurality of weights may include team member weights, team weights, progress weights, and victory weights. For example, individual team member weights are given depending on each team member's level. Here, the level is determined according to the learning score and may include 10 levels ranging from 1 to 10. In other words, the higher the learning score, the higher the level can be determined. Team member weights can be calculated for each individual. For example, the team member weight may be determined to be 1 for team members at the same level as the team member. When forming a team with a team member of a lower level than oneself, the team member weight can be calculated by multiplying the difference in weight with each team member of a lower level. Team member weight may not be calculated in relationships with team members higher than oneself. In other words, in relationships with team members higher than oneself, the team member weight can be determined to be 1. Therefore, high-level users should team up with low-level users as much as possible to obtain more points in the final result. A team can consist of a minimum of 1 person and a maximum of 5 people. At this time, the larger the number of people, the higher the team weight can be applied. To make the game progress quickly, a higher speed weight can be given to the team that selects words quickly. For example, if the number of users in a given order exceeds 20 seconds during their team's turn, the word selection can be transferred to the next team. A higher victory weight may be given in the order in which the team played Bingo quickly.

이때, 팀 점수와 별개로 각각의 사용자들은 개인에 대한 점수가 누적으로 쌓일 수 있다.At this time, separate from the team score, each user can accumulate individual scores.

예를 들어, 일대일 게임은 영단어와 관련된 복수의 예시 중 정답을 먼저 선택하는 일대일 형태의 게임일 수 있다. 예를 들어, 일대일 게임은 1분 동안 많은 문제를 맞힌 사람이 승리하는 방식일 수 있다. 예를 들어, 일대일 게임은 1) 영단어 텍스트, 2) 영단어에 대한 음성, 3) 이미지, 4) 영단어의 정의, 5) 영단어의 정의에 대한 음성 등 5가지 항목의 형태로 구성될 수 있다. 예를 들어, 사용자 단말은 디스플레이된 단어에 대해 보기에서 답을 선택하게 되며, 보기는 나머지 4가지 항목 중 하나로 주어질 수 있다. 일대일 게임의 공평성을 위해서, 레벨이 낮은 참가자(사용자)에 대해서는 주로 1) 영단어와 3) 이미지의 조합의 문제에 대한 빈도 수가 사전 설정된 빈도 수보다 높게 출제될 수 있다. 또한, 가장 레벨이 높은 참가자는 주로 5) 영단어의 정의에 대한 음성과 3) 이미지가 조합되는 형태의 문제에 대한 빈도 수가 사전 설정된 빈도 수보다 높게 출제될 수 있다. 즉, 참가자의 레벨 차이에 따라, 난이도가 자동 조정될 수 있다. 일대일 게임의 문제 유형은 5가지 항목이 서로 복합적으로 섞여서 출제되는 형태로 구성될 수 있다. 예를 들면, 텍스트가 주어지고 4개의 보기 이미지가 주어지거나, 정의에 대한 문장으로 주어지고 보기로 4개의 단어가 주어지거나, 영단어의 정의에 대한 문장으로 주어지고 보기로 이미지가 주어지거나, 영단어의 음성이 주어지고, 보기로 영단어의 정의가 주어지는 등의 형태가 복합적으로 나타날 수 있다. 일대일 게임은 전체적으로 (맞힌 문제의 문항 수 - 틀린 문제의 문항 수)가 승리를 결정하는 요소일 수 있다. 일대일 게임을 생성하는 사용자 단말은 취득할 점수, 즉 목표 점수를 설정할 수 있으며, 상대 사용자 단말에 대해 아무런 정보를 알지 못한 채 게임을 진행할 수 있다. 일대일 게임의 목적은 1분 동안 고도의 집중력으로 몰입상태에 빠지게 함으로써, 3회만 반복한다고 하더라도 3분동안 몰입이 가능하게 된다. 이 경우, 1시간 동안 학습적으로 영어를 접한 경우보다 높은 교육효과를 얻을 수 있다.For example, a one-on-one game may be a one-on-one game in which the correct answer is first selected from a plurality of examples related to an English word. For example, in a one-on-one game, the person who answers the most questions correctly in one minute wins. For example, a one-to-one game can be composed of five items: 1) English text, 2) audio of the English word, 3) image, 4) definition of the English word, and 5) audio of the definition of the English word. For example, the user terminal selects an answer from a view for the displayed word, and the view may be given as one of the remaining four items. In order to ensure fairness in one-on-one games, low-level participants (users) may be asked questions with a higher frequency than the preset frequency, mainly for questions involving combinations of 1) English words and 3) images. In addition, participants with the highest level may be asked questions that are a combination of 5) audio and 3) images for definitions of English words with a higher frequency than the preset frequency. In other words, the difficulty level can be automatically adjusted depending on the level difference of the participants. The question type of the one-on-one game can be composed of five items mixed together in a complex manner. For example, a text is given and four example images are given, a sentence is given as a definition and four words are given as examples, a sentence is given as a definition of an English word and images are given as examples, or an English word is given as an example. It can take a complex form, such as being given a voice and a definition of an English word given as an example. In a one-on-one game, the overall (number of questions answered correctly - number of questions answered incorrectly) may be the determining factor in victory. A user terminal that creates a one-on-one game can set the score to be acquired, that is, a target score, and can proceed with the game without knowing any information about the opposing user terminal. The purpose of the one-on-one game is to put you in a state of immersion with a high level of concentration for one minute, so even if you repeat it only three times, you can be immersed for three minutes. In this case, a higher educational effect can be achieved than when exposed to English academically for one hour.

예를 들어, 서바이벌 게임은, 하나 이상의 사용자 단말로 구성된, 둘 이상의 팀이 참가하는 형태일 수 있다. 예를 들어, 서바이벌 게임은 전체 최대 100명 미만까지 참가할 수 있다. 서바이벌 게임은 최후의 1인 승자가 나타날 때까지 진행되며, 승리한 팀은 팀원 전체가 같은 점수를 획득할 수 있다.For example, a survival game may be played in the form of two or more teams consisting of one or more user terminals. For example, a survival game can have less than 100 total participants. The survival game continues until the last person emerges as the winner, and the winning team can earn the same score as all team members.

예를 들어, 서바이벌 게임은, 영단어의 스펠링을 완성시켜 영단어를 맞추는 팀이 살아남는 제1 게임, 상기 하나 이상의 사용자 단말에 대응하는 게임 캐릭터를 이용하여 특정 가상 공간에 대한 탐색을 통해 알파벳을 수집하고, 및 수집된 알파벳으로 영단어를 맞추는 팀이 살아남는 제2 게임 및 특정 그림에 숨겨진 복수의 물건들 모두를 복수의 영단어로 구성된 예시와 가장 먼저 매칭시키는 팀이 살아남는 제3 게임을 포함할 수 있다. 예를 들어, 서바이벌 게임은 서로 팀원들끼리 채팅 또는 음성으로 의논이 가능할 수 있다.For example, the survival game is a first game in which a team survives by completing the spelling of an English word and collecting the alphabet through exploration of a specific virtual space using a game character corresponding to the one or more user terminals, and a second game in which the team that matches English words with collected alphabets survives, and a third game in which the team that first matches all of the plural objects hidden in a specific picture with examples consisting of plural English words survives. For example, in a survival game, team members may be able to discuss each other through chat or voice.

예를 들어, 제1 게임은 어셈블리(Assembly) 게임으로 사전 설정된 시간 동안 단어의 스펠링을 팀원들끼리 완성해서 계속적으로 단어를 맞춰 나가는 게임일 수 있다. 제1 게임은 서로 팀원들끼리 채팅 또는 음성으로 의논이 가능하며, 팀원은 순서대로 게임에 참가를 해야 할 수 있다. 예를 들어, 팀원은 순서대로 게임에 참가를 해야 할 수 있다.For example, the first game may be an assembly game, in which team members continuously guess words by completing spellings of words during a preset period of time. In the first game, team members can discuss each other through chat or voice, and team members may be required to participate in the game in order. For example, team members may be required to participate in a game in order.

예를 들어, 제2 게임은 보물 찾기 게임으로, 게임 캐릭터를 통해 특정 공간을 돌아다니며, 나타나는 알파벳을 찾아 주어진 단어를 만들어 가는 게임일 수 있다. For example, the second game may be a treasure hunt game where the game character moves around a specific space, finds the appearing alphabet, and creates a given word.

예를 들어, 제3 게임은 아이 스파이(Eye Spy) 게임으로, 그림 안에 숨겨진 다양한 물건들을 보기에 있는 단어들과 매칭 시켜서 찾아 나가는 게임일 수 있다. 예를 들어, 제3 게임의 참가자는 순서에 따라 물건과 단어를 매칭해서 찾아야 할 수 있다. For example, the third game is the Eye Spy game, which can be a game in which you search for various objects hidden in pictures by matching them with words in the picture. For example, participants in the third game may have to match objects and words in order to find them.

예를 들어, 사용자 단말은 게임을 진행하는 캐릭터의 생성이 가능하며, 기본적으로 제공하는 캐릭터에서 하나를 선택할 수도 있다. 또한, 예를 들어, 사용자 단말은 자신의 얼굴(사진 업로드나 사진 촬영)을 촬영하여 이미지화한 캐릭터를 생성할 수 있다. 사용자 단말은 게임을 통해서 얻게 된 포인트를 이용해 게임 캐릭터를 꾸미거나 발전시키는 것도 가능하다.For example, the user terminal can create a character to play the game with, and can also select one from the characters provided by default. Additionally, for example, the user terminal may create a character imaged by taking a photo of the user's face (uploading a photo or taking a photo). The user terminal can also decorate or develop game characters using points earned through games.

예를 들어, 멀티모달 기반의 영단어 학습 컨텐츠에서, 서버는 사용자 단말에게 각 단어에서 다음 단어로의 전환 시 자신이 만든 게임 캐릭터가 페이지를 넘기는 형식의 움직임을 제공할 수 있다. 또한, 게임 중간 중간에 동기를 부여하는 액션을 게임 캐릭터를 통해 구현할 수 있다. 동기를 부여할 때는, 다양한 감탄사를 표출해 주는 영상과 음향 및 음성이 제공될 수 있다.For example, in multimodal-based English word learning content, the server can provide the user terminal with a movement in the form of the game character it created turning pages when switching from each word to the next word. Additionally, motivating actions can be implemented through game characters throughout the game. When motivating, images, sounds, and voices expressing various exclamations can be provided.

일 실시예에 따르면, 서버는 사용자 단말로부터 복수의 영단어 게임 중 어느 하나의 영단어 게임에 대한 제2 요청 메시지를 수신할 수 있다.According to one embodiment, the server may receive a second request message for one English word game among a plurality of English word games from the user terminal.

제2 요청 메시지는 어느 하나의 영단어 게임에 참가하거나 어느 하나의 영단어 게임을 생성하기 위해 서버에게 요청하는 메시지이다. 예를 들어, 제2 요청 메시지는 게임 참가를 나타내는 값 또는 게임 생성을 나타내는 값 중 어느 하나와 게임에 대한 식별 정보를 포함할 수 있다. 게임에 대한 식별 정보는 게임의 종류를 나타내는 식별 값을 포함할 수 있다.The second request message is a message requesting the server to participate in one English word game or to create one English word game. For example, the second request message may include either a value indicating game participation or a value indicating game creation, and identification information about the game. Identification information about the game may include an identification value indicating the type of game.

예를 들어, 서버는 사용자 단말에게 어느 하나의 영단어 게임에 대한 접속 주소를 전송할 수 있다. 예를 들어, 사용자 단말은 접속 주소에 기반하여 사용자 단말이 요청한 영단어 게임에 참가할 수 있다. 예를 들어, 게임 참가를 나타내는 값을 수신한 경우, 서버는 사용자 단말이 요청한 게임의 종류에 해당하는 복수의 영단어 게임 중에서 사용자 단말의 학습 점수에 따라 사용자 단말의 참가가 가능한 영단어 게임의 접속 주소를 결정할 수 있다. 예를 들어, 게임 생성을 나타내는 값을 수신한 경우, 서버는 사용자 단말의 학습 점수에 따라 사용자 단말이 요청한 게임의 종류에 해당하는 영단어 게임을 생성하고, 생성된 영단어 게임의 접속 주소를 결정할 수 있다.For example, the server may transmit the access address for one English word game to the user terminal. For example, the user terminal may participate in an English word game requested by the user terminal based on the access address. For example, when receiving a value indicating game participation, the server selects the access address of the English word game in which the user terminal can participate according to the learning score of the user terminal among a plurality of English word games corresponding to the type of game requested by the user terminal. You can decide. For example, when receiving a value indicating game creation, the server creates an English word game corresponding to the type of game requested by the user terminal according to the learning score of the user terminal, and determines the access address of the created English word game. .

예를 들어, 서버는 사용자 단말로부터 어느 하나의 영단어 게임에 대한 결과 정보를 수신할 수 있다. 여기서, 영단어 게임에 대한 결과 정보는 사용자 단말이 영단어 게임을 통해 획득한 총 점수와 사용자 단말이 영단어 게임을 진행한 영단어를 포함할 수 있다. 예를 들어, 사용자 단말이 참가한 영단어 게임이 종료되면, 사용자 단말은 종료된 영단어 게임에 대한 결과를 서버에게 전송할 수 있다. 서버는 사용자 단말이 영단어 게임을 진행한 영단어에 대한 학습단계별 달성율을 해당 영단어 게임을 통해 획득한 총 점수에 비례하여 증가시킬 수 있다.For example, the server may receive result information about one English word game from the user terminal. Here, the result information about the English word game may include the total score obtained by the user terminal through the English word game and the English words for which the user terminal played the English word game. For example, when an English word game in which the user terminal participates ends, the user terminal may transmit the results of the ended English word game to the server. The server can increase the achievement rate for each learning stage for an English word played by the user terminal in an English word game in proportion to the total score obtained through the English word game.

예를 들어, 서버는 단말에 대한 식별 정보를 기반으로 사용자 단말을 식별할 수 있다. 예를 들어, 서버는 식별된 사용자 단말에 대해 이전에 저장된 사용자에 대한 개인 정보와 학습 성취도 벡터를 획득할 수 있다.For example, the server may identify the user terminal based on identification information about the terminal. For example, the server may obtain previously stored personal information about the user and a learning achievement vector for the identified user terminal.

예를 들어, 서버는 사용자에 대한 개인 정보 및 제3 요청 메시지에 포함된 사용자 단말에 대한 정보에 대해 데이터 전처리를 수행함으로써, 사용자 벡터를 생성할 수 있다. For example, the server may generate a user vector by performing data preprocessing on personal information about the user and information about the user terminal included in the third request message.

예를 들어, 서버는 사용자 벡터 및 학습 성취도 벡터를 방식 결정 모델에 입력함으로써, 사용자 단말에 대한 새로운 진행 방식 정보를 획득할 수 있다. 여기서, 새로운 진행 방식은 사용자 단말에 대한 멀티모달 기반의 영단어 학습 컨텐츠를 통해 학습을 마친 이후의 진행 방식일 수 있다. 즉, 새로운 진행 방식은 사용자 단말에 대한 멀티모달 기반의 영단어 학습 컨텐츠에 대한 진행 결과가 존재하는 경우에 대한 진행 방식일 수 있다.For example, the server may obtain new progress method information for the user terminal by inputting the user vector and the learning achievement vector into the method decision model. Here, the new progression method may be a progression method after completing learning through multimodal-based English word learning content for the user terminal. In other words, the new progress method may be a proceeding method for cases where there are progress results for multimodal-based English word learning content for the user terminal.

예를 들어, 서버는 새로운 진행 방식 정보를 포함하는 학습 개시 메시지를 사용자 단말에게 전송할 수 있다.For example, the server may transmit a learning start message containing new progress method information to the user terminal.

예를 들어, 새로운 진행 방식 정보에 기반하여 사용자 단말에 디스플레이된 멀티모달 기반의 영단어 학습 컨텐츠가 진행될 수 있다.For example, multimodal-based English word learning content displayed on the user terminal may be progressed based on new progress method information.

예를 들어, 서버는 사용자 단말이 멀티모달 기반의 학습 컨텐츠의 진행 결과에 대한 정보를 수신하여 학습 점수를 결정할 때마다 새로운 진행 방식 정보를 결정할 수 있다.For example, the server may determine new progress method information whenever the user terminal receives information about the progress results of multimodal-based learning content and determines the learning score.

예를 들어, 학습 개시 메시지는 영단어에 대한 복습 정보를 더 포함할 수 있다. 여기서, 복습 정보는 사용자 단말이 이전에 학습을 진행한 전체 영단어에 대한 정보를 포함할 수 있다. 예를 들어, 멀티모달 기반의 학습 컨텐츠는 복습 정보에 따라 복습 기능을 사용자 단말에게 제공할 수 있다. 예를 들어, 복습 기능은 영단어와 애니메이션 이미지의 조합만으로 구성된 화면을 1초에서 1.5초 사이로 사용자 단말에 표시하는 기능일 수 있다. 이때, 이전에 학습을 진행한 영단어 전체에 대해 자동으로 진행될 수 있다. 예를 들어, 이전 학습에서 사용자 단말이 30개의 영단어를 학습한 경우, 사용자 단말의 화면에 표시되는 하나의 영단어 및 애니메이션 이미지 조합을 통해 30초에서 45초 사이의 짧은 시간동안 이전 학습 내용을 빠르게 복습할 수 있다. 이를 통해, 사용자는 시간이 짧은 만큼 고도로 몰입된 상태에서 영단어를 살펴보기 때문에, 영단어를 보다 오래 기억하게 만드는 효과가 발생할 수 있다. For example, the learning start message may further include review information about English words. Here, the review information may include information about all English words previously learned by the user terminal. For example, multimodal-based learning content may provide a review function to the user terminal according to review information. For example, the review function may be a function that displays a screen consisting of only a combination of English words and animation images on the user terminal for between 1 and 1.5 seconds. At this time, the process can be performed automatically for all previously learned English words. For example, if the user terminal has learned 30 English words in previous learning, the previous learning can be quickly reviewed for a short period of time between 30 and 45 seconds through a combination of one English word and an animation image displayed on the screen of the user terminal. can do. Through this, the user examines English words in a highly immersive state for a short period of time, which can have the effect of helping the user remember English words for a longer period of time.

예를 들어, 서버는 복수의 영단어 각각에 대해 해당 영단어와 연관된 복수의 그림 이미지를 웹 크롤링을 통해 수집할 수 있다. 예를 들어, 서버는 복수의 사용자 단말에게 복수의 그림 이미지를 전송하고, 복수의 사용자 단말로부터 복수의 그림 이미지 각각에 대한 선호도를 수신할 수 있다. 서버는 복수의 사용자 단말을 나이 및 성별을 기반으로 n개의 그룹으로 분류하고, 해당 그룹마다 복수의 그림 이미지 중 가장 선호도가 높은 그림 이미지를 선호 그림 이미지로 영단어별로 결정할 수 있다. 예를 들어, n개의 그룹은 상술한 복수의 사용자 벡터를 기반으로 분류된 n개의 그룹일 수 있다. 이때, 복수의 사용자 단말은 n개의 그룹을 구성하는 사용자 벡터들에 대응하는 사용자 단말들을 포함할 수 있다.For example, for each of a plurality of English words, the server can collect a plurality of picture images associated with the English word through web crawling. For example, the server may transmit a plurality of picture images to a plurality of user terminals and receive preferences for each of the plurality of picture images from the plurality of user terminals. The server classifies a plurality of user terminals into n groups based on age and gender, and for each group, the most preferred picture image among the plurality of picture images can be determined by English word as the preferred picture image. For example, n groups may be n groups classified based on the plurality of user vectors described above. At this time, the plurality of user terminals may include user terminals corresponding to user vectors constituting n groups.

예를 들어, 서버는 복수의 영단어 각각에 대해 해당 영단어와 관련된 복수의 동적 이미지를 웹 크롤링을 통해 수집할 수 있다. 예를 들어, 서버는 복수의 동적 이미지별로 애니메이션 재생 시간을 결정할 수 있다. 예를 들어, 서버는 복수의 영단어에 각각에 대해 영단어의 정의의 재생 시간 및 영단어의 정의의 텍스트 개수에 따라 애니메이션에 대한 기준 재생 시간을 결정할 수 있다. 여기서, 기준 재생 시간은 3초에서 7초 사이의 시간일 수 있다. 이때, 기준 재생 시간은 영단어의 정의의 재생 시간이 길고, 영단어의 정의의 텍스트 개수가 많을수록 7초에 가까운 값으로 결정될 수 있다. 예를 들어, 서버는 복수의 동적 이미지 중에서 기준 재생 시간에 가장 근접한 애니메이션 재생 시간을 갖는 동적 이미지를 표준 동적 이미지로 영단어별로 결정할 수 있다. 이때, 복수의 동적 이미지 중에서 기준 재생 시간에 가장 근접한 애니메이션 재생 시간을 갖는 동적 이미지가 복수 개로 결정된 경우, 서버는 가장 근접한 애니메이션 재생 시간을 갖는 복수 개의 동적 이미지 중에서 동적 이미지를 구성하는 이미지의 개수가 더 많은 동적 이미지를 표준 동적 이미지로 결정할 수 있다.For example, the server may collect, for each of a plurality of English words, a plurality of dynamic images related to the English word through web crawling. For example, the server may determine the animation playback time for each of a plurality of dynamic images. For example, the server may determine the reference playback time for the animation according to the playback time of the definition of the English word and the number of texts of the definition of the English word for each of a plurality of English words. Here, the reference playback time may be between 3 and 7 seconds. At this time, the standard playback time may be determined to be closer to 7 seconds as the playback time of the English word definition is longer and the number of texts in the English word definition is greater. For example, the server may determine the dynamic image with the animation playback time closest to the standard playback time among the plurality of dynamic images as the standard dynamic image for each English word. At this time, if it is determined that there are a plurality of dynamic images with animation playback times closest to the reference playback time among the plurality of dynamic images, the server determines that the number of images constituting the dynamic image among the plurality of dynamic images with the closest animation playback times is greater. Many dynamic images can be resolved into standard dynamic images.

예를 들어, 서버는 복수의 선호 그림 이미지 및 복수의 표준 동적 이미지를 생성적 적대 신경망 기반의 뉴럴 네트워크를 이용하는 애니메이션 생성 모델에 입력함으로써, 멀티 모달 기반의 영단어 학습 컨텐츠에 포함된 애니메이션을 구성하는 애니메이션 이미지를 영단어별로 생성할 수 있다. 이때, 하나의 영단어마다 하나의 선호 그림 이미지와 하나의 표준 동적 이미지가 매칭될 수 있다. 애니메이션 이미지는 선호 그림 이미지가 표준 동적 이미지의 동작으로 합성된 이미지일 수 있다.For example, the server inputs a plurality of preferred picture images and a plurality of standard dynamic images into an animation creation model using a neural network based on a generative adversarial network, thereby creating an animation included in multi-modal-based English word learning content. Images can be created for each English word. At this time, one preferred picture image and one standard dynamic image may be matched for each English word. An animated image may be an image in which a preferred drawing image is synthesized with the actions of a standard dynamic image.

예를 들어, 서버는 하나의 선호 그림 이미지마다 하나의 표준 동적 이미지를 구성하는 복수의 구성 이미지 각각을 생성적 적대 신경망 기반의 뉴럴 네트워크를 이용하는 애니메이션 생성 모델에 입력함으로써, 하나의 선호 그림 이미지가 표준 동적 이미지의 동작으로 합성된 이미지를 획득할 수 있다. For example, the server inputs each of a plurality of constituent images, which constitute one standard dynamic image for each preferred picture image, into an animation generation model using a neural network based on a generative adversarial network, so that one preferred picture image is converted into a standard dynamic image. A composite image can be obtained through the motion of a dynamic image.

도 8은 일 실시예에 따른 생성적 적대 신경망 기반의 뉴럴 네트워크를 사용하는 애니메이션 생성 모델에 대한 예를 나타낸 도면이다. 도 8의 실시예는 본 개시의 다양한 실시예들과 결합될 수 있다.Figure 8 is a diagram showing an example of an animation generation model using a neural network based on a generative adversarial network according to an embodiment. The embodiment of FIG. 8 can be combined with various embodiments of the present disclosure.

도 8을 참조하면, 애니메이션 생성 모델은 생성적 적대 신경망 기반의 뉴럴 네트워크(800)를 사용할 수 있다. 생성적 적대 신경망 기반의 뉴럴 네트워크는 백본 네트워크로 비지도 학습만으로 데이터 분포를 효과적으로 분리하여 잠재 벡터(latent vector)를 구성할 수 있다. 잠재 벡터는 독립적인 잠재 변수들의 쌍을 지칭할 수 있다. 디스인탱글(disentangle)된 잠재 벡터로부터 고품질의 이미지를 생성하는 모델을 사용할 수 있다.Referring to FIG. 8, the animation generation model may use a neural network 800 based on a generative adversarial network. A neural network based on a generative adversarial network is a backbone network that can effectively separate data distributions and construct latent vectors using only unsupervised learning. A latent vector may refer to a pair of independent latent variables. A model that generates high-quality images from disentangled latent vectors can be used.

또한, 생성적 적대 신경망 기반의 뉴럴 네트워크는 스타일 트랜스퍼(style transfer)를 사용하는 AdaIN(adaptive instance normalization)의 네트워크 구조로 구성될 수 있다. 여기서, 스타일 트랜스퍼는 제1 입력 이미지에서 컨텐츠를 추출하고, 제2 입력 이미지에서 스타일을 추출하여, 스타일과 컨텐츠를 합성하는 방식을 의미한다. 예를 들어, 생성적 적대 신경망 기반의 뉴럴 네트워크는 VGG(Visual Geometry Group) 인코더, AdaIN 레이어, 디코더로 구성되는 스타일 트랜스퍼 네트워크일 수 있다. 여기서, VGG 인코더는 컨벌루션 레이어와 풀링 레이어로 구성되는 기본적인 CNN이며, 16층의 레이어 또는 19층의 레이어로 구성될 수 있다. 예를 들어, 생성적 적대 신경망 기반의 뉴럴 네트워크의 AdaIN 레이어는 특징 공간(feature space)에서 하기 수학식 4를 통해 컨텐츠를 포함하는 제1 입력 이미지에서 제1 입력 이미지의 스타일을 빼고, 제2 입력 이미지의 스타일을 합성할 수 있다. Additionally, a neural network based on a generative adversarial network may be composed of a network structure of AdaIN (adaptive instance normalization) using style transfer. Here, style transfer refers to a method of extracting content from a first input image, extracting a style from a second input image, and combining the style and content. For example, a neural network based on a generative adversarial network may be a style transfer network consisting of a VGG (Visual Geometry Group) encoder, an AdaIN layer, and a decoder. Here, the VGG encoder is a basic CNN composed of a convolutional layer and a pooling layer, and may be composed of 16 layers or 19 layers. For example, the AdaIN layer of a neural network based on a generative adversarial network subtracts the style of the first input image from the first input image including the content through Equation 4 below in the feature space, and then inputs the second input. You can combine image styles.

상기 수학식 4에서, 상기 σ(x)는 스타일을 포함한 제2 입력 이미지에 대한 평균이고, 상기 σ(y)는 스타일을 포함한 제2 입력 이미지에 대한 표준 편차이고, 상기 μ(x)는 컨텐츠를 포함한 제1 입력 이미지에 대한 평균이고, 상기 μ(y)는 컨텐츠를 포함한 제1 입력 이미지에 대한 표준 편차일 수 있다.In Equation 4, σ(x) is the average for the second input image including the style, σ(y) is the standard deviation for the second input image including the style, and μ(x) is the content. is the average for the first input image including, and μ(y) may be the standard deviation for the first input image including the content.

예를 들어, AdaIN 레이어를 통해 생성되는 특징 t는 하기의 수학식 5에 의해 결정될 수 있다.For example, feature t generated through the AdaIN layer can be determined by Equation 5 below.

상기 수학식 5에서, 상기 f(c)함수는 컨텐츠를 포함한 제1 입력 이미지가 VGG 인코더에 입력된 것을 나타내는 함수이고, 상기 f(s)는 스타일을 포함한 제2 입력 이미지가 VGG 인코더에 입력된 것을 나타내는 함수일 수 있다.In Equation 5, the f(c) function is a function indicating that the first input image including content is input to the VGG encoder, and the f(s) is a function indicating that the second input image including style is input to the VGG encoder. It can be a function that represents something.

이후, VGG 인코더를 통해 인코딩된 두 특징 맵을 랜덤하게 초기화된 디코더에 입력시켜, 디코더를 학습시킴으로써, 디코더는 스타일이 합성된 이미지 T(c, s)가 생성할 수 있다. Afterwards, the two feature maps encoded through the VGG encoder are input to a randomly initialized decoder to train the decoder, so that the decoder can generate a style-synthesized image T(c, s).

예를 들어, 복수의 제1 입력 이미지(810)는 복수의 선호 그림 이미지일 수 있다. 예를 들어, 복수의 제2 입력 이미지(820)는 복수의 표준 동적 이미지를 포함할 수 있다. 예를 들어, 서버는 복수의 제1 입력 이미지 및 복수의 제2 입력 이미지를 학습 데이터로 사용하여 선호 그림 이미지가 표준 동적 이미지의 동작으로 합성된 이미지(830)를 획득하도록 생성적 적대 신경망 모델을 학습시킬 수 있다.For example, the plurality of first input images 810 may be a plurality of favorite picture images. For example, the plurality of second input images 820 may include a plurality of standard dynamic images. For example, the server uses a plurality of first input images and a plurality of second input images as learning data to use a generative adversarial network model to obtain an image 830 in which the preferred picture image is synthesized with the operation of a standard dynamic image. It can be learned.

이를 통해, 선호 그림 이미지가 표준 동적 이미지의 동작으로 합성된 이미지를 자연스럽게 생성하도록 애니메이션 생성 모델을 학습시킬 수 있다.Through this, an animation generation model can be trained so that preferred picture images naturally generate images synthesized with the actions of standard dynamic images.

즉, 서버는 하나의 선호 그림 이미지마다 하나의 표준 동적 이미지를 구성하는 복수의 구성 이미지 각각을 생성적 적대 신경망 기반의 뉴럴 네트워크를 이용하는 애니메이션 생성 모델에 입력함으로써, 하나의 선호 그림 이미지가 표준 동적 이미지의 동작으로 합성된 이미지를 획득할 수 있다.That is, the server inputs each of a plurality of constituent images, which constitute one standard dynamic image for each preferred picture image, into an animation generation model using a neural network based on a generative adversarial network, so that one preferred picture image is converted into a standard dynamic image. A synthesized image can be obtained through the operation of .

도 9은 일 실시예에 따른 서버의 구성을 나타내는 블록도이다. 도 9의 일 실시예는 본 개시의 다양한 실시예들과 결합될 수 있다.Figure 9 is a block diagram showing the configuration of a server according to one embodiment. One embodiment of FIG. 9 may be combined with various embodiments of the present disclosure.

도 9에 도시된 바와 같이, 서버(900)는 프로세서(910), 통신부(920) 및 메모리(930)를 포함할 수 있다. 그러나, 도 9에 도시된 구성 요소 모두가 서버(900)의 필수 구성 요소인 것은 아니다. 도 9에 도시된 구성 요소보다 많은 구성 요소에 의해 서버(900)가 구현될 수도 있고, 도 9에 도시된 구성 요소보다 적은 구성 요소에 의해 서버(900)가 구현될 수도 있다. 예를 들어, 일부 실시예에 따른 서버(900)는 프로세서(910), 통신부(920) 및 메모리(930) 이외에 사용자 입력 인터페이스(미도시), 출력부(미도시) 등을 더 포함할 수도 있다.As shown in FIG. 9, the server 900 may include a processor 910, a communication unit 920, and a memory 930. However, not all of the components shown in FIG. 9 are essential components of the server 900. The server 900 may be implemented with more components than those shown in FIG. 9, or the server 900 may be implemented with fewer components than those shown in FIG. 9. For example, the server 900 according to some embodiments may further include a user input interface (not shown), an output unit (not shown), etc. in addition to the processor 910, the communication unit 920, and the memory 930. .

프로세서(910)는, 통상적으로 서버(900)의 전반적인 동작을 제어한다. 프로세서(910)는 하나 이상의 프로세서를 구비하여, 서버(900)에 포함된 다른 구성 요소들을 제어할 수 있다. 예를 들어, 프로세서(910)는, 메모리(930)에 저장된 프로그램들을 실행함으로써, 통신부(920) 및 메모리(930) 등을 전반적으로 제어할 수 있다. 또한, 프로세서(910)는 메모리(930)에 저장된 프로그램들을 실행함으로써, 도 3 내지 도 8에 기재된 서버(900)의 기능을 수행할 수 있다.The processor 910 typically controls the overall operation of the server 900. The processor 910 may include one or more processors and control other components included in the server 900. For example, the processor 910 can generally control the communication unit 920 and the memory 930 by executing programs stored in the memory 930. Additionally, the processor 910 may perform the functions of the server 900 shown in FIGS. 3 to 8 by executing programs stored in the memory 930.

통신부(920)는, 서버(900)가 다른 장치(미도시) 및 서버(미도시)와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 다른 장치(미도시)는 서버(900)와 같은 컴퓨팅 장치이거나, 센싱 장치일 수 있으나, 이에 제한되지 않는다. 통신부(920)는 네트워크를 통해, 다른 전자 장치로부터의 사용자 입력을 수신하거나, 외부 장치로부터 외부 장치에 저장된 데이터를 수신할 수 있다. The communication unit 920 may include one or more components that allow the server 900 to communicate with other devices (not shown) and servers (not shown). The other device (not shown) may be a computing device such as the server 900 or a sensing device, but is not limited thereto. The communication unit 920 may receive a user input from another electronic device or receive data stored in an external device from an external device through a network.

예를 들어, 통신부(920)는 적어도 하나의 장치와 연결을 확립하기 위한 메시지를 송수신할 수 있다. 통신부(920)는 프로세서(910)에서 생성된 정보를 서버와 연결된 적어도 하나의 장치에게 전송할 수 있다. 통신부(920)는 서버와 연결된 적어도 하나의 장치로부터 정보를 수신할 수 있다. 통신부(920)는 적어도 하나의 장치로부터 수신한 정보에 대응하여, 수신한 정보와 관련된 정보를 전송할 수 있다.For example, the communication unit 920 may transmit and receive a message to establish a connection with at least one device. The communication unit 920 may transmit information generated by the processor 910 to at least one device connected to the server. The communication unit 920 may receive information from at least one device connected to the server. The communication unit 920 may transmit information related to the received information in response to information received from at least one device.

메모리(930)는, 프로세서(910)의 처리 및 제어를 위한 프로그램을 저장할 수 있다. 예를 들어, 메모리(930)는 서버에 입력된 정보 또는 네트워크를 통해 다른 장치로부터 수신된 정보를 저장할 수 있다. 또한, 메모리(930)는 프로세서(910)에서 생성된 데이터를 저장할 수 있다. 메모리(930)는 서버(900)로 입력되거나 서버(900)로부터 출력되는 정보를 저장할 수도 있다. The memory 930 may store programs for processing and control of the processor 910. For example, the memory 930 may store information input to a server or information received from another device through a network. Additionally, the memory 930 may store data generated by the processor 910. The memory 930 may store information input to or output from the server 900.

메모리(930)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 930 is a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, SD or XD memory, etc.), and RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , and may include at least one type of storage medium among optical disks.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using one or more general-purpose or special-purpose computers, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with limited drawings as described above, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In a method for a server to provide a multimodal-based English learning service to a user terminal using a neural network,
Receiving a first request message for learning English words from a user terminal;
The first request message includes personal information about the user and information about the user terminal,
determining proceeding method information for the user terminal through a method decision model using a first neural network based on personal information about the user and information about the user terminal;
Transmitting a learning start message including the progress method information to the user terminal;
Multimodal-based English word learning content displayed on the user terminal is progressed based on the progress method information,
The multimodal-based English word learning content includes information about English words, information about definitions of English words, and information about animations related to English words,
Receiving information about the progress of the multimodal-based English word learning content from the user terminal;
determining a learning score for the user terminal through a learning achievement evaluation model using a second neural network based on information about the progress of the multimodal-based English word learning content; and
Including transmitting a learning reward message containing a learning score for the user terminal to the user terminal,
A plurality of English word games are activated for the user terminal based on the learning reward message,
A user vector is generated through data preprocessing of personal information about the user and information about the user terminal,
The user vector includes a value for the user's age, a value for the user's gender, and a value related to the capacity of the user terminal,
Among the plurality of average achievement vectors set in the server, an average achievement vector matching the user vector is determined as a basic achievement vector for the user terminal,
Progress method information for the user terminal is output based on the user vector and the basic achievement vector being input to the method decision model,
The progress method information includes the number of learning steps, the number of repetitions for each learning step, the total progress time for each learning step, and a value for at least one action related to each English word, definition of English word, and animation for each learning step, and information related to the English word for each learning step. It includes playback time, maximum recording time related to English words at each learning stage, playback time related to definitions of English words at each learning stage, and maximum recording time related to definitions of English words at each learning stage,
Information on the progress results of the multimodal-based learning content includes the achievement rate of each learning stage for each English word and a plurality of recorded voice files for each English word,
Through data preprocessing of information on the progress results of the multimodal-based learning content, a first evaluation vector consisting of the achievement rate of each learning stage is generated for each English word,
A plurality of second evaluation vectors and a plurality of third evaluation vectors are generated for each English word through data preprocessing of the plurality of recorded voice files,
The fidelity and speech rate for the user terminal are based on the first evaluation vector, the plurality of second evaluation vectors, and the plurality of third evaluation vectors being input to the learning achievement evaluation model using the second neural network. , Pronunciation accuracy and performance are displayed for each English word,
The learning score for the user terminal is determined based on the sincerity output for each English word, the speech rate, the pronunciation accuracy, and the performance,
Based on the sincerity for a specific English word being more than a preset first standard score, the state of the first object related to the specific English word displayed on the user terminal is changed,
method.

delete

According to clause 1,
Receiving a second request message for one English word game among the plurality of English word games from the user terminal;
Transmitting an access address for the one English word game to the user terminal;
The user terminal participates in one of the English word games through the access address,
Further comprising receiving result information for the one English word game from the user terminal,
The plurality of English word games include one-on-one games, survival games, and team games,
The difficulty level for each of the plurality of English word games is determined based on the learning score for the user terminal,
method.

According to clause 4,
The one-on-one game is a one-on-one game in which the correct answer is first selected from a plurality of examples related to English words,
The team-based game is a game in which multiple teams participate and play bingo consisting of multiple English words on a team-by-team basis,
The survival game is a form in which two or more teams consisting of one or more user terminals participate,
The survival game is a first game in which a team survives by completing the spelling of an English word, collecting the alphabet through exploration of a specific virtual space using a game character corresponding to the one or more user terminals, and collecting the collected alphabets. A second game in which the team that matches English words using the alphabet survives, and a third game in which the team that first matches all of the plural objects hidden in a specific picture with examples of multiple English words survives.
method.