KR101743230B1

KR101743230B1 - Apparatus and method for providing realistic language learning contents based on virtual reality and voice recognition

Info

Publication number: KR101743230B1
Application number: KR1020160046954A
Authority: KR
Inventors: 심재웅
Original assignee: (주)케이디엠티
Priority date: 2016-04-18
Filing date: 2016-04-18
Publication date: 2017-06-05

Abstract

본 발명은 가상현실 및 음성인식 기반의 실감 어학학습 콘텐츠를 제공하기 위한 장치 및 방법에 관한 것이다. 본 발명의 장치는 저장모듈과 영사모듈과 오디오모듈과 음성인식모듈과 콘텐츠 제공모듈을 포함한다. 저장모듈은 가상현실을 재현하기 위한 영상 및 음성을 포함하는 콘텐츠를 저장한다. 영사모듈은 영상을 영사하고, 오디오모듈은 음성을 입력받거나 출력한다. 음성인식모듈은 오디오모듈로부터 수신되는 가상현실의 객체에 대한 사용자의 답변 혹은 질문을 음성인식하여 음성인식 결과를 출력한다. 콘텐츠 제공모듈은 음성인식 결과에 따라 사용자의 답변 혹은 질문을 평가하고, 평가 결과에 대응하여 객체가 사용자의 답변 혹은 질문에 대해 응답하는 음성 및 영상을 포함하는 콘텐츠를 영상모듈 및 오디오모듈이 출력하도록 제공한다. The present invention relates to an apparatus and method for providing realistic language learning contents based on virtual reality and speech recognition. The apparatus includes a storage module, a projection module, an audio module, a speech recognition module, and a content providing module. The storage module stores contents including video and audio for reproducing a virtual reality. The projection module projects an image, and the audio module receives or outputs a voice. The voice recognition module outputs a voice recognition result by voice recognition of a user's answer or question about a virtual reality object received from the audio module. The content providing module evaluates the user's answer or question according to the speech recognition result, and outputs the content including the voice and the video in which the object responds to the user's answer or question in response to the evaluation result, to provide.

Description

TECHNICAL FIELD The present invention relates to an apparatus and method for providing real-life language learning contents based on virtual reality and speech recognition,

본 발명은 어학학습 콘텐츠를 제공하는 기술에 관한 것으로, 특히 가상현실 및 음성인식 기반으로 실감 어학학습 콘텐츠를 제공하기 위한 장치와 방법에 관한 것이다. The present invention relates to a technology for providing language learning contents, and more particularly, to an apparatus and a method for providing real learning language learning contents based on virtual reality and speech recognition.

가상현실(virtual reality: VR)은 컴퓨터 등을 사용한 인공적인 기술로 만들어낸, 실제와 유사하지만 실제가 아닌 특정 환경이나 상황 혹은 그 기술 자체를 의미한다. 이때, 만들어진 가상의 환경이나 상황 등은 사용자의 오감을 자극하며 실제와 유사한 공간적, 시간적 체험을 하게 함으로써 현실과 상상의 경계를 자유롭게 드나들게 한다. 또한 사용자는 가상현실에 단순히 몰입할 뿐만 아니라 실재하는 디바이스를 이용해 조작이나 명령을 가하는 등 가상현실 속에 구현된 것들과 상호작용이 가능하다. Virtual reality (VR) refers to a specific environment or situation, or technology itself, that is created by artificial techniques using computers and the like, but is not actual. At this time, the created virtual environment or situation stimulates the user's five senses and makes the user experience the spatial and temporal experiences similar to reality, thereby freely bringing the boundary between reality and imagination free. Users can not only simply immerse themselves in virtual reality, but they can also interact with things implemented in virtual reality, such as manipulating or commanding them using real devices.

본 발명의 목적은 가상현실을 통해 사용자가 직접 외국인을 만나고 음성인식을 통해 실제 외국인과 대화하는 것과 같은 체험을 제공할 수 있는 장치 및 방법을 제공하기 위한 것이다. SUMMARY OF THE INVENTION An object of the present invention is to provide an apparatus and a method that can provide an experience such that a user directly interacts with a foreigner through voice recognition through a virtual reality.

상기 목적을 달성하기 위하여, 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 장치는, 가상현실을 재현하기 위한 영상 및 음성을 포함하는 콘텐츠를 저장하는 저장모듈과, 영상을 영사하기 위한 영사모듈과, 음성을 입력받거나 출력하기 위한 오디오모듈과, 상기 오디오모듈로부터 수신되는 가상현실의 객체에 대한 사용자의 답변 혹은 질문을 음성인식하여 음성인식 결과를 출력하는 음성인식모듈과, 상기 음성인식 결과에 따라 상기 사용자의 답변 혹은 질문을 평가하고, 상기 평가 결과에 대응하여 상기 객체가 사용자의 답변 혹은 질문에 대해 응답하는 음성 및 영상을 포함하는 콘텐츠를 상기 영상모듈 및 상기 오디오모듈이 출력하도록 제공하는 콘텐츠 제공모듈을 포함한다. In order to achieve the above object, an apparatus for providing a learning content according to an embodiment of the present invention includes a storage module for storing a content including a video and a voice for reproducing a virtual reality, a projection module A voice recognition module for outputting a voice recognition result by voice recognition of a user's answer or question about an object of a virtual reality received from the audio module; The audio module and the audio module output a content including a voice and an image in which the object responds to a user's answer or a question in response to the evaluation result, And a providing module.

상기 목적을 달성하기 위하여, 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 장치는, 영상을 출력하기 위한 영사모듈과, 음성을 출력하기 위한 오디오모듈과, 사용자의 두부에 착용되는 가상현실 재현부의 움직임을 감지하고, 감지된 움직임에 따른 좌표를 출력하는 센서모듈과, 상기 좌표에 따른 사용자의 시야를 검출하고, 검출된 시야에 상응하는 영상 및 음성을 포함하는 콘텐츠를 구성하고, 구성된 콘텐츠를 상기 영상모듈 및 상기 오디오모듈이 출력하도록 제공하는 콘텐츠 제공모듈을 포함한다. According to another aspect of the present invention, there is provided an apparatus for providing a learning content, the apparatus comprising: a projection module for outputting an image; an audio module for outputting audio; A sensor module for detecting a motion of the user and outputting coordinates according to the detected movement, a user detecting a view of the user according to the coordinates, constituting a content including a video and a voice corresponding to the detected view, And a content providing module for providing the video module and the audio module to output.

상기 목적을 달성하기 위하여, 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 방법은, 오디오모듈이 수신되는 가상현실의 객체에 대한 사용자의 답변 혹은 질문을 입력 받은 단계와, 음성인식모듈이 상기 사용자의 답변 혹은 질문을 음성인식하여 음성인식 결과를 출력하는 단계와, 콘텐츠 제공모듈이 상기 음성인식 결과에 따라 상기 사용자의 답변 혹은 질문을 평가하고, 상기 평가 결과에 대응하여 상기 객체가 사용자의 답변 혹은 질문에 대해 응답하는 영상 및 음성을 포함하는 콘텐츠를 구성하는 단계와, 영상모듈 및 오디오모듈이 상기 콘텐츠를 출력하는 단계를 포함한다. According to an aspect of the present invention, there is provided a method for providing a learning content, the method comprising: receiving a user's answer or a question about an object of a virtual reality in which an audio module is received; A step of outputting a speech recognition result by voice recognition of a user's answer or a question; a step of evaluating an answer or a question of the user according to a result of speech recognition by the content providing module; Or constructing a content including a video and a voice in response to a question, and outputting the content by a video module and an audio module.

상기 목적을 달성하기 위하여, 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 방법은, 센서모듈이 사용자의 두부에 착용되는 가상현실 재현부의 움직임을 감지하고, 감지된 움직임에 따른 좌표를 출력하는 단계와, 콘텐츠 제공모듈이 상기 좌표에 따른 사용자의 시야를 검출하고, 검출된 시야에 상응하는 영상 및 음성을 포함하는 콘텐츠를 구성하는 단계와, 상기 영상모듈 및 상기 오디오모듈이 상기 구성된 콘텐츠의 영상 및 음성을 출력하는 단계를 포함한다. In order to achieve the above object, a method for providing a learning content according to an embodiment of the present invention includes sensing a movement of a virtual reality reproducing unit worn by a sensor module of a user and outputting coordinates according to a sensed movement Detecting a user's view according to the coordinates and constructing a content including video and audio corresponding to the detected view; and displaying the video of the configured content And outputting the voice.

본 발명에 따르면, 가상현실 및 음성인식을 통해 사용자가 직접 외국인을 만나고 대화하는 것과 같은 체험을 제공하여 외국에 나가지 않더라도 언어 학습의 효과를 극대화시킬 수 있다. According to the present invention, it is possible to maximize the effect of language learning even if the user does not go abroad by providing experiences such as meeting and interacting with foreigners directly through virtual reality and speech recognition.

도 1은 본 발명의 실시예에 따른 콘텐츠 제공 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 콘텐츠 제공부 및 가상현실 재현부의 구성을 설명하기 위한 블록도이다.
도 3 내지 도 6은 본 발명의 실시예에 따른 영상을 설명하기 위한 도면이다.
도 7 내지 도 12는 본 발명의 실시예에 따른 객체영상을 설명하기 위한 도면이다.
도 13은 본 발명의 일 실시예에 따른 콘텐츠를 제공하기 위한 방법을 설명하기 위한 흐름도이다.
도 14는 본 발명의 실시예에 따른 학습 콘텐츠 제공 방법을 설명하기 위한 흐름도이다.
도 15는 본 발명의 실시예에 따른 음성인식을 이용한 학습 콘텐츠를 제공하기 위한 방법을 설명하기 위한 흐름도이다.
도 16은 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 청취 객체를 설명하기 위한 화면 예이다.
도 17은 본 발명의 실시예에 따른 음성 명령을 설명하기 위한 화면 예이다.
도 18은 본 발명의 일 실시예에 따른 상호작용에 따라 학습 콘텐츠를 제공하는 방법을 설명하기 위한 도면이다.
도 19는 본 발명의 다른 실시예에 따른 상호작용에 따라 학습 콘텐츠를 제공하는 방법을 설명하기 위한 도면이다.
도 20은 본 발명의 실시예에 따른 영상 구성 방법을 설명하기 위한 도면이다.
도 21은 본 발명의 실시예에 따른 사용자와 원어민의 발음, 강세 및 억양을 비교하기 위한 그래프의 예를 보인다. 1 is a diagram for explaining a configuration of a content providing apparatus according to an embodiment of the present invention.
2 is a block diagram for explaining a configuration of a content providing unit and a virtual reality reproducing unit according to an embodiment of the present invention.
3 to 6 are views for explaining an image according to an embodiment of the present invention.
7 to 12 are views for explaining an object image according to an embodiment of the present invention.
13 is a flow chart illustrating a method for providing content according to an embodiment of the present invention.
14 is a flowchart for explaining a learning content providing method according to an embodiment of the present invention.
15 is a flowchart illustrating a method for providing learning contents using speech recognition according to an embodiment of the present invention.
FIG. 16 is an example of a screen for explaining a listening object for providing learning contents according to an embodiment of the present invention.
17 is a diagram illustrating an example of a voice command according to an embodiment of the present invention.
18 is a diagram for explaining a method of providing learning contents according to an interaction according to an embodiment of the present invention.
19 is a diagram for explaining a method of providing learning contents according to an interaction according to another embodiment of the present invention.
FIG. 20 is a diagram for explaining an image configuration method according to an embodiment of the present invention.
FIG. 21 shows an example of a graph for comparing pronunciation, intensification, and intonation of a user and a native speaker according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 외국어, 예컨대, 영어를 배우고자 하는 사용자가 가상의 상대방과 외국어로 대화할 수 있는 체험을 제공하기 위한 가상현실을 제공한다. 이를 위한 본 발명의 실시예에 따른 가상현실 기반의 학습 콘텐츠를 제공하기 위한 장치에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 학습 콘텐츠 제공 장치의 구성을 설명하기 위한 도면이다. 도 1을 참조하면, 본 발명의 실시예에 따른 학습 콘텐츠 제공 장치는 콘텐츠 제공부(100) 및 가상현실 재현부(200)를 포함한다. 그리고 학습 콘텐츠 제공 장치는 선택적으로 표시부(300)를 더 포함할 수 있다. The present invention provides a virtual reality in which a user who wants to learn a foreign language, for example, English, can provide an experience to speak with a virtual partner in a foreign language. An apparatus for providing a virtual reality-based learning content according to an embodiment of the present invention will now be described. 1 is a diagram for explaining a configuration of a learning contents providing apparatus according to an embodiment of the present invention. Referring to FIG. 1, a learning content providing apparatus according to an embodiment of the present invention includes a content providing unit 100 and a virtual reality reproducing unit 200. The learning content providing apparatus may further include a display unit 300. [

콘텐츠 제공부(100)는 본 발명의 실시예에 따른 학습 콘텐츠를 가상현실 재현부(200)에 제공한다. 여기서, 콘텐츠는 가상현실(VR: Virtual Reality)을 시각적으로 표현하는 영상 및 청각적으로 표현하는 음성을 포함한다. 또한, 콘텐츠 제공부(100)는 가상현실 재현부(200)에 제공하는 것과 동일한 콘텐츠를 표시부(300)에 제공할 수도 있다. 일 실시예에 따르면, 콘텐츠 제공부(100)는 셋탑박스(set-top box)로 구현될 수 있으며, 이러한 경우, 콘텐츠 제공부(100)는 HDMI 분배기를 통해 가상현실 재현부(200)에 제공하는 것과 동일한 콘텐츠를 표시부(300)에 제공할 수 있다. 다른 실시예에 따르면, 가상현실 재현부(200)는 콘텐츠 제공부(100)로부터 수신한 콘텐츠와 동일한 콘텐츠를 표시부(300)에 제공할 수 있다. The contents providing unit 100 provides the virtual reality reproducing unit 200 with the learning contents according to the embodiment of the present invention. Here, the content includes an image for visually expressing a virtual reality (VR) and a sound for expressing audibly. In addition, the content providing unit 100 may provide the same content to the display unit 300 as that provided to the virtual reality reproducing unit 200. [ According to one embodiment, the content providing unit 100 may be implemented as a set-top box. In this case, the content providing unit 100 may provide the virtual reality reproducing unit 200 with an HDMI distributor It is possible to provide the same content to the display unit 300. [ According to another embodiment, the virtual reality reproducing unit 200 may provide the display unit 300 with the same content as the content received from the content providing unit 100. [

가상현실 재현부(200)는 사용자의 음성을 입력받고, 입력된 음성을 콘텐츠 제공부(100)에 전달하며 콘텐츠 제공부(100)는 전달받은 음성에 상응하는 콘텐츠를 제공한다. 그러면, 가상현실 재현부(200)는 사용자가 가상현실을 체감할 수 있도록 콘텐츠를 출력한다. 가상현실 재현부(200)는 대표적으로, HMD(Head Mounted Display)를 예시할 수 있다. 따라서 사용자는 가상현실 재현부(200)를 두부에 착용하여 콘텐츠를 가상현실로 체험할 수 있다. The virtual reality reproducing unit 200 receives a voice of a user and delivers the input voice to the content providing unit 100, and the content providing unit 100 provides a content corresponding to the received voice. Then, the virtual reality reproducing unit 200 outputs the content so that the user can experience the virtual reality. The virtual reality reproducing unit 200 may represent an HMD (Head Mounted Display). Accordingly, the user can experience the contents as a virtual reality by wearing the virtual reality reproducing unit 200 on the head.

표시부(300)는 콘텐츠 제공부(100)가 제공하는 콘텐츠를 출력한다. 이러한 표시부(300)는 스피커를 포함하는 디스플레이 장치, 예컨대 모니터가 될 수 있다. 표시부(300)는 액정 표시 장치(LCD: Liquid Crystal Display), 유기 발광 다이오드(OLED: Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED: Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. The display unit 300 outputs the content provided by the content providing unit 100. [ The display unit 300 may be a display device including a speaker, for example, a monitor. The display unit 300 may include a liquid crystal display (LCD), an organic light emitting diode (OLED), and an active matrix organic light emitting diode (AMOLED).

이하, 본 발명의 실시예에 따른 콘텐츠 제공부(100) 및 가상현실 재현부(200)의 구성에 대해서 설명한다. 도 2는 본 발명의 실시예에 따른 콘텐츠 제공부 및 가상현실 재현부의 구성을 설명하기 위한 블록도이다. 도 2를 참조하면, 콘텐츠 제공부(100)는 음성인식모듈(110), 콘텐츠 제공모듈(120), 및 저장모듈(130)을 포함한다. 그리고 가상현실 재현부(200)는 영사모듈(210), 오디오모듈(220), 및 센서모듈(230)을 포함한다. Hereinafter, the configuration of the content providing unit 100 and the virtual reality reproducing unit 200 according to the embodiment of the present invention will be described. 2 is a block diagram for explaining a configuration of a content providing unit and a virtual reality reproducing unit according to an embodiment of the present invention. 2, the content providing unit 100 includes a voice recognition module 110, a content providing module 120, and a storage module 130. [ The virtual reality reproducing unit 200 includes a projection module 210, an audio module 220, and a sensor module 230.

음성인식모듈(110)은 가상현실 재현부(200)의 오디오모듈(220)로부터 사용자의 음성을 수신하여, 수신된 음성을 인식한다. 음성인식모듈(110)은 오디오모듈(220)로부터 음성이 입력되면, 음성으로부터 특징 벡터를 추출하여 음성 벡터를 생성한다. 그런 다음, 음성인식모듈(110)은 음성 벡터에 대해 음향 모델, 발음 사전, 및 언어 모델을 기초로 형성된 탐색 공간에서 음성인식을 수행한다. 이러한 음성인식은 음소 단위로 이루어진다. 또한, 음성인식의 결과는 1-best 또는 N-best의 인식 결과가 될 수 있으나, N-best의 인식 결과가 바람직하다. The voice recognition module 110 receives the user's voice from the audio module 220 of the virtual reality reproducing unit 200 and recognizes the received voice. The voice recognition module 110 extracts a feature vector from the voice and generates a voice vector when the voice is input from the audio module 220. The speech recognition module 110 then performs speech recognition on the speech vector in a search space formed on the basis of an acoustic model, a pronunciation dictionary, and a language model. This speech recognition is performed on a phoneme basis. In addition, the result of speech recognition may be a recognition result of 1-best or N-best, but a recognition result of N-best is preferable.

콘텐츠 제공모듈(120)은 마이크(MIC)를 통해 입력된 사용자의 음성을 수신하거나, 혹은 센서모듈(230)을 통해 감지된 가상현실 재현부(200)의 움직임, 즉, 사용자의 두부의 움직임의 좌표를 수신하면, 그 음성 혹은 움직임의 좌표에 대응하는 콘텐츠를 가상현실 재현부(200)에 제공하는 역할을 수행한다. 이러한 컨텐츠제공모듈(120)은 영상 및 음성을 인코딩하거나 디코딩하기 위한 인코더 혹은 디코더를 포함한다. 특히, 인코더 및 디코더는 하드웨어이거나 소프트웨어로 구현될 수 있다. The content providing module 120 receives the voice of the user inputted through the microphone MIC or receives the voice of the virtual reality reproducing unit 200 sensed through the sensor module 230, And provides the virtual reality reproducing unit 200 with a content corresponding to the coordinates of the voice or movement. The content providing module 120 includes an encoder or a decoder for encoding or decoding video and audio. In particular, the encoder and decoder may be implemented in hardware or software.

저장모듈(130)은 콘텐츠 제공부(100)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행하며, 프로그램 영역과 데이터 영역으로 구분될 수 있다. 프로그램 영역은 콘텐츠 제공부(100)의 전반적인 동작을 제어하는 프로그램 및 콘텐츠 제공부(100)를 부팅시키는 운영체제(OS: Operating System), 응용 프로그램 등을 저장할 수 있다. 데이터 영역은 콘텐츠 제공부(100)의 사용에 따라 발생하는 사용자 데이터가 저장되는 영역이다. 또한, 저장모듈(130)은 본 발명의 실시예에 따른 콘텐츠를 저장한다. 이러한 콘텐츠는 영상 및 음성을 포함한다. 저장모듈(130)에 저장되는 각종 데이터는 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage module 130 stores programs and data necessary for the operation of the contents providing unit 100, and can be divided into a program area and a data area. The program area may store a program for controlling an overall operation of the contents providing unit 100 and an operating system (OS) for booting the contents providing unit 100, an application program, and the like. The data area is an area where user data generated according to use of the contents providing unit 100 is stored. The storage module 130 also stores content according to an embodiment of the present invention. Such content includes video and audio. Various data stored in the storage module 130 can be deleted, changed, or added according to a user's operation.

음성인식모듈(110)과 콘텐츠 제공모듈(120)은 예컨대, 중앙 처리 장치(CPU: Central Processing Unit), 어플리케이션 프로세서(AP: Application Processor), 마이크로컨트롤러(micro-controller), 그래픽 처리 장치(GPU: Graphics Processing Unit), 디지털 신호 처리기(DSP: Digital Signal Processor) 등을 통해 구현될 수 있다. 또한, 저장모듈(130)은 HDD(Hard Disk Drive), SDD(Solidstate Disk Drive), RAM(Read Access Memory), ROM(Read Only Memory), FLASH, EEPROM 등을 통해 구현될 수 있다. The speech recognition module 110 and the content providing module 120 may be implemented by a central processing unit (CPU), an application processor (AP), a micro-controller, a graphics processing unit (GPU) Graphics Processing Unit), a digital signal processor (DSP), or the like. The storage module 130 may be implemented by a hard disk drive (HDD), a solid state disk drive (SDD), a read access memory (RAM), a read only memory (ROM), a flash memory, or an EEPROM.

영사모듈(210)은 도시되지는 않았지만, 프로젝터와 광학계를 포함한다. 프로젝터는 마이크로 디스플레이 패널을 통해 영상을 발산하며, 광학계는 그 영상을 반사시키거나 굴절시켜 사용자의 눈(동공)에 영사한다. 이로써, 사용자는 가상현실을 나타내는 확대된 영상을 볼 수 있다. The projection module 210 includes a projector and an optical system, although not shown. The projector emits an image through the microdisplay panel, and the optical system reflects or refracts the image and projects it onto the user's eye (pupil). Thereby, the user can view the enlarged image representing the virtual reality.

오디오모듈(220)은 마이크(MIC) 및 스피커(SPK)를 포함한다. 오디오모듈(220)은 마이크(MIC)를 통해 사용자의 음성을 입력받고, 입력된 음성을 음성인식모듈(310)에 제공한다. 또한, 오디오모듈(220)은 콘텐츠 제공모듈(320)로부터 입력되는 음성을 스피커(SPK)를 통해 출력한다. The audio module 220 includes a microphone MIC and a speaker SPK. The audio module 220 receives a user's voice through a microphone (MIC), and provides the input voice to the voice recognition module 310. Also, the audio module 220 outputs the voice input from the content providing module 320 through the speaker SPK.

센서모듈(230)은 가상현실 재현부(200)의 움직임을 감지하기 위한 것이다. 가상현실 재현부(200)는 사용자의 두부에 착용되기 때문에 센서모듈(230)을 통해 가상현실 재현부(200)의 움직임을 감지하면, 사용자의 시야를 도출할 수 있다. 이를 위하여, 센서모듈(230)은 가상현실 재현부(200)의 움직임을 감지하여, 감지된 움직임을 좌표로 콘텐츠 제공모듈(320)에 제공한다. 센서모듈(230)이 콘텐츠 제공모듈(320)에 제공하는 좌표는 3차원 직교좌표계의 x, y, z와, 요(yaw), 피치(pitch), 롤(roll)을 포함한다. 센서모듈(230)은 하나 이상의 센서를 통해 구현될 수 있으며, 이러한 센서는 가속도센서(accelerometer), 자이로센서(gyroscope), 지자기센서(magnetometer) 등이 될 수 있다. The sensor module 230 is for sensing movement of the virtual reality reproducing unit 200. Since the virtual reality reproducing unit 200 is worn on the head portion of the user, the user can see his / her field of view when he / she senses the motion of the virtual reality reproducing unit 200 through the sensor module 230. For this, the sensor module 230 senses the motion of the virtual reality reproducing unit 200 and provides the sensed motion to the content providing module 320 in coordinates. The coordinates that the sensor module 230 provides to the content providing module 320 include x, y, z in the three-dimensional Cartesian coordinate system and yaw, pitch, and roll. The sensor module 230 may be implemented through one or more sensors, and the sensor may be an accelerometer, a gyroscope, a magnetometer, or the like.

다음으로, 본 발명의 실시예에 따른 가상현실을 표현하는 배경영상(V)에 대해서 설명하기로 한다. 도 3 내지 도 6은 본 발명의 실시예에 따른 영상을 설명하기 위한 도면이다. Next, a background image V representing a virtual reality according to an embodiment of the present invention will be described. 3 to 6 are views for explaining an image according to an embodiment of the present invention.

도 3에서 도면부호 S는 가상공간을 나타낸다. 본 발명의 실시예에 따른 가상현실은 사용자가 가상현실 재현부(200)를 착용하였을 때, 가상공간(S) 상의 위치 U에 있는 것과 같은 가상의 체험을 할 수 있도록 제공된다. 이를 위하여, 위치 U에서 가상공간(S)을 전방향으로 촬영하여 배경영상(V)을 생성한다. 특히, 배경영상(V)은 현실감을 위해 사용자의 시야에 따라 가상공간(S)의 일부만이 제공된다. 현실 세계에서 사용자가 시선을 돌리면 그 사용자의 시야도 변경된다. 이와 마찬가지로, 본 발명에 따르면, 사용자의 변경되는 시야에 맞춰 가상공간(S)의 일부를 배경영상(V)으로 제공한다. In Fig. 3, reference symbol S denotes a virtual space. The virtual reality according to the embodiment of the present invention is provided such that when a user wears the virtual reality reproducing unit 200, the user can experience a virtual experience such as being in the position U on the virtual space S. To this end, the background image V is generated by photographing the virtual space S in the forward direction at the position U. Particularly, the background image V is provided only a part of the virtual space S according to the view of the user for realism. In the real world, when a user turns his or her gaze, the user's view changes. Likewise, according to the present invention, a part of the virtual space S is provided as the background image V in accordance with the changed view field of the user.

본 발명은 언어를 학습하고자 하는 사용자가 가상공간(S)에서 학습하고자 하는 언어를 가상현실을 체험할 수 있도록 한다. 주제에 따라 다양한 가상공간(S)이 제공되며, 각 주제는 특정 장소 혹은 상황에 따라 정해질 수 있다. 예컨대, 주제는 특정 장소에 따라 공항, 식당, 공연장, 호텔, 지하철, 도시, 놀이시설, 명승지 등이 될 수 있다. 혹은, 다른 예로, 주제는 특정 영화의 패러디 상황, 특정 드라마의 패러디 상황 등이 될 수 있다. 즉, 본 발명의 실시예에 따른 학습 콘텐츠는 그 주제에 따라 특정 장소 혹은 상황을 가상현실로 재현하기 위해 각 주제에 상응하는 가상공간(S)을 가진다. 그리고 이러한 가상공간(S) 중의 일부를 시야에 따라 배경영상(V)으로 제공한다. 이에 따라, 사용자는 배경영상(V)을 통해 주제, 즉 어떤 장소인지 혹은 어떤 상황인지를 직관적으로 인지할 수 있다. 도 4에 보인 예는 지하철을 가상현실로 재현하기 위해 사용되는 배경영상(V)의 예를 보인다. 배경영상(V)의 지하철의 차량, 지하철 플랫폼을 통해 사용자는 자신이 지하철에 관련된 가상현실을 체험할 것임을 인지할 수 있다. The present invention allows a user who wishes to learn a language to experience a virtual reality in a language to be learned in a virtual space (S). Depending on the subject, various virtual spaces (S) are provided, and each subject can be defined according to a specific place or situation. For example, the subject may be an airport, a restaurant, a performance hall, a hotel, a subway, a city, a playground, a scenic spot, etc., depending on a specific place. Alternatively, the subject may be a parody situation of a particular movie, a parody situation of a particular drama, or the like. That is, the learning contents according to the embodiment of the present invention has a virtual space S corresponding to each subject in order to reproduce a specific place or situation as a virtual reality according to the subject. A part of the virtual space S is provided as a background image V according to the visual field. Accordingly, the user can intuitively recognize the subject, that is, a place or a certain situation, through the background image (V). The example shown in FIG. 4 shows an example of a background image (V) used to reproduce a subway as a virtual reality. Through the vehicle, subway platform of the background video (V), the user can recognize that he will experience the virtual reality related to the subway.

본 발명의 실시예에 따르면, 가상현실 재현부(200)는 사용자의 두부에 착용되기 때문에 가상현실 재현부(200)의 움직임을 감지하고 가상현실 재현부(200)의 움직임에 따라 사용자의 시선(시야)의 변화를 추정한다. 이를 위하여, 가상현실 재현부(200)에 내장되는 센서모듈(230)을 이용한다. 도 5에 보인 바와 같이, 센서모듈(230)은 가상현실 재현부(200)의 움직임을 감지하여 그 움직임을 좌표(x, y, z, yaw, pitch, roll)로 출력한다. 그러면, 콘텐츠 제공모듈(120)은 좌표(x, y, z, yaw, pitch, roll)로부터 시선의 방향을 검출하고, 그 시선에 따른 시야에 따라 배경영상(V)을 구성한다. 좀 더 자세히 설명하면, 사람 눈의 좌우 시야각은 약 120도 정도이지만, 가상현실 재현부(200)가 제공하는 배경영상(V)은 가상현실 재현부(200)가 제공하는 시야각에 따른다. 이러한 시야각은 영사모듈(210)의 사양, 즉 프로젝터 및 광학계의 사양(spec)에 따른다. 따라서 콘텐츠 제공모듈(120)은 센서모듈(230)로부터 수신된 좌표(x, y, z, yaw, pitch, roll)를 이용하여 초점(Focus Point: FP)으로부터 시선의 방향(Direction of View: DoV)을 구하고, 그 시선의 중심으로부터 영사모듈(210)이 제공 가능한 시야각(Field of View angle: FoV)에 따라 가상공간(S)으로부터 배경영상(V)을 추출한다. 예컨대, 도 5에 보인바와 같이, 영사모듈(210)의 사양에 따라 좌우 시야각(FoV)이 약 115도이고, 상하 시야각(FoV)이 90도라고 가정한다. 콘텐츠 제공모듈(320)은 좌표(x, y, z, yaw, pitch, roll)로부터 시선의 방향(DoV)을 도출하고, 그 시선의 방향(DoV)로부터 요, 롤, 및 피치(yaw, pitch, roll)에 따라 좌우 시야각(FoV) 115도 그리고 상하 시야각(FoV) 90도 만큼의 시야에 속한 가상공간(S)을 추출하여 배경영상(V)으로 구성한다. According to the embodiment of the present invention, since the virtual reality reproducing unit 200 is worn on the head of the user, the virtual reality reproducing unit 200 detects the movement of the virtual reality reproducing unit 200 and detects the movement of the user's visual field Estimate the change. For this, a sensor module 230 embedded in the virtual reality reproducing unit 200 is used. 5, the sensor module 230 senses the motion of the virtual reality reproducing unit 200 and outputs the motion as coordinates (x, y, z, yaw, pitch, roll). Then, the content providing module 120 detects the direction of the line of sight from the coordinates (x, y, z, yaw, pitch, roll) and configures the background image V according to the view according to the line of sight. The background image V provided by the virtual reality reproducing unit 200 is in accordance with the viewing angle provided by the virtual reality reproducing unit 200. In other words, This viewing angle depends on the specifications of the projection module 210, that is, the specifications of the projector and the optical system. Accordingly, the content providing module 120 can obtain a direction of view (DoV) from the focus point (FP) using coordinates (x, y, z, yaw, pitch, roll) received from the sensor module 230 And extracts the background image V from the virtual space S in accordance with the field of view angle FoV provided by the projection module 210 from the center of the line of sight. 5, it is assumed that the left and right viewing angles FoV and FoV are about 115 degrees and 90 degrees, respectively, according to the specification of the projection module 210. [ The content providing module 320 derives the direction of the line of sight from the coordinates (x, y, z, yaw, pitch, roll) and extracts yaw, roll, and pitch and the virtual space S belonging to the field of view having the left and right viewing angles FoV of 115 degrees and the upper and lower viewing angles FoV and FoV of 90 degrees is extracted and constituted as a background image V according to the rolls.

배경영상(V)을 구성한 후, 콘텐츠 제공모듈(320)은 배경영상(V)을 영사모듈(210)에 제공하며, 영사모듈(210)은 그 배경영상(V)을 영사한다. 따라서 가상현실 재현부(200)를 착용한 사용자는 자신이 고개를 돌려 시선을 변경하면, 그 시선의 변경에 상응하는 시야에 잡히는 배경영상(V)을 볼 수 있다. 예컨대, 도 6에 보인 바와 같이, 시야가 초기 영상(V_initial)에 있을 때, 사용자가 고개를 왼쪽으로 돌리면, 센서모듈(230)은 요(yaw)의 값이 변경된 좌표를 출력하며, 이에 따라, 콘텐츠 제공모듈(320)은 요(yaw)가 변경된 배경영상(V_yaw)을 구성하고, 영사모듈(210)은 이를 영사한다. 마찬가지로, 사용자가 고개를 왼쪽으로 숙이면, 센서모듈(230)은 롤(roll)의 값이 변경된 좌표를 출력하며, 이에 따라, 콘텐츠 제공모듈(320)은 롤(roll)이 변경된 배경영상(V_roll)을 구성하고, 영사모듈(210)은 이를 영사한다. 또한, 사용자가 고개를 앞으로 숙이면, 센서모듈(230)은 피치(pitch)의 값이 변경된 좌표를 출력하며, 이에 따라, 콘텐츠 제공모듈(320)은 피치(pitch)가 변경된 배경영상(V_pitch)을 구성하고, 영사모듈(210)은 이를 영사한다. 이에 따라 사용자는 가상공간(S)에 있는 것과 같은 가상현실을 체험할 수 있다. After configuring the background image V, the content providing module 320 provides the background image V to the projection module 210, and the projection module 210 projects the background image V. FIG. Therefore, the user wearing the virtual reality reproducing unit 200 can see the background image V captured in the field of view corresponding to the change of the line of sight when the user changes his / her line of sight by turning his or her head. For example, as shown in FIG. 6, when the user is in the initial image (V_initial), when the user turns his / her head to the left, the sensor module 230 outputs coordinates in which the value of yaw has changed, The content providing module 320 forms a background image V_yaw with a changed yaw, and the projection module 210 projects the background image V_yaw. Similarly, when the user leans his / her head to the left, the sensor module 230 outputs the coordinate at which the value of the roll has been changed. Accordingly, the content providing module 320 outputs the background image V_roll, And the projection module 210 projects it. In addition, when the user pushes his / her head forward, the sensor module 230 outputs the coordinates at which the pitch value has been changed. Accordingly, the content providing module 320 outputs the background image V_pitch having a changed pitch And the projection module 210 projects it. Accordingly, the user can experience the virtual reality as in the virtual space S.

한편, 본 발명의 실시예에 따른 콘텐츠의 영상은 배경영상(V)과 더불어 객체영상(O)을 포함한다. 다른 말로, 영상은 배경영상(V)에 객체영상(O)이 정합된 것이다. 이러한 객체영상(O)에 대해서 설명하기로 한다. 도 7 내지 도 12는 본 발명의 실시예에 따른 객체영상(O)을 설명하기 위한 도면이다. Meanwhile, the image of the content according to the embodiment of the present invention includes the background image V and the object image O. In other words, the image is the object image (O) matched to the background image (V). The object image O will be described below. 7 to 12 are views for explaining an object image O according to an embodiment of the present invention.

객체영상(O)은 본 발명의 실시예에 따른 객체(Object)를 표현하기 위한 것이다. 도 7에 보인 바와 같이, 객체영상(O)은 크로마키 기법에 따라 배경영상(V)과 별개로 촬영되며, 촬영된 객체영상(O)은 배경영상(V)에 정합되어 콘텐츠의 영상을 이룬다. 본 발명의 실시예에서 객체는 대화 객체, 단어 객체, 및 이동 객체를 포함한다. The object image O is for representing an object according to an embodiment of the present invention. 7, the object image O is photographed separately from the background image V according to the chroma key technique, and the photographed object image O is registered with the background image V to form an image of the content . In an embodiment of the present invention, an object includes a dialog object, a word object, and a moving object.

대화 객체는 가상공간(S)에 존재하는 가상의 캐릭터로, 본 발명의 실시예에 따라 사용자와 대화할 수 있다. 즉, 대화 객체는 가상의 대화 상대이다. 대화 객체는 주로 인물이지만, 동물, 혹은 무생물이 의인화된 캐릭터가 될 수도 있다. 일례로, 주제가 지하철인 경우, 승객, 역무원, 경찰 등이 대화 객체가 될 수 있다. The conversation object is a virtual character existing in the virtual space S, and can talk with the user according to an embodiment of the present invention. That is, the conversation object is a virtual conversation partner. The conversation object is mainly a person, but it may be an animal or an inanimate character. For example, if the subject is a subway, a passenger, a station attendant, a police officer, etc. may be a conversation object.

도 8 및 도 9는 주제가 지하철인 경우의 배경영상(V)을 보이며, 경찰(51)이 대화 객체인 객체영상(O)이 정합된 화면 예를 보인다. 사용자는 이러한 대화 객체와 본 발명의 실시예에 따라 대화를 할 수 있다. 이러한 대화 방법은 아래에서 보다 상세하게 설명될 것이다. 8 and 9 show an example of a screen in which the background image (V) is displayed when the subject is a subway, and the object image O, which is a conversation object, is registered. The user can interact with such a conversation object according to an embodiment of the present invention. This method of conversation will be described in more detail below.

단어 객체는 단어의 의미를 설명하는 캐릭터이다. 이러한 단어 객체는 사용자가 단어의 의미를 직관적으로 이해할 수 있도록 돕는다. 단어 객체는 해당 단어가 사물의 명칭을 나타내는 명사인 경우, 그 사물이 될 수 있다. 예컨대, 도 8 및 도 9에 도시된 바와 같이, 단어 객체는 '표지판(signs)'(53)이 될 수 있다. 그리고 도 10의 단어 객체는 '개찰구(turnstile)'(55)이다. 또한, 단어 객체는 해당 단어가 동사, 형용사, 부사, 등의 무형의 것인 경우, 해당 단어의 의미를 재현하는 주체가 될 수 있다. 도 11에서 단어 객체는 단어 "read"의 의미를 설명하는 캐릭터(57)이다. 여기서, 단어 객체는 "read"의 의미를 설명하기 위하여 책을 읽고 있는 가상의 인물(57)이 채택된다. A word object is a character that describes the meaning of a word. These word objects help the user to intuitively understand the meaning of words. A word object can be an object if the word is a noun that represents the name of the object. For example, as shown in FIGS. 8 and 9, the word object may be 'signs' 53. And the word object of FIG. 10 is a 'turnstile' (55). In addition, a word object can be a subject that reproduces the meaning of the word when the word is an intangible such as a verb, adjective, adverb, etc. In Fig. 11, the word object is a character 57 that describes the meaning of the word "read ". Here, the word object adopts the imaginary person 57 reading the book to explain the meaning of "read ".

이동 객체는 사용자가 가상공간(S) 내에서 이동할 수 있도록 하기 위한 것이다. 이러한 이동 객체는 사용자가 움직일 수 있는 방향을 나타내는 표지의 형태를 가진다. 사용자는 가상공간(S)에서 이동 객체를 통해 이동 객체가 지시하는 방향으로 이동할 수 있다. 예컨대, 도 12의 이동 객체는 위쪽 방향을 지시하는 화살표(59)이며, 이러한 이동 객체가 활성화되면, 사용자는 위쪽으로 이동할 수 있다. 이러한 이동은 사용자가 직접 이동하는 것이 아니라, 배경영상(V)을 이동시켜 사용자가 계단을 통해 위로 걸어가는 것과 같은 가상 체험을 제공하는 것이다. The moving object is intended to allow the user to move within the virtual space S. Such a moving object has a form of a mark indicating a direction in which the user can move. The user can move in the direction indicated by the moving object through the moving object in the virtual space S. [ For example, the moving object in Fig. 12 is an arrow 59 indicating the upward direction, and when this moving object is activated, the user can move up. This movement is not the user moving directly but moving the background image V to provide a virtual experience such as a user walking up the stairs.

그러면, 전술한 설명을 기초로 가상현실 재현부(200)를 두부에 착용한 사용자의 움직임에 따라 학습 콘텐츠를 제공하는 방법에 대해서 설명하기로 한다. 도 13은 본 발명의 일 실시예에 따른 학습 콘텐츠를 제공하기 위한 방법을 설명하기 위한 흐름도이다. Hereinafter, a method of providing learning contents according to the movement of a user wearing the virtual reality reproducing unit 200 on the head will be described based on the above description. 13 is a flowchart illustrating a method for providing learning contents according to an embodiment of the present invention.

도 3 내지 도 13을 참조하면, 가상현실 재현부(200), 예컨대, HMD는 사용자가 자신의 두부에 착용한 상태이다. 이러한 상태에서 고개를 돌리거나, 젖히거나, 숙이는 등의 사용자의 움직임이 있으면, 가상현실 재현부(200)도 움직인다. 이에 따라, 가상현실 재현부(200) 내에 설치된 센서모듈(230)은 S110 단계에서 가상현실 재현부(200)의 움직임을 감지할 수 있다. 그리고 센서모듈(230)은 S120 단계에서 감지된 가상현실 재현부(200)의 움직임을 좌표(x, y, z, yaw, pitch, roll)를 통해 콘텐츠 제공모듈(120)로 전달한다. 3 to 13, the virtual reality reproducing unit 200, for example, the HMD is in a state in which the user wears it on his head. In this state, if there is a movement of the user such as turning the head, tilting, or bowing, the virtual reality reproducing unit 200 also moves. Accordingly, the sensor module 230 installed in the virtual reality reproducing unit 200 can sense the motion of the virtual reality reproducing unit 200 in step S110. The sensor module 230 transmits the movement of the virtual reality reproducing unit 200 sensed in step S120 to the content providing module 120 through the coordinates (x, y, z, yaw, pitch, roll).

다음으로, 콘텐츠 제공모듈(120)은 S130 단계에서 가상현실 재현부(200)의 움직임을 나타내는 좌표(x, y, z, yaw, pitch, roll)에 따라 시야를 검출한다. 여기서, 시야는 사람의 시야가 아닌 영사 모듈(210)이 제공할 수 있는 시야라는 점을 유의하여야 한다. 좀 더 자세히 설명하면, 사람 눈의 좌우 시야각은 약 120도 정도이지만, 가상현실 재현부(200)가 제공하는 배경영상(V)은 가상현실 재현부(200)가 제공하는 시야각에 따른다. 시야각은 영사모듈(210)의 사양, 즉, 프로젝터 및 광학계의 사양(spec)에 따른다. 따라서 도 5에 보인 바와 같이, 콘텐츠 제공모듈(320)은 센서모듈(230)로부터 수신된 좌표(x, y, z, yaw, pitch, roll)를 이용하여 초첨(Focus Point: FP)으로부터 시선의 방향(Direction of View: DoV)을 구하고, 그 시선의 중심으로부터 영사모듈(210)이 제공 가능한 시야각(Field of View angle: FoV)에 따라 시야를 구한다. Next, the content providing module 120 detects the field of view according to the coordinates (x, y, z, yaw, pitch, roll) indicating the movement of the virtual reality reproducing unit 200 in step S130. Here, it should be noted that the field of view is a field of view that can be provided by the projection module 210, not by a person's field of view. The background image V provided by the virtual reality reproducing unit 200 is in accordance with the viewing angle provided by the virtual reality reproducing unit 200. In other words, The viewing angle depends on the specifications of the projection module 210, that is, the specifications of the projector and the optical system. As shown in FIG. 5, the content providing module 320 receives the focus point (FP) from the sensor module 230 using the coordinates (x, y, z, yaw, pitch, And obtains the field of view according to the field of view angle (FoV) provided by the projection module 210 from the center of the line of sight.

그런 다음, 콘텐츠 제공모듈(120)은 S140 단계에서 앞서 구한 시야에 상응하는 배경영상(V) 및 객체영상(O)을 포함하는 영상과 그 영상에 필요한 음성을 포함하는 콘텐츠를 구성한다. Then, in step S140, the content providing module 120 composes a video including a background image (V) and an object image (O) corresponding to the previously obtained view and a content including a voice required for the image.

이어서, 콘텐츠 제공모듈(120)은 S150 단계에서 영상 및 음성을 영사모듈(210) 및 오디오모듈(220)에 제공한다. 그러면, 영사모듈(210) 및 오디오모듈(220)은 S160 단계에서 영상 및 음성을 출력한다. 즉, 오디오모듈(220)은 사용자가 음성을 청각적으로 인지할 수 있도록 출력하며, 영사모듈(210)은 사용자가 영상을 시각적으로 인지할 수 있도록 영사한다. Then, the content providing module 120 provides the video and audio to the projection module 210 and the audio module 220 in step S150. Then, the projection module 210 and the audio module 220 output video and audio in step S160. That is, the audio module 220 outputs audio so that the user can perceive audibly, and the projection module 210 projects the image so that the user can visually recognize the image.

전술한 바와 같이, 영상은 배경영상(V) 및 객체영상(O)을 포함한다. 본 발명의 실시예에 따르면, 객체영상(O)은 활성 모드 및 비활성 모드를 가지며, 객체의 활성 모드 및 비활성 모드 각각에 대한 객체영상(O)은 조건에 따라 선택적으로 제공된다. As described above, the image includes the background image (V) and the object image (O). According to the embodiment of the present invention, the object image O has an active mode and an inactive mode, and the object image O for each of the active mode and the inactive mode of the object is selectively provided according to the condition.

전술한 바와 같이, 대화 객체는 사용자와 가상의 대화를 수행하며, 단어 객체는 사용자에게 단어의 의미를 설명하고, 이동 객체는 사용자가 가상공간(S) 내에서 이동할 수 있도록 한다. 이러한 기능들은 활성 모드에서 제공되며, 비활성 모드인 경우 객체의 외형만이 표현되는 기본 영상이 재생된다. 기본적으로, 객체영상(O)은 비활성 모드로 제공되며, 기 설정된 조건이 만족되면, 활성 모드로 전환된다. 그러면, 비활성 모드의 객체를 활성 모드로 전환시킨 후 학습 콘텐츠를 제공하는 방법에 대해서 설명하기로 한다. 도 14는 본 발명의 실시예에 따른 학습 콘텐츠 제공 방법을 설명하기 위한 흐름도이다. 도 14의 실시예는 앞서 도 13에서 설명된 S140 단계에서 이루어진다. As described above, the conversation object performs a virtual conversation with the user, the word object describes the meaning of the word to the user, and the moving object enables the user to move within the virtual space S. These functions are provided in the active mode, and in the inactive mode, the basic image in which only the outline of the object is expressed is reproduced. Basically, the object image O is provided in an inactive mode, and when a predetermined condition is satisfied, it is switched to the active mode. Hereinafter, a method of providing learning contents after switching an inactive mode object to an active mode will be described. 14 is a flowchart for explaining a learning content providing method according to an embodiment of the present invention. The embodiment of FIG. 14 is performed in step S140 described above with reference to FIG.

전술한 바와 같이, 콘텐츠 제공모듈(120)은 S130 단계에서 센서모듈(230)로부터 가상현실 재현부(200)의 움직임에 대한 좌표를 이용하여 시야를 검출한다. 이에 따라, 콘텐츠 제공모듈(120)은 S210 단계에서 시야의 중심(C: the center of the field of view)에 객체가 존재하는지 여부를 판단한다. 여기서, 시야의 중심(C)은 시선의 방향(DoV)으로부터 도출된다. 도 8 내지 도 12는 비활성 모드의 객체(51, 53, 55, 57, 59)의 객체영상(O)을 포함하는 영상(V)이다. 도 8 및 도 9의 경우, 대화 객체인 경찰(51)과 단어 객체인 '표지판(signs)'(53)이 존재한다. 도 8의 경우, 시야의 중심(C)에 어떠한 객체도 존재하지 않지만, 도 9의 경우, 시야의 중심(C)에 객체인 경찰(51)이 존재한다. 도 8의 모든 객체(51, 53)의 객체영상(O)은 비활성 모드를 유지하며, 도 9의 경찰(51)의 객체영상(O)은 활성 모드로 전환될 것이다. 또한, 도 10, 도 11 및 도 12의 경우, 시야의 중심(C)에 각각 '개찰구(turnstile)'(55), 가상의 인물(57), 화살표(59)가 존재한다. 따라서 각각 '개찰구(turnstile)'(55), 가상의 인물(57), 화살표(59)의 객체영상(O)은 활성 모드로 전환될 것이다. As described above, the content providing module 120 detects the field of view from the sensor module 230 using the coordinates of the motion of the virtual reality reproducing unit 200 in step S130. Accordingly, the content providing module 120 determines whether an object exists in the center of the field of view (C) in step S210. Here, the center C of the field of view is derived from the direction of sight line DoV. 8 to 12 are images (V) including an object image O of the objects 51, 53, 55, 57 and 59 in the inactive mode. In the case of FIGS. 8 and 9, there are a conversation object police 51 and a word object 'signs' 53. In the case of FIG. 8, there is no object at the center C of the field of view, but in the case of FIG. 9, there is the object 51 which is the object at the center C of the field of view. The object image O of all the objects 51 and 53 of FIG. 8 remains in the inactive mode, and the object image O of the police 51 of FIG. 9 will be switched to the active mode. 10, 11 and 12, there are a turnstile 55, a virtual person 57 and an arrow 59 at the center C of the field of view. Accordingly, the object image O of the turnstile 55, the virtual person 57, and the arrow 59 will be switched to the active mode.

활성 모드로 전환되는 경우, 콘텐츠 제공모듈(120)은 S220 단계에서 저장모듈(130)로부터 객체의 종류에 따라 활성 모드에 해당하는 객체영상(O)을 추출하고, S230 단계에서 추출된 활성 모드의 객체영상(O)을 배경영상(V)에 정합하여 영상을 구성한다. 반면, 비활성 모드를 유지하는 경우, 콘텐츠 제공모듈(120)은 S240 단계에서 저장모듈(130)로부터 객체의 종류에 따라 비활성 모드에 해당하는 객체영상(O)을 추출하고, S230 단계에서 추출된 비활성 모드의 객체영상(O)을 배경영상(V)에 정합하여 영상을 구성한다. The content providing module 120 extracts the object image O corresponding to the active mode according to the type of the object from the storage module 130 in step S220, The object image (O) is matched to the background image (V) to form an image. On the other hand, if the inactive mode is maintained, the content providing module 120 extracts the object image O corresponding to the inactive mode according to the object type from the storage module 130 in step S240, Mode object image O to the background image V to construct an image.

그러면, 도 8 내지 도 12를 참조하여, 객체의 종류별 비활성 모드 및 활성 모드의 차이에 대해서 예를 들어 설명하기로 한다. 일례로, 단어 객체 중 도 8, 도 9 및 도 10의 단어 객체의 해당 단어가 사물의 명칭을 나타내는 명사인 '표지판(signs)'(53) 및 '개찰구(turnstile)'(55)와 같은 경우, 비활성 모드에서 단순히 객체의 외형만이 제공된다. 반면, 활성 모드에서 '표지판(signs)'(53) 및 '개찰구(turnstile)'(55)는 단어('signs' 및 turnstile')의 철자가 텍스트로 제공되고, 그 발음이 음성으로 제공될 수 있다. 또한, 단어('signs' 및 turnstile')의 의미가 선택에 따라 국어 혹은 영어로 음성 및 텍스트 중 어느 하나로 제공될 수 있다. 또한, 단어('signs' 및 turnstile')에 대한 예문이 음성 및 텍스트 중 어느 하나로 제공될 수 있다. 다른 예로, 단어 객체 중 도 11의 단어 객체인 가상의 인물(57)은 해당 단어가 동사 'read'와 같은 무형의 것이다. 이러한 경우, 비활성 모드에서 객체영상(O)은 가상의 인물(57)이 책을 읽고('read')있는 모습을 제공한다. 반면, 활성 모드에서 객체영상(O)은 가상의 인물(57)이 단어('read')의 철자를 직접 기재하는 행위와 더불어 가상의 인물(57)이 단어('read')를 발음하는 음성을 포함한다. 또한, 선택적으로, 활성 모드에서 객체영상(O)은 가상의 인물(57)이 단어('read')의 의미를 국어 혹은 영어로 직접 기재하는 행위와 더불어, 가상의 인물(57)이 그 의미를 읽어주는 영상이 될 수 있다. 이때, 그 의미를 읽는 데에 따른 음성이 더 포함될 것이다. 또한, 선택적으로, 객체영상(O)은 가상의 인물(57)이 단어('read')의 예문을 직접 기재하고 읽는 행위와 그에 따른 음성을 포함할 수 있다. Next, the difference between the inactive mode and the active mode for each object type will be described with reference to FIGS. 8 to 12, for example. For example, when the corresponding word of the word object of FIGS. 8, 9 and 10 among the word objects is a 'sign' 53 and a 'turnstile' 55, which are the names of objects, , And only the appearance of the object is provided in the inactive mode. On the other hand, in the active mode, the 'signs' 53 and the 'turnstiles' 55 may be spelled with the words 'signs' and 'turnstile' as text, have. In addition, the meaning of the words 'signs' and 'turnstile' can be provided in either Korean or English depending on the choice, either in voice or text. In addition, an example sentence for the words ('signs' and turnstile') can be provided in either voice or text. As another example, a hypothetical character 57, a word object of FIG. 11 of the word object, is an intangible word such as a verb 'read'. In this case, in the inactive mode, the object image O provides the appearance that the virtual person 57 reads the book. On the other hand, in the active mode, the object image O is generated in such a manner that the hypothetical person 57 directly writes the spelling of the word 'read', and the voice 57 in which the hypothetical person 57 pronounces the word ' . Optionally, in the active mode, the object image O may include a virtual person 57 describing the meaning of the word " read " directly in Korean or English, The image can be read. At this time, a voice corresponding to reading the meaning will be further included. Alternatively, the object image O may include an act of directly describing and reading the illustrative sentence of the word " read " by the virtual person 57 and a voice corresponding thereto.

또 다른 예로, 도 12의 이동 객체인 위 방향 화살표(59)의 경우, 비활성 모드에서 단순히 위 방향 화살표(59) 자체의 외형만을 표시한다. 그리고 활성 모드로 전환되면, 객체영상(O)은 사용자가 계단을 오를 때 볼 수 있는 화상을 제공한다. 이러한 이동은 사용자가 직접 이동하는 것이 아니라, 사용자가 계단을 통해 위로 걸어가는 가상 체험을 제공할 수 있도록 한다. As another example, in the case of the up arrow 59, which is the moving object of FIG. 12, only the outline of the up arrow 59 itself is shown in the inactive mode. When switched to the active mode, the object image O provides an image that can be viewed when the user climbs the stairs. This movement allows the user to provide a virtual experience of walking up the stairs rather than moving directly.

한편, 도 8 및 도 9의 대화 객체인 경찰(51)의 경우, 비활성 모드에서 객체영상(O)은 경찰(51)의 외형만이 제공된다. 반면, 활성 모드에서 대화 객체 경찰(51)의 객체영상(O)은 가상의 캐릭터인 경찰(51)이 사용자와 상호작용하여 대화를 가능하게 한다. 활성 모드에서 대화 시, 다양한 상황에 대비한 객체영상(O)이 따로 마련되며, 각각의 상황에 따라 해당 객체영상(O)을 배경영상(V)에 정합한 영상을 제공한다. 이러한 객체영상(O)은 예컨대, 대화 객체가 대화 중 대답을 기다리는 영상, 빠른 답변을 유도하기 위한 영상, 사용자의 말을 이해하지 못했을 때의 영상, 사용자의 말을 이해하였을 때의 영상 등을 포함한다. 이러한 대화는 음성인식을 기반으로 이루어진다. 그러면, 본 발명의 실시예에 따른 음성인식을 이용한 학습 콘텐츠를 제공하기 위한 방법에 대해서 설명하기로 한다. 도 15는 본 발명의 실시예에 따른 음성인식을 이용한 학습 콘텐츠를 제공하기 위한 방법을 설명하기 위한 흐름도이며, 도 16은 본 발명의 실시예에 따른 학습 콘텐츠를 제공하기 위한 청취 객체를 설명하기 위한 화면 예이다. On the other hand, in the case of the police 51, which is the conversation object of FIGS. 8 and 9, the object image O in the non-active mode is provided only the outline of the police 51. [ On the other hand, in the active mode, the object image O of the conversation object police 51 allows the police 51, which is a virtual character, to interact with the user to enable conversation. In the active mode, an object image O prepared for various situations is prepared separately, and an image obtained by matching the object image O with the background image V according to each situation is provided. This object image O includes, for example, a dialogue object including a picture waiting for an answer during a conversation, a picture for inducing a quick answer, a picture when the user does not understand the speech, and a picture when the user understands the speech do. This dialogue is based on speech recognition. Hereinafter, a method for providing a learning content using speech recognition according to an embodiment of the present invention will be described. FIG. 15 is a flowchart illustrating a method for providing learning content using speech recognition according to an embodiment of the present invention. FIG. 16 is a flowchart illustrating a method for providing a learning object for providing learning content according to an embodiment of the present invention. Screen example.

도 9와 마찬가지로, 도 16에 보인 바와 같이, 시야의 중심(C)에 대화 객체인 경찰(51)이 위치하여 활성화된 상태라고 가정한다. 이에 따라, 대화 객체인 경찰(51)과 사용자는 학습하고자 하는 언어, 예컨대 영어로 대화를 할 수 있다. 대화는 기 설정된 시나리오에 따라 경찰(51)이 먼저 말을 건네거나, 사용자가 먼저 말을 건넬 수 있다. 예컨대, 경찰(51)이 먼저 사용자에게 질문을 하거나, 사용자가 경찰(51)에게 질문할 수 있다. 이에 따라, 사용자는 질문하거나 답변할 수 있다. As in Fig. 9, it is assumed that the police 51, which is a conversation object, is located at the center C of the field of view and is activated, as shown in Fig. Accordingly, the conversation object, the police 51, and the user can communicate in a language to be learned, for example, English. The conversation may be handed over by the police 51 according to a predetermined scenario, or the user may speak first. For example, the police 51 may first ask the user a question, or the user may ask the police 51 questions. Accordingly, the user can ask questions or reply.

대화 중 사용자가 질문 혹은 답변할 때가 되면, 콘텐츠 제공모듈(120)은 S310 단계에서 도 16에 보인 바와 같은 청취 객체(60)를 정합한 배경영상(V)을 제공한다. 그러면, 영사모듈(210)은 S320 단계에서 청취 객체(60)를 포함하는 영상을 출력한다. 청취 객체(60)는 마이크 모양의 아이콘과 음성 신호의 파형으로 표시된다. When it is time for the user to ask or answer a question during the conversation, the content providing module 120 provides the background image (V) matching the listening object 60 as shown in FIG. 16 in step S310. Then, the projection module 210 outputs an image including the listener object 60 in step S320. The listening object 60 is represented by a microphone-shaped icon and a waveform of the voice signal.

청취 객체(60)를 제공한 후, S330 단계에서 사용자의 응답을 기다리는 대기 모드를 수행한다. 대기 모드에서 콘텐츠 제공모듈(120)은 기본적으로 대화 객체가 대답을 기다리고 있음을 나타내는 영상 및 음성을 포함하는 콘텐츠를 제공한다. 이에 따라, 영사모듈(210) 및 오디오모듈(220)은 대화 객체가 대답을 기다리는 있음을 나타내는 영상 및 음성을 출력할 수 있다. 또한, 대기 모드에서 콘텐츠 제공모듈(120)은 기 설정된 시간 이상 사용자의 답이 없는 경우, 빠른 답변을 유도하기 위해 대화 객체가 대답을 재촉함을 나타내는 영상 및 음성을 포함하는 콘텐츠를 제공한다. 이에 따라, 영사모듈(210) 및 오디오모듈(220)은 대화 객체가 대답을 재촉함을 나타내는 영상 및 음성을 출력한다. After providing the listener object 60, a standby mode for waiting for a response from the user is performed in step S330. In the idle mode, the content providing module 120 basically provides the content including video and audio indicating that the conversation object is waiting for an answer. Accordingly, the projection module 210 and the audio module 220 may output video and audio indicating that the conversation object is waiting for an answer. Also, in the standby mode, the content providing module 120 provides a content including a video and a voice indicating that the conversation object prompts an answer in order to prompt a quick answer when there is no answer of the user for a preset time or more. Accordingly, the projection module 210 and the audio module 220 output an image and a voice indicating that the conversation object urges an answer.

대기 모드 중 사용자가 응답하는 경우, 오디오모듈(220)은 S340 단계에서 사용자의 음성(예컨대, 질문 혹은 답변)을 마이크를 통해 입력 받고, S350 단계에서 사용자의 음성을 음성인식모듈(310)로 전달한다. 그러면, 음성인식모듈(310)은 S360 단계에서 사용자의 음성을 인식하고, S370 단계에서 음성인식 결과를 출력한다. If the user responds during the standby mode, the audio module 220 receives the user's voice (e.g., a question or answer) through the microphone in step S340, and transmits the user's voice to the voice recognition module 310 in step S350 do. Then, the speech recognition module 310 recognizes the user's speech in step S360 and outputs the speech recognition result in step S370.

출력된 음성인식 결과가 콘텐츠 제공모듈(120)에 전달되면, 콘텐츠 제공모듈(120)은 S380 단계에서 음성인식 결과에 따라 사용자의 질문 혹은 답변을 평가할 수 있다. 음성인식 결과는 사용자의 임계치보다 작은 크기의 음성의 입력 등의 원인을 비롯하여 다양한 원인에 의해 음성인식이 이루어지지 않을 수 있다. 이러한 경우, 음성인식 결과로 오류가 출력될 수 있다. 따라서 콘텐츠 제공모듈(320)은 우선 S380 단계에서 음성인식 결과가 있는지 여부를 평가한다. 만약, 음성인식 결과로 오류가 출력되는 경우, 음성인식이 이루어지지 않은 것으로 평가할 수 있다. 음성인식 결과가 있는 경우, 사용자의 질문 혹은 답변이 올바른지 여부를 평가한다. 저장모듈(330)은 다양한 상황에 따라 가능한 복수의 질문과 이에 따른 복수의 답변을 저장한다. 이에 따라, 콘텐츠 제공모듈(120)은 음성인식 결과와 저장모듈(330)에 저장된 상황별 질문 및 답변을 비교하여 사용자의 질문 혹은 답변이 올바른지 여부를 평가할 수 있다. 정리하면, S380 단계에서 콘텐츠 제공모듈(320)은 음성인식이 이루어지지 않은 것으로 평가하거나, 음성인식 결과 올바른 질문 혹은 답변을 한 것으로 평가하거나, 혹은 음성인식 결과 바르지 않은 질문 혹은 답변을 한 것으로 평가할 수 있다. When the output speech recognition result is transmitted to the content providing module 120, the content providing module 120 may evaluate the user's question or answer according to the speech recognition result in step S380. The speech recognition result may not be recognized due to various causes including the cause of input of a voice having a size smaller than the threshold value of the user or the like. In this case, an error may be output as a result of speech recognition. Accordingly, the content providing module 320 firstly evaluates whether or not the speech recognition result is present in step S380. If an error is output as a result of speech recognition, it can be estimated that speech recognition is not performed. If there is a speech recognition result, evaluate whether the user's question or answer is correct. The storage module 330 stores a plurality of possible questions and a plurality of answers according to various situations. Accordingly, the content providing module 120 can compare the voice recognition result with the context-based questions and answers stored in the storage module 330 to evaluate whether the user's question or answer is correct. In summary, in step S380, the content providing module 320 may evaluate that the speech recognition is not performed, evaluate that the speech recognition result is a correct question or answer, or evaluate the speech recognition result as an incorrect answer or answer have.

그런 다음, 콘텐츠 제공모듈(320)은 S390 단계에서 평가 결과에 상응하는 영상 및 음성을 포함하는 콘텐츠를 구성한다. 평가 결과에 상응하는 콘텐츠는 즉, 사용자의 질문 혹은 답변에 대한 응답을 제공하는 영상 및 음성을 포함한다. 일례로, 평가 결과 음성인식이 이루어지지 않은 경우, 혹은 음성인식 결과 바르지 않은 질문 혹은 답변을 한 경우, 콘텐츠 제공모듈(320)은 대화 객체의 질문 혹은 답변을 듣지 못했다거나 이해하지 못했다는 표정과 몸짓의 영상과 질문 혹은 답변을 듣지 못하거나 이해하지 못했을 때 할 수 있는 말(예컨대, 'pardon me?', 'Can you say that again, please?', 'Could you repeat that?', 'Sorry, I didn't get that.', 'I didn't hear what you said. Could you repeat it?', 'I'm sorry, but I couldn't hear you.' 등)의 음성(AUDIO)을 구성할 수 있다. 다른 예로, 평가 결과 올바른 질문 혹은 답변을 한 것으로 평가한 경우, 콘텐츠 제공모듈(320)은 대화 객체의 질문 혹은 답변을 이해하였다는 표정과 몸짓의 영상과 해당 질문 혹은 답변에 대응하는 말의 음성(예컨대, 사용자가 현재 몇 시인지 대화 객체에 물어봤을 때, 그 시간을 알려주는 대답으로 'It's two thirty.', 대화 객체가 사용자에게 시간을 물어보고, 사용자가 그 시간을 답한 후, 그에 대한 대답으로 'Is it that late already?' 등)을 구성할 수 있다. Then, the content providing module 320 forms a content including video and audio corresponding to the evaluation result in step S390. The content corresponding to the evaluation result includes video and audio that provide a response to the user's question or answer. For example, in the case where the speech recognition is not performed as a result of the evaluation, or the question or answer is not correct as a result of speech recognition, the content providing module 320 determines that the question or answer of the conversation object is not heard, 'Sorry', 'Sorry, I do not understand' or 'I do not understand' or 'I do not understand' (eg 'pardon me?', 'Can you say that again, (AUDIO) of 'I did not get that.', 'I did not hear what you said.', 'I'm sorry, but I could not hear you.' . As another example, if the evaluation result indicates that the user has answered a correct question or an answer, the content providing module 320 determines that the user understands the question or answer of the conversation object and the voice of the gesture and the voice For example, when a user asks a conversation object what time it is, the conversation object asks the user time for the answer, "It's two thirty.", The user answers the time, 'Is it that late already?').

다음으로, 콘텐츠 제공모듈(320)은 S370 단계에서 앞서 구성된 영상 및 음성을 포함하는 콘텐츠를 오디오모듈(220) 및 영사모듈(210)에 제공한다. 그러면, S380 단계에서 오디오모듈(220)은 음성을 출력하고, 영사모듈(210)은 영상을 영사한다. Next, in step S370, the content providing module 320 provides the audio module 220 and the projection module 210 with the content including the video and audio configured in advance. Then, in step S380, the audio module 220 outputs a voice, and the projection module 210 projects an image.

이와 같이, 본 발명의 실시예에 따르면, 음성인식을 통해 사용자는 가상의 대화 상대인 대화 객체와 대화할 수 있다. 이는 사용자가 직접 외국인을 만나고 대화하는 것과 같은 가상의 체험을 제공하여 외국에 나가지 않더라도 그 언어학습의 효과를 극대화시킬 수 있다. As described above, according to the embodiment of the present invention, the user can communicate with the conversation object which is a virtual conversation partner through speech recognition. This enables users to maximize the effectiveness of the language learning even if they do not go abroad by providing a virtual experience such as meeting and talking to foreigners directly.

한편, 본 발명의 실시예에 따르면, 사용자는 음성을 통해 각종 제어를 수행할 수 있다. 이러한 음성을 통한 제어 방법에 대해서 설명하기로 한다. 도 17은 본 발명의 실시예에 따른 음성 명령을 설명하기 위한 화면 예이다. Meanwhile, according to the embodiment of the present invention, the user can perform various controls through voice. Hereinafter, a control method based on voice will be described. 17 is a diagram illustrating an example of a voice command according to an embodiment of the present invention.

전술한 바와 같이, 가상공간(S)은 학습 주제별로 구분될 수 있으며, 이러한 주제를 선택하여 해당 가상공간(S)에 진입하거나, 가상공간(S)에서 이루어지는 가상현실 체험을 위한 콘텐츠의 재생을 시작(play)하거나, 중지(Stop)하거나, 가상현실 체험을 종료(Close)하는 등의 조작을 음성 명령을 통해 할 수 있다. 또한, 본 발명은 학습 주제에 따른 단어장, 숙어장, 강의 영상을 제공할 수 있으며, 사용자는 단어장, 숙어장, 강의 영상 등을 제공하도록 음성 명령을 입력할 수 있다. 전술한 바와 같은 음성 명령이 있으면, 오디오모듈(220)은 사용자의 음성 명령을 마이크를 통해 입력 받고, 그 음성 명령을 음성인식모듈(310)로 전달한다. 그러면, 음성인식모듈(310)은 음성 명령을 음성인식하고, 음성인식 결과를 출력한다. 그러면, 콘텐츠 제공모듈(120)은 음성인식 결과에 따라 기 설정된 음성 명령과 비교하여 해당 음성 명령을 식별하고, 식별된 음성 명령에 따라 해당 명령을 수행한다. 예컨대, 사용자가 'help'를 음성 명령으로 입력한 경우, 음성인식 결과는 'help'가 출력될 것이며, 'help'가 강의 영상을 제공하도록 하는 음성 명령으로 미리 설정되어 있는 경우, 콘텐츠 제공모듈(120)은 도 17에 도시된 바와 같이, 강의 영상(70)을 배경영상(V)에 정합한 영상을 제공할 수 있다. As described above, the virtual space S can be classified according to the learning theme. The virtual space S can be divided into the virtual space S by selecting the subject, or reproducing the contents for virtual reality experience in the virtual space S An operation such as playing, stopping, or ending a virtual reality experience can be performed through a voice command. In addition, the present invention can provide a vocabulary word, an idiomatic vocabulary, and a lecture image according to a learning subject, and a user can input a voice command to provide a vocabulary word, an idiomatic word, and a lecture image. If there is a voice command as described above, the audio module 220 receives the voice command of the user through the microphone and transmits the voice command to the voice recognition module 310. Then, the voice recognition module 310 recognizes the voice command and outputs the voice recognition result. Then, the content providing module 120 compares the voice command with a predetermined voice command according to a voice recognition result, and executes the corresponding command in accordance with the voice command. For example, if the user inputs 'help' as a voice command, 'help' will be output as a speech recognition result, and if the 'help' is preset as a voice command to provide a lecture image, 120 may provide an image obtained by matching the image 70 of the steel to the background image V as shown in FIG.

본 발명의 실시예에 따르면, 사용자의 음성 입력에 상응하는 콘텐츠가 제공되어야 한다. 즉, 사용자와의 상호작용에 따라 콘텐츠가 제공된다. 이러한 상호작용에 따라 콘텐츠를 제공하는 방법에 대해서 설명하기로 한다. 도 18은 본 발명의 일 실시예에 따른 상호작용에 따라 콘텐츠를 제공하는 방법을 설명하기 위한 도면이다. According to the embodiment of the present invention, contents corresponding to the voice input of the user should be provided. That is, the content is provided in accordance with the interaction with the user. A method of providing contents according to such an interaction will be described. 18 is a view for explaining a method of providing contents according to an interaction according to an embodiment of the present invention.

도 18을 참조하면, 저장모듈(130)은 질문 혹은 답변의 평가에 따라 대응하는 복수의 콘텐츠를 시간 상 연속된 1개의 콘텐츠 데이터로 결합하여 저장한다. Referring to FIG. 18, the storage module 130 combines and stores a plurality of corresponding contents into one piece of content data that is continuous in time according to an evaluation of a question or an answer.

또한, 복수의 콘텐츠 각각의 시간상의 위치를 나타내는 시간 정보는 메타데이터로 기록되어 저장모듈(130)에 저장된다. 예컨대, 제1 콘텐츠는 대화 객체가 미리 설정된 대화의 시나리오에 따라 첫 번째 질문을 하는 영상 및 음성이며, 제2 콘텐츠는 첫 번째 질문에 대한 사용자의 답이 알맞은 경우에 대응하는 영상 및 음성이고, 제3 콘텐츠는 첫 번째 질문에 대한 사용자의 답을 이해하지 못한 경우에 대응하는 영상 및 음성일 수 있다. 또한, 제4 콘텐츠는 대화 객체가 대화의 시나리오에 따라 사용자가 첫 번째 질문에 대한 답을 제대로 못하는 경우 두 번째 질문을 하는 영상 및 음성이며, 제5 콘텐츠는 두 번째 질문에 대한 사용자의 답이 알맞은 경우에 대응하는 영상 및 음성이고, 제3 콘텐츠는 두 번째 질문에 대한 사용자의 답을 이해하지 못한 경우에 대응하는 영상 및 음성일 수 있다. 그리고 제7 콘텐츠는 대화 객체가 대화의 시나리오에 따라 사용자가 두 번째 질문에 대한 답을 제대로 한 경우 세 번째 질문을 하는 영상 및 음성이 될 수 있다. 이러한 제1 내지 제7 콘텐츠는 시간상 연속되게 결합되어, 1개의 콘텐츠 데이터로 생성되며, 이러한 경우, 제1 내지 제7 콘텐츠의 타임라인(timeline) 상의 위치, 즉, 시간축 상의 위치는 시간 정보(t1 내지 t7)로 기록된다. In addition, time information indicating the temporal position of each of the plurality of contents is recorded in the metadata and stored in the storage module 130. [ For example, the first content may be a video and a voice for which a conversation object makes a first query according to a preset dialogue scenario, the second content is a video and voice corresponding to a user's answer to the first question, 3 < / RTI > content may be video and audio corresponding to an inability to understand the user's answer to the first question. In addition, the fourth content is a video and a voice asking a second question if the conversation object does not adequately answer the first question according to the scenario of the conversation, and the fifth content is a video and voice corresponding to the user's answer to the second question And the third content may be video and audio corresponding to the case where the user's answer to the second question is not understood. And the seventh content may be a video and a voice that ask a third question if the conversation object answers the second question properly according to the scenario of the conversation. In this case, the positions on the timeline of the first to seventh contents, that is, the positions on the time axis are the time information t1 To t7.

전술한 바와 같이, S380 단계에서 음성인식 결과에 따라 사용자의 질문/답변을 평가한 후, S390 단계에서 평가 결과에 대응하는 영상을 구성할 때, 콘텐츠 제공모듈(120)은 해당하는 콘텐츠의 시간 정보에 따라 대응하는 콘텐츠로 점프하여 해당 콘텐츠를 제공한다. 예컨대, 콘텐츠 제공모듈(120)은 시나리오에 따라 먼저 첫 번째 질문을 하는 제1 콘텐츠를 제공한다. 첫 번째 질문에 대한 사용자의 답변이 있을 때, 콘텐츠 제공모듈(120)은 S380 단계에서 음성인식 결과에 따라 첫 번째 질문에 대한 사용자의 답변을 평가한다. 이때, 사용자의 답변을 이해하지 못한 것으로 평가된 경우, 콘텐츠 제공모듈(120)은 S390 단계에서 저장모듈(130)에 저장된 시간 정보를 참조하여, 제3 콘텐츠로 점프한 후, 제3 콘텐츠를 제공한다. As described above, the user's question / answer is evaluated according to the speech recognition result in step S380. In step S390, when the video corresponding to the evaluation result is formed, the content providing module 120 transmits the time information And provides the corresponding content by jumping to the corresponding content. For example, the content providing module 120 first provides the first content according to the scenario. When there is a user's answer to the first question, the contents providing module 120 evaluates the user's answer to the first question according to the speech recognition result in step S380. In this case, if it is determined that the user's answer is not understood, the content providing module 120 refers to the time information stored in the storage module 130 in step S390, jumps to the third content, do.

다음으로 본 발명의 다른 실시예에 따른 상호작용에 따라 콘텐츠를 제공하는 방법에 대해서 설명하기로 한다. 도 19는 본 발명의 다른 실시예에 따른 상호작용에 따라 학습 콘텐츠를 제공하는 방법을 설명하기 위한 도면이다. Next, a method of providing contents according to an interaction according to another embodiment of the present invention will be described. 19 is a diagram for explaining a method of providing learning contents according to an interaction according to another embodiment of the present invention.

다른 실시예에 따르면, 저장모듈(130)은 질문/답변에 대한 평가 결과에 대응하는 복수의 콘텐츠들 각각을 따로 저장한다. 질문/답변의 평가 결과에 대응하여 제작된 복수의 콘텐츠 각각은 평가 결과에 대응하는 콘텐츠임을 나타내는 대응 정보가 메타데이터로 기록되어 저장모듈(130)에 저장된다. According to another embodiment, the storage module 130 separately stores each of a plurality of contents corresponding to the evaluation result of the question / answer. Corresponding information indicating that each of the plurality of contents produced in correspondence with the evaluation result of the question / answer is content corresponding to the evaluation result is recorded in the metadata and stored in the storage module 130. [

예컨대, 제1 콘텐츠는 대화 객체가 미리 설정된 대화의 시나리오에 따라 첫 번째 질문을 하는 영상 및 음성이며, 제2 콘텐츠는 첫 번째 질문에 대한 사용자의 답이 알맞은 경우에 대응하는 영상 및 음성이고, 제3 콘텐츠는 첫 번째 질문에 대한 사용자의 답을 이해하지 못한 경우에 대응하는 영상 및 음성일 수 있다. 이때, 제2 콘텐츠의 대응 정보는 제1 콘텐츠에 대한 답변의 평가 결과에 대응하는 답이 알맞은 경우에 대응하는 콘텐츠임을 나타내며, 제3 콘텐츠의 대응 정보는 제1 콘텐츠에 대한 답변의 평가 결과에 대응하는 답이 이해하지 못한 경우에 대응하는 콘텐츠임을 나타낸다. For example, the first content may be a video and a voice for which a conversation object makes a first query according to a preset dialogue scenario, the second content is a video and voice corresponding to a user's answer to the first question, 3 < / RTI > content may be video and audio corresponding to an inability to understand the user's answer to the first question. At this time, the corresponding information of the second content indicates that the answer corresponding to the evaluation result of the answer to the first content is appropriate, and the corresponding information of the third content corresponds to the evaluation result of the answer to the first content Indicates that the content corresponds to the case where the answer does not understand.

전술한 바와 같이, S380 단계에서 음성인식 결과에 따라 사용자의 질문/답변을 평가한 후, S390 단계에서 평가 결과에 대응하는 콘텐츠를 구성할 때, 콘텐츠 제공모듈(120)은 해당하는 콘텐츠의 대응 정보에 따라 해당 콘텐츠를 저장모듈(130)로부터 추출하고, 추출된 콘텐츠를 제공한다. 예컨대, 콘텐츠 제공모듈(120)은 시나리오에 따라 먼저 첫 번째 질문을 하는 제1 콘텐츠를 제공한다. 첫 번째 질문에 대한 사용자의 답변이 있을 때, 콘텐츠 제공모듈(120)은 S380 단계에서 음성인식 결과에 따라 첫 번째 질문에 대한 사용자의 답변을 평가한다. 이때, 사용자의 답변을 이해하지 못한 것으로 평가된 경우, 콘텐츠 제공모듈(120)은 S390 단계에서 대응 정보를 참조하여, 저장모듈(130)로부터 제3 콘텐츠를 추출하여 추출된 제3 콘텐츠를 제공한다. As described above, the user's question / answer is evaluated according to the speech recognition result in step S380. In step S390, when the content corresponding to the evaluation result is formed, the content providing module 120 transmits the corresponding information Extracts the corresponding content from the storage module 130, and provides the extracted content. For example, the content providing module 120 first provides the first content according to the scenario. When there is a user's answer to the first question, the contents providing module 120 evaluates the user's answer to the first question according to the speech recognition result in step S380. If it is determined that the user's answer is not understood, the content providing module 120 refers to the corresponding information in step S390, extracts the third content from the storage module 130, and provides the extracted third content .

다음으로, 본 발명의 실시예에 따른 영상 구성 방법에 대해서 설명하기로 한다. 도 20은 본 발명의 실시예에 따른 영상 구성 방법을 설명하기 위한 도면이다. 도 7에서 설명된 바와 같이, 객체(Object)를 표현하기 위한 객체영상(O)은 크로마키 기법에 따라 배경영상(V)과 별개로 촬영되며, 촬영된 객체영상(O)은 배경영상(V)에 정합된다. 도 20의 (a)는 배경영상(V)을 나타내며, 도 20의 (b)는 객체영상(O)을 나타낸다. 그리고 도 20의 (c)는 아이콘을 나타낸다. 본 발명의 실시에에 따르면, 도 20의 (a), (b) 및 (c)와 같이 서로 다른 레이어가 병합되어 도 20의 (d)와 같은 하나의 영상을 구성한다. 도 20에는 3개의 레이어가 병합되는 것으로 도시하였으나, 이러한 레이어는 필요에 따라 더 많은 수가 사용될 수 있다. 예컨대, 도 17의 강의 영상이 또 다른 레이어를 이용하여 영상에 병합될 수 있다. Next, an image configuration method according to an embodiment of the present invention will be described. FIG. 20 is a diagram for explaining an image configuration method according to an embodiment of the present invention. 7, the object image O for representing an object is photographed separately from the background image V according to the chroma key technique, and the photographed object image O is photographed as a background image V ). 20 (a) shows a background image (V), and Fig. 20 (b) shows an object image O. Fig. 20 (c) shows an icon. According to the embodiment of the present invention, different layers are merged as shown in Figs. 20A, 20B and 20C to form one image as shown in Fig. 20D. Although three layers are shown as being merged in Fig. 20, a larger number of such layers may be used as needed. For example, the image of the lecture of Fig. 17 can be merged into an image using another layer.

다음으로, 본 발명은 사용자의 발음, 강세, 억양 등을 원어민의 발음, 강세, 억양 등과 비교하여 그래프로 제공할 수 있다. 도 21은 본 발명의 실시예에 따른 사용자와 원어민의 발음, 강세 및 억양을 비교하기 위한 그래프의 예를 보인다. Next, the present invention can provide a graph in comparison with the pronunciation, strength, and intonation of the native speaker, such as pronunciation, strength, and intonation of the user. FIG. 21 shows an example of a graph for comparing pronunciation, intensification, and intonation of a user and a native speaker according to an embodiment of the present invention.

음성인식모듈(110)은 음성인식 외에 오디오모듈(220)로부터 사용자의 답변 혹은 질문이 입력될 때마다(S350), 즉 사용자의 음성이 입력될 때마다, 입력된 음성 신호에 의해 생긴 전압 변화를 시간 변화에 따라 선형스케일(linear scale)로 나타낸 시간 파형(ㄱ, A), 이를 대수적으로 처리하여 전압 변화를 데시벨(dB)로 나타낸 시간 파형의 대수적인 포락선(envelop)인 진폭포락선(ㄴ, B), 선택된 데이터그룹의 FFT(Fast Fourier Transform) 분석에 의해 입력 신호의 주파수별 전력크기(power magnitude)를 나타낸 FFT 스펙트럼(ㄷ, C)을 이용하여 사용자와 원어민의 발음, 강세 및 억양을 비교하여, 사용자가 비교에 따른 차이를 인지하도록 비교 결과를 그래프를 통해 출력한다. 그러면, 콘텐츠 제공모듈(120)은 이러한 그래프를 영상으로 구성하여 배경영상(V)에 정합한 후, 영사모듈(210) 및 오디오모듈(220)이 출력하도록 제공한다. 이러한 그래프를 참조하여 사용자는 자신의 발음을 교정할 수 있다. Each time the user answers or questions are inputted from the audio module 220 in addition to voice recognition (S350), that is, every time the user's voice is input, the voice recognition module 110 detects a voltage change caused by the inputted voice signal The time envelope (b, B), which is an algebraic envelope of the temporal waveform in decibels (dB), is obtained by logarithmically processing the time waveforms (a, ), Comparing the pronunciation, strength and intonation of the user with the native speaker using the FFT spectrum (C, C) showing the power magnitude of the input signal by the Fast Fourier Transform (FFT) analysis of the selected data group , And outputs the comparison result through the graph so that the user recognizes the difference according to the comparison. Then, the content providing module 120 composes such a graph as an image, matches the background image V, and provides the output to the projection module 210 and the audio module 220. With reference to these graphs, the user can correct his pronunciation.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다.While the present invention has been described with reference to several preferred embodiments, these embodiments are illustrative and not restrictive. It will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.

Claims

An apparatus for providing learning content,
A storage module for storing a video including an object video and a background video for reproducing a virtual reality and a content including a voice required for the video;
A projection module for projecting the image;
An audio module for receiving or outputting the voice;
A voice recognition module for recognizing a user's answer or a question about a virtual reality object received from the audio module; And
And a voice recognition unit for comparing the voice recognition result with a context-based answer and a question stored in the storage module, and when the user's answer or question is voice-recognized, Or the audio module to output a content including a voice and an image in which the object responds to the user's answer or a question in response to the evaluated evaluation result A content providing module,
The content providing module
If the object does not exist in the visual center of the user, extracts an inactive mode object image in which the outline of the object is expressed by providing an inactive mode, matches the extracted inactive mode object image with a background image, In addition,
If the object exists in the center of the user's view, the active mode object image is extracted from the active mode object image providing the learning content corresponding to the object by switching the inactive mode to the active mode, To form an image. &Lt; Desc / Clms Page number 19 >

The method according to claim 1,
Wherein the storage module stores one piece of content data in which a plurality of contents corresponding to the evaluation result are continuously combined in time,
Wherein each of the plurality of contents has time information indicating a position in time in the one piece of contents data.

3. The method of claim 2,
The content providing module
And jumping the content corresponding to the evaluation result to a corresponding content according to the time information.

The method according to claim 1,
Wherein the storage module individually stores a plurality of contents corresponding to the evaluation result,
Wherein each of the images has correspondence information indicating that the content corresponds to the evaluation result.

5. The method of claim 4,
The content providing module
And detects the content corresponding to the evaluation result in accordance with the corresponding information, and outputs the detected content.

The method according to claim 1,
The speech recognition module outputs a graph comparing the user's pronunciation with respect to the input speech and the accent and intonation of the native speaker whenever the user's answer or question is input by voice,
Wherein the content providing module is configured to output the graph as an image.

The method according to claim 1,
Wherein the voice recognition module outputs a voice recognition result for the voice command when receiving the voice command of the user from the audio module,
Wherein the content providing module configures a content corresponding to the voice command according to the voice recognition result and provides the content to be output by the projection module and the audio module.

An apparatus for providing learning content,
A projection module mounted on a head of a user for outputting an image; And
An audio module for outputting audio;
A sensor module for detecting movement of the projection module and outputting coordinates according to the sensed movement;
Wherein the control unit detects a user's view according to the coordinates and constructs a content including an image including an object image and a background image for an object at a position corresponding to the detected view and a voice required for the image, And a content providing module for providing the output of the projection module and the audio module,
The content providing module
If the object does not exist at the center of the visual field, an inactive mode object image in which the outline of the object is expressed is extracted, and the extracted inactive mode object image is registered in the background image to form an image ,
The active mode object image providing the learning content corresponding to the object is switched by switching the inactive mode to the active mode when the object exists in the center of the visual field, And arranges the images to be matched.

9. The method of claim 8,
The content providing module
A center of the visual field of the user is detected according to the coordinates, and if an object exists in the center of the visual field, the video and audio for the object are configured, and the video and audio for the object are output, And providing audio to the projection module and the audio module.

10. The method of claim 9,
The content providing module
If the object is a dialogue object, constructs a video and a voice for questioning or answering the user of the dialogue object, and provides the composed video and audio to the projection module and the audio module, Device.

10. The method of claim 9,
The content providing module
If the object is a moving object, moving the background image so that a virtual reality in which the user moves in a direction indicated by the moving object is provided.

10. The method of claim 9,
If the object is a word object, providing a video and a voice explaining the meaning of a word corresponding to the word object.

CLAIMS 1. A method for providing a learning content performed by an apparatus for representing a virtual reality by video and audio to provide learning content,
Outputting the first content to a virtual reality in which the projection module and the audio module include the object;
Receiving a user's answer or question about an object of the virtual reality in which the audio module is received;
Voice recognition module recognizing the user's answer or question;
The content providing module evaluates whether or not the user's answer or speech is recognized by the user. If the user's answer or question is recognized as speech, the content providing module compares the result of the speech recognition with the answer and question for each situation stored in the storage module, Constructing a second content including a video and a voice in which the object responds to the user's answer or question in response to the evaluated result of the evaluation; And
Wherein the projection module and the audio module output the second content,
The step of constructing the second content
If the object does not exist in the visual center of the user, extracts an inactive mode object image in which the outline of the object is expressed by providing an inactive mode, matches the extracted inactive mode object image with a background image, In addition,
If the object exists in the center of the user's view, the active mode object image is extracted from the active mode object image providing the learning content corresponding to the object by switching the inactive mode to the active mode, To form an image. &Lt; Desc / Clms Page number 24 >

14. The method of claim 13,
Further comprising storing the plurality of contents corresponding to the evaluation result as one piece of content data continuously combined in time before the step of inputting the user's answer or question,
Wherein each of the plurality of contents has time information indicating a position in time in the one piece of contents data.

15. The method of claim 14,
The step of constructing the content
And jumping the content corresponding to the evaluation result to a corresponding content according to the time information.

14. The method of claim 13,
Further comprising the step of individually storing each of a plurality of contents corresponding to the evaluation result before a step of receiving a user's answer or a question,
Wherein each of the plurality of contents has correspondence information indicating that the content corresponds to the evaluation result.

17. The method of claim 16,
The step of constructing the content
And the content corresponding to the evaluation result is detected according to the corresponding information.

CLAIMS 1. A method for providing a learning content performed by an apparatus for representing a virtual reality by video and audio to provide learning content,
Sensing the movement of the projection module worn by the sensor module on the user's head and outputting coordinates according to the detected movement;
A content providing module detects a user's view according to the coordinates and constructs a content including a video including an object video and a background video for an object at a position corresponding to the detected view and a voice required for the video step; And
Wherein the projection module and the audio module output video and audio of the configured content,
The step of constructing the content
If the object does not exist at the center of the visual field, an inactive mode object image in which the outline of the object is expressed is extracted, and the extracted inactive mode object image is registered in the background image to form an image ,
The active mode object image providing the learning content corresponding to the object is switched by switching the inactive mode to the active mode when the object exists in the center of the visual field, And arranging the images so as to form an image.

19. The method of claim 18,
The step of constructing the content
Detecting a center of the user's view according to the coordinates, and constructing an image and a voice for the object when the object exists in the center of the view.

20. The method of claim 19,
The step of constructing the content
And if the object is a dialog object, constructs a video and a voice for a question or an answer to the user of the dialog object.

20. The method of claim 19,
The step of constructing the content
If the object is a moving object, moving the background image so that a virtual reality in which the user moves in a direction indicated by the moving object is provided.

20. The method of claim 19,
The step of constructing the content
If the object is a word object, constitutes a content providing a video and a voice explaining the meaning of a word corresponding to the word object.