KR102179452B1

KR102179452B1 - Device and voice recognition server for providing sound effect of story contents

Info

Publication number: KR102179452B1
Application number: KR1020180033455A
Authority: KR
Inventors: 박광현; 김종주; 정철범; 홍순천; 황재문
Original assignee: 주식회사 케이티
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2020-11-16
Also published as: KR20190111395A

Abstract

스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치는 기저장된 스토리 컨텐츠의 내용에 대해 사용자가 발화한 음성을 입력받는 입력부, 상기 입력된 음성 데이터를 음성 인식 서버로 전송하는 전송부, 상기 음성 인식 서버로부터 상기 음성 데이터에 기초하여 변환된 텍스트 데이터를 수신하는 수신부, 상기 기저장된 스토리 컨텐츠 및 상기 수신된 텍스트 데이터를 비교하여 상기 텍스트 데이터에 대응하는 음향 효과를 상기 기저장된 스토리 컨텐츠로부터 추출하는 음향 효과 추출부 및 상기 추출된 음향 효과를 제공하는 음향 효과 제공부를 포함한다. A media playback device that provides sound effects for story content includes an input unit for receiving a voice uttered by a user for the contents of a previously stored story content, a transmission unit for transmitting the input voice data to a voice recognition server, and the voice recognition server A receiving unit for receiving text data converted based on the voice data from, a sound effect extraction for comparing the previously stored story content and the received text data to extract a sound effect corresponding to the text data from the previously stored story content And a sound effect providing unit providing the extracted sound effect.

Description

Media playback device and voice recognition server that provide sound effects for story content {DEVICE AND VOICE RECOGNITION SERVER FOR PROVIDING SOUND EFFECT OF STORY CONTENTS}

본 발명은 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버에 관한 것이다. The present invention relates to a media playback device and a voice recognition server that provide sound effects for story content.

전자책(electronic book)이란 문자나 화상과 같은 정보를 전자 매체에 기록하여 서적처럼 이용할 수 있는 디지털 도서를 의미한다. 전자책은 종이책에 비해 저렴한 가격으로 이용가능하며, 독서를 하면서 동영상 자료를 보거나, 배경음악을 들을 수 있으며, 언제 어디서든 쉽게 책을 볼 수 있다는 장점을 갖는다. An electronic book refers to a digital book that can be used like a book by recording information such as text or images on an electronic medium. E-books are available at a lower price than paper books, and they have the advantage of being able to view video data while reading, listen to background music, and easily view books anytime, anywhere.

이러한 전자책과 관련하여 선행기술인 한국공개특허 제 2014-0037824호는 전자책 인터페이스 시스템 및 방법을 개시하고 있다. In relation to such an e-book, Korean Patent Publication No. 2014-0037824, which is a prior art, discloses an e-book interface system and method.

최근에는 아이들을 위한 동화책이 전자책으로 제공되고 있다. 그러나 여러 연구 결과에 따르면 전자책을 통해 아이가 직접 동화책을 읽거나, TV를 통해 기계음으로 동화책 내용을 청취하기 보다는 부모가 직접 동화책을 읽어주는 것이 아이의 정서나 창의력 개발에 효과적이라고 한다. Recently, children's children's books are being provided as e-books. However, according to the results of various studies, it is effective to develop children's emotions and creativity by parents reading children's books directly rather than reading children's books directly through e-books or listening to children's books through machine sound through TV.

따라서, 부모가 직접 동화책을 읽어주되, 아이가 동화책 내용에 좀더 몰입할 수 있는 방안이 요구되고 있다. Therefore, there is a need for a way in which parents directly read children's books, but children are more immersed in the contents of children's books.

스토리 컨텐츠를 청취하는 청취자의 집중도를 높여주도록 다양한 효과를 제공하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공하고자 한다. 사용자가 발화한 스토리 컨텐츠의 음성을 텍스트로 변환하고, 텍스트에 오류가 포함된 경우 오류 보정 알고리즘을 이용하여 해당 텍스트를 보정하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공하고자 한다. 사용자가 발화한 스토리 컨텐츠의 문맥의 흐름을 인식하고, 문맥의 흐름에 적합한 음향 효과, 진동 효과 및 조명 효과를 제공하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공하고자 한다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. It is intended to provide a media playback device and a voice recognition server that provide sound effects for story contents that provide various effects to increase the concentration of listeners who listen to story contents. Provides a media playback device and speech recognition server that converts the voice of the story content uttered by the user into text and provides sound effects for the story content that corrects the text using an error correction algorithm when the text contains an error. I want to. To provide a media player and speech recognition server that recognizes the flow of the context of the story content uttered by the user and provides sound effects for the story content that provides sound effects, vibration effects, and lighting effects suitable for the flow of the context. . However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 기저장된 스토리 컨텐츠의 내용에 대해 사용자가 발화한 음성을 입력받는 입력부, 상기 입력된 음성 데이터를 음성 인식 서버로 전송하는 전송부, 상기 음성 인식 서버로부터 상기 음성 데이터에 기초하여 변환된 텍스트 데이터를 수신하는 수신부, 상기 기저장된 스토리 컨텐츠 및 상기 수신된 텍스트 데이터를 비교하여 상기 텍스트 데이터에 대응하는 음향 효과를 상기 기저장된 스토리 컨텐츠로부터 추출하는 음향 효과 추출부 및 상기 추출된 음향 효과를 제공하는 음향 효과 제공부를 포함하는 미디어 재생 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides an input unit for receiving a voice uttered by a user for the contents of a previously stored story content, and a transmission for transmitting the input voice data to a voice recognition server. A receiving unit for receiving text data converted based on the voice data from the voice recognition server, and comparing the previously stored story content and the received text data to obtain a sound effect corresponding to the text data as the pre-stored story content It is possible to provide a media playback device including a sound effect extracting unit extracted from the sound effect extraction unit and a sound effect providing unit providing the extracted sound effect.

본 발명의 다른 실시예는, 사용자로부터 스토리 컨텐츠의 내용을 발화한 음성을 입력받는 입력부, 상기 입력된 음성에 관한 음성 데이터를 음성 인식 서버로 전송하는 전송부, 상기 음성 인식 서버로부터 상기 전송된 음성 데이터에 대응하는 음향 효과를 수신하는 수신부 및 상기 수신한 음향 효과를 제공하는 음향 효과 제공부를 포함하고, 상기 음성 데이터는 상기 음성 인식 서버에 의해 텍스트 데이터로 변환되고, 상기 음향 효과는 상기 변환된 텍스트 데이터 및 상기 스토리 컨텐츠가 비교됨으로써, 상기 텍스트 데이터에 대응하는 음향 효과가 추출되는 것인 미디어 재생 장치를 제공할 수 있다. Another embodiment of the present invention is an input unit that receives a voice uttering the contents of a story content from a user, a transmission unit that transmits voice data related to the input voice to a voice recognition server, and the transmitted voice from the voice recognition server. And a receiving unit for receiving a sound effect corresponding to data and a sound effect providing unit for providing the received sound effect, wherein the voice data is converted into text data by the voice recognition server, and the sound effect is the converted text By comparing the data and the story content, it is possible to provide a media playback device in which a sound effect corresponding to the text data is extracted.

본 발명의 또 다른 실시예는, 미디어 재생 장치로부터 사용자에 의해 스토리 컨텐츠의 내용이 발화된 음성 데이터를 수신하는 수신부, 상기 수신한 음성 데이터를 텍스트 데이터로 변환하는 변환부, 상기 스토리 컨텐츠 및 상기 변환된 텍스트 데이터를 비교하여 상기 텍스트 데이터에 대응하는 음향 효과를 상기 스토리 컨텐츠로부터 추출하는 음향 효과 추출부 및 상기 추출된 음향 효과를 상기 미디어 재생 장치로 제공하는 음향 효과 제공부를 포함하는 음성 인식 서버를 제공할 수 있다. Another embodiment of the present invention is a receiver for receiving voice data in which the content of story content is spoken by a user from a media playback device, a converter for converting the received voice data into text data, the story content and the conversion Provides a speech recognition server comprising a sound effect extracting unit for comparing the text data and extracting a sound effect corresponding to the text data from the story content, and a sound effect providing unit for providing the extracted sound effect to the media player can do.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 스토리 컨텐츠를 청취하는 청취자의 집중도를 높여주도록 다양한 효과를 제공하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공할 수 있다. 사용자가 발화한 스토리 컨텐츠의 음성을 텍스트로 변환하고, 텍스트에 오류가 포함된 경우 오류 보정 알고리즘을 이용하여 해당 텍스트를 보정하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공할 수 있다. 사용자가 발화한 스토리 컨텐츠의 문맥의 흐름을 인식하고, 문맥의 흐름에 적합한 음향 효과, 진동 효과 및 조명 효과를 제공하는 스토리 컨텐츠에 대한 음향 효과를 제공하는 미디어 재생 장치 및 음성 인식 서버를 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, it is possible to provide a media playback device and a voice recognition server that provide sound effects for story content providing various effects to increase the concentration of the listener listening to the story content. have. Provides a media playback device and speech recognition server that converts the voice of the story content uttered by the user into text and provides sound effects for the story content that corrects the text using an error correction algorithm when the text contains an error. can do. Provides a media playback device and a speech recognition server that recognizes the flow of the context of the story content uttered by the user and provides sound effects for the story content that provides sound effects, vibration effects, and lighting effects suitable for the flow of the context. have.

도 1은 본 발명의 일 실시예에 따른 음향 효과 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다.
도 4는 본 발명의 다른 실시예에 따른 미디어 재생 장치에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다.
도 5는 본 발명의 일 실시예에 따른 음성 인식 서버의 구성도이다.
도 6은 본 발명의 일 실시예에 따른 음성 인식 서버에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다.
도 7a 및 도 7b는 본 발명의 일 실시예에 따른 스토리 컨텐츠 및 변환된 텍스트 데이터를 비교하여 텍스트 데이터에 포함된 오류를 보정하는 과정을 설명하기 위한 예시적인 도면이다.
도 8a 및 도 8c는 본 발명의 일 실시예에 따른 스토리 컨텐츠로부터 음향 효과를 추출하는 과정을 설명하기 위한 예시적인 도면이다.
도 9a 및 도 9b는 본 발명이 일 실시예에 따른 스토리 컨텐츠로부터 조명 효과를 제공하는 과정을 설명하기 위한 예시적인 도면이다.
도 10은 본 발명의 일 실시예에 따른 스토리 컨텐츠에 대한 문맥 기반 효과음 및 키워드 기반 효과음을 비교한 예시적인 도면이다.1 is a block diagram of a system for providing sound effects according to an embodiment of the present invention.
2 is a block diagram of a media playback device according to an embodiment of the present invention.
3 is a flowchart of a method of providing a sound effect for story content in a media playback device according to an embodiment of the present invention.
4 is a flowchart of a method of providing a sound effect for story content in a media playback device according to another embodiment of the present invention.
5 is a block diagram of a voice recognition server according to an embodiment of the present invention.
6 is a flowchart of a method of providing a sound effect for story content in a voice recognition server according to an embodiment of the present invention.
7A and 7B are exemplary diagrams for explaining a process of compensating an error included in text data by comparing story content and converted text data according to an embodiment of the present invention.
8A and 8C are exemplary diagrams for explaining a process of extracting a sound effect from story content according to an embodiment of the present invention.
9A and 9B are exemplary diagrams for explaining a process of providing a lighting effect from story content according to an exemplary embodiment of the present invention.
10 is an exemplary diagram comparing a context-based sound effect and a keyword-based sound effect for story content according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, or two or more units may be realized using one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In the present specification, some of the operations or functions described as being performed by the terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음향 효과 제공 시스템의 구성도이다. 도 1을 참조하면, 음향 효과 제공 시스템(1)은 미디어 재생 장치(110) 및 음성 인식 서버(120)를 포함할 수 있다. 미디어 재생 장치(110) 및 음성 인식 서버(120)는 음향 효과 제공 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다. 1 is a block diagram of a system for providing sound effects according to an embodiment of the present invention. Referring to FIG. 1, the sound effect providing system 1 may include a media playback device 110 and a voice recognition server 120. The media playback device 110 and the voice recognition server 120 exemplarily illustrate components that can be controlled by the sound effect providing system 1.

도 1의 음향 효과 제공 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 미디어 재생 장치(110)는 음성 인식 서버(120)와 동시에 또는 시간 간격을 두고 연결될 수 있다. Each component of the sound effect providing system 1 of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1, the media playback device 110 may be connected to the voice recognition server 120 at the same time or at intervals of time.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. A network refers to a connection structure that enables information exchange between nodes such as terminals and servers, and is a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired/wireless data communication networks, telephone networks, wired/wireless television networks, etc. Examples of wireless data networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, and ultrasound. Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

일 실시예에 따르면, 미디어 재생 장치(110)에서 스토리 컨텐츠의 문맥을 스스로 인식하여 음향 효과를 제공할 수 있다. According to an embodiment, the media playback device 110 may provide a sound effect by self-recognizing the context of the story content.

미디어 재생 장치(110)는 사용자(100)로부터 복수의 스토리 컨텐츠 중 어느 하나의 컨텐츠를 요청받고, 요청받은 스토리 컨텐츠를 컨텐츠 제공 서버(미도시)로부터 다운로드하여 관리할 수 있다. 스토리 컨텐츠는 스토리 컨텐츠가 기재되어 있는 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장으로 계층화된 것일 수 있다. The media playback device 110 may receive a request for any one of a plurality of story contents from the user 100 and download and manage the requested story content from a content providing server (not shown). The story content may be layered with at least one paragraph in which the story content is described and at least one sentence included in each paragraph.

미디어 재생 장치(110)는 기저장된 스토리 컨텐츠의 내용에 대해 사용자(100)가 발화한 음성을 입력받고, 입력된 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. 예를 들어, 스토리 컨텐츠가 "잭과 콩나무"인 경우, 사용자(100)는 "잭과 콩나무"의 스크립트(예를 들어, 문서 형태 또는 e-book 형태의 스크립트)를 읽어서 발화하고, 미디어 재생 장치(110)는 사용자(100)로부터 발화된 해당 음성을 입력받을 수 있다. The media playback device 110 may receive a voice uttered by the user 100 for the content of the previously stored story content, and transmit the input voice data to the voice recognition server 120. For example, when the story content is "Jack and the beanstalk", the user 100 reads the script of "Jack and the beanstalk" (for example, a script in the form of a document or an e-book) and utters it, and the media player 110 may receive a corresponding voice uttered from the user 100.

미디어 재생 장치(110)는 음성 인식 서버(120)로부터 음성 데이터에 기초하여 변환된 텍스트 데이터를 수신할 수 있다. The media playback device 110 may receive text data converted based on the voice data from the voice recognition server 120.

미디어 재생 장치(110)는 텍스트 데이터를 기저장된 스토리 컨텐츠와 비교하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 구체적으로, 미디어 재생 장치(110)는 오류 보정 알고리즘을 적용하여 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장을 검색할 수 있다. 예를 들어, 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 검색된 경우, 미디어 재생 장치(110)는 검색된 스토리 컨텐츠의 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 다른 예를 들어, 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 복수개로 검색된 경우, 미디어 재생 장치(110)는 문맥 추적 알고리즘을 이용하여 복수개의 문장 중 어느 하나를 선택하고, 선택된 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. The media playback device 110 may correct an error included in the text data by comparing the text data with previously stored story content. Specifically, the media playback device 110 may apply an error correction algorithm to search for a sentence having a similarity between a sentence of text data and a sentence of pre-stored story content equal to or greater than a threshold value. For example, when a sentence having a similarity between a sentence of text data and a sentence of pre-stored story content is greater than or equal to a threshold value, the media playback device 110 may correct an error included in the text data based on the sentence of the retrieved story content. have. For another example, when a plurality of sentences having a similarity between a sentence of text data and a sentence of pre-stored story content are searched for at least a threshold value, the media playback device 110 selects any one of the plurality of sentences using a context tracking algorithm. , Based on the selected sentence, errors included in the text data may be corrected.

미디어 재생 장치(110)는 계층화된 스토리 컨텐츠에 기초하여 스토리 컨텐츠에서 텍스트 데이터에 해당하는 스토리 컨텐츠의 문맥을 추적할 수 있다. 이 때, 텍스트 데이터에 해당하는 문장이 스토리 컨텐츠에 적어도 둘 이상 포함되어 있는 경우, 미디어 재생 장치(110)는 텍스트 데이터에 해당하는 둘 이상의 문장 중 유사도 가중 파라미터 및 디스턴스 가중 파라미터를 적용하여 텍스트 데이터에 해당하는 문장을 추출할 수 있다. The media playback device 110 may track the context of story content corresponding to text data in the story content based on the layered story content. In this case, when at least two sentences corresponding to text data are included in the story content, the media playback device 110 applies a similarity weighting parameter and a distance weighting parameter among two or more sentences corresponding to text data to the text data. You can extract the corresponding sentence.

미디어 재생 장치(110)는 기저장된 스토리 컨텐츠 및 수신된 텍스트 데이터를 비교하여 텍스트 데이터에 대응하는 음향 효과를 기저장된 스토리 컨텐츠로부터 추출하고, 추출된 음향 효과를 제공할 수 있다. 예를 들어, 미디어 재생 장치(110)는 추출된 문장에 대응하는 음향 효과를 기저장된 스토리 컨텐츠로부터 추출하고, 추출된 음향 효과를 제공할 수 있다. The media playback device 110 may compare the previously stored story content and the received text data to extract a sound effect corresponding to the text data from the previously stored story content, and provide the extracted sound effect. For example, the media playback device 110 may extract a sound effect corresponding to the extracted sentence from pre-stored story content and provide the extracted sound effect.

미디어 재생 장치(110)는 텍스트 데이터에 대응하는 진동 효과 또는 조명 효과를 제공하도록 복수의 기기와의 연동을 통해 제어할 수 있다. The media playback device 110 may be controlled through interworking with a plurality of devices to provide a vibration effect or a lighting effect corresponding to text data.

음성 인식 서버(120)는 미디어 재생 장치(110)로부터 사용자(100)에 의해 스토리 컨텐츠의 내용이 발화된 음성 데이터를 수신할 수 있다. The voice recognition server 120 may receive voice data in which the content of the story content is uttered by the user 100 from the media playback device 110.

음성 인식 서버(120)는 수신한 음성 데이터를 텍스트 변환할 수 있다. The voice recognition server 120 may convert the received voice data into text.

음성 인식 서버(120)는 변환된 텍스트 데이터를 미디어 재생 장치(110)로 전송할 수 있다. The voice recognition server 120 may transmit the converted text data to the media playback device 110.

즉, 일 실시예에 따르면, 미디어 재생 장치(110)는 사용자(100)로부터 발화된 음성의 텍스트 데이터를 통해 이에 대응하는 스토리 컨텐츠의 내용 및 문맥을 직접 인식하여 음향 효과를 추출 및 제공하는 역할을 수행하고, 음성 인식 서버(120)는 사용자(100)가 발화한 스토리 컨텐츠의 내용에 관한 음성 데이터를 텍스트 데이터로 변환하는 역할을 수행할 수 있다. That is, according to an embodiment, the media playback device 110 directly recognizes the content and context of the corresponding story content through text data of the voice spoken by the user 100 to extract and provide sound effects. Then, the voice recognition server 120 may perform a role of converting voice data related to the content of the story content uttered by the user 100 into text data.

다른 실시예에 따르면, 음성 인식 서버(120)에서 사용자(100)로부터 발화된 음성의 텍스트 데이터를 통해 이에 대응하는 스토리 컨텐츠의 내용 및 문맥을 인식하여 음향 효과를 미디어 재생 장치(110)로 전송할 수 있다. According to another embodiment, the voice recognition server 120 may recognize the content and context of the corresponding story content through text data of the voice spoken by the user 100 and transmit the sound effect to the media playback device 110 have.

미디어 재생 장치(110)는 사용자(100)로부터 스토리 컨텐츠의 내용을 발화한 음성을 입력받을 수 있다. The media playback device 110 may receive a voice uttering the content of the story content from the user 100.

미디어 재생 장치(110)는 입력된 음성에 관한 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. The media playback device 110 may transmit voice data related to the input voice to the voice recognition server 120.

미디어 재생 장치(110)는 음성 인식 서버(120)로부터 전송된 음성 데이터에 대응하는 음향 효과를 수신할 수 있다. The media playback device 110 may receive a sound effect corresponding to the voice data transmitted from the voice recognition server 120.

미디어 재생 장치(110)는 수신한 음향 효과를 제공할 수 있다. The media playback device 110 may provide the received sound effect.

이러한 미디어 재생 장치(110)는 안드로이드 또는 iOS의 운영 체제가 탑재된 인공지능 스피커, 스마트폰, 태블릿 PC 등을 포함하나, 이에 한정하지 않는다. 음성 인식 서버(120)는 스토리 컨텐츠를 데이터베이스에 저장하여 관리할 수 있다. 스토리 컨텐츠는 스토리 컨텐츠가 기재되어 있는 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장으로 계층화된 것일 수 있다. 이 때, 스토리 컨텐츠는 계층화된 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장이 소정의 상위 문단에 포함되도록 확장된 계층화 구조를 갖도록 구성될 수 있다. The media playback device 110 includes, but is not limited to, an artificial intelligence speaker, a smart phone, a tablet PC, etc. equipped with an Android or iOS operating system. The voice recognition server 120 may store and manage story content in a database. The story content may be layered with at least one paragraph in which the story content is described and at least one sentence included in each paragraph. In this case, the story content may be configured to have an extended hierarchical structure such that at least one layered paragraph and at least one sentence included in each paragraph are included in a predetermined upper paragraph.

음성 인식 서버(120)는 수신한 음성 데이터를 텍스트 데이터로 변환할 수 있다. The voice recognition server 120 may convert the received voice data into text data.

음성 인식 서버(120)는 텍스트 데이터를 스토리 컨텐츠와 비교하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 구체적으로, 음성 인식 서버(120)는 오류 보정 알고리즘을 적용하여 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장을 검색할 수 있다. 예를 들어, 음성 인식 서버(120)는 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 검색된 경우, 검색된 스토리 컨텐츠의 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 다른 예를 들어, 음성 인식 서버(120)는 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 복수개로 검색된 경우, 문맥 추적 알고리즘을 이용하여 복수개의 문장 중 어느 하나를 선택하고, 선택된 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. The speech recognition server 120 may compare text data with story content and correct errors included in the text data. Specifically, the speech recognition server 120 may apply an error correction algorithm to search for a sentence having a similarity between a sentence of text data and a sentence of story content equal to or greater than a threshold value. For example, when a sentence having a similarity between a sentence of text data and a sentence of story content is greater than or equal to a threshold value, the speech recognition server 120 may correct an error included in the text data based on the sentence of the retrieved story content. For another example, when a plurality of sentences having a similarity between text data sentences and sentences of story content equal to or greater than a threshold value are searched, the voice recognition server 120 selects any one of the plurality of sentences using a context tracking algorithm, and Errors included in text data may be corrected based on sentences.

음성 인식 서버(120)는 스토리 컨텐츠 및 변환된 텍스트 데이터를 비교하여 텍스트 데이터에 대응하는 음향 효과를 스토리 컨텐츠로부터 추출할 수 있다. The speech recognition server 120 may compare the story content and the converted text data to extract a sound effect corresponding to the text data from the story content.

음성 인식 서버(120)는 추출된 음향 효과를 미디어 재생 장치(110)로 제공할 수 있다. The voice recognition server 120 may provide the extracted sound effect to the media playback device 110.

즉, 다른 실시예에 따르면, 미디어 재생 장치(110)는 음성 인식 서버(120)로부터 음향 효과를 수신하여 출력하는 역할을 수행하고, 음성 인식 서버(120)는 사용자(100)가 발화한 스토리 컨텐츠의 내용에 관한 음성 데이터를 텍스트 데이터로 변환하고, 스토리 컨텐츠의 문맥을 직접 인식하여 음향 효과를 추출하여 미디어 재생 장치(110)로 제공하는 역할을 수행할 수 있다. That is, according to another embodiment, the media playback device 110 plays a role of receiving and outputting a sound effect from the voice recognition server 120, and the voice recognition server 120 is a story content uttered by the user 100 The voice data related to the contents of is converted into text data, and a sound effect is extracted by directly recognizing the context of the story content, and provided to the media playback device 110.

도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다. 일 실시예에 따르면, 미디어 재생 장치(110)는 관리부(210), 입력부(220), 전송부(230), 수신부(240), 오류 보정부(250), 문맥 추적부(260), 음향 효과 추출부(270), 음향 효과 제공부(280) 및 제어부(290)를 포함할 수 있다. 2 is a block diagram of a media playback device according to an embodiment of the present invention. According to an embodiment, the media playback device 110 includes a management unit 210, an input unit 220, a transmission unit 230, a reception unit 240, an error correction unit 250, a context tracking unit 260, and a sound effect. It may include an extraction unit 270, a sound effect providing unit 280 and a control unit 290.

관리부(210)는 사용자(100)로부터 복수의 스토리 컨텐츠 중 어느 하나의 컨텐츠를 요청받고, 요청받은 스토리 컨텐츠를 컨텐츠 제공 서버(미도시)로부터 다운로드하여 관리할 수 있다. 여기서, 스토리 컨텐츠는 스토리 컨텐츠가 기재되어 있는 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장으로 계층화된 것일 수 있다. 즉, 적어도 하나의 문장이 모여 문단을 형성하고, 적어도 하나의 문단이 모여 스토리 컨텐츠를 형성할 수 있다.The management unit 210 may receive a request for any one of a plurality of story contents from the user 100 and download and manage the requested story content from a content providing server (not shown). Here, the story content may be layered with at least one paragraph in which the story content is described and at least one sentence included in each paragraph. That is, at least one sentence may be gathered to form a paragraph, and at least one paragraph may be gathered to form a story content.

입력부(220)는 기저장된 스토리 컨텐츠의 내용에 대해 사용자(100)가 발화한 음성을 입력받을 수 있다. The input unit 220 may receive a voice uttered by the user 100 for the content of the previously stored story content.

전송부(230)는 입력된 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. The transmission unit 230 may transmit the input voice data to the voice recognition server 120.

수신부(240)는 음성 인식 서버(120)로부터 음성 데이터에 기초하여 변환된 텍스트 데이터를 수신할 수 있다. The receiver 240 may receive text data converted based on the voice data from the voice recognition server 120.

오류 보정부(250)는 텍스트 데이터를 기저장된 스토리 컨텐츠와 비교하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. The error correcting unit 250 may correct an error included in the text data by comparing the text data with previously stored story content.

구체적으로, 오류 보정부(250)는 오류 보정 알고리즘을 적용하여 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장을 검색할 수 있다. 오류 보정 알고리즘이란 텍스트 데이터에 오류가 포함되더라도, 텍스트 데이터에 포함된 오류를 보정하여 스토리 컨텐츠의 문장을 인지하는 알고리즘을 의미한다. 예를 들어, 오류 보정부(250)는 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 검색된 경우, 검색된 스토리 컨텐츠의 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 다른 예를 들어, 오류 보정부(250)는 변환된 텍스트 데이터의 문장과 기저장된 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 복수개로 검색된 경우, 문맥 추적 알고리즘을 이용하여 복수개의 문장 중 어느 하나를 선택하고, 선택된 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 문맥 추적 알고리즘이란 스토리 컨텐츠에 복수의 문단에 동일 문장이 반복된 경우, 이에 대응하는 문단을 추적하여 적절한 음향 효과를 추출할 수 있도록 하는 알고리즘을 나타낸다. Specifically, the error correction unit 250 may apply an error correction algorithm to search for a sentence having a similarity between a sentence of text data and a sentence of pre-stored story content equal to or greater than a threshold value. The error correction algorithm refers to an algorithm for recognizing a sentence of story content by correcting an error included in text data even if an error is included in the text data. For example, when a sentence having a similarity between a sentence of text data and a sentence of pre-stored story content is greater than or equal to a threshold value, the error correction unit 250 may correct an error included in the text data based on the sentence of the retrieved story content. have. For another example, when a plurality of sentences having a similarity between a sentence of the converted text data and a sentence of pre-stored story content are searched for at least a threshold value, any one of the plurality of sentences is retrieved using a context tracking algorithm. Select and correct errors included in text data based on the selected sentence. The context tracking algorithm refers to an algorithm for extracting an appropriate sound effect by tracking the corresponding paragraph when the same sentence is repeated in a plurality of paragraphs in the story content.

문맥 추적부(260)는 계층화된 스토리 컨텐츠에 기초하여 스토리 컨텐츠에서 텍스트 데이터에 해당하는 스토리 컨텐츠의 문맥을 추적할 수 있다. The context tracking unit 260 may track the context of story content corresponding to text data in the story content based on the layered story content.

문맥 추적부(260)는 텍스트 데이터에 해당하는 문장이 스토리 컨텐츠에 적어도 둘 이상 포함되어 있는 경우, 텍스트 데이터에 해당하는 둘 이상의 문장 중 유사도 가중 파라미터 및 디스턴스 가중 파라미터를 적용하여 텍스트 데이터에 해당하는 문장을 추출할 수 있다. When at least two sentences corresponding to text data are included in the story content, the context tracking unit 260 applies a similarity weighting parameter and a distance weighting parameter among two or more sentences corresponding to the text data to correspond to the text data. Can be extracted.

음향 효과 추출부(270)는 기저장된 스토리 컨텐츠 및 수신된 텍스트 데이터를 비교하여 상기 텍스트 데이터에 대응하는 음향 효과를 기저장된 스토리 컨텐츠로부터 추출할 수 있다. 예를 들어, 음향 효과 추출부(270)는 추출된 문장에 대응하는 음향 효과를 기저장된 스토리 컨텐츠로부터 추출할 수 있다. 음향 효과는 예를 들어, 배경음, 효과음 등을 포함할 수 있다. The sound effect extractor 270 may compare the previously stored story content and the received text data to extract a sound effect corresponding to the text data from the previously stored story content. For example, the sound effect extractor 270 may extract a sound effect corresponding to the extracted sentence from pre-stored story content. The sound effect may include, for example, a background sound and a sound effect.

음향 효과 제공부(280)는 추출된 음향 효과를 제공할 수 있다. The sound effect providing unit 280 may provide the extracted sound effect.

제어부(290)는 텍스트 데이터에 대응하는 진동 효과 또는 조명 효과를 제공하도록 복수의 기기와의 연동을 통해 제어할 수 있다. The controller 290 may control through interworking with a plurality of devices to provide a vibration effect or a lighting effect corresponding to text data.

즉, 일 실시예에 따르면, 미디어 재생 장치(110)는 스토리 컨텐츠의 변환된 텍스트 데이터의 오류 보정 및 텍스트 데이터에 대응하는 음향 효과를 추출하여 이를 출력하는 역할을 수행할 수 있다. That is, according to an embodiment, the media playback apparatus 110 may perform a role of correcting errors of converted text data of story content and extracting sound effects corresponding to the text data and outputting them.

다른 실시예에 따르면, 미디어 재생 장치(110)는 입력부(220), 전송부(230), 수신부(240) 및 음향 효과 제공부(280)를 포함할 수 있다. According to another embodiment, the media playback device 110 may include an input unit 220, a transmission unit 230, a reception unit 240, and a sound effect providing unit 280.

입력부(220)는 사용자(100)로부터 스토리 컨텐츠의 내용을 발화한 음성을 입력받을 수 있다. The input unit 220 may receive a voice that uttered the content of the story content from the user 100.

전송부(230)는 입력된 음성에 관한 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. The transmission unit 230 may transmit voice data related to the input voice to the voice recognition server 120.

수신부(240)는 음성 인식 서버(120)로부터 전송된 음성 데이터에 대응하는 음향 효과를 수신할 수 있다. The receiver 240 may receive a sound effect corresponding to the voice data transmitted from the voice recognition server 120.

음향 효과 제공부(280)는 수신한 음향 효과를 제공할 수 있다. The sound effect providing unit 280 may provide the received sound effect.

즉, 다른 실시예에 따르면, 미디어 재생 장치(110)는 음성 인식 서버(120)로부터 음향 효과를 수신하여 이를 출력하는 역할만을 수행할 수도 있다. That is, according to another embodiment, the media playback device 110 may only perform a role of receiving sound effects from the voice recognition server 120 and outputting them.

도 3은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다. 도 3에 도시된 미디어 재생 장치(110)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법은 도 1 및 도 2에 도시된 실시예에 따른 음향 효과 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 및 도 2에 도시된 실시예에 따른 미디어 재생 장치(110)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법에도 적용된다. 3 is a flowchart of a method of providing a sound effect for story content in a media playback device according to an embodiment of the present invention. A method of providing sound effects for story content in the media playback device 110 shown in FIG. 3 is a step of being processed in a time series by the sound effect providing system 1 according to the embodiment shown in FIGS. 1 and 2 Includes them. Accordingly, even though the contents are omitted below, the method of providing sound effects for story contents in the media playback device 110 according to the exemplary embodiment illustrated in FIGS. 1 and 2 is also applied.

단계 S310에서 미디어 재생 장치(110)는 기저장된 스토리 컨텐츠의 내용에 대해 사용자(100)가 발화한 음성을 입력받을 수 있다. In step S310, the media playback device 110 may receive a voice uttered by the user 100 for the content of the previously stored story content.

단계 S320에서 미디어 재생 장치(110)는 입력된 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. In step S320, the media playback device 110 may transmit the input voice data to the voice recognition server 120.

단계 S330에서 미디어 재생 장치(110)는 음성 인식 서버(120)로부터 음성 데이터에 기초하여 변환된 텍스트 데이터를 수신할 수 있다. In step S330, the media playback device 110 may receive text data converted based on the voice data from the voice recognition server 120.

단계 S340에서 미디어 재생 장치(110)는 기저장된 스토리 컨텐츠 및 수신된 텍스트 데이터를 비교하여 텍스트 데이터에 대응하는 음향 효과를 기저장된 스토리 컨텐츠로부터 추출할 수 있다. In operation S340, the media playback device 110 may compare the previously stored story content and the received text data to extract a sound effect corresponding to the text data from the previously stored story content.

단계 S350에서 미디어 재생 장치(110)는 추출된 음향 효과를 제공할 수 있다. In step S350, the media playback device 110 may provide the extracted sound effect.

상술한 설명에서, 단계 S310 내지 S350은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S310 to S350 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be switched.

도 4는 본 발명의 다른 실시예에 따른 미디어 재생 장치에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다. 도 4에 도시된 미디어 재생 장치(110)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법은 도 1 내지 도 3에 도시된 실시예에 따른 음향 효과 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 3에 도시된 실시예에 따른 미디어 재생 장치(110)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법에도 적용된다. 4 is a flowchart of a method of providing a sound effect for story content in a media playback device according to another embodiment of the present invention. The method of providing sound effects for story content in the media playback device 110 illustrated in FIG. 4 is a step of being processed in a time series by the sound effect providing system 1 according to the embodiment illustrated in FIGS. 1 to 3. Includes them. Therefore, even if omitted below, it is also applied to a method of providing sound effects for story content in the media playback device 110 according to the exemplary embodiment illustrated in FIGS. 1 to 3.

단계 S410에서 미디어 재생 장치(110)는 사용자(100)로부터 스토리 컨텐츠의 내용을 발화한 음성을 입력받을 수 있다. In operation S410, the media playback device 110 may receive a voice uttered by the user 100 from the story content.

단계 S420에서 미디어 재생 장치(110)는 입력된 음성에 관한 음성 데이터를 음성 인식 서버(120)로 전송할 수 있다. In step S420, the media playback device 110 may transmit voice data related to the input voice to the voice recognition server 120.

단계 S430에서 미디어 재생 장치(110)는 음성 인식 서버(120)로부터 전송된 음성 데이터에 대응하는 음향 효과를 수신할 수 있다. In operation S430, the media playback device 110 may receive a sound effect corresponding to the voice data transmitted from the voice recognition server 120.

단계 S440에서 미디어 재생 장치(110)는 수신한 음향 효과를 제공할 수 있다. In step S440, the media playback device 110 may provide the received sound effect.

상술한 설명에서, 단계 S410 내지 S440은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S410 to S440 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be switched.

도 5는 본 발명의 일 실시예에 따른 음성 인식 서버의 구성도이다. 일 실시예에 따르면, 음성 인식 서버(120)는 수신부(510), 변환부(520), 오류 보정부(530), 음향 효과 추출부(540) 및 음향 효과 제공부(550)를 포함할 수 있다. 5 is a block diagram of a voice recognition server according to an embodiment of the present invention. According to an embodiment, the speech recognition server 120 may include a receiving unit 510, a conversion unit 520, an error correction unit 530, a sound effect extraction unit 540, and a sound effect providing unit 550. have.

수신부(510)는 미디어 재생 장치(110)로부터 사용자(100)에 의해 스토리 컨텐츠의 내용이 발화된 음성 데이터를 수신할 수 있다. 여기서, 스토리 컨텐츠는 스토리 컨텐츠가 기재되어 있는 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장으로 계층화되도록 구성되며, 계층화된 적어도 하나의 문단 및 각 문단에 포함된 적어도 하나의 문장이 소정의 상위 문단에 포함되도록 확장된 계층화 구조를 갖도록 구성될 수 있다. The receiver 510 may receive voice data in which the content of the story content is uttered by the user 100 from the media playback device 110. Here, the story content is configured to be layered into at least one paragraph in which the story content is described and at least one sentence included in each paragraph, and at least one layered paragraph and at least one sentence included in each paragraph are It can be configured to have an extended hierarchical structure to be included in the upper paragraph.

변환부(520)는 수신한 음성 데이터를 텍스트 데이터로 변환할 수 있다. The conversion unit 520 may convert the received voice data into text data.

오류 보정부(530)는 텍스트 데이터를 스토리 컨텐츠와 비교하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. The error correction unit 530 may compare text data with story content to correct an error included in the text data.

구체적으로, 오류 보정부(530)는 오류 보정 알고리즘을 적용하여 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장을 검색할 수 있다. 오류 보정 알고리즘이란 텍스트 데이터에 오류가 포함되더라도, 텍스트 데이터에 포함된 오류를 보정하여 스토리 컨텐츠의 문장을 인지하는 알고리즘을 의미한다. 예를 들어, 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 검색된 경우, 오류 보정부(530)는 검색된 스토리 컨텐츠의 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 다른 예를 들어, 텍스트 데이터의 문장과 스토리 컨텐츠의 문장 간의 유사도가 임계치 이상인 문장이 복수개로 검색된 경우, 오류 보정부(530)는 문맥 추적 알고리즘을 이용하여 복수개의 문장 중 어느 하나를 선택하고, 선택된 문장에 기초하여 텍스트 데이터에 포함된 오류를 보정할 수 있다. 문맥 추적 알고리즘이란 스토리 컨텐츠에 복수의 문단에 동일 문장이 반복된 경우, 이에 대응하는 문단을 추적하여 적절한 음향 효과를 추출할 수 있도록 하는 알고리즘을 나타낸다.Specifically, the error correction unit 530 may apply an error correction algorithm to search for a sentence having a similarity between a sentence of text data and a sentence of story content equal to or greater than a threshold value. The error correction algorithm refers to an algorithm for recognizing a sentence of story content by correcting an error included in text data even if an error is included in the text data. For example, when a sentence having a similarity between a sentence of text data and a sentence of story content equal to or greater than a threshold value is searched, the error correcting unit 530 may correct an error included in the text data based on the sentence of the retrieved story content. For another example, when a plurality of sentences having a similarity between a sentence of text data and a sentence of story content are greater than or equal to a threshold value, the error correction unit 530 selects any one of the plurality of sentences using a context tracking algorithm, and Errors included in text data may be corrected based on sentences. The context tracking algorithm refers to an algorithm for extracting an appropriate sound effect by tracking the corresponding paragraph when the same sentence is repeated in a plurality of paragraphs in the story content.

음향 효과 추출부(540)는 스토리 컨텐츠 및 변환된 텍스트 데이터를 비교하여 텍스트 데이터에 대응하는 음향 효과를 스토리 컨텐츠로부터 추출할 수 있다. 음향 효과는 예를 들어, 배경음, 효과음 등을 포함할 수 있다.The sound effect extraction unit 540 may compare the story content and the converted text data to extract a sound effect corresponding to the text data from the story content. The sound effect may include, for example, a background sound and a sound effect.

음향 효과 제공부(550)는 추출된 음향 효과를 미디어 재생 장치(110)로 제공할 수 있다. The sound effect providing unit 550 may provide the extracted sound effect to the media playback device 110.

즉, 일 실시예에 따르면, 음성 인식 서버(120)는 스토리 컨텐츠의 변환된 텍스트 데이터의 오류 보정 및 텍스트 데이터에 대응하는 음향 효과를 추출하여 이를 미디어 재생 장치(110)로 제공하는 역할을 수행할 수 있다. That is, according to an embodiment, the voice recognition server 120 may perform a role of correcting errors in the converted text data of story content and extracting sound effects corresponding to the text data and providing them to the media playback device 110. I can.

다른 실시예에 따르면, 음성 인식 서버(120)는 수신부(510), 변환부(520) 및 전송부(미도시)를 포함할 수 있다. According to another embodiment, the voice recognition server 120 may include a receiving unit 510, a conversion unit 520, and a transmission unit (not shown).

수신부(510)는 미디어 재생 장치(110)로부터 사용자(100)에 의해 스토리 컨텐츠의 내용이 발화된 음성 데이터를 수신할 수 있다. The receiver 510 may receive voice data in which the content of the story content is uttered by the user 100 from the media playback device 110.

변환부(520)는 수신한 음성 데이터를 텍스트 데이터로 변환할 수 있다.The conversion unit 520 may convert the received voice data into text data.

전송부(미도시)는 변환된 텍스트 데이터를 미디어 재생 장치(110)로 전송할 수 있다. The transmission unit (not shown) may transmit the converted text data to the media playback device 110.

즉, 다른 실시예에 따르면, 음성 인식 서버(120)는 미디어 재생 장치(110)로부터 수신한 음성 데이터를 텍스트 데이터로 변환하는 기능만을 수행할 수도 있다. That is, according to another embodiment, the voice recognition server 120 may only perform a function of converting voice data received from the media playback device 110 into text data.

도 6은 본 발명의 일 실시예에 따른 음성 인식 서버에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법의 순서도이다. 도 6에 도시된 음성 인식 서버(120)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법은 도 1 내지 도 5에 도시된 실시예에 따른 음향 효과 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 5에 도시된 실시예에 따른 음성 인식 서버(120)에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법에도 적용된다. 6 is a flowchart of a method of providing a sound effect for story content in a voice recognition server according to an embodiment of the present invention. The method of providing sound effects for story content in the voice recognition server 120 shown in FIG. 6 is a step of being processed in a time series by the sound effect providing system 1 according to the embodiments shown in FIGS. 1 to 5 Includes them. Accordingly, even if the contents are omitted below, the method of providing sound effects for story contents in the voice recognition server 120 according to the exemplary embodiment illustrated in FIGS. 1 to 5 is also applied.

단계 S610에서 음성 인식 서버(120)는 미디어 재생 장치(110)로부터 사용자(100)에 의해 스토리 컨텐츠의 내용이 발화된 음성 데이터를 수신할 수 있다. In operation S610, the voice recognition server 120 may receive voice data in which the content of the story content is uttered by the user 100 from the media playback device 110.

단계 S620에서 음성 인식 서버(120)는 수신한 음성 데이터를 텍스트 데이터로 변환할 수 있다. In step S620, the voice recognition server 120 may convert the received voice data into text data.

단계 S630에서 음성 인식 서버(120)는 스토리 컨텐츠 및 상기 변환된 텍스트 데이터를 비교하여 텍스트 데이터에 대응하는 음향 효과를 스토리 컨텐츠로부터 추출할 수 있다. In step S630, the speech recognition server 120 may compare the story content and the converted text data to extract a sound effect corresponding to the text data from the story content.

단계 S640에서 음성 인식 서버(120)는 추출된 음향 효과를 미디어 재생 장치(110)로 제공할 수 있다. In step S640, the voice recognition server 120 may provide the extracted sound effect to the media playback device 110.

상술한 설명에서, 단계 S610 내지 S640은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S610 to S640 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be switched.

도 7a 및 도 7b는 본 발명의 일 실시예에 따른 스토리 컨텐츠 및 변환된 텍스트 데이터를 비교하여 텍스트 데이터에 포함된 오류를 보정하는 과정을 설명하기 위한 예시적인 도면이다. 7A and 7B are exemplary diagrams for explaining a process of compensating an error included in text data by comparing story content and converted text data according to an embodiment of the present invention.

도 7a는 본 발명의 일 실시예에 따른 스토리 컨텐츠 및 사용자가 음성 발화한 스토리 컨텐츠의 내용에 기초하여 변환된 텍스트 데이터를 비교한 예시적인 도면이다. 도 7a를 참조하면, 음성 인식 서버(120)는 미디어 재생 장치(110)로부터 사용자(100)가 발화한 스토리 컨텐츠의 내용에 대한 음성 데이터를 수신하고, 수신한 음성 데이터를 텍스트 데이터(710)로 변환할 수 있다. 음성 인식 서버(120)는 스토리 컨텐츠(700)와 변환된 텍스트 데이터(710)를 비교하여 텍스트 데이터(710)에 포함된 오류를 보정할 수 있다. 음성 인식 서버(120)는 텍스트 데이터(710)에 붉은색으로 표시된 단어를 오류로 판단할 수 있다. 7A is an exemplary diagram illustrating a comparison between story content and text data converted based on the content of story content spoken by a user according to an embodiment of the present invention. Referring to FIG. 7A, the voice recognition server 120 receives voice data on the content of the story content uttered by the user 100 from the media playback device 110, and converts the received voice data to text data 710. Can be converted. The speech recognition server 120 may correct an error included in the text data 710 by comparing the story content 700 with the converted text data 710. The speech recognition server 120 may determine a word displayed in red in the text data 710 as an error.

이러한 미디어 재생 장치(110) 또는 음성 인식 서버(120)는 사용자(100)가 스토리 컨텐츠의 어느 부분을 읽고 있는지를 추적하여 음향 효과를 제공하기 위해, 텍스트 데이터(710)에 오류가 포함되어 있다고 하더라도 보정 알고리즘을 통해 스토리 컨텐츠의 내용을 유추할 수 있어야 한다. Even if the text data 710 contains an error in order to provide a sound effect by tracking which part of the story content the user 100 is reading, the media playback device 110 or the speech recognition server 120 The content of the story content must be inferred through a correction algorithm.

도 7b는 본 발명의 일 실시예에 따른 스토리 컨텐츠 및 텍스트 데이터의 비교를 통해 텍스트 데이터에 포함된 오류를 보정하는 과정을 설명하기 위한 예시적인 도면이다. 도 7b를 참조하면, 스토리 컨텐츠(720)의 내용이 "나무꾼은 호랑이가 무서웠지만 아파하는 호랑이를 내버려둘 수 없었어요"라는 부분을 사용자(100)가 음성 발화한 경우, 음성 인식 서버(120)는 이를 정확히 인지하지 못하고 "호랑이가 무서워지만 아빠는 호랑이를 내버려 둘 수 없어요"라고 인식하여 텍스트 데이터(730)로 변환할 수 있다. FIG. 7B is an exemplary diagram for explaining a process of correcting an error included in text data by comparing story content and text data according to an embodiment of the present invention. Referring to FIG. 7B, when the user 100 makes a voice utterance in the content of the story content 720 "The woodcutter was afraid of a tiger, but could not leave a sick tiger alone", the voice recognition server 120 Does not correctly recognize this, and recognizes "I am afraid of a tiger, but my father cannot leave the tiger alone" and converts it into text data 730.

종래의 방법으로 오류가 포함된 텍스트 데이터(730)를 단순히 스트링 비교만을 통해 스토리 컨텐츠(720)에서 문장을 검색할 경우 원하는 검색 결과를 얻을 수 없으므로, 음성 인식 서버(120)는 오류 보정 알고리즘으로 O(ND) Diff 알고리즘을 이용하여 스토리 컨텐츠(720) 및 텍스트 데이터(730) 간의 유사도가 가장 높은 문장을 검색하여 오류를 보정(740)할 수 있다. 음성 인식 서버(120)는 유사도가 임계값 이상인 경우, 해당 문장을 선택하고, 유사도가 임계값 이상인 문장이 복수개가 검색된 경우(예를 들어, 동일 문장이 1문단, 3문단, 5문단 등에서 검색된 경우), 문맥 추적 알고리즘을 이용하여 가장 알맞은 문장을 선택하여 오류를 보정(740)할 수 있다. If the text data 730 containing the error is searched for a sentence in the story content 720 by simply comparing the text data 730 in the conventional method, the desired search result cannot be obtained. An error may be corrected (740) by searching for a sentence having the highest similarity between the story content 720 and the text data 730 using the (ND) Diff algorithm. When the similarity is greater than or equal to the threshold value, the speech recognition server 120 selects a corresponding sentence, and when a plurality of sentences with similarity greater than or equal to the threshold value are searched (for example, when the same sentence is searched in 1 paragraph, 3 paragraphs, 5 paragraphs, etc.) ), an error may be corrected 740 by selecting the most appropriate sentence using a context tracking algorithm.

도 8a 및 도 8c는 본 발명의 일 실시예에 따른 스토리 컨텐츠로부터 음향 효과를 추출하는 과정을 설명하기 위한 예시적인 도면이다. 8A and 8C are exemplary diagrams for explaining a process of extracting a sound effect from story content according to an embodiment of the present invention.

도 8a는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 제공되는 음향 효과를 도시한 예시적인 도면이다. 도 8a를 참조하면, 미디어 재생 장치(110)에서 제공되는 음향 효과는 배경음(800) 및 효과음(810)을 포함할 수 있다. 배경음(800)은 하나의 문단 내용에 대응되는 음향 효과로서 예를 들어, 군중소리(801), 정글소리(802) 등을 포함하고, 효과음(810)은 특정 문장 내용에 대응되는 음향 효과로서 따릉따릉소리(811), 노젓는 소리(812) 등을 포함할 수 있다. 8A is an exemplary diagram illustrating sound effects provided by a media playback device according to an embodiment of the present invention. Referring to FIG. 8A, sound effects provided by the media player 110 may include a background sound 800 and a sound effect 810. The background sound 800 is a sound effect corresponding to the content of one paragraph and includes, for example, a crowd sound 801, a jungle sound 802, and the like, and the sound effect 810 is a sound effect corresponding to a specific sentence content. It may include a tareung sound 811, a rowing sound 812, and the like.

도 8b는 본 발명의 일 실시예에 따른 계층 구조로 구성된 스토리 컨텐츠로부터 음향 효과를 추출하는 과정을 설명하기 위한 예시적인 도면이다. 도 8b를 참조하면, 스토리 컨텐츠의 XML 파일은 Class에 스토리 컨텐츠를 나타내는 ClassName과 언어를 나타내는 Language 파라미터로 정의될 수 있다. 이 때, 음성 인식 서버(120)는 음성 인식 언어를 동적으로 변경할 수 없으므로, 사용자(100)가 말한 스토리 컨텐츠의 제목을 ClassName과 비교하여 해당 스토리 컨텐츠를 준비하고, 해당 스토리 컨텐츠에 정의된 Language 파라미터를 통해 인식 언어를 설정 및 변경할 수 있다. 예를 들어, 국문 스토리 컨텐츠일 경우 Language 파라미터를 "ko"로 설정하고, 영어 스토리 컨텐츠일 경우 Language 파라미터를 "en'으로 설정할 수 있다. 이를 통해, 본 발명은 Language 파라미터를 이용하여 음향 효과 인터렉티브 서비스를 영어 스토리 컨텐츠 서비스로 확장이 가능함으로써, 스토리 컨텐츠 서비스 시장을 확대시킬 수 있다. 8B is an exemplary diagram for explaining a process of extracting a sound effect from story content configured in a hierarchical structure according to an embodiment of the present invention. Referring to FIG. 8B, an XML file of story content may be defined as a ClassName representing story content in Class and a Language parameter representing language. At this time, since the speech recognition server 120 cannot dynamically change the speech recognition language, the title of the story content spoken by the user 100 is compared with the ClassName to prepare the corresponding story content, and the Language parameter defined in the corresponding story content You can set and change the recognition language through. For example, in case of Korean story content, the Language parameter may be set to “ko”, and in case of English story content, the Language parameter may be set to “en.” Through this, the present invention uses the Language parameter to set the sound effect interactive service The story content service market can be expanded by expanding the English story content service.

또한, 영어 스토리 컨텐츠 및 사용자(100)의 음성 인식 결과의 유사도 레벨을 변경하여, 영어 스토리 컨텐츠 읽기의 난이도를 조절함으로써, 어린이의 영어 스토리 컨텐츠 읽기의 흥미를 점진적으로 발전시킬 수도 있다. 예를 들어, 사용자(100)의 음성 인식 결과와 스토리 컨텐츠의 문장을 1:1로 비교하여 0%~100% 사이의 유사성을 난이도로 조정하게끔 함으로써, 학습 효과를 제공할 수 있다. In addition, by changing the similarity level between the English story content and the voice recognition result of the user 100 to adjust the difficulty of reading the English story content, children's interest in reading English story content may be gradually developed. For example, by comparing the result of speech recognition of the user 100 with the sentence of the story content 1:1, the similarity between 0% and 100% is adjusted to a degree of difficulty, thereby providing a learning effect.

미디어 재생 장치(110) 또는 음성 인식 서버(120)는 배경음과 효과음을 효과적으로 재생하기 위해 스토리 컨텐츠를 문단과 문장을 계층적으로 구분하여 관리할 수 있다. 이 때, 스토리 컨텐츠는 문단과 문장이 계층적 구조로 표현된 XML 파일을 통해, 어느 시점에 어떤 음을 추출해야 하는지를 알 수 있게 하여 그에 대응하는 음향 효과를 추출할 수 있도록 한다.The media playback device 110 or the voice recognition server 120 may divide and manage the story content hierarchically by classifying paragraphs and sentences in order to effectively reproduce background sounds and sound effects. In this case, the story content enables you to know which sound should be extracted at what point in time through an XML file in which paragraphs and sentences are expressed in a hierarchical structure, so that sound effects corresponding thereto can be extracted.

예를 들어, "곰돌이가 자전거를 타고 가요/ 곰돌아 어디가니/ 시장에 갈 거야"(830)라는 문단이 사용자(100)로부터 발화되어 미디어 재생 장치에(120)에 입력되는 동안 '군중소리'(820)가 배경음으로 재생되도록 하고, 각각의 문장 중 "곰돌이가 자전거를 타고 가요."(831)에 해당하는 문장이 사용자(100)로부터 발화되어 미디어 재생 장치에(120)에 입력되는 동안 '따릉따릉소리'(832)가 재생되도록 할 수 있다. For example, while the paragraph 830, "I'm going to ride a bicycle / where are you going to go to the market" (830), is uttered by the user 100 and input to the media player 120, the'crowd sound' While 820 is played as a background sound, a sentence corresponding to “Bear is going on a bicycle” 831 among each sentence is uttered by the user 100 and inputted to the media player 120 while being ' Ttareung Ttareungsori' 832 may be played.

다른 예를 들어, "곰돌이가 뗏목을 타고 가요/ 곰돌아 어디가니?/ 정글에 갈거야."(850)라는 문단이 사용자(100)로부터 발화되어 미디어 재생 장치에(120)에 입력되는 동안 '정글소리'(840)가 배경음으로 재생되도록 하고, 각각의 문장 중 "곰돌이가 뗏목을 타고 가요."(851)에 해당하는 문장이 사용자(100)로부터 발화되어 미디어 재생 장치에(120)에 입력되는 동안 '노젓는 소리'(852)가 재생되도록 할 수 있다. For another example, while the paragraph 850 is ignited by the user 100 and input to the media playback device 120,'Where is the bear going on the raft/ Where is the bear going?/ Going to the jungle.' Jungle sound' 840 is played as a background sound, and a sentence corresponding to "Bear Bear is going on a raft" 851 among each sentence is uttered by the user 100 and inputted to the media player 120 During this, the'roaring sound' 852 may be played.

이와 같이, 본 발명은 스토리 컨텐츠의 문단과 문장을 계층적 구조로 구분하여 관리함으로써 다음과 같은 효과를 얻을 수 있다. 예를 들어, 특정 문장을 인식하지 못하는 예외 상황이 발생하더라도, 해당 문단의 배경음을 추출할 수 있다. 종래의 방법을 이용하는 경우, "곰돌이가 자전거를 타고 가요"라는 문장에 배경음인 "군중소리.wav"를 지정하였으나, 변환된 텍스트에 오류가 포함됨으로써 "곰돌이가 자전거를 타고 가요"라는 문장을 인식하지 못하게 되는 경우, 다음 문장인 "곰돌아 어디가니?"와 "시장에 갈거야."에서 "군중소리.wav"의 재생 시점을 놓치게 된다. 그러나 본 발명의 방법을 이용하는 경우, 문단이 계층 구조로 구성됨으로써, "곰돌이가 자전거를 타고 가요"라는 문장을 놓치게 되더라도 "곰돌아 어디가니?" 또는 "시장에 갈거야."라는 문장이 인식될 때, 사용자(100)가 첫번째 문단을 발화하고 있음을 인지하여 "군중소리.wav"를 재생할 수 있게 된다. As described above, the present invention can obtain the following effects by dividing and managing paragraphs and sentences of story content into a hierarchical structure. For example, even if an exceptional situation in which a specific sentence is not recognized occurs, the background sound of the corresponding paragraph may be extracted. In the case of using the conventional method, the background sound "crowd sound.wav" was specified in the sentence "Bear is going on a bicycle", but the sentence "Bear is going on a bicycle" is recognized as an error is included in the converted text. If you can't, you miss the point of playing "crowd sound.wav" in the following sentences, "Where are you going to go to the bear?" and "I'm going to the market." However, in the case of using the method of the present invention, the paragraphs are organized in a hierarchical structure, so even if the sentence "Going bear goes on a bicycle" is missed, "Where are you going around the bear?" Or, when the sentence "I'm going to the market" is recognized, the user 100 recognizes that the first paragraph is uttering, so that the "crowd sound.wav" can be reproduced.

또한, 스토리 컨텐츠 내에 동일 문장이 반복되더라도 문맥의 흐름을 파악할 수 있다는 장점을 갖는다. 예를 들어, 스토리 컨텐츠를 구성하는 서로 다른 제 1 문단 및 제 2 문단에 동일한 문장인 "곰돌아 어디가니?"라는 문장이 중복되어 있는 경우, 오류 보정 알고리즘에 의해 "곰돌아 어디가니?"라는 사용자(100)의 발화에 대해 2개의 후보 문장을 선정하더라도, 이전에 인식된 문장이 제 1 문단에 기재된 "곰돌이가 자전거를 타고 가요"인지 또는 제 2 문단에 기재된 "곰돌이가 뗏목을 타고 가요"인지를 판단하여 2 개의 "곰돌아 어디가니?" 중 문맥에 맞는 최적의 문장을 선택할 수 있다. In addition, even if the same sentence is repeated in the story content, the flow of the context can be grasped. For example, if the same sentence "Where are you going?", which is the same sentence, is duplicated in different first and second paragraphs constituting the story content, the error correction algorithm says "Where are you going?" Even if two candidate sentences are selected for the user's 100 utterance, whether the previously recognized sentence is "Bear is going on a bicycle" in the first paragraph or "Bear is going on a raft" in the second paragraph. Judging whether it is, two "Where are you going around the bear?" Among them, you can select the best sentence for the context.

이를 위해, 유사도 가중치 파라미터와 디스턴스 가중치 파라미터를 이용할 수 있다. 유사도 가중치 파라미터는 문장의 유사도를 측정하고, 가중치 파라미터는 마지막 인식 문장에서 현재 위치까지 얼마나 떨어져 있는지를 판별할 수 있다. 예를 들어, "곰돌아 어디가니?"라는 음성 인식 결과가 나온 경우, 제 1 및 제 2 문단에 동일하게 존재하는 "곰돌아 어디가니?"라는 문장과 유사도는 동일하게 측정될 수 있으나, 바로 이전에 인식된 문장이 제 1 문단에 포함된 "곰돌이가 자전거를 타고가요"인지 또는 제 2 문단의 "곰돌이가 뗏목을 타고 가요"인지에 따라 디스턴스 가중치가 달라지므로 좀더 문맥에 맞는 최적의 문장을 선택할 수 있게 된다.To this end, a similarity weight parameter and a distance weight parameter may be used. The similarity weight parameter measures the similarity of a sentence, and the weight parameter can determine how far away from the last recognized sentence to the current position. For example, if the result of speech recognition is "Where are you going?", the similarity to the sentences of "Where are you going?" can be measured in the same way, but immediately The distance weight changes depending on whether the previously recognized sentence is "Is Bear Riding a Bike" in the first paragraph or "Is Bear Riding a Raft" in the second paragraph, so the optimal sentence that fits the context more You can choose.

즉, 스토리 컨텐츠가 문단 및 문장으로 계층화됨으로써, 문맥 추적 알고리즘을 이용하여 사용자(100)가 어느 문단을 읽고 있는 지를 판단할 수 있게 된다. That is, as the story content is layered into paragraphs and sentences, it is possible to determine which paragraph the user 100 is reading using a context tracking algorithm.

도 8c는 본 발명의 일 실시예에 따른 확장된 계층 구조로 구성된 스토리 컨텐츠로부터 음향 효과를 추출하는 과정을 설명하기 위한 예시적인 도면이다. 도 8c를 참조하면, 확장된 계층 구조로 구성된 스토리 컨텐츠는 상위 문단(860 및 880)에 배경음을 정의하는 복수의 문단(870, 875, 890, 895)을 포함할 수 있다. 이 때, 상위 문단(860 및 880)은 조명의 밝기를 달리하는 명령어를 각각 포함함으로써 낮(예를 들어, LightLevel="90")과 밤(예를 들어, LightLevel="15")을 구분되도록 할 수 있다. 이외에도, 확장된 계층 구조로 구성된 스토리 컨텐츠를 통해 다양한 효과를 컨텐츠에 포함시킬 수 있다. 8C is an exemplary diagram for explaining a process of extracting a sound effect from story content configured with an extended hierarchical structure according to an embodiment of the present invention. Referring to FIG. 8C, the story content configured in an extended hierarchical structure may include a plurality of paragraphs 870, 875, 890, and 895 defining a background sound in upper paragraphs 860 and 880. In this case, the upper paragraphs 860 and 880 each include a command for varying the brightness of the light so that day (eg, LightLevel="90") and night (eg, LightLevel="15") are distinguished. can do. In addition, various effects can be included in the content through the story content configured in an extended hierarchical structure.

도 9a 및 도 9b는 본 발명이 일 실시예에 따른 스토리 컨텐츠로부터 조명 효과를 제공하는 과정을 설명하기 위한 예시적인 도면이다. 9A and 9B are exemplary diagrams for explaining a process of providing a lighting effect from story content according to an exemplary embodiment of the present invention.

도 9a는 본 발명의 일 실시예에 따른 스토리 컨텐츠와 텍스트 데이터의 문장이 일치한 경우 조명 효과를 추출하는 과정을 설명하기 위한 예시적인 도면이다. 도 9a를 참조하면, 미디어 재생 장치(110) 또는 음성 인식 서버(120)는 제 1 문장에 해당하는 "덜커덩하고 문이 열리며 어마어마하게 큰 거인이 나타났어요"(900)라는 문장에 대해 'LEDAction=twinkle', 'LEDColor=255.0.0', 'LEDRepeat=3'으로 추출하고, 제 3 문장에 해당하는 "거인은 자루를 풀러 암탉 한 마리를 꺼냈어요."(910)라는 문장에 대해 'LEDAction=loop', 'LEDColor=255.255.0', 'LEDRepeat=2'으로 추출할 수 있다. 9A is an exemplary diagram for explaining a process of extracting a lighting effect when a sentence of story content and text data match according to an embodiment of the present invention. Referring to FIG. 9A, the media playback device 110 or the voice recognition server 120 responds to the sentence 900 corresponding to the first sentence, "The door was open and a huge giant appeared" 900. LEDAction=twinkle','LEDColor=255.0.0', and'LEDRepeat=3' were extracted, and the sentence "The giant unwrapped the sack and took out a hen" (910) corresponding to the 3rd sentence was' It can be extracted with LEDAction=loop','LEDColor=255.255.0', and'LEDRepeat=2'.

도 9b는 본 발명의 일 실시예에 따른 조명 효과를 제공하는 과정을 설명하기 위한 예시적인 도면이다. 도 9b를 참조하면, 스토리 컨텐츠(930)의 내용과 사용자(100)가 발화한 스토리 컨텐츠의 문장이 동일한 경우, 미디어 재생 장치(110)는 그에 대응하는 적절한 음향 효과(940) 및 조명 효과(945)를 제공할 수 있다. 9B is an exemplary diagram for explaining a process of providing a lighting effect according to an embodiment of the present invention. Referring to FIG. 9B, when the content of the story content 930 and the sentence of the story content uttered by the user 100 are the same, the media playback device 110 provides appropriate sound effects 940 and lighting effects 945 corresponding thereto. ) Can be provided.

예를 들어, 사용자(100)가 발화한 스토리 컨텐츠의 문장이 "히히히히힝! 따그닥따그닥! 다그닥따그닥! "앗 마차다! 어! 위험해! 소녀는 급히 달려오는 마차를 피하려가 그만 넘어지고 말았어요"이고, 스토리 컨텐츠의 내용(935)과 동일한 경우, 미디어 재생 장치(110)는 "히히히히힝! 따그닥따그닥! 따그닥따그닥!" 문장에 말 울음 소리 및 말 걸음 소리를 음향 효과를 출력할 수 있다. 또한, 미디어 재생 장치(110)는 "앗 마차다! 어! 위험해! 소녀는 급히 달려오는 마차를 피하려다" 문장에 마차 소리를 음향 효과를 출력할 수 있다. 또한, 미디어 재생 장치(110)는 "그만 넘어지고 말았어요" 문장에 사람이 넘어지는 소리를 음향 효과로 출력하고, 조명이 깜빡여지도록 직접 출력하거나, 다른 기기(950)와의 연동을 통해 조명이 깜빡여지도록 다른 기기(950)를 제어할 수 있다. For example, the sentence of the story content uttered by the user 100 is "Hehehehehehehehehe! uh! Dangerous! The girl stopped falling down while trying to avoid the rushing carriage", and in the case of the same as the content 935 of the story content, the media playback device 110 said "Hehehehehehe! Tag-d-d-d-d-o-g! Sound effects may be output to the sound of crying words and walking sounds in the sentence. uh! Dangerous! The sound effect of the carriage sound may be output in the sentence "The girl tries to avoid the rushing carriage". In addition, the media playback device 110 uses the sound effect of the person falling in the sentence "Stop falling over" as a sound effect. The other device 950 may be controlled so that the light may be output, and the light may be directly output so that the light flickers, or the light may be flickered through interlocking with the other device 950.

도 10은 본 발명의 일 실시예에 따른 스토리 컨텐츠에 대한 문맥 기반 효과음 및 키워드 기반 효과음을 비교한 예시적인 도면이다. 도 10을 참조하면, 키워드 기반 효과음(1010)은 '호랑이', '나무를 하다'와 같이 단편적인 단어만을 추출하여 상황에 맞는 효과적인 음향 효과의 제공에 제약이 있을 수 있다. 10 is an exemplary diagram comparing a context-based sound effect and a keyword-based sound effect for story content according to an embodiment of the present invention. Referring to FIG. 10, the keyword-based sound effect 1010 may have limitations in providing an effective sound effect suitable for a situation by extracting only fragmentary words such as'tiger' and'do tree'.

그러나 본원 발명에서 제안하는 문맥 기반 효과음(1000)은 문장의 내용/의미/문맥에 맞는 효과를 제공함으로써, 좀더 자연스러운 음향 효과를 제공할 수 있다. However, the context-based sound effect 1000 proposed by the present invention may provide a more natural sound effect by providing an effect suitable for the content/meaning/context of a sentence.

도 1 내지 도 10을 통해 설명된 미디어 재생 장치 및 음성 인식 서버에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 10을 통해 설명된 미디어 재생 장치 및 음성 인식 서버에서 스토리 컨텐츠에 대한 음향 효과를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. A method of providing sound effects for story content in a media playback apparatus and a voice recognition server described with reference to FIGS. 1 to 10 is a recording medium including a computer program stored in a medium executed by a computer or instructions executable by a computer. It can also be implemented in the form of. In addition, the method of providing sound effects for story content in the media playback device and the voice recognition server described with reference to FIGS. 1 to 10 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

110: 미디어 재생 장치
120: 음성 인식 서버
210: 관리부
220: 입력부
230: 전송부
240: 수신부
250: 오류 보정부
260: 문맥 추적부
270: 음향 효과 추출부
280: 음향 효과 제공부
290: 제어부
510: 수신부
520: 변환부
530: 오류 보정부
540: 음향 효과 추출부
550: 음향 효과 제공부110: media playback device
120: speech recognition server
210: management
220: input unit
230: transmission unit
240: receiver
250: error correction unit
260: context tracking unit
270: sound effect extraction unit
280: sound effect providing unit
290: control unit
510: receiver
520: conversion unit
530: error correction unit
540: sound effect extraction unit
550: sound effect providing unit

Claims

In a media playback device providing sound effects for story content,
An input unit for receiving a voice uttered by a user for the content of the previously stored story content;
A transmission unit for transmitting the input voice data to a voice recognition server;
A receiver configured to receive text data converted based on the voice data from the voice recognition server;
A sound effect extraction unit for comparing the previously stored story content and the received text data to extract a sound effect corresponding to the text data from the previously stored story content; And
A media playback device comprising a sound effect providing unit providing the extracted sound effect.

The method of claim 1,
The media playback device further comprising an error correction unit for comparing the text data with the previously stored story content to correct an error included in the text data.

The method of claim 2,
Wherein the error correction unit searches for a sentence having a similarity between a sentence of the text data and a sentence of the pre-stored story content exceeding a threshold value by applying an error correction algorithm.

The method of claim 3,
The error correction unit corrects an error included in the text data based on a sentence of the retrieved story content when a sentence having a similarity between the sentence of the text data and the sentence of the pre-stored story content is searched for Playback device.

The method of claim 3,
The error correction unit selects any one of the plurality of sentences using a context tracking algorithm when a plurality of sentences having a similarity between sentences of the converted text data and sentences of the pre-stored story content are searched for more than a threshold value, and the selected Correcting errors included in the text data based on sentences.

The method of claim 1,
The media playback device further comprises a management unit that receives a request for any one of a plurality of story contents from the user, and downloads and manages the requested story content from a content providing server.

The method of claim 6,
The story content is layered with at least one paragraph in which the story content is described and at least one sentence included in each paragraph.

The method of claim 7,
The media playback apparatus further comprising a context tracking unit for tracking the context of the story content corresponding to the text data in the story content based on the layered story content.

The method of claim 8,
When at least two sentences corresponding to the text data are included in the story content, the context tracking unit applies a similarity weighting parameter and a distance weighting parameter among two or more sentences corresponding to the text data to correspond to the text data. To extract the sentence, media playback device.

The method of claim 9,
The sound effect extracting unit extracts a sound effect corresponding to the extracted sentence from the pre-stored story content.

The method of claim 1,
The media playback device further comprising a control unit for controlling through interworking with a plurality of devices to provide a vibration effect or a lighting effect corresponding to the text data.

In a media playback device providing sound effects for story content,
An input unit for receiving a voice uttering the content of the story content from a user;
A transmission unit for transmitting voice data related to the input voice to a voice recognition server;
A receiver configured to receive a sound effect corresponding to the transmitted voice data from the voice recognition server; And
Including a sound effect providing unit for providing the received sound effect,
The voice data is converted into text data by the voice recognition server,
The sound effect is that by comparing the converted text data and the story content, a sound effect corresponding to the text data is extracted.

In a voice recognition server providing sound effects for story content,
A receiving unit for receiving voice data in which the content of the story content is spoken by a user from the media playback device;
A conversion unit converting the received voice data into text data;
A sound effect extracting unit for comparing the story content and the converted text data to extract a sound effect corresponding to the text data from the story content; And
A voice recognition server comprising a sound effect providing unit for providing the extracted sound effect to the media player.

The method of claim 13,
The speech recognition server further comprises an error correction unit for comparing the text data with the story content and correcting an error included in the text data.

The method of claim 14,
The error correction unit searches for a sentence having a similarity between the text data sentence and the story content sentence greater than or equal to a threshold value by applying an error correction algorithm.

The method of claim 15,
The error correction unit corrects an error included in the text data based on a sentence of the searched story content when a sentence having a similarity between the sentence of the text data and the sentence of the story content is greater than or equal to a threshold value is searched. .

The method of claim 16,
The error correction unit selects one of the plurality of sentences using a context tracking algorithm when a plurality of sentences having a similarity of the sentence of the converted text data and the sentence of the story content value equal to or greater than a threshold value are searched, and the selected sentence To correct errors included in the text data based on the speech recognition server.

The method of claim 13,
The story content is layered with at least one paragraph in which the story content is described and at least one sentence included in each paragraph.

The method of claim 18,
The story content is configured to have an extended hierarchical structure such that the at least one layered paragraph and at least one sentence included in each paragraph are included in a predetermined upper paragraph.