KR20200072575A

KR20200072575A - Apparatus and method for providing e-book service

Info

Publication number: KR20200072575A
Application number: KR1020180150863A
Authority: KR
Inventors: 박미화; 정달영; 두일철; 정의정; 장유진
Original assignee: 동국대학교 산학협력단
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-23
Also published as: KR102201153B1

Abstract

The present invention relates to a technology for providing an e-book service, and more specifically, to a technology for providing an e-book service that provides image captions and page summary functions. According to an embodiment of the present invention, an image that is difficult for the visually-impaired to understand with only alternative text can be effectively transmitted to the visually-impaired.

Description

E-book service providing device and method{APPARATUS AND METHOD FOR PROVIDING E-BOOK SERVICE}

본 발명은 전자책 서비스 제공 기술에 관한 것으로, 보다 구체적으로 이미지 캡션과 페이지 요약 기능을 제공하는 전자책 서비스 제공 기술에 관한 것이다. The present invention relates to an e-book service providing technology, and more particularly, to an e-book service providing technology that provides an image caption and a page summary function.

다양한 스마트 기기의 출현과 함께 문서의 디지털화가 진행되면서, 전자책의 사용이 급증하고 있다. 특히, TTS를 이용한 전자책의 읽어주기 기능으로 인해 사용자는 눈으로 책을 보지 않아도 책을 읽을 수 있게 되었다. 이로 인해, 다양한 정보나 자료가 부족하였던 시각장애인들도 전자책을 자유롭게 청취할 수 있게 되었다. 더욱이, 시각장애인들의 편의를 제공하기 위한 전자책 관련 기술들이 많이 연구되고 있다.With the advent of various smart devices, the digitization of documents is progressing, and the use of e-books is rapidly increasing. In particular, due to the ability to read e-books using TTS, users can read books without looking through them. Due to this, even visually impaired people who lacked various information or materials can freely listen to e-books. Moreover, many e-book related technologies have been studied to provide convenience for the visually impaired.

읽어주기 기능은 페이지의 처음부터 마지막 문자까지 음성으로 출력하는 기능이다. 다만, 음성으로 출력하는 읽어주기 기능의 특성 상 음성이 한번 재생되면 다시 들을 수 없어, 다시 듣기 위해서는 페이지의 처음부터 다시 재생해야 하는 불편함이 존재한다. 즉, 시각 장애인들은 어떤 페이지의 내용을 다시 듣고 싶은 경우, 찾고 싶은 페이지를 찾을 때까지 많은 페이지의 내용을 다시 들어가며 많은 시간을 들여야 하는 문제가 있다. The reading function is a function that outputs the voice from the first to the last text on the page. However, due to the nature of the read-out function that is output through voice, once the voice is played, it cannot be heard again, and there is a inconvenience in that it must be played again from the beginning of the page to listen again. In other words, if the visually impaired person wants to listen to the content of a page again, there is a problem in that he/she has to spend a lot of time re-entering the content of many pages until the page to be searched for is found.

본 발명에 대한 선행기술로는 등록특허 제10-1789057호가 있다.Prior art for the present invention is Patent No. 10-1789057.

본 발명은 시각장애인이 대체 텍스트만으로 이해하기 어려운 이미지를 효과적으로 시각장애인에게 효과적으로 전달하는 전자책 서비스 제공 장치 및 방법을 제공한다.The present invention provides an apparatus and method for providing an e-book service that effectively delivers an image that is difficult for a visually impaired person to understand with only alternative text.

또한 본 발명은 각 페이지를 자동으로 요약하여 음성으로 제공함으로써 사용자의 각 페이지에 대한 접근성을 향상시키는 전자책 서비스 제공 장치 및 방법을 제공한다.In addition, the present invention provides an apparatus and method for providing an e-book service that automatically improves access to each page of a user by automatically summarizing each page and providing a voice.

본 발명의 일 측면에 따르면, 전자책 서비스 제공 장치가 제공된다.According to an aspect of the present invention, an apparatus for providing an e-book service is provided.

본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 사용자 음성 신호를 입력 받는 입출력부, 음성 인식 데이터베이스를 포함하는 메모리 및 상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하고, 상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 제어부를 포함하되, 상기 제어부는 상기 입출력부를 통해 상기 요약 데이터 및 상기 이미지 캡션 중 하나 이상을 음성으로 출력할 수 있다.The apparatus for providing an e-book service according to an embodiment of the present invention accesses a page according to the user voice signal by using an input/output unit that receives a user voice signal, a memory including a voice recognition database, and the voice recognition database, and the It includes a control unit for generating one or more of the summary data and the image caption corresponding to the page, the control unit may output one or more of the summary data and the image caption through the input and output unit by voice.

상기 음성 인식 데이터베이스는 음성 명령어를 포함하고, 상기 제어부는 상기 사용자 음성 신호가 상기 음성 명령어에 해당하는 경우, 상기 음성 명령어에 대응하는 상기 페이지에 접근할 수 있다. The voice recognition database includes a voice command, and when the user voice signal corresponds to the voice command, the control unit may access the page corresponding to the voice command.

상기 제어부는 상기 사용자 음성 신호가 상기 음성 명령어에 해당하지 않는 경우, 상기 사용자 음성 신호에 대응하는 검색어를 포함하는 페이지를 검색하여 접근할 수 있다. When the user voice signal does not correspond to the voice command, the control unit may search and access a page including a search word corresponding to the user voice signal.

상기 제어부는 상기 페이지에 이미지가 포함되어 있는 경우, 상기 이미지 캡션을 생성할 수 있다. When the image is included in the page, the controller may generate the image caption.

상기 제어부는 상기 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성하고, 상기 특징 벡터를 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 입력하여 상기 이미지 캡션을 생성할 수 있다. The controller generates a feature vector by inputting the image into a convolutional neural network (CNN), and generates the image caption by inputting the feature vector into a recurrent neural network (RNN)/long-short term memory (LSTM) network. Can.

상기 제어부는 상기 페이지의 텍스트에 포함된 각 문장에 대해 텍스트 랭크를 산출하고, 텍스트 랭크가 가장 높은 문장을 상기 요약 데이터로 생성할 수 있다. The control unit may calculate a text rank for each sentence included in the text of the page, and generate a sentence with the highest text rank as the summary data.

본 발명의 다른 측면에 따르면, 전자책 서비스 제공 장치에서 전자책 서비스 제공 방법 및 이를 실행하는 컴퓨터 프로그램이 저장된 컴퓨터가 판독 가능한 기록매체가 제공된다. According to another aspect of the present invention, there is provided a method for providing an e-book service in an e-book service providing apparatus and a computer-readable recording medium storing a computer program executing the same.

본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치에서 전자책 서비스 제공 방법 및 이를 실행하는 컴퓨터 프로그램이 저장된 컴퓨터가 판독 가능한 기록매체는 사용자 음성 신호를 입력 받는 단계, 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계, 상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계 및 상기 요약 데이터 및 상기 이미지 캡션 중 하나 이상을 음성으로 출력하는 단계를 포함할 수 있다. A method for providing an e-book service in a device for providing an e-book service according to an embodiment of the present invention and a computer-readable recording medium storing a computer program executing the same, receiving a user's voice signal, and the user using a voice recognition database The method may include accessing a page according to a voice signal, generating one or more of summary data and image captions corresponding to the page, and outputting one or more of the summary data and the image captions by voice.

상기 음성 인식 데이터베이스는 음성 명령어를 포함하고, 상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계는, 상기 사용자 음성 신호가 상기 음성 명령어에 해당하는 경우, 상기 음성 명령어에 대응하는 상기 페이지에 접근하는 단계일 수 있다. The voice recognition database includes a voice command, and accessing a page according to the user voice signal using the voice recognition database includes: when the user voice signal corresponds to the voice command, corresponding to the voice command It may be a step of accessing the page.

상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계는, 상기 사용자 음성 신호가 상기 음성 명령어에 해당하지 않는 경우, 상기 사용자 음성 신호에 대응하는 검색어를 포함하는 페이지를 검색하여 접근하는 단계일 수 있다. The step of accessing the page according to the user's voice signal using the voice recognition database may include searching and accessing a page including a search word corresponding to the user voice signal when the user voice signal does not correspond to the voice command. It may be a step.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 페이지에 이미지가 포함되어 있는 경우, 상기 이미지 캡션을 생성하는 단계일 수 있다. The generating of one or more of summary data and image captions corresponding to the page may be a step of generating the image caption when the page includes an image.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성하고, 상기 특징 벡터를 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 입력하여 상기 이미지 캡션을 생성하는 단계일 수 있다. The generating of one or more of summary data and image captions corresponding to the page generates a feature vector by inputting the image into a convolutional neural network (CNN), and the feature vector is a Recurrent Neural Network (RNN)/LSTM (long). -short term memory) may be input to a network to generate the image caption.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 페이지의 텍스트에 포함된 각 문장에 대해 텍스트 랭크를 산출하고, 텍스트 랭크가 가장 높은 문장을 상기 요약 데이터로 생성하는 단계일 수 있다. Generating one or more of the summary data and the image caption corresponding to the page is a step of calculating a text rank for each sentence included in the text of the page, and generating the sentence with the highest text rank as the summary data Can.

상술한 바와 같이 본 발명의 일 실시 예에 따르면, 시각장애인이 대체 텍스트만으로 이해하기 어려운 이미지를 효과적으로 시각장애인에게 효과적으로 전달할 수 있다.As described above, according to an embodiment of the present invention, an image that is difficult for the visually impaired to understand with only alternative text can be effectively delivered to the visually impaired.

또한 본 발명의 일 실시 예에 따르면, 각 페이지를 자동으로 요약하여 음성으로 제공함으로써 사용자의 각 페이지에 대한 접근성을 향상시킬 수 있다.In addition, according to an embodiment of the present invention, by automatically summarizing each page and providing it by voice, access to each page of the user can be improved.

도 1은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치를 예시한 블록도.
도 2는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 간략히 예시한 순서도.
도 3은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 예시한 순서도.
도 4는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 개념적으로 예시한 도면.
도 5는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 예시한 순서도.
도 6은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 개념적으로 예시한 도면.1 is a block diagram illustrating an e-book service providing apparatus according to an embodiment of the present invention.
2 is a flowchart briefly illustrating a process in which an e-book service providing apparatus according to an embodiment of the present invention provides an e-book service.
3 is a flowchart illustrating a process of generating summary data by an e-book service providing apparatus according to an embodiment of the present invention.
4 is a diagram conceptually illustrating a process in which the e-book service providing apparatus according to an embodiment of the present invention generates summary data.
5 is a flowchart illustrating a process in which an e-book service providing apparatus performs image captioning according to an embodiment of the present invention.
6 is a diagram conceptually illustrating a process in which an e-book service providing apparatus according to an embodiment of the present invention performs image captioning.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명하도록 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 또한, 어떤 부분이 어떤 구성 요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains may easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, when a part includes a certain component, this means that other components may be further included instead of excluding other components unless otherwise specified.

도 1은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치를 예시한 블록도이고, 도 2는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 간략히 예시한 순서도이다.1 is a block diagram illustrating an e-book service providing apparatus according to an embodiment of the present invention, and FIG. 2 briefly illustrates a process in which the e-book service providing apparatus according to an embodiment of the present invention provides an e-book service It is a flowchart.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 입출력부(110), 제어부(120) 및 메모리(130)를 포함한다.Referring to FIG. 1, an apparatus for providing an e-book service according to an embodiment of the present invention includes an input/output unit 110, a control unit 120, and a memory 130.

입출력부(110)는 사용자로부터 음성 신호(이하, 사용자 음성 신호라 지칭)를 입력 받고, 텍스트를 변환한 음성을 스피커를 통해 출력하거나 텍스트를 디스플레이를 통해 출력한다. 이 때, 입출력부(110)는 마이크 등의 음성 신호를 입력하는 장치, 스피커 등의 음성을 출력하는 장치 및 디스플레이와 전기적으로 연결될 수 있다.The input/output unit 110 receives a voice signal (hereinafter, referred to as a user voice signal) from a user, and outputs the text-converted voice through a speaker or a text through a display. At this time, the input/output unit 110 may be electrically connected to a device for inputting a voice signal such as a microphone, a device for outputting a voice such as a speaker, and a display.

제어부(120)는 입출력부(110)로부터 사용자 음성 신호를 수신하는 경우, 음성인식 데이터베이스를 검색하여 사용자 음성 신호가 음성 명령어 또는 검색어에 해당하는지 판단한다. 음성 인식 데이터베이스는 음성 명령어를 저장하고 있는 데이터베이스이다. 제어부()는 사용자 음성 신호가 하기의 표 1과 같은 음성 명령어에 해당하는 경우, 해당 음성 명령어에 따른 시스템 동작을 수행한다.When the user voice signal is received from the input/output unit 110, the controller 120 searches the voice recognition database to determine whether the user voice signal corresponds to a voice command or a search word. The voice recognition database is a database that stores voice commands. When the user's voice signal corresponds to a voice command as shown in Table 1 below, the controller () performs a system operation according to the voice command.

음성 명령어Voice commands 시스템 동작System operation 비고Remark 시작, 처음Start, first 첫 페이지로 넘어간다Go to the first page 끝, 마지막End, end 끝 페이지로 넘어간다Go to the last page 이전, 앞Previous, front 전 페이지로 넘어간다Go to the previous page 다음, 뒤Next, back 다음 페이지로 넘어간다Go to the next page 예, 네, 응Yes, yes, yes 이미지 캡션 기능을 실행한다Execute image caption function 이미지 정보 알림 후 사용자의 의사를 물은 뒤 “예”일 경우 해석한다 After informing the image, ask the user's intention and interpret "Yes" 아니아, 아니No no no 이미지 캡션 기능을 실행하지 않는다Image caption function is not executed 그 외 (검색어)Other (Search term) 검색 처리부로 넘어가 검색한다Go to the search processing unit and search

제어부(120)는 음성 명령어 또는 검색어에 따라 특정 페이지로 접근하고, 특정 페이지에 대한 요약 데이터 및 이미지 스크립트를 생성하고, 요약 데이터 및 이미지 스크립트 중 하나 이상을 입출력부(110)를 통해 출력한다. 예를 들어, 음성 명령어가 처음 인 경우, 전자책의 첫 페이지로 접근하여 첫 페이지의 요약 데이터를 출력할 수 있다. 또한, 제어부(120)는 명령어 이외 검색어를 입력 받은 경우, 검색어를 포함하는 페이지를 검색 및 접근한다.The controller 120 accesses a specific page according to a voice command or a search word, generates summary data and an image script for a specific page, and outputs one or more of the summary data and the image script through the input/output unit 110. For example, when the voice command is the first time, the first page of the e-book may be accessed to output summary data of the first page. In addition, when a search word other than a command is input, the controller 120 searches and accesses a page including the search word.

이 때, 메모리(130)는 음성 인식 데이터 베이스 및 페이지 정보 데이터베이스를 포함한다. 페이지 정보 데이터 베이스는 페이지의 각 텍스트 및 이미지를 저장하고, 제어부(120)가 생성한 요약 데이터 및 이미지 캡션을 추가로 저장한다. At this time, the memory 130 includes a voice recognition database and a page information database. The page information database stores each text and image of the page, and additionally stores summary data and image captions generated by the controller 120.

이하, 도 2를 참조하여 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 상세히 설명하도록 한다. 이하 설명하는 각 과정은 전자책 서비스 제공 장치를 구성하는 각 기능부를 통해 수행되는 과정이나 발명의 간결하고 명확한 설명을 위해 전자책 서비스 제공 장치로 통칭하도록 한다. Hereinafter, a process in which the e-book service providing apparatus provides the e-book service will be described in detail with reference to FIG. 2. Each process described below will be referred to collectively as an e-book service providing device for a concise and clear description of the process or the invention performed through each functional unit constituting the e-book service providing device.

단계 S210에서 전자책 서비스 제공 장치는 사용자 음성 신호를 입출력부(110)를 통해 입력 받는다. 이 때, 사용자 음성 신호는 미리 지정된 음성 명령어 또는 검색어를 포함할 수 있다.In step S210, the e-book service providing apparatus receives a user voice signal through the input/output unit 110. At this time, the user voice signal may include a predetermined voice command or search word.

단계 S220에서 전자책 서비스 제공 장치는 사용자 음성 신호가 음성 명령어인지 판단한다. 예를 들어, 전자책 서비스 제공 장치는 사용자 음성 신호를 STT(Speech to Text) 기능을 이용하여 텍스트로 변환할 수 있다. 전자책 서비스 제공 장치는 변환된 텍스트가 음성 인식 데이터베이스 내에 포함된 음성 명령어 리스트에 저장된 음성 명령어 중 어느 하나에 해당하는 경우, 사용자 음성 신호를 음성 명령어로 인식할 수 있다. 또한, 전자책 서비스 제공 장치는 변환된 텍스트가 음성 명령어 리스트에 저장된 음성 명령어가 아닌 경우, 사용자 음성 신호를 검색어로 인식한다.In step S220, the e-book service providing apparatus determines whether the user voice signal is a voice command. For example, the apparatus for providing an e-book service may convert a user voice signal into text using a speech to text (STT) function. The e-book service providing apparatus may recognize the user's voice signal as a voice command when the converted text corresponds to any one of the voice commands stored in the voice command list included in the voice recognition database. In addition, when the converted text is not a voice command stored in a voice command list, the e-book service providing apparatus recognizes the user voice signal as a search word.

단계 S220에서 사용자 음성 신호가 검색어인 경우, 단계 S230에서 전자책 서비스 제공 장치는 검색어를 포함하는 페이지를 검색하여 접근한다. 이 때, 전자책 서비스 제공 장치는 검색어를 포함하는 페이지가 검색되지 않는 경우, 단계 S210부터의 과정을 다시 수행할 수 있다. 또한, 전자책 서비스 제공 장치는 이전에 생성한 요약 데이터 중 검색어를 포함하는 것이 존재하는 경우, 해당 요약 데이터에 대응하는 페이지를 검색할 수 있다.If the user's voice signal is a search word in step S220, the e-book service providing apparatus searches and accesses a page including the search word in step S230. At this time, when the page including the search word is not searched, the e-book service providing apparatus may perform the process from step S210 again. In addition, the e-book service providing apparatus may search for a page corresponding to the summary data, if the summary data previously generated includes the search word.

단계 S220에서 사용자 음성 신호가 음성 명령어인 경우, 단계 S240에서 전자책 서비스 제공 장치는 음성 명령어에 따른 페이지로 접근한다.If the user's voice signal is a voice command in step S220, the e-book service providing apparatus accesses the page according to the voice command in step S240.

단계 S250에서 전자책 서비스 제공 장치는 접근한 페이지에 요약 데이터가 존재하는지 판단한다. 예를 들어, 전자책 서비스 제공 장치는 접근한 페이지에 대한 요약 데이터가 메모리(130)에 구비된 페이지 정보 데이터베이스 내에 존재하는지 판단할 수 있다.In step S250, the e-book service providing apparatus determines whether summary data exists on the accessed page. For example, the apparatus for providing an e-book service may determine whether summary data for an accessed page exists in a page information database provided in the memory 130.

단계 S250에서 요약 데이터가 존재하지 않는 경우, 단계 S260에서 전자책 서비스 제공 장치는 접근한 페이지의 전문에 대한 요약 데이터를 생성한다. 요약 데이터를 생성하는 과정은 추후 도 3 및 도 4를 참조하여 상세히 설명하도록 한다.If the summary data does not exist in step S250, in step S260, the e-book service providing apparatus generates summary data for the full text of the accessed page. The process of generating the summary data will be described in detail later with reference to FIGS. 3 and 4.

단계 S250에서 요약 데이터가 존재하는 경우, 단계 S270에서 전자책 서비스 제공 장치는 페이지에 이미지가 존재하는지 판단한다.If the summary data is present in step S250, in step S270, the e-book service providing apparatus determines whether an image exists in the page.

단계 S260에서 이미지가 존재하는 경우, 단계 S280에서 전자책 서비스 제공 장치는 사용자에게 이미지 캡셔닝을 수행한다. 이 때, 이미지 캡셔닝 과정은 추후 도 5를 참조하여 상세히 설명하도록 한다.If an image exists in step S260, the e-book service providing apparatus performs image captioning to the user in step S280. At this time, the image captioning process will be described in detail later with reference to FIG. 5.

단계 S290에서 전자책 서비스 제공 장치는 페이지에 대한 요약 데이터 및 이미지 캡션 중 하나 이상을 출력한다. 예를 들어, 전자책 서비스 제공 장치는 사용자가 이미지 캡셔닝 과정을 요청하는 경우, 요약 데이터 및 이미지 캡션을 출력한다.In step S290, the apparatus for providing an e-book service outputs one or more of summary data and an image caption for a page. For example, the apparatus for providing an e-book service outputs summary data and image captions when a user requests an image captioning process.

도 3은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 예시한 순서도이고, 도 4는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 개념적으로 예시한 도면이다. 이하 설명하는 각 과정은 도 2의 단계 S260에 해당하는 과정이다.3 is a flowchart illustrating a process in which the e-book service providing apparatus according to an embodiment of the present invention generates summary data, and FIG. 4 is an e-book service providing apparatus according to an embodiment of the present invention generating summary data This is a conceptual illustration of the process. Each process described below is a process corresponding to step S260 of FIG. 2.

도 3을 참조하면, 단계 S310에서 전자책 서비스 제공 장치는 페이지의 텍스트를 추출한다.Referring to FIG. 3, in step S310, the e-book service providing apparatus extracts the text of the page.

단계 S320에서 전자책 서비스 제공 장치는 도 4와 같이 추출한 텍스트를 문장 단위로 분리하고, 자연어 처리하여 TF-IDF 모델을 생성한다.In step S320, the apparatus for providing an e-book service separates the extracted text into sentence units as shown in FIG. 4 and processes the natural language to generate a TF-IDF model.

단계 S330에서 전자책 서비스 제공 장치는 TF-IDF 모델에 따른 연관 행렬(Correlation Matrix)를 생성하고, 각 행 및 열 간에 대한 가중치 그래프를 산출하고, 가중치 그래프에 따른 각 문장에 대한 텍스트 랭크를 산출한다(도 4 참조). 이 때, 텍스트 랭크는 하기의 수학식 1에 따라 산출할 수 있다.In step S330, the e-book service providing apparatus generates a correlation matrix according to the TF-IDF model, calculates a weight graph for each row and column, and calculates a text rank for each sentence according to the weight graph (See Figure 4). At this time, the text rank can be calculated according to Equation 1 below.

[수학식 1][Equation 1]

문장

에 대한 텍스트 랭크

sentence

Text rank for

r_ij: 문장 또는 단어　i　와　j　사이의 가중치r _ij : weight between sentences or words i and j

d: 페이지 랭크에서 웹 서핑을 하는 사람이 해당 페이지를 만족하지 못하고 다른 페이지로 이동하는 확률 d: Probability of someone surfing the web at the page rank is not satisfied with the page and moves to another page

단계 S340에서 전자책 서비스 제공 장치는 각 문장 중 텍스트 랭크가 가장 높은 문장을 페이지의 요약 데이터로 선정한다.In step S340, the e-book service providing device selects the sentence with the highest text rank among the sentences as summary data of the page.

단계 S350에서 전자책 서비스 제공 장치는 페이지 정보 데이터베이스의 페이지 리스트에 요약 데이터를 저장한다.In step S350, the e-book service providing apparatus stores summary data in the page list of the page information database.

도 5는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 예시한 순서도이고, 도 6은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 개념적으로 예시한 도면이다. 이 때, 도 5의 과정은 도 2에서 상술한 단계 S280에 해당하는 과정이다.5 is a flowchart illustrating a process in which an e-book service providing apparatus according to an embodiment of the present invention performs image captioning, and FIG. 6 is an e-book service providing apparatus according to an embodiment of the present invention. It is a diagram conceptually illustrating the process of performing the. At this time, the process of FIG. 5 is a process corresponding to step S280 described in FIG. 2.

도 5를 참조하면, 단계 S510에서 전자책 서비스 제공 장치는 이미지 캡션의 사용의 확인을 사용자에게 요청하고, 사용자 음성 신호를 입력 받는다.Referring to FIG. 5, in step S510, the e-book service providing apparatus requests the user to confirm the use of the image caption, and receives a user voice signal.

단계 S520에서 전자책 서비스 제공 장치는 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어인지 판단한다.In step S520, the apparatus for providing an e-book service determines whether the user's voice signal is a voice command instructing to use an image caption.

단계 S520에서 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어가 아닌 경우, 이미지 캡셔닝 과정을 종료한다.In step S520, if the user's voice signal is not a voice command for instructing to use the image caption, the image captioning process is ended.

단계 S520에서 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어인 경우, 단계 S530에서 전자책 서비스 제공 장치는 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성한다. 컨볼루션 신경망은 이미지의 공간 정보를 유지한 상태로 학습이 가능한 모델이다.If the user's voice signal is a voice command instructing the use of the image caption in step S520, in step S530, the e-book service providing apparatus inputs the image into the convolutional neural network (CNN) to generate a feature vector. The convolutional neural network is a model capable of learning while maintaining spatial information of an image.

단계 S540에서 전자책 서비스 제공 장치는 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크의 입력 차원과 동일한 차원이 되도록 특징 벡터를 선형 변화시킨다.In step S540, the apparatus for providing an e-book service linearly changes the feature vector to be the same dimension as an input dimension of a Recurrent Neural Network (RNN)/Long-short Term Memory (LSTM) network.

단계 S550에서 전자책 서비스 제공 장치는 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 특징 벡터를 입력하여 이미지 캡션을 생성한다. 이 때, RNN/LSTM 네트워크는 장/단기 기억을 가능하게 설계한 신경망으로, 앞 뒤 문장의 요소들을 종합하여 의미를 파악할 수 있는 모델이다. 예를 들어, 전자책 서비스 제공 장치는 도 6과 같이 RNN/LSTM 네트워크는 특징 벡터를 토대로 시퀀스를 차례대로 예측하여 텍스트로 변환하는 언어 모델링 과정을 통해 이미지 캡션을 생성할 수 있다.In step S550, the e-book service providing apparatus inputs a feature vector into a recurrent neural network (RNN)/long-short term memory (LSTM) network to generate an image caption. At this time, the RNN/LSTM network is a neural network designed to enable long/short memory, and is a model that can grasp the meaning by synthesizing the elements of the front and back sentences. For example, the apparatus for providing an e-book service may generate an image caption through a language modeling process in which the RNN/LSTM network predicts a sequence sequentially based on a feature vector and converts it into text.

따라서, 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 전자책의 각 페이지에 대한 요약 데이터와 이미지 캡션을 자동으로 생성하여 음성으로 출력함으로써 청각을 통해 페이지의 내용을 빠르게 파악할 수 있도록 할 수 있다.Accordingly, the apparatus for providing an e-book service according to an embodiment of the present invention can automatically generate summary data and an image caption for each page of an e-book and output it in voice, so that the contents of the page can be quickly understood through hearing. have.

상술한 본 발명의 실시 예들은 다양한 수단을 통해 구현될 수 있다. 본 발명의 실시 예들은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 실시 예들에 따른 방법은 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. 펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 실시 예들에 따른 방법은 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차 또는 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드 등이 기록된 컴퓨터 프로그램은 컴퓨터 판독 가능 기록 매체 또는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 메모리 유닛은 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 프로세서와 데이터를 주고 받을 수 있다. 또한 본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방법으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다. 더불어 각 블록 또는 각 단계는 특정된 논리적 기능을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또한 몇 가지 대체 실시 예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.The above-described embodiments of the present invention can be implemented through various means. Embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof. For implementation by hardware, methods according to embodiments of the present invention include one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), It can be implemented by field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, and the like. In the case of implementation by firmware or software, the method according to embodiments of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. A computer program in which software code or the like is recorded may be stored in a computer-readable recording medium or memory unit and driven by a processor. The memory unit is located inside or outside the processor, and can exchange data with the processor by various means already known. Also, combinations of each block of the block diagram and each step of the flowchart attached to the present invention may be performed by computer program instructions. Since these computer program instructions may be mounted on an encoding processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, the instructions performed through the encoding processor of a computer or other programmable data processing equipment may be used in each block of the block diagram or In each step of the flowchart, means are created to perform the functions described. These computer program instructions can also be stored in computer readable or computer readable memory that can be oriented to a computer or other programmable data processing equipment to implement a function in a particular way, so that computer readable or computer readable memory The instructions stored in it are also possible to produce an article of manufacture containing instructions means for performing the functions described in each block or flowchart step of the block diagram. Computer program instructions can also be mounted on a computer or other programmable data processing equipment, so a series of operational steps are performed on a computer or other programmable data processing equipment to create a process that is executed by the computer to generate a computer or other programmable data. It is also possible for instructions to perform processing equipment to provide steps for performing the functions described in each block of the block diagram and in each step of the flowchart. In addition, each block or each step can represent a module, segment, or portion of code that includes one or more executable instructions for executing a specified logical function. It should also be noted that in some alternative embodiments it is also possible that the functions mentioned in blocks or steps occur out of order. For example, two blocks or steps shown in succession may in fact be executed substantially simultaneously, or it is also possible that the blocks or steps are sometimes performed in reverse order depending on the corresponding function.

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, those skilled in the art to which the present invention pertains will appreciate that the present invention may be implemented in other specific forms without changing its technical spirit or essential features. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive. The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be interpreted to be included in the scope of the present invention. .

Claims

An input/output unit that receives a user voice signal;
A memory including a speech recognition database; And
A controller for accessing a page according to the user's voice signal using the speech recognition database, and generating one or more of summary data and image captions corresponding to the page,
The control unit is an electronic book service providing apparatus, characterized in that for outputting at least one of the summary data and the image caption through the input and output unit.

According to claim 1,
The voice recognition database includes voice commands,
When the user voice signal corresponds to the voice command, the controller accesses the page corresponding to the voice command.

According to claim 2,
When the user voice signal does not correspond to the voice command, the control unit searches for a page including a search word corresponding to the user voice signal and accesses the e-book service providing apparatus.

According to claim 1,
When the image is included in the page, the control unit generates the image caption.

According to claim 4,
The controller generates a feature vector by inputting the image into a convolutional neural network (CNN), and generates the image caption by inputting the feature vector into a Recurrent Neural Network (RNN)/long-short term memory (LSTM) network. E-book service providing device, characterized in that.

According to claim 1,
The control unit calculates a text rank for each sentence included in the text of the page, and generates a sentence with the highest text rank as the summary data.

A method for providing an e-book service by an e-book service providing device,
Receiving a user voice signal;
Accessing a page according to the user voice signal using a voice recognition database;
Generating one or more of summary data and image captions corresponding to the page; And
And outputting one or more of the summary data and the image caption by voice.

The method of claim 7,
The voice recognition database includes voice commands,
The step of accessing the page according to the user voice signal using the voice recognition database is when the user voice signal corresponds to the voice command, accessing the page corresponding to the voice command. How to provide e-book service.

The method of claim 8,
The step of accessing the page according to the user's voice signal using the voice recognition database may include searching and accessing a page including a search word corresponding to the user voice signal when the user voice signal does not correspond to the voice command. E-book service providing method characterized in that the step.

The method of claim 7,
The generating of one or more of the summary data and the image caption corresponding to the page is a method of providing an e-book service, when the page includes an image, generating the image caption.

The method of claim 10,
The generating of one or more of the summary data and the image caption corresponding to the page generates a feature vector by inputting the image into a convolutional neural network (CNN), and the feature vector is Recurrent Neural Network (RNN)/LSTM (long). -short term memory) A method of providing an e-book service, characterized in that the inputting to a network generates the image caption.

The method of claim 7,
Generating one or more of the summary data and the image caption corresponding to the page is a step of calculating a text rank for each sentence included in the text of the page and generating the sentence with the highest text rank as the summary data E-book service providing method, characterized in that.

A computer-readable recording medium storing a computer program executing any one of claims 7 to 12.