KR102201153B1

KR102201153B1 - Apparatus and method for providing e-book service

Info

Publication number: KR102201153B1
Application number: KR1020180150863A
Authority: KR
Inventors: 박미화; 정달영; 두일철; 정의정; 장유진
Original assignee: 동국대학교 산학협력단
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2021-01-12
Also published as: KR20200072575A

Abstract

본 발명은 전자책 서비스 제공 기술에 관한 것으로, 보다 구체적으로 이미지 캡션과 페이지 요약 기능을 제공하는 전자책 서비스 제공 기술에 관한 것이다. 본 발명의 일 실시 예에 따르면, 시각장애인이 대체 텍스트만으로 이해하기 어려운 이미지를 효과적으로 시각장애인에게 효과적으로 전달할 수 있다.The present invention relates to an e-book service providing technology, and more particularly, to an e-book service providing technology that provides an image caption and a page summary function. According to an embodiment of the present invention, an image that is difficult for a visually impaired person to understand with only alternative text can be effectively transmitted to the visually impaired.

Description

Device and method for providing e-book service {APPARATUS AND METHOD FOR PROVIDING E-BOOK SERVICE}

본 발명은 전자책 서비스 제공 기술에 관한 것으로, 보다 구체적으로 이미지 캡션과 페이지 요약 기능을 제공하는 전자책 서비스 제공 기술에 관한 것이다. The present invention relates to an e-book service providing technology, and more particularly, to an e-book service providing technology that provides an image caption and a page summary function.

다양한 스마트 기기의 출현과 함께 문서의 디지털화가 진행되면서, 전자책의 사용이 급증하고 있다. 특히, TTS를 이용한 전자책의 읽어주기 기능으로 인해 사용자는 눈으로 책을 보지 않아도 책을 읽을 수 있게 되었다. 이로 인해, 다양한 정보나 자료가 부족하였던 시각장애인들도 전자책을 자유롭게 청취할 수 있게 되었다. 더욱이, 시각장애인들의 편의를 제공하기 위한 전자책 관련 기술들이 많이 연구되고 있다.With the advent of various smart devices and digitization of documents, the use of e-books is increasing rapidly. In particular, due to the reading function of e-books using TTS, users can read books without looking at them with their own eyes. Due to this, even visually impaired people who lacked various information or materials can freely listen to e-books. Moreover, many e-book related technologies have been studied to provide convenience for the visually impaired.

읽어주기 기능은 페이지의 처음부터 마지막 문자까지 음성으로 출력하는 기능이다. 다만, 음성으로 출력하는 읽어주기 기능의 특성 상 음성이 한번 재생되면 다시 들을 수 없어, 다시 듣기 위해서는 페이지의 처음부터 다시 재생해야 하는 불편함이 존재한다. 즉, 시각 장애인들은 어떤 페이지의 내용을 다시 듣고 싶은 경우, 찾고 싶은 페이지를 찾을 때까지 많은 페이지의 내용을 다시 들어가며 많은 시간을 들여야 하는 문제가 있다. The read function is a function that outputs the text from the beginning to the end of the page by voice. However, due to the characteristics of the reading function that is output as a voice, once the voice is played, it cannot be heard again, and in order to listen again, there is an inconvenience of having to play again from the beginning of the page. In other words, when visually impaired people want to listen to the content of a certain page again, there is a problem in that they have to spend a lot of time re-entering the content of many pages until they find the page they want to find.

본 발명에 대한 선행기술로는 등록특허 제10-1789057호가 있다.As a prior art for the present invention, there is Registration Patent No. 10-1789057.

본 발명은 시각장애인이 대체 텍스트만으로 이해하기 어려운 이미지를 효과적으로 시각장애인에게 효과적으로 전달하는 전자책 서비스 제공 장치 및 방법을 제공한다.The present invention provides an apparatus and method for providing an e-book service for effectively delivering an image that is difficult for the visually impaired to understand only with alternative texts to the visually impaired.

또한 본 발명은 각 페이지를 자동으로 요약하여 음성으로 제공함으로써 사용자의 각 페이지에 대한 접근성을 향상시키는 전자책 서비스 제공 장치 및 방법을 제공한다.In addition, the present invention provides an apparatus and method for providing an e-book service for improving the accessibility of each page by a user by automatically summarizing each page and providing a voice.

본 발명의 일 측면에 따르면, 전자책 서비스 제공 장치가 제공된다.According to an aspect of the present invention, an apparatus for providing an e-book service is provided.

본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 사용자 음성 신호를 입력 받는 입출력부, 음성 인식 데이터베이스를 포함하는 메모리 및 상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하고, 상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 제어부를 포함하되, 상기 제어부는 상기 입출력부를 통해 상기 요약 데이터 및 상기 이미지 캡션 중 하나 이상을 음성으로 출력할 수 있다.The e-book service providing apparatus according to an embodiment of the present invention accesses a page according to the user voice signal using an input/output unit receiving a user voice signal, a memory including a voice recognition database, and the voice recognition database, and the And a control unit that generates at least one of summary data and image caption corresponding to a page, wherein the control unit may output at least one of the summary data and the image caption as a voice through the input/output unit.

상기 음성 인식 데이터베이스는 음성 명령어를 포함하고, 상기 제어부는 상기 사용자 음성 신호가 상기 음성 명령어에 해당하는 경우, 상기 음성 명령어에 대응하는 상기 페이지에 접근할 수 있다.The voice recognition database includes voice commands, and the controller may access the page corresponding to the voice command when the user voice signal corresponds to the voice command.

상기 제어부는 상기 사용자 음성 신호가 상기 음성 명령어에 해당하지 않는 경우, 상기 사용자 음성 신호에 대응하는 검색어를 포함하는 페이지를 검색하여 접근할 수 있다.When the user voice signal does not correspond to the voice command, the controller may search for and access a page including a search word corresponding to the user voice signal.

상기 제어부는 상기 페이지에 이미지가 포함되어 있는 경우, 상기 이미지 캡션을 생성할 수 있다.When the page contains an image, the control unit may generate the image caption.

상기 제어부는 상기 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성하고, 상기 특징 벡터를 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 입력하여 상기 이미지 캡션을 생성할 수 있다.The control unit generates a feature vector by inputting the image to a convolutional neural network (CNN), and inputs the feature vector to a recurrent neural network (RNN)/long-short term memory (LSTM) network to generate the image caption. I can.

상기 제어부는 상기 페이지의 텍스트에 포함된 각 문장에 대해 텍스트 랭크를 산출하고, 텍스트 랭크가 가장 높은 문장을 상기 요약 데이터로 생성할 수 있다.The control unit may calculate a text rank for each sentence included in the text of the page, and generate a sentence having the highest text rank as the summary data.

본 발명의 다른 측면에 따르면, 전자책 서비스 제공 장치에서 전자책 서비스 제공 방법 및 이를 실행하는 컴퓨터 프로그램이 저장된 컴퓨터가 판독 가능한 기록매체가 제공된다.According to another aspect of the present invention, a method for providing an e-book service in an e-book service providing apparatus and a computer-readable recording medium storing a computer program for executing the same are provided.

본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치에서 전자책 서비스 제공 방법 및 이를 실행하는 컴퓨터 프로그램이 저장된 컴퓨터가 판독 가능한 기록매체는 사용자 음성 신호를 입력 받는 단계, 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계, 상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계 및 상기 요약 데이터 및 상기 이미지 캡션 중 하나 이상을 음성으로 출력하는 단계를 포함할 수 있다. In the apparatus for providing an e-book service according to an embodiment of the present invention, a method for providing an e-book service and a computer-readable recording medium storing a computer program that executes the same, receiving a user voice signal, and the user using a voice recognition database Accessing the page according to the audio signal, generating at least one of summary data and image caption corresponding to the page, and outputting at least one of the summary data and the image caption as a voice.

상기 음성 인식 데이터베이스는 음성 명령어를 포함하고, 상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계는, 상기 사용자 음성 신호가 상기 음성 명령어에 해당하는 경우, 상기 음성 명령어에 대응하는 상기 페이지에 접근하는 단계일 수 있다.The voice recognition database includes a voice command, and the step of accessing a page according to the user voice signal using the voice recognition database comprises: when the user voice signal corresponds to the voice command, corresponding to the voice command It may be a step of accessing the page.

상기 음성 인식 데이터베이스를 이용하여 상기 사용자 음성 신호에 따른 페이지에 접근하는 단계는, 상기 사용자 음성 신호가 상기 음성 명령어에 해당하지 않는 경우, 상기 사용자 음성 신호에 대응하는 검색어를 포함하는 페이지를 검색하여 접근하는 단계일 수 있다.The step of accessing a page according to the user voice signal using the voice recognition database includes: when the user voice signal does not correspond to the voice command, a page containing a search word corresponding to the user voice signal is searched and accessed. It may be a step to do.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 페이지에 이미지가 포함되어 있는 경우, 상기 이미지 캡션을 생성하는 단계일 수 있다.The generating of at least one of summary data and image caption corresponding to the page may be a step of generating the image caption when the page contains an image.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성하고, 상기 특징 벡터를 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 입력하여 상기 이미지 캡션을 생성하는 단계일 수 있다.In the step of generating at least one of summary data and image caption corresponding to the page, a feature vector is generated by inputting the image into a convolutional neural network (CNN), and the feature vector is recurrent neural network (RNN)/long -short term memory) may be a step of generating the image caption by inputting it into a network.

상기 페이지에 대응하는 요약 데이터 및 이미지 캡션 중 하나 이상을 생성하는 단계는 상기 페이지의 텍스트에 포함된 각 문장에 대해 텍스트 랭크를 산출하고, 텍스트 랭크가 가장 높은 문장을 상기 요약 데이터로 생성하는 단계일 수 있다.The generating of at least one of summary data and image caption corresponding to the page is a step of calculating a text rank for each sentence included in the text of the page, and generating a sentence having the highest text rank as the summary data I can.

상술한 바와 같이 본 발명의 일 실시 예에 따르면, 시각장애인이 대체 텍스트만으로 이해하기 어려운 이미지를 효과적으로 시각장애인에게 효과적으로 전달할 수 있다.As described above, according to an embodiment of the present invention, an image that is difficult for a visually impaired person to understand only with alternative text can be effectively transmitted to the visually impaired.

또한 본 발명의 일 실시 예에 따르면, 각 페이지를 자동으로 요약하여 음성으로 제공함으로써 사용자의 각 페이지에 대한 접근성을 향상시킬 수 있다.In addition, according to an embodiment of the present invention, each page is automatically summarized and provided as a voice, thereby improving accessibility to each page of a user.

도 1은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치를 예시한 블록도.
도 2는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 간략히 예시한 순서도.
도 3은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 예시한 순서도.
도 4는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 개념적으로 예시한 도면.
도 5는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 예시한 순서도.
도 6은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 개념적으로 예시한 도면.1 is a block diagram illustrating an e-book service providing apparatus according to an embodiment of the present invention.
2 is a flow chart briefly illustrating a process of providing an e-book service by an e-book service providing apparatus according to an embodiment of the present invention.
3 is a flow chart illustrating a process of generating summary data by an e-book service providing apparatus according to an embodiment of the present invention.
4 is a diagram conceptually illustrating a process of generating summary data by an e-book service providing apparatus according to an embodiment of the present invention.
5 is a flowchart illustrating a process of performing image captioning by an e-book service providing apparatus according to an embodiment of the present invention.
6 is a diagram conceptually illustrating a process of performing image captioning by an e-book service providing apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명하도록 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 또한, 어떤 부분이 어떤 구성 요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various forms and is not limited to the embodiments described herein. In addition, when a certain part includes a certain component, this means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

도 1은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치를 예시한 블록도이고, 도 2는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 간략히 예시한 순서도이다.1 is a block diagram illustrating a device for providing an e-book service according to an embodiment of the present invention, and FIG. 2 is a simplified illustration of a process in which an e-book service providing device according to an embodiment of the present invention provides an e-book service It is a flow chart.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 입출력부(110), 제어부(120) 및 메모리(130)를 포함한다.Referring to FIG. 1, an apparatus for providing an e-book service according to an embodiment of the present invention includes an input/output unit 110, a control unit 120, and a memory 130.

입출력부(110)는 사용자로부터 음성 신호(이하, 사용자 음성 신호라 지칭)를 입력 받고, 텍스트를 변환한 음성을 스피커를 통해 출력하거나 텍스트를 디스플레이를 통해 출력한다. 이 때, 입출력부(110)는 마이크 등의 음성 신호를 입력하는 장치, 스피커 등의 음성을 출력하는 장치 및 디스플레이와 전기적으로 연결될 수 있다.The input/output unit 110 receives a voice signal (hereinafter, referred to as a user voice signal) from a user, and outputs the converted voice through a speaker or outputs the text through a display. In this case, the input/output unit 110 may be electrically connected to a device for inputting a voice signal such as a microphone, a device for outputting voice such as a speaker, and a display.

제어부(120)는 입출력부(110)로부터 사용자 음성 신호를 수신하는 경우, 음성인식 데이터베이스를 검색하여 사용자 음성 신호가 음성 명령어 또는 검색어에 해당하는지 판단한다. 음성 인식 데이터베이스는 음성 명령어를 저장하고 있는 데이터베이스이다. 제어부()는 사용자 음성 신호가 하기의 표 1과 같은 음성 명령어에 해당하는 경우, 해당 음성 명령어에 따른 시스템 동작을 수행한다.When receiving a user voice signal from the input/output unit 110, the controller 120 searches the voice recognition database and determines whether the user voice signal corresponds to a voice command or a search word. The voice recognition database is a database that stores voice commands. When the user voice signal corresponds to the voice command as shown in Table 1 below, the control unit () performs a system operation according to the voice command.

음성 명령어Voice command 시스템 동작System operation 비고Remark 시작, 처음Beginning, beginning 첫 페이지로 넘어간다Go to the first page 끝, 마지막End, end 끝 페이지로 넘어간다Go to the last page 이전, 앞Previous, front 전 페이지로 넘어간다Go to the previous page 다음, 뒤Next, back 다음 페이지로 넘어간다Go to the next page 예, 네, 응Yes, yes, yes 이미지 캡션 기능을 실행한다Execute the image caption function 이미지 정보 알림 후 사용자의 의사를 물은 뒤 “예”일 경우 해석한다 After notifying the image information, ask the user's intention and interpret it if "Yes" 아니아, 아니No, no 이미지 캡션 기능을 실행하지 않는다Do not execute the image caption function 그 외 (검색어)Other (search terms) 검색 처리부로 넘어가 검색한다Go to the search processing unit and search

제어부(120)는 음성 명령어 또는 검색어에 따라 특정 페이지로 접근하고, 특정 페이지에 대한 요약 데이터 및 이미지 스크립트를 생성하고, 요약 데이터 및 이미지 스크립트 중 하나 이상을 입출력부(110)를 통해 출력한다. 예를 들어, 음성 명령어가 처음 인 경우, 전자책의 첫 페이지로 접근하여 첫 페이지의 요약 데이터를 출력할 수 있다. 또한, 제어부(120)는 명령어 이외 검색어를 입력 받은 경우, 검색어를 포함하는 페이지를 검색 및 접근한다.The controller 120 accesses a specific page according to a voice command or search word, generates summary data and image script for the specific page, and outputs at least one of the summary data and image script through the input/output unit 110. For example, when the voice command is the first, it is possible to access the first page of the e-book and output the summary data of the first page. In addition, when a search word other than a command is input, the controller 120 searches for and accesses a page including the search word.

이 때, 메모리(130)는 음성 인식 데이터 베이스 및 페이지 정보 데이터베이스를 포함한다. 페이지 정보 데이터 베이스는 페이지의 각 텍스트 및 이미지를 저장하고, 제어부(120)가 생성한 요약 데이터 및 이미지 캡션을 추가로 저장한다. In this case, the memory 130 includes a voice recognition database and a page information database. The page information database stores text and images of each page, and additionally stores summary data and image captions generated by the controller 120.

이하, 도 2를 참조하여 전자책 서비스 제공 장치가 전자책 서비스를 제공하는 과정을 상세히 설명하도록 한다. 이하 설명하는 각 과정은 전자책 서비스 제공 장치를 구성하는 각 기능부를 통해 수행되는 과정이나 발명의 간결하고 명확한 설명을 위해 전자책 서비스 제공 장치로 통칭하도록 한다. Hereinafter, a process of providing an e-book service by an e-book service providing device will be described in detail with reference to FIG. 2. Each of the processes to be described below will be collectively referred to as an e-book service providing device for a concise and clear description of the invention or a process performed by each functional unit constituting the e-book service providing device.

단계 S210에서 전자책 서비스 제공 장치는 사용자 음성 신호를 입출력부(110)를 통해 입력 받는다. 이 때, 사용자 음성 신호는 미리 지정된 음성 명령어 또는 검색어를 포함할 수 있다.In step S210, the e-book service providing apparatus receives a user voice signal through the input/output unit 110. In this case, the user voice signal may include a predefined voice command or search word.

단계 S220에서 전자책 서비스 제공 장치는 사용자 음성 신호가 음성 명령어인지 판단한다. 예를 들어, 전자책 서비스 제공 장치는 사용자 음성 신호를 STT(Speech to Text) 기능을 이용하여 텍스트로 변환할 수 있다. 전자책 서비스 제공 장치는 변환된 텍스트가 음성 인식 데이터베이스 내에 포함된 음성 명령어 리스트에 저장된 음성 명령어 중 어느 하나에 해당하는 경우, 사용자 음성 신호를 음성 명령어로 인식할 수 있다. 또한, 전자책 서비스 제공 장치는 변환된 텍스트가 음성 명령어 리스트에 저장된 음성 명령어가 아닌 경우, 사용자 음성 신호를 검색어로 인식한다.In step S220, the e-book service providing apparatus determines whether the user's voice signal is a voice command. For example, the e-book service providing device may convert a user's voice signal into text using a Speech to Text (STT) function. When the converted text corresponds to one of the voice commands stored in the voice command list included in the voice recognition database, the e-book service providing apparatus may recognize a user voice signal as a voice command. In addition, when the converted text is not a voice command stored in the voice command list, the e-book service providing apparatus recognizes a user voice signal as a search word.

단계 S220에서 사용자 음성 신호가 검색어인 경우, 단계 S230에서 전자책 서비스 제공 장치는 검색어를 포함하는 페이지를 검색하여 접근한다. 이 때, 전자책 서비스 제공 장치는 검색어를 포함하는 페이지가 검색되지 않는 경우, 단계 S210부터의 과정을 다시 수행할 수 있다. 또한, 전자책 서비스 제공 장치는 이전에 생성한 요약 데이터 중 검색어를 포함하는 것이 존재하는 경우, 해당 요약 데이터에 대응하는 페이지를 검색할 수 있다.When the user's voice signal is a search word in step S220, the e-book service providing apparatus searches for and accesses a page including the search word in step S230. In this case, when the page including the search word is not searched, the e-book service providing apparatus may perform the process from step S210 again. In addition, when there is a previously generated summary data including a search word, the e-book service providing device may search for a page corresponding to the summary data.

단계 S220에서 사용자 음성 신호가 음성 명령어인 경우, 단계 S240에서 전자책 서비스 제공 장치는 음성 명령어에 따른 페이지로 접근한다.When the user voice signal is a voice command in step S220, the e-book service providing apparatus accesses a page according to the voice command in step S240.

단계 S250에서 전자책 서비스 제공 장치는 접근한 페이지에 요약 데이터가 존재하는지 판단한다. 예를 들어, 전자책 서비스 제공 장치는 접근한 페이지에 대한 요약 데이터가 메모리(130)에 구비된 페이지 정보 데이터베이스 내에 존재하는지 판단할 수 있다.In step S250, the e-book service providing apparatus determines whether summary data exists in the accessed page. For example, the e-book service providing apparatus may determine whether summary data on the accessed page exists in the page information database provided in the memory 130.

단계 S250에서 요약 데이터가 존재하지 않는 경우, 단계 S260에서 전자책 서비스 제공 장치는 접근한 페이지의 전문에 대한 요약 데이터를 생성한다. 요약 데이터를 생성하는 과정은 추후 도 3 및 도 4를 참조하여 상세히 설명하도록 한다.If the summary data does not exist in step S250, the e-book service providing device generates summary data for the full text of the accessed page in step S260. The process of generating the summary data will be described in detail later with reference to FIGS. 3 and 4.

단계 S250에서 요약 데이터가 존재하는 경우, 단계 S270에서 전자책 서비스 제공 장치는 페이지에 이미지가 존재하는지 판단한다.If summary data exists in step S250, the e-book service providing apparatus determines whether an image exists on the page in step S270.

단계 S260에서 이미지가 존재하는 경우, 단계 S280에서 전자책 서비스 제공 장치는 사용자에게 이미지 캡셔닝을 수행한다. 이 때, 이미지 캡셔닝 과정은 추후 도 5를 참조하여 상세히 설명하도록 한다.If the image exists in step S260, the e-book service providing device performs image captioning to the user in step S280. In this case, the image captioning process will be described in detail later with reference to FIG. 5.

단계 S290에서 전자책 서비스 제공 장치는 페이지에 대한 요약 데이터 및 이미지 캡션 중 하나 이상을 출력한다. 예를 들어, 전자책 서비스 제공 장치는 사용자가 이미지 캡셔닝 과정을 요청하는 경우, 요약 데이터 및 이미지 캡션을 출력한다.In step S290, the e-book service providing apparatus outputs one or more of summary data and image captions for the page. For example, when a user requests an image captioning process, the e-book service providing apparatus outputs summary data and image captions.

도 3은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 예시한 순서도이고, 도 4는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 요약 데이터를 생성하는 과정을 개념적으로 예시한 도면이다. 이하 설명하는 각 과정은 도 2의 단계 S260에 해당하는 과정이다.3 is a flowchart illustrating a process of generating summary data by an e-book service providing device according to an embodiment of the present invention, and FIG. 4 is a flowchart illustrating a process of generating summary data by an e-book service providing device according to an embodiment of the present invention. It is a diagram conceptually illustrating the process of doing so. Each process described below is a process corresponding to step S260 of FIG. 2.

도 3을 참조하면, 단계 S310에서 전자책 서비스 제공 장치는 페이지의 텍스트를 추출한다.Referring to FIG. 3, in step S310, the apparatus for providing an e-book service extracts text of a page.

단계 S320에서 전자책 서비스 제공 장치는 도 4와 같이 추출한 텍스트를 문장 단위로 분리하고, 자연어 처리하여 TF-IDF 모델을 생성한다.In step S320, the e-book service providing apparatus separates the extracted text into sentences as shown in FIG. 4 and processes natural language to generate a TF-IDF model.

단계 S330에서 전자책 서비스 제공 장치는 TF-IDF 모델에 따른 연관 행렬(Correlation Matrix)를 생성하고, 각 행 및 열 간에 대한 가중치 그래프를 산출하고, 가중치 그래프에 따른 각 문장에 대한 텍스트 랭크를 산출한다(도 4 참조). 이 때, 텍스트 랭크는 하기의 수학식 1에 따라 산출할 수 있다.In step S330, the e-book service providing apparatus generates a correlation matrix according to the TF-IDF model, calculates a weight graph for each row and column, and calculates a text rank for each sentence according to the weight graph. (See Fig. 4). In this case, the text rank can be calculated according to Equation 1 below.

[수학식 1][Equation 1]

문장

에 대한 텍스트 랭크

sentence

Text rank for

r_ij: 문장 또는 단어　i　와　j　사이의 가중치r _ij : weight between sentences or words i and j

d: 페이지 랭크에서 웹 서핑을 하는 사람이 해당 페이지를 만족하지 못하고 다른 페이지로 이동하는 확률d: Probability of a person surfing the web at page rank that is not satisfied with the page and goes to another page

단계 S340에서 전자책 서비스 제공 장치는 각 문장 중 텍스트 랭크가 가장 높은 문장을 페이지의 요약 데이터로 선정한다.In step S340, the e-book service providing apparatus selects a sentence with the highest text rank among each sentence as summary data of the page.

단계 S350에서 전자책 서비스 제공 장치는 페이지 정보 데이터베이스의 페이지 리스트에 요약 데이터를 저장한다.In step S350, the e-book service providing apparatus stores summary data in a page list of the page information database.

도 5는 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 예시한 순서도이고, 도 6은 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치가 이미지 캡셔닝을 수행하는 과정을 개념적으로 예시한 도면이다. 이 때, 도 5의 과정은 도 2에서 상술한 단계 S280에 해당하는 과정이다.5 is a flowchart illustrating a process of performing image captioning by an e-book service providing device according to an embodiment of the present invention, and FIG. 6 is a flowchart illustrating a process of performing image captioning by an e-book service providing device according to an embodiment of the present invention. It is a diagram conceptually illustrating the process of performing. In this case, the process of FIG. 5 corresponds to step S280 described above in FIG. 2.

도 5를 참조하면, 단계 S510에서 전자책 서비스 제공 장치는 이미지 캡션의 사용의 확인을 사용자에게 요청하고, 사용자 음성 신호를 입력 받는다.Referring to FIG. 5, in step S510, the apparatus for providing an e-book service requests a user to confirm use of an image caption, and receives a user voice signal.

단계 S520에서 전자책 서비스 제공 장치는 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어인지 판단한다.In step S520, the e-book service providing apparatus determines whether the user's voice signal is a voice command instructing to use the image caption.

단계 S520에서 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어가 아닌 경우, 이미지 캡셔닝 과정을 종료한다.In step S520, if the user voice signal is not a voice command commanding the use of the image caption, the image captioning process is terminated.

단계 S520에서 사용자 음성 신호가 이미지 캡션을 사용함을 명령하는 음성 명령어인 경우, 단계 S530에서 전자책 서비스 제공 장치는 이미지를 컨볼루션 신경망(CNN)에 입력하여 특징 벡터를 생성한다. 컨볼루션 신경망은 이미지의 공간 정보를 유지한 상태로 학습이 가능한 모델이다.In step S520, if the user's voice signal is a voice command instructing to use the image caption, in step S530, the e-book service providing apparatus inputs the image into a convolutional neural network (CNN) to generate a feature vector. A convolutional neural network is a model that can learn while maintaining spatial information of an image.

단계 S540에서 전자책 서비스 제공 장치는 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크의 입력 차원과 동일한 차원이 되도록 특징 벡터를 선형 변화시킨다.In step S540, the e-book service providing apparatus linearly changes the feature vector to have the same dimension as the input dimension of the recurrent neural network (RNN)/long-short term memory (LSTM) network.

단계 S550에서 전자책 서비스 제공 장치는 RNN(Recurrent Neural Network)/LSTM(long-short term memory) 네트워크에 특징 벡터를 입력하여 이미지 캡션을 생성한다. 이 때, RNN/LSTM 네트워크는 장/단기 기억을 가능하게 설계한 신경망으로, 앞 뒤 문장의 요소들을 종합하여 의미를 파악할 수 있는 모델이다. 예를 들어, 전자책 서비스 제공 장치는 도 6과 같이 RNN/LSTM 네트워크는 특징 벡터를 토대로 시퀀스를 차례대로 예측하여 텍스트로 변환하는 언어 모델링 과정을 통해 이미지 캡션을 생성할 수 있다.In step S550, the e-book service providing apparatus generates an image caption by inputting a feature vector into a recurrent neural network (RNN)/long-short term memory (LSTM) network. In this case, the RNN/LSTM network is a neural network designed to enable long/short memory, and is a model capable of grasping the meaning by synthesizing elements of the front and back sentences. For example, the e-book service providing apparatus may generate an image caption through a language modeling process in which the RNN/LSTM network sequentially predicts sequences based on feature vectors and converts them to text as shown in FIG. 6.

따라서, 본 발명의 일 실시 예에 따른 전자책 서비스 제공 장치는 전자책의 각 페이지에 대한 요약 데이터와 이미지 캡션을 자동으로 생성하여 음성으로 출력함으로써 청각을 통해 페이지의 내용을 빠르게 파악할 수 있도록 할 수 있다.Accordingly, the apparatus for providing an e-book service according to an embodiment of the present invention automatically generates summary data and image captions for each page of the e-book and outputs it as a voice, so that the contents of the page can be quickly recognized through hearing. have.

상술한 본 발명의 실시 예들은 다양한 수단을 통해 구현될 수 있다. 본 발명의 실시 예들은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 실시 예들에 따른 방법은 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. 펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 실시 예들에 따른 방법은 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차 또는 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드 등이 기록된 컴퓨터 프로그램은 컴퓨터 판독 가능 기록 매체 또는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 메모리 유닛은 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 프로세서와 데이터를 주고 받을 수 있다. 또한 본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방법으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다. 더불어 각 블록 또는 각 단계는 특정된 논리적 기능을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또한 몇 가지 대체 실시 예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.The above-described embodiments of the present invention can be implemented through various means. Embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof. In the case of implementation by hardware, the method according to embodiments of the present invention includes one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), It can be implemented by field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, and the like. In the case of implementation by firmware or software, the method according to the embodiments of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. A computer program in which software codes and the like are recorded may be stored in a computer-readable recording medium or a memory unit and driven by a processor. The memory unit may be located inside or outside the processor, and may exchange data with the processor through various known means. In addition, combinations of each block of the block diagram attached to the present invention and each step of the flowchart may be performed by computer program instructions. Since these computer program instructions can be mounted on the encoding processor of a general-purpose computer, special purpose computer or other programmable data processing equipment, the instructions executed by the encoding processor of the computer or other programmable data processing equipment are each block of the block diagram or Each step of the flow chart will create a means to perform the functions described. These computer program instructions can also be stored in computer-usable or computer-readable memory that can be directed to a computer or other programmable data processing equipment to implement a function in a particular way, so that the computer-usable or computer-readable memory It is also possible to produce an article of manufacture in which the instructions stored in the block diagram contain instruction means for performing the functions described in each block or flow chart. Computer program instructions can also be mounted on a computer or other programmable data processing equipment, so that a series of operating steps are performed on a computer or other programmable data processing equipment to create a computer-executable process to create a computer or other programmable data processing equipment. It is also possible for the instructions to perform the processing equipment to provide steps for performing the functions described in each block of the block diagram and each step of the flowchart. In addition, each block or each step may represent a module, segment, or part of code including one or more executable instructions for executing a specified logical function. It should also be noted that, in some alternative embodiments, functions mentioned in blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially simultaneously, or the blocks or steps may sometimes be performed in the reverse order depending on the corresponding function.

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, those skilled in the art to which the present invention pertains will be able to understand that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. Therefore, the embodiments described above are illustrative in all respects and should be understood as non-limiting. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. .

Claims

An input/output unit for receiving a user's voice signal and outputting the converted text through a speaker or outputting the text through a display;
A memory including a speech recognition database including a list of speech commands and a page information database including page summary data; And
When receiving a user voice signal, search a voice recognition database to determine whether the user voice signal corresponds to a voice command or search word, and if the user voice signal corresponds to a voice command, perform a system operation according to the voice command, Access a page according to the user's voice signal using the speech recognition database, generate at least one of summary data and image caption corresponding to the page, and at least one of the summary data and the image caption through the input/output unit Including a control unit for outputting voice,
When the user voice signal corresponds to the voice command list, the controller accesses the page corresponding to the voice command,
When the user voice signal does not correspond to the voice command list, a page containing a search word corresponding to the user voice signal is searched and accessed,
When the page contains an image, a feature vector is generated by inputting the image to a convolutional neural network (CNN), and the feature vector is input to a recurrent neural network (RNN)/long-short term memory (LSTM) network. To generate the image caption,
It is determined whether summary data corresponding to the page exists in the page information data, and if the summary data does not exist, the text of the page is extracted,
The extracted text is separated into sentences, and natural language processing is performed to generate a TF-IDF model, a correlation matrix according to the TF-IDF model is generated, a weight graph for each row and column is calculated, and a weight graph The text rank for each sentence according to is calculated according to Equation 1 below,
[Equation 1]

TR(Vi): text rank for sentence Vi
rij: weight between sentences or words i and j,
d: Probability of a person surfing the web at page rank not satisfied with the page and navigate to another page,
An e-book service providing apparatus for calculating a text rank for each sentence included in the text of the page, generating a sentence having the highest text rank as the summary data, and storing page summary data of the page information database.

delete

In the method for providing an e-book service by an e-book service providing device,
Receiving a user voice signal and outputting the converted text through a speaker or outputting the text through a display;
When receiving a user voice signal, searching a voice recognition database and determining whether the user voice signal corresponds to a voice command or a search word;
Accessing a page corresponding to the voice command when the user voice signal corresponds to a voice command list;
If the user voice signal does not correspond to the voice command list, searching for and accessing a page including a search word corresponding to the user voice signal;
When the page contains an image, a feature vector is generated by inputting the image to a convolutional neural network (CNN), and the feature vector is input to a recurrent neural network (RNN)/long-short term memory (LSTM) network. Generating an image caption;
Determining whether summary data corresponding to the page exists in page information data, and extracting text of a page if the summary data does not exist;
The extracted text is separated into sentences, and natural language processing is performed to generate a TF-IDF model, a correlation matrix according to the TF-IDF model is generated, a weight graph for each row and column is calculated, and a weight graph Calculating a text rank for each sentence according to Equation 1 below;
[Equation 1]

sentence

Text rank for,
rij: weight between sentences or words i and j,
d: Probability of a person surfing the web at page rank not satisfied with the page and navigate to another page,
Calculating a text rank for each sentence included in the text of the page, and generating a sentence having the highest text rank as the summary data; And
And storing the page summary data of the page information database.

delete

A computer-readable recording medium in which a computer program for executing the method of providing an e-book service of claim 7 is stored in a computer.