KR20000049925A

KR20000049925A - The contents of book of PDF file in internet that offer method & system.

Info

Publication number: KR20000049925A
Application number: KR1020000024264A
Authority: KR
Inventors: 조규철
Original assignee: 조규철
Priority date: 2000-05-06
Filing date: 2000-05-06
Publication date: 2000-08-05

Abstract

PURPOSE: A system and method for supplying a portable document format(PDF) file with book contents in the Internet, is provided to process all works of writers into a PDF, to extract excerpts to all works as an indexing engine, to establish a meta database with the extracted excerpts, and read books directly on the Internet. CONSTITUTION: A system and method for supplying a portable document format(PDF) file with book contents in the Internet, extracts meta data on books by use of an indexing engine indexing information on the books automatically, in order to search and process vast literature, and stores data processed by composing the PDF file as a directory per book according to a classification schedule. The system and method stores and supplies an electronic text data which increases a loading speed by storing in a unit of a page, as vast pages of a book unit causes service speed slow, when reading the book on the Internet. If a key word and natural language are inputted, extracts the key word in an inputted search word row, and displays a desired resource by the key word.

Description

The contents of book of PDF file in internet that offer method & system.

본 발명은 인터넷상에서 서적내용을 피디에프(PDF)파일로 제공하는 방법 및 그 시스템에 관한 것으로 이를 좀더 구체적으로 설명하면 기존의 인쇄물을 피디에프(PDF) 파일로 가공하여 인터넷상에서 제공하는 방법과 인쇄매체가 아닌 인터넷을 통해 작가들이 저작활동을 할 수 있는 공간을 제공하는 방법 및 독자들에게 서비스를 제공하는 방법과 독자들의 참여 방법 그리고 독자와 작가의 교감공간의 제공에 대한 시스템에 관한 것이다.The present invention relates to a method and a system for providing a book content as a PDF file on the Internet. More specifically, the present invention processes a conventional printed matter as a PDF file and provides it on the Internet. It relates to a method of providing a space for authors to write through the Internet, not a medium, a method of providing a service to readers, a method of participation of readers, and a system of providing a sympathetic space between readers and writers.

종래의 저작 작품은 종이로 만든 서적을 통해 작가들에게 제공되었다. 이 작품들은 해당 출판사가 없어지거나, 서적이 훼손되면 생명력이 단축되거나 사라진다. 또한 서적이 지니는 배포상의 한계로 인해 접할 수 있는 독자가 한정된다. 현재 이러한 작품들을 인터넷상에서 판매하는 시스템이 있다. 하지만 이 방식 또한 종이라는 매체에 기록되어 배포되는 서적의 한계성을 벗어 나지 못하였다. 전자 출판의 개념으로 씨디(CD)를 이용한 전달방식도 있다. 이 방식 또한 저장 측면에서의 진보는 있으되 작가와 독자의 교감적 측면이나 제공되는 작품의 양적 측면에서의 한계성을 벗어나지 못하고 있다. 현재 가장 진보된 방식으로 전자 도서관을 들 수 있다. 자료의 저장 및 제공측면에서 한 발 진보한 방식으로 볼 수 있다. 그러나 논문 등 전문성있는 일부 저작물에 한해 가공되어, 특정한 자격요건으로 제한된 사용자에게만 제공되고 있는 실정이다. 이에 따라 다양한 분야에서의 다양한 작가들의 작품을 제한없이 각계 각층의 독자를 대상으로 제공하기 위하여 새로운 방법 및 시스템이 요구되었다.Conventional works of art have been provided to writers through books made of paper. These works are shortened or disappeared when the publisher disappears or the books are damaged. In addition, the limitations of a book's distribution limit the reader's access to it. There is currently a system for selling these works on the Internet. However, this method also escaped the limitations of books recorded and distributed on paper. There is also a CD (CD) as a concept of electronic publishing. This method also has progress in terms of storage, but it does not escape the limitations in the sympathetic aspects of the artist and the reader or the quantity of the work provided. Nowadays, electronic library is the most advanced way. In terms of storage and provision of data, it can be seen as a step forward. However, some of the specialized works such as thesis are processed and provided to users restricted to specific qualifications. Accordingly, new methods and systems were required to provide the works of various artists in various fields to readers from all walks of life.

이러한 종래 기술의 문제를 고려한 본 발명의 목적은 작가들의 모든 저작물을 인터넷 표준 포맷인 피디에프(PDF)로 가공하고 모든 작품에 대한 초록을 인덱싱엔진으로 추출하며 이 정보로 메타 데이터베이스를 구축하여 인터넷 상에서 직접 책을 볼 수 있도록 서비스를 제공하는 방법과 그 시스템이다.The object of the present invention in consideration of the problems of the prior art is to process all the authors' works in the Internet standard format PDF (PDF), extract the abstracts of all the works with the indexing engine and build a meta database with this information on the Internet It is a system and a method of providing a service to directly view the book.

본 발명의 다른 목적은 독자와 작가가 인터넷상에서 교감할 수 있는 공간을 제공하는 것이다. 작가에겐 저작 활동을 할 수 있는 공간, 예비 작가에겐 등단할 수 있는 공간. 독자에겐 작가의 작품을 감상하고 평할 수 있으며 작가와 직접 대화할 수 있는 공간을 제공하는 것이다.Another object of the present invention is to provide a space for readers and writers to communicate on the Internet. It is a space where authors can do their work, and a space that can be made by prospective writers. It is to provide readers with space to appreciate and comment on the artist's work and to talk directly with the artist.

도면 1은 본 발명의 실제 예에 따른 서적내용 피디에프(PDF) 파일 제공 시스템의 시스템 구성도1 is a system configuration diagram of a book contents PD file providing system according to a practical example of the present invention.

도면 2는 전자 도서 데이터를 제공하는 흐름도2 is a flow chart for providing e-book data

도면 3은 서적을 가공하는 전자 도서 데이터의 제작 흐름3 is a production flow of e-book data processing books

[도면의 주요 부분에 대한 부호의 설명][Description of Symbols for Main Parts of Drawing]

112 : 웹 서버112: web server

212 : 웹 브라우저212: web browser

252 : 시스템252: system

254 : 사용자 컴퓨터254: your computer

본 발명은 서적을 피디에프(PDF)로 가공하여 피디에프(PDF) 파일서버에 두고 인터넷을 통해 원하는 독자에게 직접 책의 내용을 전달하는 것이다.According to the present invention, a book is processed into a PDF file, placed in a PDF file server, and the contents of the book are directly delivered to a desired reader through the Internet.

＜ 시스템의 하드웨어(H/W) 구성 ＞<Hardware (H / W) configuration of the system>

서적(102)를 스캔(Scan)해서 이미지(image)로 만든 후 피디에프(PDF)파일로 변환하는 데이터 제작 피씨(PC)(104)와 피디에프(PDF)파일에서 도서의 메타 테이터를 추출해내는 인덱싱엔진(106), 메타 데이터 베이스 정보를 담고 있는 데이터 베이스 호스트(Data Base Host)(110)와 피디에프(PDF) 데이터를 저장 및 관리하기 위한 파일서버(FILE Server)(108), 작가의 인터넷상 저작활동(116)을 위한 작가방을 제공하고 고객에게 인터넷(118)을 통해 정보를 사용할 수 있도록 하는 검색엔진을 장착한 웹서버(Web Server)(112), 외부 접속을 통제하는 방화벽(Fire-wall)(114), 인덱싱 엔진을 장착한 서적정보취록 피씨(PC)(106)가 있다.Scanning the book 102 to make an image, and then extracting the metadata of the book from PC (PC) 104 and PD (PDF) files Indexing engine 106, Data Base Host 110 containing meta database information, File Server 108 for storing and managing PDF data, Author's Internet Web server 112 with a search engine that provides a writer's room for award-winning activities 116 and makes the information available to customers via the Internet 118, and a firewall that controls external access. wall 114, and a book information recording PC (PC) 106 equipped with an indexing engine.

＜ 도서 자료의 수집 ＞＜ Collection of book materials ＞

기 출간된 서적은 물론 향후 출간될 서적의 작가 및 저작권자에게 저작물의 인터넷 게시를 위한 저작권이용에 대한 협의를 구하는 것을 기본으로 하되, 작가로 등단하기를 희망하는 예비 작가들이 있을 경우에 이들의 작품도 동일한 절차를 거쳐 수용한다. 대학 및 연구 단체의 논문, 언론매체나 출판사의 출간물 등 종류에 구애 받지 않고, 인터넷으로 서비스가 가능한 모든 자료를 수집하되 도서 분류 기준에 따라 가치있는 자료를 중복없이 수집한다.It is basically based on asking the authors and copyright holders of published books as well as future publications to discuss the use of copyrights for the publication of the work on the Internet. Accepted through the same procedure. Regardless of the type of thesis, publications or publications of universities and research institutes, it collects all materials that can be serviced through the Internet, and collects valuable data without duplicates according to book classification criteria.

＜ 피디에프(PDF) 파일 제작 ＞<PD F (PDF) file production ＞

피디에프(PDF) 파일은 1단계에 아스키(ASCII) 텍스트, 2단계에 이진(binary)값으로 문헌의 내용(텍스트, 이미지, 그래픽 데이터), 3단계에 문헌의 내용을 검색 열람할 수 있도록 도와주는 글꼴 정보와 하이퍼텍스트 정보가 들어가는 3단 구조로 이루어 졌다. 압축된 파일을 사용함으로 전송이 용이하며, 시스템에 독립적으로 서로 다른 시스템에서 자유로이 사용될 수 있고, 기존의 문헌을 전자 문헌화 하는데 있어서 스캔된 이미지 파일을 텍스트 파일로 자동 변환할 수 있다. 또한 파일 자체에 암호를 걸고, 프린트를 제한 할 수도 있다.PDF files help you to search and view the contents of documents (ASCII) text in the first stage, binary contents in the second stage (text, image, graphic data), and the contents of the literature in the third stage. The state has a three-stage structure that contains font information and hypertext information. By using the compressed file, it is easy to transfer, can be used freely in different systems independent of the system, and can automatically convert the scanned image file into a text file in the electronic document of the existing literature. You can also password-protect the file itself and restrict printing.

본 시스템에서는 피디에프(PDF) 파일의 이러한 장점을 이용하기 위해서 출간된 도서(302)를 스캔(304)하여, 암호를 건다. 피디에프(PDF) 파일 자체가 압축을 통해 작아 지긴 했으나, 책 자체의 페이지 수가 방대하므로 모두 띄우기에는 무리가 있다. 따라서 각 페이지 하나 하나를 따로 제작하여 로딩(loading)속도를 높일 수 있도록 한다. 이렇게 제작한 문서 하나 하나를 문자인식(OCR)(312)처리 후 파일서버(File Server)(314) 안에 정한 도서분류체계에 따라 도서 한 권 단위로 디렉토리를 만들어 저장한다.In this system, in order to take advantage of the advantages of the PDF file, the published book 302 is scanned 304 and encrypted. Although the PDF file itself has become smaller through compression, the book itself has a huge number of pages, so it's hard to put them all together. Therefore, each page is produced separately so that the loading speed can be increased. The document produced in this way is processed by the character recognition (OCR) 312 and stored in a directory in units of books according to the book classification system defined in the file server (314).

＜ 인덱싱 엔진 ＞<Indexing engine>

종이 매체를 스캔하여 전자 이미지 파일로 변환하고 검색이 가능한 피디에프파일(PDF FILE)(306)로 만들어 도서에 대한 인덱스 정보를 자동 색인하는 인덱싱 엔진(308)이다.An indexing engine 308 that scans paper media, converts it into an electronic image file, and makes an indexable PDF FILE 306 to automatically index the index information for the book.

기존의 검색엔진이 가진 검색 기능에서 발전한 기능으로 피디에프(PDF) 파일에서 문서내용을 받아 들이고, 단어로 분해한다. 문서에 있는 단어들은 색인이 불필요한 단어의 집합인 불용어 목록에 대해 비교된다. 문서에 있는 단어들이 불용어 목록에 없으면 다음으로 어간이 구분된다. 문서와 데이터베이스 전체에 걸쳐 단어의 빈도수가 검색된 문헌을 배치시키기 위해 자주 사용되므로, 단어들의 수가 계산된다. 마지막으로 문서, 문서 내의 항목, 총 단어수와 같은 관련 정보와 단어들이 데이터 베이스로 입력된다. 그러면 데이터베이스는 책 내용에서 검색에 필요한 저자, 출판사, 출판인, 년도, 목차, 검색키워드 등을 메타 데이터 베이스(310)에 입력할 수 있는 형태로 추출해 낸다. 이 데이터는 도서 검색에 사용되므로 보다 정확한 분류를 위해 사람에 의한 검증 작업을 거쳐 보다 정확한 자료로 가공된다.It is a feature developed from the existing search engine's search function. It accepts document contents from PDF files and decomposes them into words. Words in the document are compared against a list of stopwords, which is a set of words that don't require an index. If the words in the document are not in the stopwords list, the stems are next. The number of words is calculated because the frequency of words is often used to place the retrieved documents across documents and databases. Finally, relevant information and words such as documents, items in documents, and the total number of words are entered into the database. Then, the database extracts the author, the publisher, the publisher, the year, the table of contents, the search keyword, and the like necessary for the search from the book content in a form that can be input to the meta database 310. This data is used for book retrieval, which is then validated by humans for more accurate classification and processed into more accurate data.

＜ 메타 데이터 저장 및 피디에프(PDF) 파일 저장＞<Save Meta Data and Save PDF File>

정보를 검색하기 위해 사용되어지는 메타 데이터(310)는 데이터에 대한 정보를 제공하는 데이터이다. 메타 데이터의 하나의 예로써 도서관에서 사용하는 도서정보 및 검색용 카드를 들 수 있다. 이 정보에는 제목, 저자, 주제, 분류, 책꽂이 마크 등이 포함된다. 이 시스템에서 메타 데이터는 피디에프(PDF)파일의 접근과 탐색을 위한 정보를 제공하고 피디에프(PDF)로 표현하는 도서의 내용을 요약하며, 피디에프(PDF) 간의 의미적 상호 연동성을 제공한다. 또한 이 자체로도 중요한 도서 정보가 된다. 메타 데이터는 앞서 언급한 인덱싱 엔진(308)으로 도서에서 추출하여 만들게 된다.Meta data 310 used to retrieve information is data that provides information about the data. One example of metadata is book information and a search card used in a library. This information includes the title, author, subject, classification, bookshelf mark, and so on. In this system, metadata provides information for accessing and searching PDF files, summarizes the contents of the book in PDF, and provides semantic interoperability between PDFs. . It is also important book information by itself. Meta data is extracted from the book by the aforementioned indexing engine 308.

피디에프(PDF) 파일은 도서 분류체계에 따라 체계적으로 나누어진 디렉토리에 도서 한 권 단위로 같은 디렉토리에 들어가게 된다. 그래서 저장된 파일의 분류 및 검색을 원활하게 할 수 있다.PDF files will be placed in the same directory as a book in a directory organized according to the book classification system. This makes it easier to sort and search stored files.

＜ 도서 검색 및 보기＞＜ Search and View Books ＞

방화벽(Fire Wall)(210)으로 보안된 서비스 서버(252)에 있는 데이터를 고객이 인터넷상의 컴퓨터(254)를 이용해서 회원 인증(206) 후 웹(Web) 서버(208)를 통한 도서를 검색하는 방법은 크게 두 가지 방법으로 나뉜다. 하나는 메타 데이터(204)를 이용해서 제목, 저자, 주제, 분류 등에 따라 검색하는 것이고, 다른 하나는 피디에프(PDF) 파일 전문을 검색(202)하는 것이다.The customer retrieves the book via the Web server 208 after member authentication (206) using a computer 254 on the Internet for data on a service server 252 secured by a firewall (210). There are two ways to do this. One is to search by title, author, subject, classification, etc. using the metadata 204, and the other is to search 202 the full text of the PDF file.

여기서 두 번째 방법은 시스템 자체에 많은 부하를 주므로 메타 데이터를 이용한 검색을 주로 하되, 필요에 따라 전문 검색 서비스를 제공 한다.The second method puts a lot of load on the system itself, and mainly searches using metadata, but provides a specialized search service as needed.

이 두 가지 방법 모두 자연어 검색기능이 제공되는데, 입력한 검색 문자열에서 키워드(keyword)를 추출한 후, 이 키워드(keyword)로 원하는 자료를 보여준다.Both of these methods provide natural language search, which extracts a keyword from the search string and displays the desired data with the keyword.

검색이 완료되면 아크로벳리더(Acrobat Reader)(214)가 플러그인(plug-in)된 웹 브라우저(익스플로러, 넷스케이프)(212)로 온라인(인터넷/인트라넷)상에서 직접 도서를 열람할 수 있다.Once the search is complete, Acrobat Reader 214 can view the book directly online (Internet / Intranet) with a plug-in web browser (Explorer, Netscape) 212.

＜ 도서 다운로드 서비스를 위한 로컬뷰어(Local Viewer) ＞＜ Local Viewer for Book Download Service ＞

피디에프(PDF) 파일을 볼 수 있는 로컬뷰어(Local Viewer)를 제공하여 인터넷에서 읽는 기능 외에도 로컬(local) 상에서도 읽기를 원하는 사용자를 위해서 다운로드 서비스를 제공하여 판매를 한다.It provides a local viewer that can view PDF files, and provides a download service for users who want to read locally.

이때, 파일의 저작권자인 작가에게 가공된 도서 파일의 판매수익금을 지급함으로써 종이 인쇄매체가 아닌 전자 문서 매체의 거래를 활성화한다.At this time, the sales proceeds of the processed book file are paid to the author who is the copyright holder of the file to activate the transaction of the electronic document medium, not the paper print medium.

＜ 다른 형태로 변환 서비스 ＞<Conversion service to other forms>

피디에프(PDF) 파일은 사운드 등 멀티미디어적인 요소를 쉽게 추가할 수 있다. 따라서 향후 데이터의 마이그레이션도 가능하도록 한다.PDF files can easily add multimedia elements such as sound. Therefore, future data migration is possible.

＜ 보안 기능 ＞<Security function>

인쇄 매체를 피디에프(PDF)로 변환 하였을 경우 이 자료를 가공 배포하기가 용이하다는 장점이 있으나 불법적으로 배포될 우려가 있다. 따라서 모든 파일에 패스워드(password)를 걸고 프린팅(printing)제한을 설정하여 무단 배포를 제한한다.When the print media is converted to PDF, it is easy to process and distribute this material, but it may be illegally distributed. Therefore, all files are password-protected and printing restrictions are set to limit unauthorized distribution.

디지털정보와 멀티미디어 정보를 포함하고 있는 전자 문헌들을 인터넷상에서 자주 경험하게 된다. 아무리 훌륭한 정보 시스템과 네트워크 환경을 갖추어도 이를 채울 데이터와 정보가 없다면 아무런 소용이 없다. 또한 데이터의 제작과 축적 기간은 시스템 구축기간보다 월등히 많은 시간과 인력이 투입되며, 또한 표준에서 벗어나거나 범용성이 결여된 데이터라면 재활용할 수 없다.Electronic documents containing digital and multimedia information are frequently experienced on the Internet. No matter how good an information system or network environment is, it is useless without the data and information to fill it. In addition, the production and accumulation period of data is much more time-consuming and manpower than the system construction period, and data that is out of standard or lacks generality cannot be recycled.

미래의 디지털도서관에 있어서 정보의 유통 및 공유라는 기본목적을 충족하기 위해 중요한 요소 중의 하나는 정보자료의 형식이라 할 수 있다. 현재 여러 종류의 형식이 사용되고 있고 그 포맷 형식에 따라 제공하고 있는 기능 및 특징 또한 다양하다. 최근에 주로 쓰이는 것은 피디에프(PDF), 에스지엠엘(SGML), 에이치티엠엘(HTML), 엑스엠엘(XML) 등이 널리 쓰이고 있으나, 본 시스템에서는 피디에프(PDF) 파일 포맷을 채택하였다.In the future of digital libraries, one of the important factors to meet the basic purpose of information distribution and sharing is the format of information materials. Currently, various types of formats are used, and the functions and features provided by the format formats also vary. Recently, the most commonly used are PD, PDF, SGML, HTML, XML, XML, etc., but the system adopts the PDF file format.

피디에프(PDF) 파일은 어도비(Adobe)사에서 일반 및 인터넷에서 통용되는 공통표준문서로 제안한 문서 파일 형식으로서 피디에프(PDF) 는 인터넷 이전에 이미 컴퓨터의 기종, 운영체제, 프린터 종류 및 해상도 등에 제한 없이 어디서나 읽고 출력할 수 있도록 고안된 문서 형태이다. 피디에프(PDF) 는 인터넷에서 종이 인쇄물과 같은 역할을 하면서 전자문서 출판을 서비스 할 수 있는 솔루션이라 할 수 있다. 피디에프(PDF) 문서는 다양한 문서의 포맷을 여러 시스템 환경에서도 제공하고, 공유하고자 하는 경우, 문서 자료의 압축된 자료로 전송하고 보여주고자 하는 경우, 원격지로 특정문서를 보내고 고품질의 프린팅을 원하는 경우에 사용된다.PDF file is a document file format proposed by Adobe as a common standard document commonly used on the Internet and Internet. PDF file is limited to computer type, operating system, printer type and resolution before the Internet. It is a document type designed to be read and output anywhere without a document. PD (PDF) is a solution that can serve as a paper print on the Internet, serving electronic document publishing. PDF documents provide various document formats in various system environments, and if you want to share them, if you want to send and display them as compressed material of document material, you can send a specific document to a remote site and want high quality printing. Is used in the case.

본 시스템은 전자 도서관을 구축함에 있어 피디에프(PDF) 파일의 이러한 특성을 이용하여 데이터를 제작한다. 작가들은 본 시스템을 통해 자신의 저작물을 사장시키지 않고 안전하게 영구히 보존할 수 있는 저장고로서 사용할 수 있고, 독자들은 인터넷을 통해서 원하는 자료를 필요한 시간에 수시로 볼 수 있다. 예비 작가들 또한 특정한 이익단체 또는 출판사 등을 통하지 않더라도 본 시스템을 이용해서 쉽게 등단할 수 있으며, 기존 작가이든 예비 작가이든 인터넷을 통하여 수시로 자신의 신작품에 대한 독자들의 반응을 간단히 측정해 볼 수 있다. 이외에 기존의 영세한 도서관이 지니던 장서 소장문제 역시 구축된 데이터 베이스를 이용해서 상당부분 해결할 수 있다.The system uses these characteristics of PDF files to build data in building electronic libraries. Authors can use this system as a repository that can safely and permanently preserve their works without leaving the company, and readers can view the desired materials at any time through the Internet. Prospective writers can also easily make use of this system without having to go through specific interest groups or publishers, and can easily measure readers' reactions to their new works through the Internet, whether existing or preliminary writers. . In addition, the collection collection problem of existing small libraries can be solved in large part using the built-in database.

Claims

Provides PDF content as PDF files on the Internet, featuring processing methods that abstract metadata about books using an indexing engine that automatically indexes book information for accurate retrieval and rapid processing of large prints. How and that system.

In order to facilitate the classification and retrieval of stored files, and to easily process them in multimedia, the Internet stores the processed data by organizing the directory into one book unit according to the book classification system. Method and system for providing book contents in PDF file.

When reading books on the Internet, if the number of pages is provided, the service speed is slow when the number of pages is huge. Therefore, the method of storing and providing electronic document data to save the loading speed of each page instead of one book unit to increase the loading speed. Method and system for providing book contents in PDF file on the Internet.

If you input natural language as well as keywords such as title, author, subject, classification, etc., natural language book search method that extracts keyword from input search string and shows desired material by this keyword. Method and system for providing book contents in PDF file on the Internet.