KR101096431B1

KR101096431B1 - Method and system of classifying books

Info

Publication number: KR101096431B1
Application number: KR1020110063005A
Authority: KR
Inventors: 김성수
Original assignee: (주)비알네트콤
Priority date: 2011-06-28
Filing date: 2011-06-28
Publication date: 2011-12-20

Abstract

PURPOSE: A method and a system for classifying a book are provided to increase the efficiency of a category classification by automatically classifying a category of a book based on a category learning database. CONSTITUTION: A category learning database(150) receives and stores book data including category of a classified book and a book name from a book information database(200). The category learning database includes distribution information between words configuring book name of the classified book and category. An extracting unit(130) extracts the combination of words which is included in the book name of the classified target book from the book name of a classified target book. A category estimator(140) estimates the category of the classified target book by using the category learning database and the combination of the words.

Description

Book classification method and book classification system {Method and System of Classifying Books}

본 발명은 도서 분류 방법 및 도서 분류 시스템에 관한 것으로서, 보다 상세하게는 도서의 서명으로부터 자동적으로 도서의 카테고리를 분류할 수 있는 도서 분류 방법 및 시스템에 관한 것이다.
The present invention relates to a book classification method and a book classification system, and more particularly, to a book classification method and system capable of automatically classifying a book category from a signature of a book.

최근 대학 도서관 또는 공공 도서관 등 다량의 책을 소장하고 있는 각 도서관에서는 구매 대상 도서를 이용자들로부터 추천받거나, 도서를 유통하는 회사, 또는 출판사 등으로부터 신간도서에 관한 정보를 입수하여 도서를 구매하고 있는 실정이다.Recently, each library that owns a large number of books, such as a university library or a public library, recommends a book for purchase, or obtains information on a new book from a company that distributes the book, or a publisher. It is true.

이렇게 도서를 구매함에 있어서, 각 도서관 실정에 맞도록 도서를 구매할 필요가 있다. 각 도서관에 기 소장되어 있는 도서는 카테고리별로 분류되어 있으며, 도서관은 현재의 카테고리별 비중을 고려하여 향후 구매할 도서의 카테고리별 비중을 중요한 요소로 고려할 수 있다. 그렇다면, 구매 대상이 되는 도서의 카테고리를 적절하게 분류하는 과정이 필요하다.In purchasing a book like this, it is necessary to purchase a book suitable for each library. Books listed in each library are classified by category, and the library can consider the weight of each category of books to be purchased as an important factor considering the current weight of each category. If so, it is necessary to properly classify the categories of books to be purchased.

그런데, 구매 대상이 되는 도서에 대하여 일일이 수작업으로 카테고리를 분류하는 것은 매우 번거로운 일이다. 또한, 동일한 도서라 할지라도 실제 서로 다른 카테고리로 분류될 수 있는 것처럼, 해당 도서관의 분류 업무 담당자가 아닌 제 3 자, 예를 들면 도서 공급 업체에서 분류한다면 실제 해당 도서관의 분류와는 서로 맞지 않을 가능성도 있는 문제가 있다.However, it is very cumbersome to classify categories manually by hand for books to be purchased. In addition, even if the same book can be classified into different categories, it is unlikely that it would be inconsistent with the actual classification of the library if it is classified by a third party (e.g., a book supplier) who is not in charge of the classification work of the library. There is also a problem.

또한, 구매할 도서관에서 직접 분류하고자 하는 경우, 구매 대상 도서 중에는 실제 구매로 이어지지 않는 도서도 다수 포함될 수밖에 없으며, 실제 구매하지도 않는 도서에 대해서 일일이 카테고리를 분류하게 되는 분류 업무는 매우 번거로운 일로 치부되기 쉬운 문제가 있다.In addition, if you want to classify directly at the library to be purchased, the books to be purchased must include a number of books that do not lead to actual purchase, and the classification task of classifying the categories for books that are not actually purchased is very troublesome. There is.

또한, 실제 구매가 이루어진 실물 도서가 도서관에 도착하면 도서를 입고할 때 해당 도서관의 분류 기준 또는 분류 행태에 맞추어서 정확히 도서를 분류하는 과정을 거친다. 그런데, 이러한 분류 작업은 많은 시간과 노력을 필요로 하지만, 이러한 분류 업무를 경감시킬 수 있는 방법은 아직 없는 실정이다.In addition, when a physical book, which is actually purchased, arrives at the library, the book is classified according to the classification criteria or classification behavior of the library when the book is received. However, this sorting work requires a lot of time and effort, but there is no way to alleviate such sorting work.

한편, 상기한 종래 기술의 문제점 및 과제에 대한 인식은 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이 아니므로 이러한 인식을 기반으로 선행기술들과 대비한 본 발명의 진보성을 판단하여서는 아니됨을 밝혀둔다.
On the other hand, since the perception of the problems and problems of the prior art is not obvious to those skilled in the art of the present invention, judging the progress of the present invention compared to the prior art based on this perception Make no sense.

본 발명은 전술한 바와 같은 문제점을 해결하기 위하여 창출된 것으로서, 본 발명의 목적은 도서의 카테고리를 자동적으로 분류할 수 있는 방법 및 시스템을 제공하기 위한 것이다.SUMMARY OF THE INVENTION The present invention was created to solve the above problems, and an object of the present invention is to provide a method and system capable of automatically classifying book categories.

또한 본 발명의 다른 목적은 특정 도서관의 분류 기준 또는 분류 행태에 맞추어서 자동적으로 도서의 카테고리를 분류할 수 있는 방법 및 시스템을 제공하기 위한 것이다.Another object of the present invention is to provide a method and system for automatically classifying book categories according to the classification criteria or classification behavior of a specific library.

또한 본 발명의 다른 목적은 도서의 구매 과정 및 도서의 입고 과정에서 도서의 카테고리 분류 업무를 경감시킬 수 있는 방법 및 시스템을 제공하기 위한 것이다.
In addition, another object of the present invention is to provide a method and system that can reduce the category classification work of books in the purchase process and the receipt of the book.

본 발명의 일 양상에 따른 도서 분류 방법은, 도서 정보를 저장하고 있는 도서정보 데이터베이스(10)와 연계하여 컴퓨터 또는 서버에서 수행되는 도서 분류 방법으로서, 상기 도서정보 데이터베이스(10)로부터 카테고리가 이미 분류된 도서인 기분류 도서의 적어도 카테고리 및 서명을 포함하는 도서 데이터를 입력받고 상기 도서 데이터를 이용하여 카테고리 학습 데이터베이스 - 여기서 카테고리 학습 데이터베이스는 적어도 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보를 포함한다 - 를 구축하는 제 1 단계; 카테고리를 분류할 도서인 분류 대상 도서의 서명을 입력받아 상기 분류 대상 도서의 서명으로부터 단어의 집합을 추출하는 제 2-1 단계; 상기 제 2-1 단계에서 추출된 단어의 집합과 상기 카테고리 학습 데이터베이스를 이용하여 상기 분류대상 도서의 카테고리를 추정하는 제 2-2 단계;를 포함하는 것을 특징으로 한다.A book classification method according to an aspect of the present invention is a book classification method performed by a computer or a server in association with a book information database 10 storing book information, and a category is already classified from the book information database 10. Receiving book data including at least a category and signature of a mood book, which is a book, and using the book data, wherein the category learning database is a distribution between each word and category constituting at least the signature of the mood book Includes information-a first step of constructing; A step 2-1 of receiving a signature of a book to be classified as a book to classify a category and extracting a set of words from the signature of the book to be classified; And a step 2-2 of estimating a category of the book to be classified using the set of words extracted in the step 2-1 and the category learning database.

본 발명의 일 양상에 따른 도서 분류 시스템은, 도서 정보를 저장하고 있는 도서정보 데이터베이스(200)와 연계하여 분류 대상 도서를 자동으로 분류하는 도서 분류 시스템으로서, 상기 도서정보 데이터베이스(10)로부터 카테고리가 이미 분류된 도서인 기분류 도서의 적어도 카테고리 및 서명을 포함하는 도서 데이터를 입력받아 구축되되, 적어도 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보를 포함하는 카테고리 학습 데이터베이스(150); 카테고리를 분류할 도서인 분류 대상 도서의 서명으로부터 상기 분류 대상 도서의 서명에 포함된 단어의 집합을 추출하는 추출 수단(130); 상기 추출 수단(130)에 의해 추출된 단어의 집합과 상기 카테고리 학습 데이터베이스를 이용하여 상기 분류대상 도서의 카테고리를 추정하는 카테고리 추정 수단(140);을 포함하는 것을 특징으로 한다.
A book classification system according to an aspect of the present invention is a book classification system for automatically classifying books to be classified in association with a book information database 200 that stores book information. A category learning database 150 constructed by receiving book data including at least a category and a signature of a mood book, which are already classified books, and including distribution information between each word and category constituting at least the signature of the mood book ; Extraction means (130) for extracting a set of words included in the signature of the book to be classified from the signature of the book to be classified which is the book to be classified; And a category estimating means (140) for estimating the category of the book to be classified using the set of words extracted by the extracting means (130) and the category learning database.

본 발명의 일 양상에 따르면, 도서의 카테고리를 자동적으로 분류할 수 있으므로 카테고리 분류의 효율성을 획기적으로 증대시킬 수 있는 효과가 있다. According to an aspect of the present invention, since the categories of books can be automatically classified, there is an effect that can significantly increase the efficiency of category classification.

또한 본 발명의 일 양상에 따르면, 특정 도서관의 분류 기준 또는 분류 행태에 맞추어서 자동적으로 도서의 카테고리를 분류할 수 있고 이에 따라 특정 도서관의 실정에 맞추어진 카테고리 분류가 가능하게 되는 효과가 있다.In addition, according to an aspect of the present invention, it is possible to automatically classify the categories of books in accordance with the classification criteria or classification behavior of a particular library, and accordingly there is an effect that can be categorized according to the situation of the particular library.

또한 본 발명의 일 양상에 따르면, 도서의 구매 과정 및 도서의 입고 과정에서 도서의 카테고리 분류 업무를 경감시킬 수 있는 효과가 있다.
In addition, according to an aspect of the present invention, there is an effect that can reduce the category classification work of the book in the purchase process and the receipt of the book.

도 1은 본 발명의 일 실시예에 따른 도서 분류 시스템(100) 및 도서 정보 데이터베이스(200)를 도시한 블럭도이다.
도 2는 본 발명의 일 실시예에 따른 도서 분류 방법을 나타내는 플로우 차트로서, 도 2(A)는 카테고리 학습 데이터베이스의 구축을 나타내며, 도 2(B)는 분류 대상 도서의 카테고리를 분류하는 방법의 세부 과정을 나타낸다.
도 3은 카테고리 학습 데이터베이스(150)을 구축하는 과정을 상세히 나타낸 플로우차트이다.
도 4는 각 카테고리에 있어서 각 단어의 출현 카운트의 예를 도시한 도면이다.
도 5는 상기 도 4의 출현 카운트를 기초로 계산되는 출현 빈도율의 예를 도시한 도면이다.
도 6은 추출된 단어의 집합과 카테고리 학습 데이터베이스를 이용하여 분류대상 도서의 카테고리를 추정하는 구체적인 과정을 도시한 플로우 차트이다.1 is a block diagram illustrating a book classification system 100 and a book information database 200 according to an embodiment of the present invention.
FIG. 2 is a flow chart illustrating a book classification method according to an embodiment of the present invention. FIG. 2 (A) illustrates the construction of a category learning database, and FIG. 2 (B) illustrates a method of classifying categories of books to be classified. Detailed process is shown.
3 is a flowchart illustrating a process of building the category learning database 150 in detail.
4 is a diagram illustrating an example of the appearance count of each word in each category.
5 is a diagram illustrating an example of an appearance frequency rate calculated based on the appearance count of FIG. 4.
FIG. 6 is a flowchart illustrating a specific process of estimating a category of a book to be classified using a set of extracted words and a category learning database.

이하, 도면을 참조하여 본 발명의 바람직한 일실시예에 대해서 설명한다. 또한, 이하에 설명되는 실시예는 특허청구범위에 기재된 본 발명의 내용을 부당하게 한정하지 않으며, 본 실시 형태에서 설명되는 구성 전체가 본 발명의 해결 수단으로서 필수적이라고는 할 수 없다.
Hereinafter, with reference to the drawings will be described a preferred embodiment of the present invention. In addition, the Example described below does not unduly limit the content of this invention described in the claim, and the whole structure demonstrated by this embodiment is not necessarily required as a solution of this invention.

1. 도서 분류 시스템1. Book Classification System

도 1은 본 발명의 일 실시예에 따른 도서 분류 시스템(100) 및 도서 정보 데이터베이스(200)를 도시한 블럭도이다.
1 is a block diagram illustrating a book classification system 100 and a book information database 200 according to an embodiment of the present invention.

도서 정보 데이터베이스(200)는 도서 정보를 저장하고 있으며, 예를 들면 대학 도서관 또는 공공 도서관 등에서 소장 도서의 관리를 위하여 구축하고 있는 데이터베이스이다. 도서 정보는 도서에 대한 서명, 카테고리, 구입 연월일, 출판사, 저자, 요약, 서평 등일 수 있으며, 특히 본 발명의 일 실시예에 따르면 도서 정보는 적어도 도서의 서명 및 카테고리에 관한 정보를 포함한다. 도서의 '카테고리'는 한국십진분류(KDC), 듀이십진분류(DDC), 국제십진분류(UDC) 등 다양한 분류법에 따른 것일 수 있다.The book information database 200 stores book information and is, for example, a database that is constructed for the management of books in a university library or a public library. The book information may be a signature, a category, a date of purchase, a publisher, an author, a summary, a book review, and the like for the book. In particular, according to an embodiment of the present invention, the book information includes at least information about a book's signature and category. The 'category' of a book may be according to various classification methods such as Korean Decimal Classification (KDC), Dewey Decimal Classification (DDC), and International Decimal Classification (UDC).

본 발명의 일 실시예에 따른 도서 분류 시스템(100)은 도서정보 데이터베이스(200)와 연계하여 분류 대상 도서를 자동으로 분류하는 기능을 수행하며, 일반적인 컴퓨터 또는 서버로 구현될 수 있다.The book classification system 100 according to an embodiment of the present invention performs a function of automatically classifying books to be classified in association with the book information database 200, and may be implemented as a general computer or a server.

입출력 수단(110)은 도서 정보 데이터베이스(200)와의 데이터 송수신, 관리자 또는 사용자와의 입출력, 다른 컴퓨터 또는 서버(미도시)와의 입출력 등의 기능을 담당한다. 특히 입출력 수단(110)은 도서정보 데이터베이스(10)로부터 도서 데이터를 입력받으며, 도서 데이터에는 적어도 서적의 서명 및 카테고리를 포함한다.The input / output means 110 is responsible for functions such as data transmission and reception with the book information database 200, input / output with an administrator or user, input / output with another computer or server (not shown), and the like. In particular, the input / output means 110 receives book data from the book information database 10, and the book data includes at least the signature and category of the book.

이하에서는 상기한 도서 정보 데이터베이스(100)에 등록된 도서처럼, 이미 카테고리가 분류되어 카테고리가 정해져 있는 서적을 '기분류 도서'라 한다. 한편, 아직 카테고리가 분류되어 있지 않은 도서를 '분류 대상 도서'라 한다. 분류 대상 도서는 본 발명의 도서 분류 방법 또는 도서 분류 시스템을 이용하여 카테고리를 분류할 대상이 되는 도서이다. 특히 분류 대상 도서는 도서 정보 데이터베이스(200)를 이용하는 대학 도서관 또는 공공 도서관 등의 기관과 직접 관련되는 도서일 수 있다. 예를 들면, 이들 기관이 구매할 가능성이 있는 도서, 이들 기관에 입고될 예정인 도서, 이들 기관에서 구매한 도서 등일 수 있다.Hereinafter, like a book registered in the book information database 100 described above, a book whose category has been classified and the category is defined is referred to as a 'classified book'. On the other hand, books that have not yet been categorized are referred to as "books to classify". The book to be classified is a book to be categorized using the book classification method or book classification system of the present invention. In particular, the books to be classified may be books directly related to an institution such as a college library or a public library using the book information database 200. For example, books that are likely to be purchased by these institutions, books that are expected to be received by these institutions, books purchased by these institutions, and the like.

학습 수단(120)은 도서정보 데이터베이스(100)로부터 입력받은 도서 데이터를 이용하여 카테고리 학습 데이터베이스(150)를 구축하는 기능을 수행하며, 구체적으로 카테고리가 이미 분류된 도서인 기분류 도서의 적어도 카테고리 및 서명을 포함하는 도서 데이터로부터 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보를 구축한다.The learning means 120 performs a function of constructing a category learning database 150 using book data input from the book information database 100. Specifically, the learning means 120 includes at least a category of mood books that are books that have already been categorized. Distribution information between each word and category constituting the signature of the mood book is constructed from the book data including the signature.

카테고리 학습 데이터베이스(150)는 학습 수단(120)에 의해 구축되는 각 단어와 카테고리 사이의 분포 정보를 저장하는 기능을 수행한다. 구체적으로 카테고리 학습 데이터베이스(150)는 카테고리가 이미 분류된 도서인 기분류 도서의 적어도 카테고리 및 서명을 포함하는 도서 데이터로부터 구축되는 것으로서, 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보를 저장한다.The category learning database 150 performs a function of storing distribution information between each word and category constructed by the learning means 120. Specifically, the category learning database 150 is constructed from book data including at least a category and a signature of a mood book, which is a book whose category is already classified, and distribution information between each word and category constituting the signature of the mood book. Save it.

추출 수단(130)은 카테고리를 분류할 도서인 분류 대상 도서의 서명으로부터 상기 분류 대상 도서의 서명에 포함된 단어의 집합을 추출하는 기능을 수행하며, 아울러, 기분류 도서의 서명으로부터 기분류 도서의 서명에 포함된 단어의 집합을 추출하는 기능을 수행한다.The extracting unit 130 performs a function of extracting a set of words included in the signature of the book to be classified from the signature of the book to be classified, which is the book to classify the category. Performs a function to extract a set of words included in a signature.

카테고리 추정 수단(140)은 추출 수단(130)에 의해 추출된 단어의 집합과 카테고리 학습 데이터베이스(150)를 이용하여 분류대상 도서의 카테고리를 추정하는기능을 수행한다.The category estimating means 140 performs a function of estimating the category of the book to be classified using the set of words extracted by the extracting means 130 and the category learning database 150.

저장 수단(160)은 카테고리 추정 수단(140)에 의해서 분류된 결과를 저장하며, 도서 분류 시스템(100)의 수행에 필요한 데이터 및 임시 데이터를 저장하는 기능을 수행한다.The storage means 160 stores the results classified by the category estimating means 140 and performs a function of storing data and temporary data necessary for performing the book classification system 100.

서적 분류 시스템(100)의 구체적인 기능 및 부가적인 기능은 이하 서적 분류 방법의 설명을 통하여 보다 잘 이해될 수 있을 것이며, 상기의 설명은 서적 분류 시스템(100)의 모든 기능에 대하여 설명된 것이 아님에 유의하여야 한다.
Specific functions and additional functions of the book classification system 100 may be better understood through the description of the book classification method, and the above description is not described for all the functions of the book classification system 100. Care must be taken.

2. 도서 분류 방법2. How to sort your books

도 2는 본 발명의 일 실시예에 따른 도서 분류 방법을 나타내는 플로우 차트로서, 도 2(A)는 카테고리 학습 데이터베이스의 구축을 나타내며, 도 2(B)는 분류 대상 도서의 카테고리를 분류하는 방법의 세부 과정을 나타낸다.
FIG. 2 is a flow chart illustrating a book classification method according to an embodiment of the present invention. FIG. 2 (A) illustrates the construction of a category learning database, and FIG. 2 (B) illustrates a method of classifying categories of books to be classified. Detailed process is shown.

먼저, 도 2(A)에 도시된 바와 같이 카테고리 학습 데이터베이스를 구축하는 과정(S100)이 선행되어야 한다.First, as shown in FIG. 2A, a process of building a category learning database (S100) must be preceded.

본 발명의 일 실시예에 따른 도서 분류 방법은 도서 정보를 저장하고 있는 도서정보 데이터베이스(10)와 연계하여 실시되며, 도서 정보 데이터베이스(10)는 도서 정보를 저장하고 있고, 예를 들면 대학 도서관 또는 공공 도서관 등에서 도서의 관리를 위하여 구축하고 있는 데이터베이스이다. 도서 정보는 도서에 대한 서명, 카테고리, 구입 연월일, 출판사, 저자, 요약, 서평 등일 수 있으며, 특히 본 발명의 일 실시예에 따르면 도서 정보는 적어도 도서의 서명 및 카테고리에 관한 정보를 포함한다.Book classification method according to an embodiment of the present invention is carried out in conjunction with the book information database 10 that stores the book information, the book information database 10 stores the book information, for example, university library or It is a database being built for the management of books in public libraries. The book information may be a signature, a category, a date of purchase, a publisher, an author, a summary, a book review, and the like for the book. In particular, according to an embodiment of the present invention, the book information includes at least information about a book's signature and category.

카테고리 학습 데이터베이스(150)을 구축하는 과정은 도서정보 데이터베이스(10)로부터 카테고리가 이미 분류된 도서인 기분류 도서의 적어도 카테고리 및 서명을 포함하는 도서 데이터를 입력받고 상기 도서 데이터를 이용하여 수행되며, 카테고리 학습 데이터베이스(150)는 적어도 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보를 포함한다. 상기에서 '구축'은 사람이 수행하는 것을 말하는 것은 아니며, 본 발명의 일 실시예에 따른 도서 분류 시스템(100), 구체적으로 예를 들면 학습 수단(120)이 수행하는 것을 말한다.
The process of constructing the category learning database 150 is performed by using the book data, which receives book data including at least a category and a signature of a mood book, which is a book whose category is already classified, from the book information database 10. The category learning database 150 includes at least distribution information between each word and category constituting the signature of the mood book. In the above description, 'build' does not mean that a person performs, but refers to a book classification system 100 according to an embodiment of the present invention, specifically, the learning means 120.

도 3은 카테고리 학습 데이터베이스(150)을 구축하는 과정을 상세히 나타낸 플로우차트이다.
3 is a flowchart illustrating a process of building the category learning database 150 in detail.

먼저, 도서 분류 시스템(100), 구체적으로 예를 들면 입출력 수단(110)이 도서 정보 데이터베이스(200)으로부터 기분류 도서의 도서 데이터를 입력받는다(S110).First, the book classification system 100, specifically, for example, the input / output means 110 receives book data of a mood book from the book information database 200 (S110).

그리고 도서 분류 시스템(100), 구체적으로 예를 들면 추출 수단(130)은 기분류 도서의 서명으로부터 단어의 집합을 추출한다(S120). 단계 S120에서 기분류 도서의 서명으로부터 단어의 집합을 추출할 때, 추출되는 단어의 집합은 자립 형태소, 실질 형태소 또는 어휘 형태소의 집합일 수 있다.In addition, the book classification system 100, specifically, for example, the extraction unit 130 extracts a set of words from the signature of the mood book (S120). When the set of words is extracted from the signature of the mood book in step S120, the set of extracted words may be a set of self-supporting morphemes, real morphemes, or lexical morphemes.

형태소는 한 언어 내에서 의미를 내포하고 있는 가장 작은 단위로서 더 이상 분석하면 뜻을 잃어버리는 언어 단위이다. 음소와 마찬가지로 형태소는 추상적인 실체이며 발화에서 다양한 형태로 실현될 수 있다. A morpheme is the smallest unit containing meaning in a language and is a language unit that loses its meaning when further analyzed. Like phonemes, morphemes are abstract entities and can be realized in various forms in speech.

형태소는 자립성의 여부에 따라 자립형태소(free morpheme)와 의존형태소(bound morpheme), 의미의 허실에 따라 실질형태소(full morpheme)와 형식형태소(empty morpheme)로 나누어진다. 구체적인 대상이나 동작, 상태와 같은 어휘적 의미를 표시하는 실질형태소를 어휘형태소(lexical morpheme)라고도 하고, 실질형태소에 붙어 주로 말과 말 사이의 형식적 관계를 표시하는 형태소를 문법형태소(grammatical morpheme)라고도 한다.The morphemes are divided into free morphemes, bound morphemes, and full morphemes and empty morphemes according to the loss of meaning. Real morphemes that represent lexical meanings, such as specific objects, actions, and states, are called lexical morphemes. do.

어휘형태소는 어휘적 의미를 가지는 형태소로 어떤 대상이나 상태, 동작을 가리키는 형태소이다. 일반적으로 명사, 동사, 형용사, 부사가 이에 속한다. 예를 들어, “도서관에는 좋은 정보가 많다.”라는 말에서 "도서관", “좋－”, “정보”, “많－”은 어휘형태소에 해당한다. A lexical morpheme is a morpheme that has a lexical meaning and refers to an object, state, or action. Generally, nouns, verbs, adjectives, and adverbs belong to this category. For example, in the words "there is a lot of good information in the library," "library," "good," "information," and "many," are lexical morphemes.

자립형태소는 다른 형태소 없이 홀로 사용될 수 있는 형태소이다. 한국어에서는 일반적으로 명사가 이에 속한다. 위의 예에서 “도서관” 및 “정보”가 이에 해당한다.Freestanding morphemes are morphemes that can be used alone without other morphemes. In Korean, nouns generally belong to this. In the example above, “library” and “information” are the equivalent.

그리고, 도서 분류 시스템(100), 구체적으로 예를 들면 학습 수단(120)은 기분류 도서의 해당 카테고리에 있어서, 추출된 단어의 집합에 속하는 각 단어에 대한 출현 카운트를 하나씩 증가시키는 과정을 수행한다.In addition, the book classification system 100, specifically, for example, the learning means 120 performs a process of increasing the appearance count for each word belonging to the set of extracted words in the corresponding category of the mood book by one. .

그리고 상기한 단계 S120 및 단계 S130은 복수의 기분류 도서에 대하여 반복 수행하며, 예를 들면 학습할 기분류 도서가 더 이상 없을 때까지 수행될 수 있다(단계 S140 참조). 아울러, 단계 S110에서 기분류 도서의 도서 데이터를 한권씩 입력받는 경우에는 단계 S110로 돌아간다.The above-described steps S120 and S130 may be repeatedly performed on the plurality of mood books, for example, until there are no more mood books to learn (see step S140). In addition, when the book data of the mood book is input one by one in step S110, the process returns to step S110.

그리고 단계 S100은 부가적으로 단계 S150를 더 구비할 수 있다. 단계 S150에서 도서 분류 시스템(100)은 모든 기분류 도서의 서명에 포함되는 각 단어의 출현 빈도율을 구한다(S150). 구체적으로 기분류 도서의 각 카테고리에 있어서, 해당 카테고리의 모든 기분류 도서의 서명에 포함되는 각 단어의 출현 빈도율을 구하며, 여기서 '단어의 출현 빈도율'이란 특정 카테고리일 때 해당 카테고리의 모든 기분류 도서의 서명에서 해당 단어가 출현하는 빈도율을 말하며 수학식으로 표현하면 다음과 같다.
In addition, step S100 may further include step S150. In step S150, the book classification system 100 calculates the frequency of appearance of each word included in the signatures of all mood books (S150). Specifically, for each category of mood books, the frequency of occurrence of each word included in the signature of all mood books of the category is calculated, where the frequency of occurrence of the word is a certain category. Refers to the frequency of occurrence of the word in the signature of the classification book.

□ 카테고리 C에서 단어 A의 출현 빈도율□ Frequency of occurrence of word A in category C

= (카테고리 C에 포함되는 서적의 서명에 나타나는 단어 A의 출현 카운트) / (카테고리 C에 포함되는 서적의 서명에 나타나는 전체 단어의 출현 카운트)
= (Count of occurrences of word A in the signature of a book in category C) / (count of appearances of all words in the signature of a book in category C)

도 4는 각 카테고리에 있어서 각 단어에 대한 출현 카운트의 예를 도시한 도면이다. Fig. 4 is a diagram showing an example of appearance counts for each word in each category.

예를 들면, 카테고리 1에는 10권의 도서가 있으며, 이들 도서의 서명에서는 단어A가 5번, 단어B가 2번, 단어C가 2번, 단어D가 2번, 단어E가 1번, 단어F가 1번 나타나서, 출현 카운트는 각각 5, 2, 2, 2, 1, 1이다.
For example, there are 10 books in category 1, and the signature of these books is 5 for word A, 2 for word B, 2 for word C, 2 for word D, 1 for word E, and 1 for word. F appears once, and the appearance counts are 5, 2, 2, 2, 1, and 1, respectively.

도 5는 상기 도 4의 출현 카운트를 기초로 계산되는 출현 빈도율의 예를 도시한 도면이다.5 is a diagram illustrating an example of an appearance frequency rate calculated based on the appearance count of FIG. 4.

예를 들면, 카테고리 1에서 단어A의 출현 빈도율은 5/13이며, 단어B의 출현 빈도율은 2/13이며, 단어C의 출현 빈도율은 2/13이며, 단어D의 출현 빈도율은 2/13이며, 단어E의 출현 빈도율은 1/13이며, 단어F의 출현 빈도율은 1/13이다.For example, in category 1, the frequency of word A is 5/13, the frequency of word B is 2/13, the frequency of word C is 2/13, and the frequency of word D is 2/13, the appearance frequency of word E is 1/13, and the appearance frequency of word F is 1/13.

그리고, 출현 빈도율을 구함에 있어서, 출현 빈도율이 0(ZERO)이 되면 하기에서 설명할 카테고리 추정이 어렵게 되는 문제가 발생할 수 있으므로 이를 회피하기 위하여 위에서 구한 출현 빈도율에 대하여 약간의 보정을 수행할 수도 있다.In order to obtain the appearance frequency rate, when the appearance frequency rate becomes zero (ZERO), a problem that it may be difficult to estimate the category to be described below may occur. Therefore, a slight correction of the appearance frequency rate obtained above is performed. You may.

상기한 출현 빈도율을 계산하는 계산식에서, 분자에는 1을 부가적으로 더하고 분모에는 (해당 단어의 출현 카운트+1)을 부가적으로 더하는 방법을 사용한다. 상기한 출현 빈도율 계산식은 다음과 같이 변형될 수도 있다.
In the above formula for calculating the frequency of appearance, a method of additionally adding 1 to the numerator and additionally adding the appearance count of the word +1 to the denominator is used. The above-mentioned occurrence rate calculation formula may be modified as follows.

□ 카테고리 C에서 단어 A의 출현 빈도율(변형)□ Frequency of occurrence of word A in category C (variation)

= (카테고리 C에 포함되는 서적의 서명에 나타나는 단어 A의 출현 카운트 + 1) / (카테고리 C에 포함되는 서적의 서명에 나타나는 전체 단어의 출현 카운트 + 카테고리 C에 포함되는 서적의 서명에 나타나는 단어 A의 출현 카운트 + 1)
= (Count of occurrences of word A in the signature of a book in category C + 1) / (count of appearances of all words in a signature of a book in category C + word A in the signature of a book in category C Appearance count of + 1)

그리고 카테고리 학습 데이터베이스(150)에는 기분류 도서의 서명을 구성하는 각 단어와 카테고리 사이의 분포 정보가 저장되는 바, 이러한 분포 정보는 상기한 출현 카운트 정보 또는 출현 빈도율 정보일 수 있다.
The category learning database 150 stores distribution information between each word and category constituting the signature of the mood book, and the distribution information may be the appearance count information or the appearance frequency information.

다시 도 2로 돌아와 도 2(B)에 도시된 분류 대상 도서의 카테고리를 분류하는 방법의 세부 과정을 설명한다.2, the detailed process of the method for classifying the category of the book to be classified shown in FIG. 2 (B) will be described.

먼저, 도서 분류 시스템(100)은 분류 대상 도서의 서명을 입력받는다(S200). 예를 들면, 도서 분류 시스템(100)의 입출력 수단(110)을 통하여 다른 컴퓨터 또는 서버(미도시)로부터 입력받을 수 있으며, 또한 도서 분류 시스템(100)의 관리자 또는 사용자로부터 직접 입력받을 수도 있다.First, the book classification system 100 receives a signature of a book to be classified (S200). For example, the input and output means 110 of the book classification system 100 may be input from another computer or server (not shown), or may be directly input from an administrator or user of the book classification system 100.

그리고 도서 분류 시스템(100), 구체적으로 예를 들면 추출 수단(130)은 카테고리를 분류할 도서인 분류 대상 도서의 서명을 입력받아 상기 분류 대상 도서의 서명으로부터 단어의 집합을 추출한다(S300). 이렇게 서명으로부터 단어의 집합을 추출하는 과정은 단계 S120에서 기분류 도서의 서명으로부터 단어의 집합을 추출하는 과정과 유사하다.In addition, the book classification system 100, specifically, for example, the extraction unit 130 receives a signature of a book to be classified as a book to classify a category and extracts a set of words from the signature of the book to be classified (S300). The process of extracting the set of words from the signature is similar to the process of extracting the set of words from the signature of the mood book in step S120.

카테고리를 분류할 도서인 분류 대상 도서의 서명을 입력받아 상기 분류 대상 도서의 서명으로부터 단어의 집합을 추출할 때, 추출되는 단어의 집합은 자립 형태소, 실질 형태소 또는 어휘 형태소의 집합일 수 있다.When extracting a set of words from a signature of a book to be classified, which is a book to classify, and extracting a word from the signature of the book to be classified, the set of words to be extracted may be a self-supporting morpheme, a real morpheme or a lexical morpheme.

그리고, 도서 분류 시스템(100), 구체적으로 예를 들면 카테고리 추정 수단(140)은 추출된 단어의 집합과 카테고리 학습 데이터베이스(150)를 이용하여 분류대상 도서의 카테고리를 추정한다(S400).
In addition, the book classification system 100, for example, the category estimation unit 140 estimates the category of the book to be classified using the extracted word set and the category learning database 150 (S400).

도 6은 추출된 단어의 집합과 카테고리 학습 데이터베이스(150)를 이용하여 분류대상 도서의 카테고리를 추정하는 구체적인 과정을 도시한 도면이다.
FIG. 6 is a diagram illustrating a specific process of estimating a category of a book to be classified using a set of extracted words and a category learning database 150.

먼저 도서 분류 시스템(100)은 분류 가능한 모든 카테고리의 각각에 대하여 상기 분류 대상 도서가 해당 카테고리일 확률과 비례하는 값인 카테고리별 비례 확률을 구한다(S410). 카테고별 비례 확률은 추출된 단어의 집합에 속하는 모든 단어에 대한 '출현 빈도율'과 '카테고리 점유율'을 곱합으로서 구할 수 있다. 여기서 출현 빈도율은 단계 S150에서 구한 '출현 빈도율'을 말하며, 카테고리 학습 데이터베이스(150)에 이미 구축되어 있는 것을 이용하게 된다. 한편, 카테고리 학습 데이터 베이스(150)에는 각 카테고리에 있어서 각 단어의 출현 카운트만을 구축하여 저장하고 있다가, 나중에 실제 서명으로부터 카테고리를 추정하는 과정에서 상기한 출현 빈도율을 실시간으로 계산하여도 된다.First, the book classification system 100 calculates a proportional probability for each category, which is a value proportional to the probability that the book to be classified is a category for each of all categories that can be classified (S410). The proportional probability by category can be obtained by multiplying the occurrence frequency rate and the category occupancy rate for all words belonging to the extracted word set. Here, the appearance frequency rate refers to the "appearance frequency rate" obtained in step S150, and uses the one already established in the category learning database 150. On the other hand, the category learning database 150 may store and store only the appearance count of each word in each category, and calculate the above-mentioned appearance frequency rate in real time in the process of estimating the category from the actual signature later.

그리고 '카테고리 점유율'은 모든 기분류 도서의 수에서 해당 카테고리의 기분류 도서가 점유하는 비율이다. 예를 들면, 도 4에 도시된 예에서 카테고리1의 카테고리 점유율은 10/(10+20+25+30+15)로서 0.1이다.The 'category share' is the ratio of the mood books of the category to the total number of mood books. For example, in the example shown in FIG. 4, the category occupancy of category 1 is 0.1 as 10 / (10 + 20 + 25 + 30 + 15).

예를 들면, 분류대상도서의 서명이 단어 A, B, A, F, E를 포함하고 있다고 가정할 때, 카테고리별 비례 확률은 다음과 같이 계산될 수 있다.
For example, assuming that the signature of the classification book includes the words A, B, A, F, and E, the proportional probability for each category may be calculated as follows.

□ 카테고리 1의 비례 확률□ Proportional Probability of Category 1

=10/100 * (5+1)/(13+6) * (2+1)/(13+3) * (5+1)/(13+6) * (1+1)/(13+2)*(1+1)/(13+2) = 0.000033241
= 10/100 * (5 + 1) / (13 + 6) * (2 + 1) / (13 + 3) * (5 + 1) / (13 + 6) * (1 + 1) / (13+ 2) * (1 + 1) / (13 + 2) = 0.000033241

□ 카테고리 2의 비례 확률□ proportional probability of category 2

= 20/100 * (2+1)/(16+3) * (3+1)/(16+4) * (2+1)/(16+3)* (2+1)/(16+3) * (1+1)/(16+2) = 0.000017495
= 20/100 * (2 + 1) / (16 + 3) * (3 + 1) / (16 + 4) * (2 + 1) / (16 + 3) * (2 + 1) / (16+ 3) * (1 + 1) / (16 + 2) = 0.000017495

□ 카테고리 3의 비례 확률 = 25/100 * (1+1)/(21+2) * (2+1)/(21+3) * (1+1)/(21+2)* (6+1)/(21+7) * (5+1)/(21+6) = 0.000013127
□ Proportional Probability of Category 3 = 25/100 * (1 + 1) / (21 + 2) * (2 + 1) / (21 + 3) * (1 + 1) / (21 + 2) * (6+ 1) / (21 + 7) * (5 + 1) / (21 + 6) = 0.000013127

□ 카테고리 4의 비례 확률 = 30/100 * (2+1)/(13+3) * (2+1)/(13+3) * (2+1)/(13+3)* (3+1)/(13+4) * (1+1)/(13+2) = 0.000062040
□ Proportional Probability of Category 4 = 30/100 * (2 + 1) / (13 + 3) * (2 + 1) / (13 + 3) * (2 + 1) / (13 + 3) * (3+ 1) / (13 + 4) * (1 + 1) / (13 + 2) = 0.000062040

□ 카테고리 5의 비례 확률 = 15/100 * (4+1)/(18+5) * (1+1)/(18+2) * (4+1)/(18+5)* (1+1)/(18+2) * (6+1)/(18+7) = 0.000019849
□ Proportional Probability of Category 5 = 15/100 * (4 + 1) / (18 + 5) * (1 + 1) / (18 + 2) * (4 + 1) / (18 + 5) * (1+ 1) / (18 + 2) * (6 + 1) / (18 + 7) = 0.000019849

그리고 도서 분류 시스템(100)은 위에서 구한 카테고리별 비례 확률 중에서 가장 큰 값을 가지는 비례 확률의 카테고리를 분류 대상 도서의 카테고리로 추정한다(S420). 위의 계산 예에서는 카테고리 4의 비례 확률이 가장 큰 값을 가지므로, 서명에 단어 A, B, A, F, E를 포함하는 서적은 카테고리 4로 분류된다. 그리고 다른 실시 형태로서, 비례 확률이 높은 순으로 카테고리를 추천하는 방법을 사용할 수도 있다. 위의 예에서는 카테고리 4, 카테고리 1의 순위로 추천할 수 있을 것이다.The book classification system 100 estimates the category of the proportional probability having the largest value among the proportional probabilities for each category obtained above as the category of the book to be classified (S420). In the above calculation example, since the proportional probability of category 4 has the largest value, books containing the words A, B, A, F, and E in the signature are classified as category 4. And as another embodiment, the method of recommending a category in the order of high proportional probability can also be used. In the example above, the ranking of category 4 and category 1 may be recommended.

그리고 추정된 카테고리는 저장수단(160)에 일단 저장되며, 입출력 수단(110)을 통하여 다른 컴퓨터 또는 서버로 전송되거나 도서 분류 시스템(100)이 자체적으로 이용할 수도 있다.
The estimated category is once stored in the storage means 160, and may be transmitted to another computer or server through the input / output means 110 or used by the book classification system 100 by itself.

3. 본 발명에 관한 수학적 이론3. Mathematical theory of the present invention

이하에서는 본 발명의 일 실시예에 따른 도서 분류 방법에 관한 수학적 이론을 설명하기로 한다.Hereinafter, a mathematical theory of a book classification method according to an embodiment of the present invention will be described.

여기서, P(A|B)는 사건 B가 발생한 상태에서 사건 A가 발생할 조건부 확률이고, P(B|A)는 사건 A가 발생한 상태에서 사건 B가 발생할 조건부 확률이다. 또한, P(A)는 사건 A가 발생할 확률이고, P(B)는 사건 B가 발생할 확률이다.Here, P (A | B) is a conditional probability that event A will occur while event B occurs, and P (B | A) is a conditional probability that event B will occur while event A occurs. Also, P (A) is the probability that event A will occur and P (B) is the probability that event B will occur.

이를 서명이 D=(t₁,...,t_n)인 서적이 카테고리 C일 확률에 적용하면 다음과 같은 수학식1을 얻을 수 있다.
Applying this to the probability that the book with the signature D = (t ₁ , ..., t _n ) is category C, Equation 1 can be obtained.

여기서 (t₁,...,t_n)은 서명에 포함되는 단어의 집합이며, P(C|D)는 서명 D일 때 카테고리 C일 조건부 확률이고, P(t₁,...,t_n|C)는 카테고리 C일 때 서명 D가 발생할 조건부 확률이며, P(C)는 카테고리 C가 발생할 확률이며, P(t₁,...,t_n)는 서명 D가 발생할 확률이다.Where (t ₁ , ..., t _n ) is the set of words included in the signature, P (C | D) is the conditional probability of category C when signature D, and P (t ₁ , ..., t _n | C) is the conditional probability that signature D will occur when category C, P (C) is the probability that category C will occur, and P (t ₁ , ..., t _n ) is the probability that signature D will occur.

한편, 위 수학식 1을 기초로 다음과 같은 수학식 2를 얻을 수 있다.
Meanwhile, the following Equation 2 can be obtained based on Equation 1 above.

이때,

는 가능한 카테고리들 중에서 그 후단에서 계산되는 값이 최고가 되는 카테고리를 의미한다. 따라서 위 수학식 2는

가 최대가 되는 카테고리는

가 최대가 되는 카테고리와 동일하다는 의미를 가진다. 여기서

는 서명 D일 때 카테고리 C일 확률이며, P(t_i|C)는 카테고리 C일 대 단어 t_i의 확률로서, 실제 계산에 있어서는 전술한 단어의 '출현 빈도율'로서 대표될 수 있다.At this time,

Denotes a category in which the value calculated at a later stage among the possible categories becomes the highest. Therefore, Equation 2 above

Is the largest category

Has the same meaning as the maximum category. here

Is the probability of category C when signature D, and P (t _i | C) is the probability of category C versus word t _i , and can be represented as the 'frequency of occurrence' of the words described above in actual calculation.

P(C)는 카테고리 C가 발생할 확률이며, 실제 계산에 있어서 전술한 '카테고리 점유율'로서 대표될 수 있다. 수학식 1에서 우측변의 P(t₁,...,t_n)는 카테고리에 무관한 인자이므로 수학식 2에서 삭제되었다.P (C) is the probability that category C will occur and can be represented as the aforementioned 'category share' in the actual calculation. In Equation 1, P (t ₁ , ..., t _n ) on the right side is deleted from Equation 2 because it is a category-independent factor.

그러므로, 수학식 2에 따르면,Therefore, according to equation (2),

가 최대로 되는 카테고리를 구하면 결국 서명 D가 속할 가능성이 가장 높은 카테고리를 구할 수 있게 되는 것이다.

If we find the category of which is the maximum, we can find the category most likely to belong to the signature D.

한편,

는 실제 계산에 있어서 전술한 카테고리별 '비례 확률'로 대표될 수 있으며, 이 비례 확률이 최대가 되는 카테고리가 서명 D가 속할 가능성이 가장 높은 카테고리가 되는 것이다.
Meanwhile,

Can be represented by the above-mentioned 'proportional probability' for each category in the actual calculation, and the category in which the proportional probability is maximum becomes the category most likely to belong to the signature D.

이하 본 발명의 여러 양상에 따른 발명의 효과에 대하여 구체적으로 살펴본다.Hereinafter, the effects of the invention according to various aspects of the present invention will be described in detail.

한편, 분류 업무의 특성상 각 도서관마다 동일한 분류 체계를 사용한다고 하여도 실제 도서를 분류하는 업무를 수행함에 있어서 동일한 도서를 서로 다른 카테고리로 분류하는 경우도 많다. 도서관은 카테고리 분류의 일관성을 유지하기 위하여 자체 분류 기준을 마련하거나 독자적인 분류 행태를 가지고 도서를 분류하기도 한다.On the other hand, even in the case of using the same classification system for each library due to the nature of the classification work, the same books are often classified into different categories in performing the task of classifying the actual books. Libraries may establish their own classification criteria or categorize books with their own classification behavior in order to maintain consistency in categorization.

본 발명의 일 양상에 따르면, 특정 도서관의 도서에 관한 도서 정보 데이터베이스와 연계하여 학습된 카테고리 학습 데이터베이스를 구축하고 이를 이용하여 분류하게 되므로, 특정 도서관의 분류 기준 또는 분류 행태에 맞추어서 자동적으로 도서의 카테고리를 분류할 수 있고 이에 따라 특정 도서관의 실정에 맞추어진 카테고리 분류가 가능하게 되는 효과가 있다.According to an aspect of the present invention, since a category learning database learned in association with a book information database of books of a specific library is constructed and classified using the same, the book category is automatically matched to the classification criteria or classification behavior of the specific library. There is an effect that can be classified and thus the category can be tailored to the specific library.

또한 본 발명의 일 양상에 따르면, 시스템에 의해 자동적으로 도서의 카테고리를 추천할 수 있으며, 이에 따라 분류 업무 담당자가 수작업에 의한 정확한 분류를 수행하기 전 참고할 수 있는 정보를 자동 제공하게 되는 효과가 있다. 본 발명의 일 양상에 따르면, 도서의 구매 과정 및 도서의 입고 과정에서 도서의 카테고리 분류 업무를 경감시킬 수 있는 효과가 있다.
In addition, according to an aspect of the present invention, it is possible to automatically recommend the category of the book by the system, accordingly there is an effect that the person in charge of the classification task to automatically provide information that can be referred to before performing the correct classification by manual . According to an aspect of the present invention, there is an effect that can reduce the category classification work of the book in the purchase process and the receipt of the book.

100 : 도서 분류 시스템 110 : 입출력 수단
120 : 학습 수단 130 : 추출 수단
140 : 카테고리 추정 수단 150 : 카테고리 학습 데이터베이스
160 : 저장 수단 200 : 도서 정보 데이터베이스100: book classification system 110: input and output means
120: learning means 130: extraction means
140: Category Estimation Means 150: Category Learning Database
160: storage means 200: book information database

Claims

As a book classification method performed on a computer or server in association with a book information database (10) storing book information,
The computer or the server receives from the book information database 10 book data containing at least a category and signature of a mood book, a category of which is already classified, and using the book data, a category learning database-a category The learning database includes at least a distribution information between each word and category constituting the signature of the mood book;
A step 2-1 of the computer or the server receiving a signature of a book to be classified as a book to classify a category and extracting a set of words from the signature of the book to be classified;
A step 2-2 of the computer or the server estimating a category of the book to be classified using the set of words extracted in the step 2-1 and the category learning database;
The first step is,
For each category of mood books, the frequency of occurrence of each word in the signature of all mood books in the category, where the frequency of occurrence of the word is that word in the signature of all mood books in that category when in a particular category. Refers to the frequency of appearance-to obtain the first 1-3 steps;
The second step 2-2,
Calculating a proportional probability for each category, which is a value proportional to the probability that the book to be classified is a category for each of all categories that can be classified;
And a step 2-2-2 of estimating the category of the proportional probability having the largest value among the proportional probabilities of the categories obtained in the step 2-2-1 as the category of the book to be classified.
In calculating the proportional probability for each category in step 2-2-1,
The proportional probability of each category is the frequency of occurrence and category occupancy for all words belonging to the set of words extracted in step 2-1. The percentage of occupancy is obtained by multiplying the book classification method.

The method according to claim 1,
The first step is,
Extracting a set of words from the signature of the mood book;
In the corresponding category of the mood book, Steps 1 and 2 for increasing the appearance count for each word belonging to the set of words extracted in the step 1-1;
The method of claim 1, wherein the steps 1-1 and 1-2 are repeated for a plurality of mood-like books.

delete

The method according to claim 1,
In order to avoid the problem that the appearance frequency becomes 0 when calculating the appearance frequency of each word in the first to third steps,
A formula for calculating the frequency of appearance, wherein the numerator of the formula is (the count of the occurrences of the corresponding word appearing in the signature of the book included in a particular category) and the denominator of the formula is the sum of Appearance count)
And additionally adding 1 to the numerator and (addition count +1 of the words appearing in the signature of books included in a particular category) to the denominator.

The method according to claim 2,
When extracting a set of words from the signature of the mood book in step 1-1, the extracted set of words is a collection of independent morphemes, real morphemes or lexical morphemes.

Claim 1
When extracting a set of words from the signature of the book to be classified, which is a book to classify categories in step 2-1, and extracting a set of words from the signature of the book to be classified, the set of words to be extracted is an independent morpheme, a real morpheme or a lexical morpheme. Book classification method, characterized in that the set of.

delete

A book classification system for automatically classifying books to be classified in association with a book information database (200) storing book information,
The book information database 10 is constructed by receiving book data including at least a category and a signature of a mood book, which is a book whose category has already been classified, and at least distribution between each word and category constituting the signature of the mood book. A category learning database 150 containing information;
Extraction means (130) for extracting a set of words included in the signature of the book to be classified from the signature of the book to be classified which is the book to be classified;
And category estimating means (140) for estimating the category of the book subject to classification using the set of words extracted by the extracting means (130) and the category learning database.
Distribution information of the category learning database 150,
For each category of mood books, the frequency of occurrence of each word included in the signature of all mood books in that category, where the frequency of occurrence of the word corresponds to the signature of all mood books in that category for a particular category. Refers to the frequency of occurrence of the word-,
The proportional probability for each category is the frequency of occurrence and category occupancy for all words belonging to the set of words extracted by the extraction means 130, where the category occupancy is the mood book of the category in the number of all mood books. The ratio of occupied is obtained by multiplying and estimating the category of the proportional probability by category having the largest value among the obtained proportional probability by category as the category of the book to be classified.

delete

The method according to claim 1,
Book information stored in the book information database 10 relates to books already owned by a particular library,
The book to be classified is a book classification method, characterized in that the book is likely to be purchased by the particular library, books that are to be received in the particular library or books purchased by the particular library.

As a book classification method performed on a computer or server in association with a book information database (10) storing book information,
The computer or the server receives from the book information database 10 book data containing at least a category and signature of a mood book, a category of which is already classified, and using the book data, a category learning database-a category The learning database includes at least a distribution information between each word and category constituting the signature of the mood book;
A step 2-1 of the computer or the server receiving a signature of a book to be classified as a book to classify a category and extracting a set of words from the signature of the book to be classified;
A step 2-2 of the computer or the server estimating a category of the book to be classified using the set of words extracted in the step 2-1 and the category learning database;
The first step is,
For each category of mood books, the frequency of occurrence of each word in the signature of all mood books in the category, where the frequency of occurrence of the word is that word in the signature of all mood books in that category when in a particular category. Refers to the frequency of appearance-to obtain the first 1-3 steps;
The second step 2-2,
Calculating a proportional probability for each category, which is a value proportional to the probability that the book to be classified is a category for each of all categories that can be classified;
From the categories of proportional probability having the largest value among the proportional probabilities obtained in step 2-2-1, two or more categories are sequentially estimated as categories of the book to be classified and used to recommend two or more categories by rank. Including; 2-2-2 step of allowing;
In calculating the proportional probability for each category in step 2-2-1,
The proportional probability of each category is the frequency of occurrence and category occupancy for all words belonging to the set of words extracted in step 2-1. The percentage of occupancy is obtained by multiplying the book classification method.