KR100921892B1

KR100921892B1 - Method and system for generating rank learning model using weight regularization

Info

Publication number: KR100921892B1
Application number: KR1020080038805A
Authority: KR
Inventors: 이상호
Original assignee: 엔에이치엔(주)
Priority date: 2008-04-25
Filing date: 2008-04-25
Publication date: 2009-10-13

Abstract

PURPOSE: A method and system for generating rank learning model using weight regularization are provided to produce the optimized rank learning model by minimizing the error function of the rank learning model. CONSTITUTION: A method for generating rank learning model using weight regularization(800) includes the step of defining the rank learning model which determines the feature weighted value of a document by studying according to the rank of the document group; the step of determining the learning model parameter of the rank learning model repetitively using the subset of the plural number about the document group; the step of producing the updated rank learning model by relearning the whole document group according to the learning model parameter.

Description

Method and system for generating rank training model using weight normalization {METHOD AND SYSTEM FOR GENERATING RANK LEARNING MODEL USING WEIGHT REGULARIZATION}

본 발명은 가중치 정규화를 이용한 랭크 학습 모델 생성 방법 및 시스템에 관한 것이다. 보다 자세하게, 본 발명은 문서 집합의 랭크에 따른 랭크 학습 모델의 에러 함수에 대해 가중치에 따른 정규 함수를 추가하여 상기 랭크 학습 모델의 변이를 줄일 수 있는 랭크 학습 모델을 생성하는 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for generating a rank learning model using weight normalization. More specifically, the present invention relates to a method and system for generating a rank learning model that can reduce the variation of the rank learning model by adding a regular function according to the weight to the error function of the rank learning model according to the rank of the document set. .

다수의 객체를 특정한 목적에 맞게 순서화하는 문제가 중요시되고 있다. 특히, 검색 엔진에 있어서, 검색 DB에 존재하는 다수의 문서들에 있어서, 주어진 검색 쿼리에 대한 문서를 순서화하는 것이 중요하다.The problem of ordering multiple objects for a specific purpose is of importance. In particular, for search engines, for many documents present in a search DB, it is important to order the documents for a given search query.

종래에는 검색 쿼리에 대한 문서의 적합도가 적혀있는 학습자료 <쿼리, 문서, 적합도>가 있을 경우, 검색 모델은 문서 적합도 값 자체를 추정하도록 학습하고 이를 이용해서 문서를 순서화하는 방법이 일반적으로 행해지고 있다. 그러나, 문서의 적합도가 주어지지 않고 문서의 순서만 주어질 경우에는 이를 해결하기에는 새로운 방법이 요구된다. Conventionally, when there is a learning material <query, document, goodness of fit> for a search query, the search model is generally trained to estimate the document goodness value itself, and the method of ordering the documents using the same is generally performed. . However, if only the order of the documents is given without a goodness of fit of the documents, a new method is required to solve them.

또한, 종래에 검색 쿼리에 대한 문서의 적합도 순서만이 제공되는 경우, 상 기 문서의 적합도를 결정하는 것을 학습하는 방법은 존재하였으나, 문서의 속성인 피쳐들의 가중치를 결정하는 방법은 존재하지 않았다. 이 때, 문서 간에 동일한 순서를 나타내는 경우, 이를 처리할 수 있는 구체적인 방법이 제시되지 않고 있다.In addition, when only a document's relevance order for a search query is provided in the related art, there has been a method of learning to determine the goodness of the document, but there is no method of determining the weight of features that are attributes of the document. At this time, when the same order is shown between the documents, a specific method for processing the same has not been presented.

따라서, 문서의 적합도가 주어지지 않는 경우에도 문서를 순서화하고, 이 때, 문서의 속성인 피쳐들의 가중치를 결정하는 학습 모델이 요구되고 있다. 또한, 문서 간 동일한 순서가 존재하는 경우에도 이를 학습하는 방법이 요구되고 있다. 그리고, 주어진 학습 자료를 충분히 이용하여 특정 파라미터에 의해 발생되는 학습 모델의 변이를 줄이는 방법에 대해서도 함께 요구되고 있다. 또한, 학습 모델의 에러를 최소화하여 최적화하는 방법도 요구되고 있다.Accordingly, there is a need for a learning model for ordering documents even when no goodness of fit is given, and determining weights of features that are attributes of the document. In addition, there is a need for a method of learning the same even when the same order exists between documents. In addition, there is a demand for a method of reducing the variation of the learning model caused by a specific parameter by fully utilizing the given learning data. There is also a need for a method of optimizing and minimizing errors in the learning model.

본 발명은 적어도 하나의 문서를 포함하는 문서 집합의 랭크가 주어지는 경우, 랭크 자체가 발생할 확률을 계산하여 문서 각각의 피쳐별 가중치를 결정하는 랭크 학습 모델을 생성하는 방법 및 시스템을 제공한다.The present invention provides a method and system for generating a rank learning model that determines a weight for each feature of a document by calculating a probability of occurrence of the rank itself when a rank of a document set including at least one document is given.

본 발명은 동일한 랭크를 나타내는 적어도 하나의 문서를 포함하는 문서 집합의 랭크가 주어지는 경우에도, 문서 집합의 랭크 자체가 발생할 확률을 학습하는 랭크 학습 모델을 생성하는 방법 및 시스템을 제공한다.The present invention provides a method and system for generating a rank learning model that learns a probability that a rank of a document set will occur even if a rank of a document set including at least one document representing the same rank is given.

본 발명은 가중치 정규화를 통해 랭크 학습 모델이 과학습 (over training)하는 것을 방지하고, 또한 학습된 모델의 안정성 (stability)을 증가시킬 수 있는 랭크 학습 모델을 생성하는 방법 및 시스템을 제공한다.The present invention provides a method and system for generating a rank learning model that can prevent the rank learning model from over training through weight normalization and also increase the stability of the learned model.

본 발명은 피쳐별 가중치에 따른 랭크 학습 모델의 에러 함수를 최소화함으로써, 최적화된 랭크 학습 모델을 생성하는 랭크 학습 모델 생성 방법 및 시스템을 제공한다. The present invention provides a rank learning model generation method and system for generating an optimized rank learning model by minimizing an error function of a rank learning model according to a weight for each feature.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 문서 집합의 랭크에 따라 학습하여 문서의 피쳐별 가중치를 결정하는 랭크 학습 모델을 정의하는 단계 -상기 문서 집합은 랭크에 따라 정렬된 적어도 하나의 문서를 포함함- 상기 문서 집합에 대한 복수 개의 서브 집합을 반복적으로 이용하여 상기 랭크 학습 모델의 학습 모델 파라미터를 결정하는 단계 및 상기 학습 모델 파라미터에 따라 상기 문서 집합 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성하는 단계를 포함할 수 있다.A method for generating a rank learning model according to an embodiment of the present invention includes: defining a rank learning model that determines a weight for each feature of a document by learning according to a rank of a document set, wherein the document sets are at least one arranged according to rank. Determining a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of the document set; and re-learning the entire document set according to the learning model parameter to update the rank. Generating a learning model.

본 발명의 일측면에 따르면, 랭크 학습 모델을 정의하는 상기 단계는 상기 문서 집합의 랭크를 획득하는 단계, 상기 문서 각각의 적합도를 결정하여 상기 획득한 문서 집합의 랭크가 발생할 확률을 계산하는 단계 및 상기 계산된 확률을 이용하여 상기 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 단계를 포함할 수 있다.According to an aspect of the present invention, the step of defining a rank learning model comprises the steps of obtaining a rank of the document set, determining the suitability of each document to calculate the probability of occurrence of the rank of the obtained document set and The method may include performing rank learning to extract a weight for each feature of each document using the calculated probability.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 시스템은 문서 집합의 랭크에 따라 학습하여 문서의 피쳐별 가중치를 결정하는 랭크 학습 모델을 정의하는 랭크 학습 모델 정의부 -상기 문서 집합은 랭크에 따라 정렬된 적어도 하나의 문서를 포함함- 상기 문서 집합에 대한 복수 개의 서브 집합을 반복적으로 이용하여 상기 랭크 학습 모델의 학습 모델 파라미터를 결정하는 학습 파라미터 결정부 및 상기 학습 모델 파라미터에 따라 상기 문서 집합을 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성하는 랭크 학습 모델 업데이트부를 포함할 수 있다.Rank learning model generation system according to an embodiment of the present invention is a rank learning model definition unit for defining a rank learning model to determine the weight for each feature of the document by learning according to the rank of the document set-The document set is sorted according to the rank And a learning parameter determiner for determining a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of the document set and the document set according to the learning model parameters. Re-learning may include a rank learning model updater for generating the updated rank learning model.

본 발명에 따르면, 적어도 하나의 문서를 포함하는 문서 집합의 랭크가 주어지는 경우, 랭크 자체가 발생할 확률을 계산하여 문서 각각의 피쳐별 가중치를 결정하는 랭크 학습 모델을 생성하는 방법 및 시스템이 제공된다.According to the present invention, when a rank of a document set including at least one document is given, a method and system for generating a rank learning model that calculates the probability of occurrence of the rank itself to determine the weight for each feature of each document is provided.

본 발명에 따르면, 동일한 랭크를 나타내는 적어도 하나의 문서를 포함하는 문서 집합의 랭크가 주어지는 경우에도, 문서 집합의 랭크 자체가 발생할 확률을 학습하는 랭크 학습 모델을 생성하는 방법 및 시스템이 제공된다.According to the present invention, even if a rank of a document set including at least one document representing the same rank is given, a method and system for generating a rank learning model for learning the probability of occurrence of the rank of the document set itself is provided.

본 발명에 따르면, 가중치 정규화를 통해 랭크 학습 모델이 과학습 (over training)하는 것을 방지하고, 또한 학습된 모델의 안정성 (stability)을 증가시킬 수 있는 랭크 학습 모델을 생성하는 방법 및 시스템을 제공한다.According to the present invention, there is provided a method and system for generating a rank learning model capable of preventing over ranking of a rank learning model through weight normalization and also increasing the stability of the learned model. .

본 발명에 따르면, 피쳐별 가중치에 따른 랭크 학습 모델의 에러 함수를 최소화함으로써, 최적화된 랭크 학습 모델을 생성하는 랭크 학습 모델 생성 방법 및 시스템이 제공된다.According to the present invention, a rank learning model generation method and system for generating an optimized rank learning model by minimizing an error function of a rank learning model according to a weight for each feature is provided.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 랭크 학습 모델 생성 시스템에 의해 수행될 수 있다.Hereinafter, with reference to the contents described in the accompanying drawings will be described in detail an embodiment according to the present invention. However, the present invention is not limited or limited by the embodiments. Like reference numerals in the drawings denote like elements. The rank learning model generation method according to an embodiment of the present invention may be performed by a rank learning model generation system.

도 1은 본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법을 도시한 플로우차트이다.1 is a flowchart illustrating a rank learning model generation method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 문서 집합의 랭크에 따라 학습하여 문서의 피쳐별 가중치를 결정하는 랭크 학습 모델을 정의할 수 있다(S101). 이 때, 문서 집합은 랭크에 따라 정렬된 적어도 하나의 문서를 포함할 수 있다. The rank learning model generation method according to an embodiment of the present invention may define a rank learning model that determines a weight for each feature of a document by learning according to the rank of a document set (S101). At this time, the document set may include at least one document arranged according to the rank.

그리고, 상기 적어도 하나의 문서는 복수 개의 검색 쿼리 각각에 대한 적어 도 하나의 검색 결과 문서일 수 있다. 일례로, 검색 결과 문서는 상기 검색 쿼리를 통해 검색된 웹페이지를 포함하며, URL형태로 표현될 수 있다.The at least one document may be at least one search result document for each of the plurality of search queries. For example, the search result document may include a web page searched through the search query and may be expressed in a URL form.

일례로, 랭크 학습 모델을 정의하는 단계(S101)는 상기 문서 집합의 랭크를 획득하는 단계를 포함할 수 있다. 그리고, 일례로, 랭크 학습 모델을 정의하는 단계(S101)는 상기 문서 각각의 적합도를 결정하여 상기 획득한 문서 집합의 랭크가 발생할 확률을 계산하는 단계를 포함할 수 있다. 또한, 랭크 학습 모델을 정의하는 단계(S101)는 상기 계산된 확률을 이용하여 상기 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 단계를 포함할 수 있다.In one example, defining a rank learning model (S101) may include obtaining a rank of the document set. In addition, as an example, defining a rank learning model (S101) may include calculating a probability of generating a rank of the acquired document set by determining a fitness of each document. In addition, the step S101 of defining a rank learning model may include performing rank learning to extract a weight for each feature of the document using the calculated probability.

일례로, 문서 집합의 랭크를 획득하는 상기 단계는 적어도 하나의 문서 중 동일한 랭크를 나타내는 문서들이 포함된 문서 집합의 랭크를 획득할 수 있다. 예를 들면, 문서 집합은 1위를 나타내는 문서를 하나 이상 포함할 수 있다. 문서 집합에 동일한 랭크를 나타내는 문서들이 많이 포함될수록 이후 생성되는 랭크 학습 모델은 복잡할 수 있다.In one example, the obtaining of the rank of the document set may obtain the rank of the document set including documents indicating the same rank among the at least one document. For example, a document set may include one or more documents representing first place. As the number of documents representing the same rank is included in the document set, the generated rank learning model may be complicated.

일례로, 문서 집합의 랭크가 발생할 확률을 계산하는 상기 단계는 문서 각각에 대해 나머지 문서들의 적합도를 이용하여 확률을 계산할 수 있다. 이 때, 적합도는 적어도 하나의 문서 각각의 속성인 피쳐들과 상기 피쳐(feature)들 각각의 가중치의 곱을 합산한 가중합(weighted sum)의 지수함수

로 결정될 수 있다.In one example, the step of calculating the probability of occurrence of the rank of the document set may calculate the probability using the goodness of fit of the remaining documents for each document. In this case, the goodness-of-fit is an exponential function of a weighted sum obtained by summing the product of the features of each of the at least one document and the weight of each of the features.

Can be determined.

본 발명은 피쳐들의 가중합을 통해 문서의 검색 쿼리에 대한 적합도를 결정 할 수 있다. 이는 "y(x) > g(x)"와 같은 부등호 관계가 있을 때 각 항에 log를 취할 경우에도 부등호가 바뀌지 않는다는 특성을 이용한 것으로, 원래의 적합도

가 단순하게

로 표현 가능하게 된다. The present invention can determine the goodness of fit of a search query of a document through weighted sum of features. It uses the property that the inequality does not change even if the log is taken for each term when there is an inequality relation such as "y (x)> g (x)".

Simply

It can be expressed as.

본 발명의 일실시예에 따르면, 랭크 학습 모델 생성 방법은 문서 각각의 적합도가 제공되지 않고 문서 집합의 랭크가 제공되더라도, 최대우도예측 (Maximum Likelihood Estimation) 이론에 기반하여, 상기 랭크 자체가 발생할 확률을 최대화하여 피쳐별 가중치를 추출하는 랭크 학습 모델을 정의할 수 있다.According to an embodiment of the present invention, the method for generating a rank learning model is based on the maximum likelihood estimation theory, even if the rank of the document set is provided without the suitability of each document. We can define a rank learning model that maximizes to extract feature weights.

예를 들어, 영화 검색의 경우, 검색 결과인 문서의 피쳐는 최신성, 평점, 리뷰수, 관객수 등이 될 수 있다. 이 때, 문서 집합의 랭크가 결정되는 경우, 본 발명의 일실시예에 따른 랭크 학습 모델은 주어진 피쳐 중 어떤 피쳐에 중점을 두고 결정되었는 지 여부를 나타내는 피쳐별 가중치를 결정할 수 있다. 랭크 학습 모델을 통해 피쳐들의 가중치가 결정되면, 이를 기반으로 특정 쿼리에 대해 적합한 문서가 정렬되도록 랭크를 결정할 수 있다.For example, in the case of a movie search, the feature of the document that is the search result may be freshness, rating, reviews, audience, and the like. In this case, when the rank of the document set is determined, the rank learning model according to an embodiment of the present invention may determine a feature-specific weight that indicates whether or not a given feature has been determined with emphasis. Once the weights of the features are determined using the rank learning model, the ranks can be determined so that the appropriate documents are sorted for a particular query.

일례로, 랭크 학습을 수행하는 상기 단계는 상기 문서 집합의 랭크에 따라 계산된 확률이 최대가 될 수 있는 피쳐별 가중치를 추출하는 과정을 학습할 수 있다. For example, the step of performing rank learning may learn a process of extracting a weight for each feature whose probability calculated according to the rank of the document set may be the maximum.

문서 집합의 랭크가 발생할 확률을 계산하는 구체적인 과정 및 랭크 학습을 수행하는 구체적인 과정은 도 2 내지 도 5에서 구체적으로 설명된다.A detailed process of calculating a probability of generating a rank of a document set and a specific process of performing rank learning are described in detail with reference to FIGS. 2 to 5.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 문서 집합에 대한 복수 개의 서브 집합을 반복적으로 이용하여 상기 랭크 학습 모델의 학습 모델 파라미터를 결정할 수 있다(S102).The rank learning model generation method according to an embodiment of the present invention may determine a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of a document set (S102).

일례로, 학습 모델 파라미터를 결정하는 단계(S102)는 문서 집합을 복수 개의 서브 집합으로 분할하는 단계를 포함할 수 있다. 그리고, 학습 모델 파라미터를 결정하는 단계(S102)는 복수 개의 서브 집합 각각에 대해 교차 검증을 수행하여 상기 서브 집합들 각각에 대한 정규 파라미터들을 생성하는 단계를 포함할 수 있다. 또한, 학습 모델 파라미터를 결정하는 단계(S102)는 생성된 정규 파라미터들을 이용하여 최적의 정규 파라미터를 예측하고, 이 예측된 정규 파라미터를 적용한 후 전체 자료에 대해 학습 모델 파라미터를 계산하는 단계를 포함할 수 있다.In one example, determining the learning model parameter (S102) may include dividing the document set into a plurality of subsets. In operation S102, determining the learning model parameter may include performing normalization on each of the subsets by performing cross-validation on each of the plurality of subsets. In addition, the step of determining the training model parameter (S102) may include predicting an optimal normal parameter using the generated normal parameters, applying the predicted normal parameter, and then calculating the learning model parameter for the entire material. Can be.

이 때, 학습 모델 파라미터는 정의된 랭크 학습 모델의 피쳐별 가중치에 따른 정규 함수(regularization function)를 제어하는 최적의 파라미터일 수 있다. 이 때, 정규 함수는 랭크 학습 모델을 통해 추출되는 피쳐별 가중치로 결정될 수 있다.In this case, the learning model parameter may be an optimal parameter for controlling a regularization function according to the weight for each feature of the defined rank learning model. In this case, the normal function may be determined as a weight for each feature extracted through the rank learning model.

정의된 랭크 학습 모델을 통해 추출되는 피쳐별 가중치의 값이 큰 경우, 랭크 학습 모델의 모델 복잡도(Model Complexity)는 증가한다. 그러면, 랭크 학습 모델의 안정성 (stability)이 떨어지게 되며, 결과적으로 모델에게 있어서 입력 자료의 작은 변화가 랭크 학습 모델의 출력값에 많은 영향을 끼치게 되는 나쁜 결과를 가져오게 된다.If the weight value of each feature extracted through the defined rank learning model is large, model complexity of the rank learning model increases. As a result, the stability of the rank learning model is degraded, and as a result, a small change in the input data has a bad effect on the output of the rank learning model.

따라서, 학습 모델 파라미터를 결정하는 단계(S102)는 랭크 학습 모델의 모델 복잡도를 감소시키기 위해 모델 복잡도에 영향을 끼치는 피쳐별 가중치에 대한 정규 함수를 디자인하고, 이를 이용하여 랭크 학습 모델의 안정성을 향상시킬 수 있는 상기 정규 함수에 대한 학습 모델 파라미터를 결정할 수 있다.Therefore, the step of determining the learning model parameters (S102) design a regular function for the weight for each feature that affects the model complexity in order to reduce the model complexity of the rank learning model, by using this to improve the stability of the rank learning model A learning model parameter for the regular function can be determined.

학습 모델 파라미터를 결정하는 구체적인 과정은 도 6 내지 도 7에서 구체적으로 설명된다.A detailed process of determining the learning model parameter is described in detail with reference to FIGS. 6 to 7.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 학습 모델 파라미터에 따라 상기 문서 집합 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성할 수 있다(S103).The rank learning model generation method according to an embodiment of the present invention may generate the updated rank learning model by relearning the entire document set according to a learning model parameter (S103).

일례로, 업데이트된 랭크 학습 모델을 생성하는 단계(S103)는 피쳐별 가중치에 따른 정규 함수에 상기 학습 모델 파라미터를 반영하여 상기 랭크 학습 모델의 에러 함수를 수정하는 단계를 포함할 수 있다. 그리고, 업데이트된 랭크 학습 모델을 생성하는 단계(S103)는 수정된 에러 함수를 상기 랭크 학습 모델에 적용하는 단계를 포함할 수 있다. For example, generating the updated rank learning model (S103) may include modifying an error function of the rank learning model by reflecting the learning model parameter in a regular function according to a weight for each feature. The generating of the updated rank learning model in operation S103 may include applying a modified error function to the rank learning model.

이 때, 랭크 학습 모델의 에러 함수를 수정하는 단계는 상기 랭크 학습 모델로부터 상기 피쳐별 가중치에 따른 에러 함수를 결정하고, 상기 결정된 에러 함수에 상기 학습 모델 파라미터가 반영된 정규 함수를 추가할 수 있다.At this time, the step of modifying the error function of the rank learning model may determine an error function according to the weight for each feature from the rank learning model, and may add a regular function reflecting the learning model parameters to the determined error function.

결국, 피쳐별 가중치에 따른 정규 함수가 랭크 학습 모델의 에러 함수에 반영되기 때문에, 수정되는 에러 함수의 크기는 증가할 수 있다. 다만, 정규 함수 때문에, 랭크 학습 모델의 피쳐별 가중치에 따른 변이는 감소하게 되고, 최종적으로 더욱 안정적인 랭크 학습 모델이 생성될 수 있다.As a result, since the normal function according to the feature-specific weight is reflected in the error function of the rank learning model, the size of the error function to be modified may increase. However, because of the normal function, the variation according to the weight for each feature of the rank learning model is reduced, and finally a more stable rank learning model can be generated.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 업데이트된 랭크 학습 모델의 에러 함수를 최소화하여 상기 랭크 학습 모델을 최적화할 수 있다(S104). 이 때, 최소화 대상이 되는 에러 함수는 단계(S103)에 따라 가중치에 따른 정규 함수가 반영되어 수정된 에러 함수를 의미한다.The rank learning model generation method according to an embodiment of the present invention may minimize the error function of the updated rank learning model to optimize the rank learning model (S104). In this case, the error function to be minimized refers to an error function modified by reflecting a normal function according to a weight according to step S103.

일례로, 랭크 학습 모델을 최적화하는 단계(S104)는 상기 수정된 에러 함수가 최소가 될 수 있는 최적화된 가중치를 결정할 수 있다. 이 때, 상기 수정된 에러 함수가 최소가 될 수 있는 최적화된 가중치를 결정하는 방법은 다양한 방법이 적용될 수 있다. In one example, optimizing the rank learning model (S104) may determine an optimized weight at which the modified error function may be minimal. In this case, various methods may be applied to the method for determining an optimized weight value in which the modified error function may be minimized.

따라서, 본 발명은 문서 집합의 랭크가 발생할 확률을 학습에 이용하여 문서 집합을 구성하는 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습 모델을 정의할 수 있다. 일례로, 본 발명은 multi-class Bradley-Terry 방법을 이용하여 문서 각각의 피쳐별 가중치를 결정하는 랭크 학습 모델을 정의할 수 있다.Accordingly, the present invention may define a rank learning model that extracts weights for each feature of each document constituting the document set by using the probability of rank of the document set in learning. For example, the present invention may define a rank learning model that determines weights for each feature of a document using a multi-class Bradley-Terry method.

그리고, 본 발명은 주어진 학습 자료를 정의된 랭크 학습 모델에 따라 교차 검증(cross-validation)하여, 랭크 학습 모델이 과학습되는 것을 방지할 수 있도록 학습 모델 파라미터를 결정함으로써 랭크 학습 모델을 업데이트할 수 있다.In addition, the present invention can update the rank learning model by determining the learning model parameters so as to cross-validate the given learning material according to the defined rank learning model to prevent the rank learning model from being learned. have.

이 때, 피쳐별 가중치를 정규화 (weight regularization)하고, 정규화된 가중치를 제어하는 학습 모델 파라미터를 결정할 수 있다. 결국, 정규화된 피쳐별 가중치를 제어하는 학습 모델 파라미터를 이용하여 랭크 학습 모델을 업데이트함으로써, 랭크 학습 모델의 안정성을 확보할 수 있다..In this case, weight weighting for each feature may be normalized, and a learning model parameter for controlling the normalized weight may be determined. As a result, by updating the rank learning model using the learning model parameter that controls the weight for each normalized feature, it is possible to secure the stability of the rank learning model.

또한, 본 발명은 랭크 학습 모델의 에러 함수를 최소화할 수 있는 최적화된 가중치를 결정함으로써, 랭크 학습 모델을 최적화할 수 있다. In addition, the present invention can optimize the rank learning model by determining an optimized weight that can minimize the error function of the rank learning model.

도 2는 본 발명의 일실시예에 있어서, 특정 쿼리에 대해 문서 집합의 랭크를 획득하는 과정을 도시한 도면이다.2 is a diagram illustrating a process of obtaining a rank of a document set for a specific query according to an embodiment of the present invention.

도 2를 참고하면, 쿼리(201) A에 대한 문서 집합(202)은 문서 1, 문서 2, 문서 3 및 문서 4를 포함할 수 있다. 문서 집합(202)은 적어도 하나의 문서를 포함할 수 있으며, 문서의 개수는 제한되지 않는다. 여기서, 문서는 쿼리(201)를 통해 검색된 웹페이지를 포함하며, URL 형태로 표현될 수 있다.Referring to FIG. 2, document set 202 for query 201 A may include document 1, document 2, document 3, and document 4. The document set 202 may include at least one document, and the number of documents is not limited. Here, the document includes a web page searched through the query 201 and may be expressed in the form of a URL.

일례로, 랭크 학습 모델을 정의하는 단계(S101)는 문서 집합의 랭크를 획득하는 단계를 포함할 수 있다. 문서 집합의 랭크를 획득하는 단계는 적어도 하나의 문서로 구성된 문서 집합의 랭크를 획득할 수 있다. 도 2에 도시되었듯이, 문서 2가 1위, 문서 1이 2위, 문서 4가 3위, 문서 3이 4위인 문서 집합의 랭크(203)가 획득된 경우를 가정할 수 있다.In one example, defining a rank learning model (S101) may include obtaining a rank of a document set. Acquiring the rank of the document set may acquire the rank of the document set composed of at least one document. As shown in FIG. 2, it may be assumed that a rank 203 of a document set in which document 2 is first, document 1 second, document 4 third, and document 3 fourth is obtained.

이 때, 본 발명은 획득한 문서 집합의 랭크를 이용하여 임시적인 랭크 학습 모델을 설정할 수 있다. 즉, 본 발명은 획득한 문서 집합의 랭크를 이용하여 문서 집합의 랭크 자체가 발생할 확률을 최대화함으로써, 문서 각각의 피쳐별 가중치를 결정할 수 있는 랭크 학습 모델을 설정할 수 있다.In this case, the present invention may set a temporary rank learning model using the acquired rank of the document set. That is, the present invention maximizes the probability of occurrence of the rank of the document set by using the acquired rank of the document set, thereby establishing a rank learning model that can determine the weight for each feature of each document.

이 때, 랭크는 문서 각각의 적합도에 따라 결정될 수 있지만, 문서 각각의 적합도의 크기 순서와 대응되는 것은 아니다. 이 때, 문서의 적합도는 쿼리(201)에 대한 문서의 관련성 정도를 의미할 수 있다.At this time, the rank may be determined according to the goodness of fit of each document, but it does not correspond to the size order of the goodness of fit of each document. In this case, the goodness of fit of the document may mean a degree of relevance of the document to the query 201.

다만, 적합도의 크기에 따라 문서 집합이 결정되는 것은 아니라고 할 수 있다. 즉, 적합도가 낮은 문서가 적합도가 높은 문서보다 랭크가 높게 결정될 수 있 는 확률도 있는 것이다. 다만, 문서의 적합도의 크기에 따라 랭크가 결정되는 경우, 상기 결정된 랭크가 발생할 확률은 높아질 수 있다.However, it can be said that the document set is not determined by the size of the goodness of fit. In other words, there is a possibility that a document with low suitability may be determined to have a higher rank than a document with high suitability. However, when the rank is determined according to the size of the goodness of fit of the document, the probability of occurrence of the determined rank may be increased.

일례로, 적합도는 적어도 하나의 문서 각각의 속성인 피쳐들과 상기 피쳐(feature)들 각각의 가중치의 곱을 합산한 가중합(weighted sum)의 지수함수

로 결정될 수 있다. 다만, 본 발명의 일실시예에 따르면, 적어도 하나의 문서 각각의 적합도는 제공되지 않고, 문서 집합의 랭크가 획득될 수 있다.In one example, the goodness-of-fit is an exponential function of a weighted sum that is the sum of the product of the attributes of each of the at least one document and the weight of each of the features.

Can be determined. However, according to an embodiment of the present invention, the suitability of each of the at least one document is not provided, and the rank of the document set may be obtained.

도 3은 본 발명의 일실시예에 있어서, 문서 집합의 랭크가 발생할 확률을 계산하는 과정을 설명하기 위한 일례를 도시한 도면이다.3 is a diagram illustrating an example for explaining a process of calculating a probability of occurrence of a rank of a document set according to an embodiment of the present invention.

문서 집합의 랭크를 획득하는 단계는 적어도 하나의 문서로 구성된 문서 집합의 랭크를 획득할 수 있다. 도 3에서 볼 수 있듯이, 문서 집합의 랭크를 획득하는 단계는 문서 2가 1위, 문서 1이 2위, 문서 4가 3위, 문서 3이 4위인 문서 집합의 랭크(203)를 획득한 것을 가정할 수 있다.Acquiring the rank of the document set may acquire the rank of the document set composed of at least one document. As can be seen in Figure 3, the step of acquiring the rank of the document set is that document 2 ranks first, document 1 second, document 4 third, and document 3 rank 203 of the document set obtained Can assume

일례로, 랭크 학습 모델을 정의하는 단계(S101)는 문서 각각의 적합도를 결정하여 상기 획득한 문서 집합의 랭크가 발생할 확률을 계산하는 단계를 포함할 수 있다. 문서 집합의 랭크가 발생할 확률을 계산하는 단계는 적어도 하나의 문서 각각의 적합도를 결정하여 상기 확률을 계산할 수 있다.For example, the step S101 of defining a rank learning model may include calculating a probability of occurrence of a rank of the acquired document set by determining a fitness of each document. The calculating of the probability of generating the rank of the document set may calculate the probability by determining a goodness of fit of each of the at least one document.

일례로, 문서 집합의 랭크가 발생할 확률을 계산하는 단계는 하기 수학식 1에 따라 확률을 계산할 수 있다.For example, the calculating of the probability of generating the rank of the document set may calculate the probability according to Equation 1 below.

이 때,

는 문서 각각의 적합도를 의미한다. 그리고,

는 k번째 랭크를 나타내는 문서를 의미하며,

는 문서 집합의 랭크가 발생할 확률을 의미한다.

는 문서의 피쳐를 의미하고,

는 피쳐별 가중치를 의미한다. 상기 수학식 1에서 볼 수 있듯이, 문서 각각의 적합도는 적어도 하나의 문서 각각의 속성인 피쳐들의 가중합의 지수함수로 결정될 수 있다.At this time,

Means the goodness of fit of each document. And,

Means the document representing the k th rank,

Denotes the probability that a document set rank will occur.

Means a feature in the document,

Denotes a weight for each feature. As can be seen in Equation 1, the goodness of fit of each document may be determined by the exponential function of the weighted sum of features that are attributes of each of the at least one document.

이 때, 문서 집합의 랭크가 발생할 확률을 계산하는 단계는 문서 각각에 대해 나머지 문서들의 적합도를 이용하여 확률을 계산할 수 있다. 구체적으로, 본 발명에 일실시예에 따르면, 해당 순위의 문서와 나머지 문서의 적합도를 이용하여 문서 집합의 랭크가 발생할 확률이 계산될 수 있다.In this case, the calculating of the probability of generating the rank of the document set may calculate the probability using the goodness of fit of the remaining documents for each document. Specifically, according to an embodiment of the present invention, the probability of generating a rank of a document set may be calculated using the goodness of fit of the document of the corresponding order and the remaining documents.

도 3에서 도시된 것처럼, 각 문서 옆에 도시된 문서 집합의 랭크를 상기 수학식 1에 적용하면, 문서 집합의 랭크가 발생할 확률(301)이 계산될 수 있다. 즉, 문서 2, 문서 1, 문서 4, 문서 3의 순서로 정렬된 문서 집합의 랭크가 발생할 확률 은

가 된다. 이 때, 적합도의 크기에 따라 랭크가 결정되는 경우, 상기 랭크가 발생할 확률은 그렇지 않은 경우와 비교해서 최대의 확률 값을 가지게 된다.As shown in FIG. 3, when the rank of the document set shown next to each document is applied to Equation 1, the probability 301 of occurrence of the rank of the document set may be calculated. That is, the probability of occurrence of the rank of a set of documents arranged in the order of Document 2, Document 1, Document 4, and Document 3 is

Becomes At this time, when the rank is determined according to the magnitude of the goodness of fit, the probability of occurrence of the rank has a maximum probability value compared with the case where it is not.

일례로, 랭크 학습 모델을 정의하는 단계(S101)는 문서 집합의 랭크가 발생할 확률을 이용하여 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 단계를 포함할 수 있다.For example, the step S101 of defining a rank learning model may include performing rank learning to extract weights for each feature of each document by using a probability that a rank of a document set occurs.

이 때, 랭크 학습을 수행하는 단계는 문서 집합의 랭크가 발생할 확률이 최대값을 나타내는 피쳐별 가중치를 결정하는 학습 모델을 설정할 수 있다. 일례로, 랭크 학습을 수행하는 단계는 하기 수학식 2를 적용하여 랭크 학습을 수행할 수 있다.In this case, the performing of the rank learning may set a learning model for determining a weight for each feature having a maximum probability that the rank of the document set will occur. For example, the performing of the rank learning may perform rank learning by applying Equation 2 below.

도 4는 본 발명의 일실시예에 있어서, 특정 쿼리에 대해 적어도 하나의 문서 중 동일한 랭크를 나타내는 문서가 포함된 문서 집합의 랭크를 획득하는 과정을 도시한 도면이다.FIG. 4 is a diagram for a process of obtaining a rank of a document set including a document indicating a same rank among at least one document for a specific query in accordance with one embodiment of the present invention.

일례로, 문서 집합의 랭크를 획득하는 단계는 적어도 하나의 문서 중 동일 한 랭크를 나타내는 문서가 포함된 문서 집합의 랭크를 획득할 수 있다. 다시 말해서, 동일한 랭크를 나타내는 문서가 적어도 하나일 수 있다.For example, the obtaining of the rank of the document set may obtain the rank of the document set including the document indicating the same rank among the at least one document. In other words, there may be at least one document representing the same rank.

도 4에서 볼 수 있듯이, 쿼리(401)에 대해 문서 1, 문서 2, 문서 3, 문서 4, 문서 5를 포함하는 문서 집합(402)을 가정할 수 있다. 이 때, 문서 집합의 랭크를 획득하는 단계는 문서 2 및 문서 3이 1위이고, 문서 1 및 문서 4가 2위, 문서 5가 3위인 문서 집합(402)의 랭크(403)를 획득할 수 있다.As can be seen in FIG. 4, a query set 402 including document 1, document 2, document 3, document 4, and document 5 may be assumed for query 401. At this time, the step of acquiring the rank of the document set may obtain the rank 403 of the document set 402, in which document 2 and document 3 are first, document 1 and document 2 second, and document 5 third. have.

도 4의 경우, 문서 2 및 문서 3이 1위인 문서 집합(402)의 랭크(403)를 획득한 경우, 문서 2와 문서 3 모두 1위가 될 수 있다. 이 때, 문서 2와 문서 3 사이의 순서는 바뀔 수 있다. 그리고, 문서 1과 문서 4 모두 2위가 될 수 있다. 마찬가지로, 문서 1과 문서 4 사이의 순서는 바뀔 수 있다. 결국, 문서 집합(402)에 포함된 문서를 조합하면, 4가지 경우의 문서 집합의 랭크가 발생할 수 있다.In the case of FIG. 4, when document 2 and document 3 obtain the rank 403 of document set 402, which is first, both document 2 and document 3 may be ranked first. At this time, the order between the document 2 and the document 3 may be changed. In addition, both Document 1 and Document 4 may be placed second. Similarly, the order between document 1 and document 4 can be reversed. As a result, when the documents included in the document set 402 are combined, four cases of document set rank may occur.

따라서, 적어도 하나의 문서 중 동일한 랭크를 나타내는 문서가 포함된 문서 집합의 랭크를 획득하는 경우, 본 발명은 문서를 조합하여 발생할 수 있는 모든 문서 집합의 랭크를 생성할 수 있다. 이 때, 본 발명은 생성된 모든 경우의 랭크가 발생할 수 있는 확률을 계산할 수 있다.Therefore, when obtaining a rank of a document set including a document representing the same rank among at least one document, the present invention can generate a rank of all document sets that can occur by combining the documents. At this time, the present invention can calculate the probability that a rank in all generated cases can occur.

도 5는 본 발명의 일실시예에 있어서, 동일한 랭크를 나타내는 문서를 조합하여 생성 가능한 모든 문서 집합을 추출하는 과정을 설명하기 위한 일례를 도시한 도면이다.FIG. 5 is a diagram illustrating an example for explaining a process of extracting all document sets that can be generated by combining documents representing the same rank according to one embodiment of the present invention.

도 5에서 볼 수 있듯이, 문서 집합의 랭크(403)는 문서 2 및 문서 3이 1위이고, 문서 1 및 문서 4가 2위, 문서 5가 3위일 수 있다. 일례로, 랭크(403)가 도 5와 같이 주어진 경우, 하기 수학식 3에 따라 다음과 같은 문서 그룹이 생성될 수 있다.As shown in FIG. 5, the rank 403 of the document set may be Document 2 and Document 3 first, Document 1 and Document 4 second, and Document 5 third. For example, when the rank 403 is given as shown in FIG. 5, the following document group may be generated according to Equation 3 below.

여기서,

는 각각 문서 2, 문서 3을 의미하며, 순서는 변경될 수 있다. 그리고,

는 각각 문서 1, 문서 4를 의미하며, 순서는 변경될 수 있다. 그리고,

는 문서 5를 의미할 수 있다. 이 때, 같은 문서 그룹에 속한 문서들은 상기 문서 그룹 내에서 순서가 변경되어도 무관하다.here,

Denote document 2 and document 3, respectively, and the order may be changed. And,

Denotes Document 1 and Document 4, respectively, and the order may be changed. And,

May mean document 5. In this case, documents belonging to the same document group may be changed in the document group.

일례로, 문서 집합의 랭크가 발생할 확률을 계산하는 단계는 동일한 랭크를 나타내는 문서를 조합하여 생성 가능한 모든 문서 집합을 추출할 수 있다. 이 때, 문서 집합의 랭크가 발생할 확률을 계산하는 단계는 동일한 랭크를 나타내는 문서끼리 문서 그룹을 생성할 수 있다.For example, the calculating of the probability of generating the rank of the document set may extract all the document sets that can be generated by combining the documents representing the same rank. At this time, the step of calculating the probability of generating a rank of the document set may generate a document group between documents representing the same rank.

예를 들어, 생성된 문서 그룹이

인 경우, 추출되는 문서 집합의 랭크는

,

일 수 있다. 상기 예를 도 4 에서 적용하여, 동일한 랭크를 나타내는 문서를 통해 생성된 문서 그룹을 조합하면, 문서 집합(501, 502, 503, 504)이 추출될 수 있다.For example, the generated document group

If, the rank of the document set to be extracted is

,

Can be. By applying the above example in FIG. 4, by combining document groups generated through documents representing the same rank, document sets 501, 502, 503, and 504 can be extracted.

그러면, 수학식 1을 적용하여 생성된 모든 문서 집합의 랭크가 발생할 수 있는 확률의 총합을 구하면, 하기 수학식 4와 같다. Then, the sum of probabilities that the ranks of all the document sets generated by applying Equation 1 can be obtained, as shown in Equation 4 below.

일례로, 동일한 랭크를 나타내는 문서의 개수가 복수인 경우, 상기 문서를 조합하여 생성 가능한 모든 문서 집합을 추출하면, 추출될 수 있는 문서 집합의 개수는 기하급수적으로 증가할 수 있다. 따라서, 본 발명의 일실시예에 따라 확률 계산 과정에서 각 항에서 분모가 다른 부분을 상수로 두는 과정을 통해 간략화하는 경우, 하기 수학식 5와 같이 추출된 문서 집합의 랭크가 발생할 확률의 총합이 계산될 수 있다.For example, when the number of documents representing the same rank is plural, when all the document sets that can be generated by combining the documents are extracted, the number of document sets that can be extracted may increase exponentially. Therefore, in the case of simplifying the process of making the denominator different from each term in the probability calculation process according to an embodiment of the present invention, the sum of the probability of occurrence of the rank of the extracted document set is expressed as in Equation 5 below. Can be calculated.

상기 수학식 3 내지 수학식 5를 통해 일반화하면, 하기 수학식 6이 생성될 수 있다.When generalized through Equations 3 to 5, Equation 6 may be generated.

여기서,

는 동일한 랭크를 나타내는 문서끼리 생성된 문서 그룹을 의미하며, 이 때,

이다. 그리고, 문서는 문서 그룹에 속할 수 있으며,

로 표현될 수 있다. 이 때,

는 문서이고,

는 문서 그룹을 의미할 수 있다. 그러면, 문서 각각의 적합도는

로 결정되고, 문서 그룹에 따른 문서 각각의 적합도는

로 결정될 수 있다.here,

Means a document group generated between documents showing the same rank.

to be. And a document can belong to a document group,

It can be expressed as. At this time,

Is a document,

May refer to a document group. Then, the goodness of fit of each document

The goodness of fit of each document by document group is determined by

Can be determined.

즉, 일례로, 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 단계는 계산된 문서 조합 각각의 랭크가 발생할 확률의 총합이 최대가 될 수 있는 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습 모델을 설정할 수 있다. 일례로, 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 단계는 상기 수학식 6을 고려하여, 하기 수학식 7에 따라 랭크 학습을 수행할 수 있다.That is, as an example, performing rank learning to extract weights for each feature of each document may include rank learning to extract weights for each feature of a document in which a sum of probabilities of occurrence of the rank of each calculated document combination may be maximum. You can set the model. For example, in the performing of the rank learning extracting the weight for each feature of each document, the rank learning may be performed according to Equation 7 in consideration of Equation 6.

도 6은 본 발명의 일실시예에 있어서, 학습 모델 파라미터를 결정하는 과정을 설명하기 위한 일례를 도시한 도면이다.6 is a diagram illustrating an example for explaining a process of determining a learning model parameter according to an embodiment of the present invention.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 문서 집합에 대한 복수 개의 서브 집합을 반복적으로 이용하여 상기 랭크 학습 모델의 학습 모델 파 라미터를 결정할 수 있다(S102).The rank learning model generation method according to an embodiment of the present invention may determine a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of a document set (S102).

일례로, 랭크 학습 모델의 학습 모델 파라미터를 결정하는 단계(S102)는 문서 집합을 복수 개의 서브 집합으로 분할하는 단계를 포함할 수 있다. 분할되는 서브 집합의 개수는 제한이 없다. 도 6을 참고하면, 문서 집합(600)은 3개의 서브 집합(L₁, L₂, L₃)으로 분할된 것을 알 수 있다.For example, the determining of the learning model parameter of the rank learning model (S102) may include dividing the document set into a plurality of subsets. There is no limit to the number of subsets that are divided. Referring to FIG. 6, it can be seen that the document set 600 is divided into _three subsets L ₁ , L ₂ , and L ₃ .

일례로, 랭크 학습 모델의 학습 모델 파라미터를 결정하는 단계(S102)는 복수 개의 서브 집합 각각에 대해 교차 검증을 수행하여 상기 서브 집합들 각각에 대한 정규 파라미터들을 생성하는 단계를 포함할 수 있다.In one example, the step of determining the training model parameters of the rank learning model (S102) may include performing regular validation for each of the plurality of subsets to generate regular parameters for each of the subsets.

이 때, 정규 파라미터들을 생성하는 단계는 복수 개의 서브 집합 중 일부의 서브 집합에 대해 정의된 랭크 학습 모델에 따라 학습하고, 나머지 일부의 서브 집합에 대해 테스팅하는 과정을 반복하여 교차 검증을 수행할 수 있다.In this case, the generating of the regular parameters may be performed according to a rank learning model defined for a subset of a plurality of subsets, and may be cross-tested by repeating the testing of the remaining subsets. have.

도 6을 참고하면, 학습 모델 파라미터를 결정하는 단계(S102)는 분할된 서브 집합 중 일부의 서브 집합(L₁+ L₂)에 대해 정의된 랭크 학습 모델에 따라 학습할 수 있다. 그리고, 학습 모델 파라미터를 결정하는 단계(S102)는 나머지 일부의 서브 집합(L₃)에 대해 테스팅할 수 있다. 이러한 과정을 통해 학습 모델 파라미터를 결정하는 단계(S102)는 서브 집합(L₁+ L₂)에 대한 정규 파라미터

을 생성할 수 있다.Referring to FIG. 6, in operation S102, determining a learning model parameter may be performed according to a rank learning model defined for a subset L ₁ + L ₂ of the divided subset. In operation S102, determining the learning model parameter may test the remaining subset L ₃ . Determining the learning model parameters through this process (S102) is a regular parameter for the subset (L ₁ + L ₂ )

Can be generated.

학습 모델 파라미터를 결정하는 단계(S102)는 서브 집합(L₁+L₂) 외에 서브 집합(L₂+L₃), 서브 집합(L₃+L₁)에 대해서도 동일한 과정을 통해 학습 및 테스트를 수행하여 정규 파라미터

,

를 각각 생성할 수 있다.Determining the training model parameters (S102) is trained and tested through the same process for the subset (L ₂ + L ₃ ), the subset (L ₃ + L ₁ ) in addition to the subset (L ₁ + L ₂ ). Performing regular parameters

,

Each can be generated.

일례로, 랭크 학습 모델의 학습 모델 파라미터를 결정하는 단계(S102)는 생성된 정규 파라미터들을 이용하여 상기 학습 모델 파라미터를 계산하는 단계를 포함할 수 있다.In one example, determining the learning model parameters of the rank learning model (S102) may include calculating the learning model parameters using the generated regular parameters.

이 때, 학습 모델 파라미터를 계산하는 단계는 생성된 정규 파라미터들을 기하 평균하여 최적화된 학습 모델 파라미터(

)를 계산할 수 있다. 정규 파라미터의 개수는 문서 집합을 분할한 서브 집합의 개수에 따라 정해질 수 있다.In this case, the calculating of the training model parameter may be performed by geometric mean averaging of the generated regular parameters.

) Can be calculated. The number of regular parameters may be determined according to the number of subsets of the document set.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 학습 모델 파라미터에 따라 문서 집합을 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성할 수 있다(S103).The rank learning model generation method according to an embodiment of the present invention may generate the updated rank learning model by re-learning the entire document set according to a learning model parameter (S103).

이 때, 업데이트된 랭크 학습 모델을 생성하는 단계(S103)는 피쳐별 가중치에 따른 정규 함수에 학습 모델 파라미터를 반영하여 랭크 학습 모델의 에러 함수를 수정하는 단계를 포함할 수 있다.In this case, the generating of the updated rank learning model (S103) may include modifying an error function of the rank learning model by reflecting the learning model parameters in the regular function according to the weight for each feature.

일례로, 에러 함수를 수정하는 단계는 하기 수학식 8에 따라 에러 함수를 수정할 수 있다.For example, modifying the error function may modify the error function according to Equation 8.

여기서,

는 수정된 에러 함수이고,

는 피쳐별 가중치에 따른 정규 함수를 의미한다. 그리고,

는 정규 함수를 제어하는 함수 가중치로써, 학습 모델 파라미터를 의미한다. 그리고, 업데이트된 랭크 학습 모델을 생성하는 단계(S103)는 수정된 에러 함수를 상기 랭크 학습 모델에 적용하는 단계를 포함할 수 있다.here,

Is the corrected error function,

Denotes a normal function according to the weight of each feature. And,

Is a function weight that controls a regular function, and means a learning model parameter. The generating of the updated rank learning model in operation S103 may include applying a modified error function to the rank learning model.

즉, 랭크 학습 모델의 에러 함수를 수정하는 단계는 랭크 학습 모델로부터 상기 피쳐별 가중치에 따른 에러 함수를 결정하고, 상기 결정된 에러 함수에 상기 학습 모델 파라미터가 반영된 정규 함수를 추가할 수 있다.That is, in the correcting of the error function of the rank learning model, an error function according to the weight for each feature may be determined from the rank learning model, and a regular function in which the learning model parameter is reflected may be added to the determined error function.

본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 업데이트된 랭크 학습 모델의 에러 함수를 최소화하여 상기 랭크 학습 모델을 최적화할 수 있다(S104). 이 때, 최소화 대상이 되는 에러 함수는 단계(S103)에 따라 가중치에 따른 정규 함수가 반영되어 수정된 에러 함수일 수 있다.The rank learning model generation method according to an embodiment of the present invention may minimize the error function of the updated rank learning model to optimize the rank learning model (S104). In this case, the error function to be minimized may be an error function modified by reflecting a normal function according to a weight according to step S103.

일례로, 랭크 학습 모델을 최적화하는 단계(S104)는 상기 수정된 에러 함수가 최소가 될 수 있는 최적화된 가중치를 결정할 수 있다.In one example, optimizing the rank learning model (S104) may determine an optimized weight at which the modified error function may be minimal.

일례로, 상기 수학식 7과 같이 업데이트된 랭크 학습 모델을 통해 학습을 하는 경우, 실제 원하는 가중치와 학습을 통해 도출되는 가중치의 관계는 하기 수학식 9를 통해 도출될 수 있다.For example, when learning through the updated rank learning model as shown in Equation 7, the relationship between the actual desired weight and the weight derived through the learning may be derived through Equation 9 below.

여기서,

는 n번째 학습한 경우 도출되는 피쳐별 가중치 벡터를 의미하고,

는 n+1번째 학습한 경우 도출되는 피쳐별 가중치 벡터를 의미한다. 그리고,

는 랭크 학습 모델이 학습 시 적용되는 학습 속도를 의미한다.

는 업데이트된 랭크 학습 모델의 에러 함수(수정된 에러 함수)의 미분 벡터를 의미한다.here,

Denotes a weight vector of each feature derived from the nth learning.

Denotes a weight vector for each feature derived when the n + 1 th learning is performed. And,

Denotes the learning rate to which the rank learning model is applied during learning.

Denotes the derivative vector of the error function (corrected error function) of the updated rank learning model.

수학식 10에 따르면, 랭크 학습 모델을 최적화하는 단계(S104)는 수정된 에러 함수의 미분 벡터

를 계산한 후, n번째 학습 단계에서 구한 모델 변수 벡터

에

을 더함으로써, 그 다음번 학습단계에서는 수정된 에러 함수

가 감소되도록 랭크 학습 모델을 최적화할 수 있다. According to Equation 10, the step (S104) of optimizing the rank learning model may be a derivative vector of the modified error function.

Is computed, and then the model variable vector obtained in the nth training step.

on

By adding, the next learning step is the corrected error function

The rank learning model can be optimized so that

랭크 학습 모델을 최적화하는 단계(S104)는 최종적으로 수정된 에러 함수가 더 이상 감소되지 않을 때의 모델 변수 벡터인 최적의 가중치 집합을 추출할 수 있다. 결국, 랭크 학습 모델을 최적화하는 단계(S104)는 수정된 에러 함수의 미분 벡터

의 반대방향으로 모델 변수 벡터

를 조절함으로써 최적의 가중치를 찾아가는 것이라고 볼 수 있다.Optimizing the rank learning model (S104) may extract an optimal set of weights that is a model variable vector when the finally corrected error function is no longer reduced. Finally, the step S104 of optimizing the rank learning model is the derivative vector of the modified error function.

Model variables in the opposite direction of

By adjusting, it can be seen as finding the optimal weight.

결국, 랭크 학습 모델의 학습 모델 파라미터를 결정하는 단계(S102)를 통해 주어진 문서 집합 전체를 충분히 사용하여 학습 모델 파라미터가 결정될 수 있다.As a result, the learning model parameters may be determined using the entire document set as a whole through the step S102 of determining the learning model parameters of the rank learning model.

도 7은 에러 함수를 통해 정규 파라미터를 생성하는 과정을 설명하기 위한 그래프이다.7 is a graph for explaining a process of generating a regular parameter through an error function.

도 7은 서브 집합 L₁, L₂ 및 L₃ 각각에 대한 정규 파라미터를 생성하는 과정을 나타내고 있다. 도 7에서 도시된 각각의 에러 함수는 문서 집합의 서브 집합의 결합인 L₁+ L₂ _,L₂+ L₃ _,L₃+ L₁ 에 대해 정의된 랭크 학습 모델을 이용하여 학습한 경우, 학습시 발생하는 테스트 자료에 대한 에러 함수를 통해 정규 파라미터를 생성하는 과정을 나타낸다.7 is a subset L ₁ , L ₂ And L ₃ The process of generating regular parameters for each is shown. Each error function shown in FIG. 7 is L ₁ + L ₂ _, L ₂ + L ₃ _, L ₃ + L ₁ _, which is a combination of a subset of the document set. In the case of learning using the rank learning model defined for, it shows the process of generating regular parameters through an error function on test data generated during learning.

구체적으로, 학습 횟수를 반복하면, 에러 함수를 통해 에러가 일반적으로 감소하다가 overfitting에 의해 어느 시점부터 증가하는 현상이 나타난다. 본 발명은 에러가 증가할 때의 학습에 따른 가중치 업데이트의 횟수를 학습 자료에 따른 정규 파라미터로 결정할 수 있다.Specifically, when the number of learning is repeated, an error generally decreases through an error function and then increases from a certain point due to overfitting. According to the present invention, the number of weight updates according to learning when an error increases may be determined as a regular parameter according to learning data.

즉, 본 발명은 주어진 학습 자료인 문서 집합을 복수 개의 서브 집합으로 분할하고, 분할된 서브 집합 각각에 대해 정의된 랭크 학습 모델에 따라 테스트 자료를 반복적으로 교차 검증을 수행할 수 있다. 이 때, 에러 함수의 감소가 둔화되어 다시 에러 함수가 증가될 때의 값을 서브 집합 각각에 대한 학습 자료에 따른 정규 파라미터로 결정할 수 있다.That is, the present invention may divide a document set, which is a given training material, into a plurality of subsets, and repeatedly perform cross-validation of the test data according to the rank learning model defined for each of the divided subsets. At this time, the value of when the decrease of the error function is slowed down and the error function is increased again may be determined as a regular parameter according to the learning data for each subset.

결국, 본 발명은 문서 집합을 분할하여 각각의 서브 집합에 대해 정규 파라미터를 결정함으로써, 문서 집합을 전체적으로 이용할 수 있다.As a result, the present invention can utilize the document set as a whole by dividing the document set and determining a regular parameter for each subset.

도 6을 참고하면, L₁+ L₂ _,L₂+ L₃ _,L₃+ L₁ 에 대해 학습하는 경우, 에러가 증가할 때 생성되는 정규 파라미터는 각각

,

이다. 일례로, 서브 집합 각각의 정규 파라미터를 결정하는 과정은 하기 수학식 10에 의해 수행될 수 있다.Referring to Figure 6, L ₁ + L ₂ _, L ₂ + L ₃ _, L ₃ + L ₁ When learning about, the regular parameters generated when the error increases

,

to be. For example, the process of determining the regular parameter of each subset may be performed by Equation 10 below.

(=,,)

(= ,,)

그러면, 최종적으로 결정되는 학습 모델 파라미터

는 생성된 정규 파라미터를 통해 결정될 수 있다. 예를 들어, 학습 모델 파라미터

는 생성된 정규 파라미터를 기하 평균(

)한 값으로 결정될 수 있다.Then, the finally determined learning model parameter

May be determined through the generated regular parameter. For example, learning model parameters

Is the geometric mean (

It can be determined by the value.

그러면, 본 발명의 일실시예에 따른 랭크 학습 모델 생성 방법은 학습 모델 파라미터에 따라 문서 집합 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성할 수 있다. 이 때, 업데이트된 랭크 학습 모델을 생성하는 단계는 결정된 학습 모델 파라미터가 반영된 에러 함수를 적용하여 문서 집합 전체를 재학습할 수 있다.Then, the rank learning model generation method according to an embodiment of the present invention may generate the updated rank learning model by relearning the entire document set according to the learning model parameters. In this case, the generating of the updated rank learning model may relearn the entire document set by applying an error function reflecting the determined learning model parameter.

도 8은 본 발명의 일실시예에 있어서, 랭크 학습 모델 생성 시스템의 전체 구성을 도시한 블록 다이어그램이다.8 is a block diagram showing the overall configuration of a rank learning model generation system according to an embodiment of the present invention.

도 8을 참고하면, 본 발명의 일실시예에 따른 랭크 학습 모델 생성 시스템(800)은 랭크 학습 모델 정의부(801), 학습 모델 파라미터 결정부(802), 랭크 학습 모델 업데이트부(803) 및 랭크 학습 모델 최적화부(804)를 포함할 수 있다.Referring to FIG. 8, the rank learning model generation system 800 according to an embodiment of the present invention includes a rank learning model defining unit 801, a learning model parameter determining unit 802, a rank learning model updating unit 803, and The rank learning model optimizer 804 may be included.

랭크 학습 모델 정의부(801)는 문서 집합의 랭크에 따라 학습하여 문서의 피쳐별 가중치를 결정하는 랭크 학습 모델을 정의할 수 있다. 이 때, 상기 문서 집합은 랭크에 따라 정렬된 적어도 하나의 문서를 포함할 수 있다. 그리고, 상기 적어도 하나의 문서는 복수 개의 검색 쿼리 각각에 대한 적어도 하나의 검색 결과 문서일 수 있다.The rank learning model definition unit 801 may define a rank learning model that determines a weight for each feature of a document by learning according to the rank of the document set. In this case, the document set may include at least one document arranged according to rank. The at least one document may be at least one search result document for each of a plurality of search queries.

일례로, 랭크 학습 모델 정의부(801)는 문서 집합의 랭크를 획득하는 랭크 획득부, 문서 각각의 적합도를 결정하여 상기 획득한 문서 집합의 랭크가 발생할 확률을 계산하는 확률 계산부 및 상기 계산된 확률을 이용하여 상기 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습을 수행하는 랭크 학습 수행부를 포함할 수 있다(랭크 학습 모델 정의부(801)의 구체적인 구성은 도시되지 않았다).In one example, the rank learning model definition unit 801 is a rank acquisition unit for obtaining a rank of a document set, a probability calculator for determining the suitability of each document to calculate the probability of occurrence of the rank of the acquired document set and the calculated It may include a rank learning performing unit for performing a rank learning to extract the weight for each feature of the document by using the probability (specific configuration of the rank learning model definition unit 801 is not shown).

이 때, 확률 계산부는 문서 각각에 대해 나머지 문서들의 적합도를 이용하 여 확률을 계산할 수 있다. 여기서, 적합도는 문서 각각의 피쳐별 가중합에 대한 지수함수로 결정될 수 있다. 문서 집합의 랭크가 발생할 확률을 계산하는 예는 상기 수학식 1을 참고할 수 있다.At this time, the probability calculator may calculate the probability using the goodness of fit of the remaining documents for each document. Here, the goodness of fit may be determined by an exponential function for the weighted sum of each feature of each document. An example of calculating a probability of generating a rank of a document set may be referred to Equation 1 above.

그리고, 랭크 학습 수행부는 문서 집합의 랭크에 따라 계산된 확률이 최대가 될 수 있는 피쳐별 가중치를 추출하는 과정을 학습할 수 있다.The rank learning performer may learn a process of extracting a weight for each feature whose probability calculated according to the rank of the document set may be the maximum.

일례로, 랭크 획득부는 적어도 하나의 문서 중 동일한 랭크를 나타내는 문서들이 포함된 상기 문서 집합의 랭크를 획득할 수 있다. 이 경우, 확률 계산부는 동일한 랭크를 나타내는 문서들을 조합하여 생성 가능한 모든 문서 집합 각각의 랭크가 발생할 확률을 계산할 수 있다. 이 때, 생성 가능한 모든 문서 집합을 생성하는 예는 상기 수학식 3을 참고할 수 있다.For example, the rank obtaining unit may obtain a rank of the document set including documents indicating the same rank among at least one document. In this case, the probability calculator may calculate the probability of generating a rank of each document set that can be generated by combining documents representing the same rank. In this case, an example of generating all the document sets that can be generated may be referred to Equation 3 above.

본 발명의 일실시예에 따라 생성 가능한 모든 문서 집합을 생성하는 경우, 추출될 수 있는 문서 집합의 개수가 기하급수적으로 증가할 수 있다. 따라서, 확률 계산부는 각 항에서 분모가 다른 부분을 상수로 두는 과정을 통해 간략화하여 각 문서 집합의 확률의 총합을 계산할 수 있다.When generating all document sets that can be generated according to an embodiment of the present invention, the number of document sets that can be extracted may increase exponentially. Therefore, the probability calculator can calculate the sum of the probabilities of each document set by simplifying the process of making the denominator different from each other as a constant.

그러면, 랭크 학습 수행부는 생성 가능한 모든 문서 집합 각각의 랭크가 발생할 확률의 총합이 최대가 될 수 있는 상기 문서 각각의 피쳐별 가중치를 추출하는 랭크 학습 모델을 정의할 수 있다. 문서 집합에 적어도 하나의 문서 중 동일한 랭크를 나타내는 문서들이 포함된 경우, 확률을 계산하는 예는 상기 수학식 4 내지 상기 수학식 6을 참고할 수 있다.Then, the rank learning execution unit may define a rank learning model for extracting a weight for each feature of each document in which the sum of probabilities of occurrence of each rank of all generateable document sets can be maximized. When the document set includes documents indicating the same rank among at least one document, an example of calculating a probability may refer to Equations 4 to 6 below.

학습 모델 파라미터 결정부(802)는 문서 집합에 대한 복수 개의 서브 집합 을 반복적으로 이용하여 상기 랭크 학습 모델의 학습 모델 파라미터를 결정할 수 있다.The learning model parameter determiner 802 may determine a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of the document set.

일례로, 학습 모델 파라미터 결정부(802)는 상기 문서 집합을 복수 개의 서브 집합으로 분할하는 문서 집합 분할부, 상기 복수 개의 서브 집합 각각에 대해 교차 검증을 수행하여 상기 서브 집합들 각각에 대한 정규 파라미터들을 생성하는 정규 파라미터 생성부 및 상기 생성된 정규 파라미터들을 이용하여 상기 학습 모델 파라미터를 계산하는 학습 파라미터 계산부를 포함할 수 있다(학습 모델 파라미터 결정부(802)의 구체적인 구성은 도시되지 않았다). 정규 파라미터를 생성하는 예는 상기 수학식 10을 참고할 수 있다.For example, the learning model parameter determiner 802 may divide a document set into a plurality of subsets, and perform a cross-validation on each of the plurality of subsets, thereby performing regular verification for each of the subsets. And a learning parameter calculator configured to calculate the learning model parameter using the generated normal parameters (a detailed configuration of the learning model parameter determiner 802 is not shown). An example of generating a regular parameter may refer to Equation 10 above.

이 때, 학습 모델 파라미터는 정의된 랭크 학습 모델의 피쳐별 가중치에 따른 정규 함수의 변이를 조절하는 값일 수 있다.In this case, the learning model parameter may be a value for adjusting the variation of the normal function according to the weight for each feature of the defined rank learning model.

이 때, 정규 파라미터 생성부는 복수 개의 서브 집합 중 일부의 서브 집합에 대해 상기 정의된 랭크 학습 모델에 따라 학습하고, 나머지 일부의 서브 집합에 대해 테스팅하는 과정을 반복하여 교차 검증을 수행할 수 있다.In this case, the regular parameter generator may perform cross-validation by learning a subset of a plurality of subsets according to the rank learning model defined above, and repeating the testing of the remaining subsets.

이 때, 학습 모델 파라미터 계산부는 생성된 정규 파라미터들을 평균하여 최적화된 학습 모델 파라미터를 계산할 수 있다.In this case, the learning model parameter calculator may calculate the optimized learning model parameter by averaging the generated normal parameters.

랭크 학습 모델 업데이트부(803)는 학습 모델 파라미터에 따라 상기 문서 집합을 전체를 재학습하여 업데이트된 상기 랭크 학습 모델을 생성할 수 있다. 일례로, 랭크 학습 모델 업데이트부(803)는 피쳐별 가중치에 따른 정규 함수에 상기 학습 모델 파라미터를 반영하여 상기 랭크 학습 모델의 에러 함수를 수정하는 에러 함수 수정부, 상기 수정된 에러 함수를 상기 랭크 학습 모델에 적용하는 에러 함수 적용부를 포함할 수 있다. 에러 함수를 수정하는 예는 상기 수학식 8을 참고할 수 있다.The rank learning model updater 803 may generate the updated rank learning model by re-learning the entire document set according to a learning model parameter. For example, the rank learning model updating unit 803 may include an error function corrector for correcting an error function of the rank learning model by reflecting the learning model parameter to a regular function according to a weight for each feature, and the corrected error function. It may include an error function applying unit applied to the learning model. For an example of modifying the error function, Equation 8 may be referred to.

이 때, 에러 함수 수정부는 랭크 학습 모델로부터 상기 피쳐별 가중치에 따른 에러 함수를 결정하고, 상기 결정된 에러 함수에 상기 학습 모델 파라미터가 반영된 정규 함수를 추가할 수 있다.At this time, the error function correction unit may determine an error function according to the weight for each feature from the rank learning model, and may add a regular function reflecting the learning model parameter to the determined error function.

랭크 학습 모델 최적화부(804)는 업데이트된 랭크 학습 모델의 에러 함수를 최소화하여 상기 랭크 학습 모델을 최적화할 수 있다. 일례로, 랭크 학습 모델 최적화부(804)는 상기 수정된 에러 함수가 최소가 될 수 있는 최적화된 가중치를 결정할 수 있다.The rank learning model optimizer 804 may optimize the rank learning model by minimizing an error function of the updated rank learning model. In one example, the rank learning model optimizer 804 may determine an optimized weight at which the modified error function may be minimized.

일례로, 랭크 학습 모델 최적화부(804)는 최종적으로 수정된 에러 함수가 더 이상 감소되지 않을 때의 모델 변수 벡터인 최적의 가중치 집합을 추출할 수 있다. 에러 학습 모델을 최적화하는 과정은 상기 수학식 9를 참고할 수 있다.For example, the rank learning model optimizer 804 may extract an optimal weight set that is a model variable vector when the finally corrected error function is no longer reduced. For the process of optimizing the error learning model, Equation 9 may be referred to.

도 8에서 설명되지 않은 부분은 도 1 내지 도 7의 내용을 참고할 수 있다.Parts not described in FIG. 8 may refer to the contents of FIGS. 1 to 7.

또한 본 발명의 일실시예에 따른 가중치 정규화를 이용한 랭크 학습 모델 생성 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, the rank learning model generation method using weight normalization according to an embodiment of the present invention includes a computer readable medium including program instructions for performing various computer-implemented operations. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium or program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the present invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications thereof will belong to the scope of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

800: 랭크 학습 모델 생성 시스템800: rank learning model generation system

801: 랭크 학습 모델 정의부801: rank learning model definition

802: 학습 모델 파라미터 결정부802: learning model parameter determiner

803: 랭크 학습 모델 업데이트부803: rank learning model update unit

804: 랭크 학습 모델 최적화부804: rank learning model optimization unit

Claims

Defining a rank learning model that learns according to a rank of a document set to determine weights for each feature of the document, wherein the document set includes at least one document arranged according to rank;

Determining a learning model parameter of the rank learning model by repeatedly using the plurality of subsets for the document set; And

Re-learning the entire document set according to the learning model parameters to generate the updated rank learning model

Rank learning model generation method comprising a.

The method of claim 1,

The step of defining a rank learning model,

Obtaining a rank of the document set;

Calculating a probability of occurrence of a rank of the acquired document set by determining a goodness of fit of each document; And

Performing rank learning to extract a weight for each feature of each document using the calculated probability;

Rank learning model generation method comprising a.

The method of claim 2,

The step of calculating the probability that a rank of the document set will occur,

And a probability is calculated for each of the documents using the goodness of fit of the remaining documents.

The method of claim 2,

The goodness of fit,

The rank learning model generation method, characterized in that determined by the exponential function for the weighted sum of each feature of the document.

The method of claim 2,

The step of performing rank learning,

A method of generating a rank learning model, characterized in that for learning the process of extracting the weight for each feature that can be the maximum probability calculated according to the rank of the document set.

The method of claim 2,

The step of obtaining the rank of the document set,

And obtaining a rank of the document set including documents representing the same rank among the at least one document.

The method of claim 6,

And calculating a probability of occurrence of each rank of all document sets that can be generated by combining the documents representing the same rank.

The method of claim 7, wherein

The step of performing rank learning,

And a rank learning model for extracting a weight for each feature of each document in which a sum of probabilities of occurrence of ranks of all of the generated document sets can be maximized.

The method of claim 1,

The step of determining the learning model parameters,

Dividing the document set into a plurality of subsets;

Performing cross validation on each of the plurality of subsets to generate regular parameters for each of the subsets; And

Calculating the learning model parameter using the generated regular parameters

Rank learning model generation method comprising a.

The method of claim 9,

The learning model parameter is

Rank learning model generation method characterized in that for controlling the normal function according to the weight for each feature of the rank learning model defined.

The method of claim 9,

The step of generating a regular parameter,

A method for generating a rank learning model, wherein the subset of the plurality of subsets is trained according to the defined rank learning model, and the cross-test is performed by repeating the testing of the remaining subsets. .

The method of claim 9,

The step of calculating the learning model parameters,

And calculating the optimized learning model parameter by averaging the generated normal parameters.

The method of claim 1,

The step of generating an updated rank learning model,

Modifying an error function of the rank learning model by reflecting the learning model parameter in a normal function according to the weight for each feature; And

Applying the modified error function to the rank learning model

Rank learning model generation method comprising a.

The method of claim 13,

Modifying the error function of the rank learning model,

The error learning method according to the weight for each feature is determined from the rank learning model, and the rank learning model generation method, characterized in that for adding the regular function reflecting the learning model parameters to the determined error function.

The method of claim 1,

Optimizing the rank learning model by minimizing the error function of the updated rank learning model

Rank learning model generation method further comprising.

The method of claim 15,

The step of optimizing the rank learning model,

And a method for generating a rank learning model, wherein the optimized error function determines an optimized weight that can be minimized.

A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 16 is recorded.

A rank learning model definition unit that defines a rank learning model that determines a weight for each feature of a document by learning according to a rank of a document set, wherein the document set includes at least one document arranged according to rank;

A learning parameter determiner which determines a learning model parameter of the rank learning model by repeatedly using a plurality of subsets of the document set; And

Rank learning model updating unit for re-learning the entire document set according to the learning model parameters to generate the updated rank learning model

Rank learning model generation system comprising a.

The method of claim 18,

The rank learning model definition unit,

A rank obtaining unit obtaining a rank of the document set;

A probability calculator which determines a goodness of fit of each document and calculates a probability that a rank of the acquired document set will occur; And

Rank learning performing unit for performing a rank learning to extract the weight for each feature of each document by using the calculated probability

Rank learning model generation system comprising a.

The method of claim 19,

The probability calculation unit,

A rank learning model generation system for each of the documents, characterized in that calculating the probability using the goodness of fit of the remaining documents.

The method of claim 19,

The goodness of fit,

Rank learning model generation system, characterized in that determined by the exponential function for the weighted sum for each feature of the document.

The method of claim 19,

The rank learning performing unit,

A rank learning model generation system, characterized in that for learning the process of extracting the weight for each feature that can be the maximum probability calculated according to the rank of the document set.

The method of claim 19,

The rank acquisition unit,

The method of claim 23, wherein

The probability calculation unit,

A rank learning model generation system, characterized in that to calculate the probability that each of the ranks of all the document set that can be generated by combining the documents representing the same rank.

The method of claim 24,

The rank learning performing unit,

And a rank learning model for extracting a weight for each feature of each document in which a sum of probabilities of occurrence of ranks of all of the generateable document sets can be maximized.

The method of claim 18,

The learning model parameter determiner,

A document set dividing unit dividing the document set into a plurality of subsets;

A normal parameter generation unit performing cross validation on each of the plurality of subsets to generate regular parameters for each of the subsets; And

A learning parameter calculator for calculating the learning model parameter using the generated regular parameters.

Rank learning model generation system comprising a.

The method of claim 26,

The learning model parameter is

Rank learning model generation system, characterized in that for controlling the normal function according to the weight for each feature of the rank learning model defined.

The method of claim 26,

The regular parameter generator,

A rank learning model generation system, wherein the subset of the plurality of subsets is trained according to the defined rank learning model, and the process of testing the remaining subsets is repeated to perform cross-validation. .

The method of claim 26,

The learning model parameter calculation unit,

And a rank learning model parameter calculated by averaging the generated normal parameters.

The method of claim 18,

The rank learning model update unit,

An error function corrector for modifying an error function of the rank learning model by reflecting the learning model parameter in a regular function according to the weight for each feature; And

Error function application unit for applying the modified error function to the rank learning model

Rank learning model generation system comprising a.

The method of claim 30,

The error function correction unit,

The error learning method according to the weight for each feature is determined from the rank learning model, and the rank learning model generation system, characterized in that for adding the regular function reflecting the learning model parameters to the determined error function.

The method of claim 18,

Rank learning model optimizer for optimizing the rank learning model by minimizing the error function of the updated rank learning model

Rank learning model generation system further comprising.

33. The method of claim 32,

The rank learning model optimization unit,

And a system for determining an optimal weight at which the modified error function can be minimized.