KR100918361B1

KR100918361B1 - System and method for high-speed search modeling

Info

Publication number: KR100918361B1
Application number: KR1020080017243A
Authority: KR
Inventors: 최지훈; 김광현; 이상호
Original assignee: 엔에이치엔(주)
Priority date: 2008-02-26
Filing date: 2008-02-26
Publication date: 2009-09-22
Also published as: JP5171686B2; KR20090091990A; JP2009205678A

Abstract

An accelerated search modeling system and method are disclosed. The accelerated search modeling system includes a test collection generator that generates a test collection using search results for a query, a search model generator that generates a search model for determining a correct ranking according to the query from the test collection, and the generation. It includes a search model evaluation unit for evaluating the performance of the search model.

Faster search modeling, test collections, correlations, features

Description

Speeding Search Modeling System and Method {SYSTEM AND METHOD FOR HIGH-SPEED SEARCH MODELING}

본 발명은 고속화 검색 모델링 시스템 및 방법에 관한 것으로, 보다 상세하게는, 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성하고, 테스트 컬렉션으로부터 검색 모델을 생성 및 평가함으로써 고속화된 검색 모델링을 구축하는 시스템 및 방법에 관한 것이다.The present invention relates to a fast search modeling system and method, and more particularly, to a system for building a fast search modeling by generating a test collection using a search result for a query, and generating and evaluating a search model from the test collection. And to a method.

최근 다양한 취미를 가지는 사람들로 인해 전문적인 지식을 검색하려는 수요가 증가하고 있다. 사람들은 특정 분야에 대한 정보가 저장된 데이터베이스를 검색 엔진을 통해 검색함으로써, 영화, 자동차, 증권, 스포츠 등의 특정 분야의 전문적인 지식 데이터를 습득할 수 있다. 예를 들면, '와인'에 대한 정보를 수집하려는 사람은 와인이라는 질의어를 통해 검색 결과를 수집할 수 있다.Recently, people with various hobbies are increasing the demand to search for professional knowledge. People can acquire specialized knowledge data in specific fields such as movies, automobiles, securities, and sports by searching a database that stores information on a specific field through a search engine. For example, a person who wants to collect information about 'wine' may collect search results through the query word wine.

다만, 종래에 특정 분야에 대한 정보가 저장된 데이터베이스를 검색하기 위한 검색 모델을 생성하는 과정은 어려움이 많았다. 구체적으로, 종래의 검색 모델을 생성하는 과정은 개발자가 직관적으로 검색 모델을 생성하여 튜닝하고, 검색 서비스 기획자가 검토하는 과정을 반복한다. 즉, 검색 모델은 개발자 중심으로 모델 링되어 데모가 생성된 후, 기획자의 검토를 통해 수정하여 완성되는 형태를 가지게 된다.However, it has been difficult to create a search model for searching a database in which information on a specific field is stored. In detail, in the process of generating a conventional search model, the developer intuitively generates and tunes the search model and repeats the process of reviewing the search service planner. In other words, the search model is modeled around the developer, and after the demo is generated, the search model is modified and completed through review by the planner.

이 때, 전문적인 데이터에 대해 개발자는 지식이나 경험이 부족하여 잘못된 검색 모델이 생성되는 경우가 많이 발생할 수 있다. 그러면, 사용자가 입력한 질의어에 대해 엉뚱한 검색 결과가 노출되는 문제점이 발생할 수 있다. 이러한 문제점을 방지하기 위해 검색 기획자의 의견이 반영되어 검색 모델이 생성될 수 있지만, 개발자와 검색 기획자 간의 커뮤니케이션 상의 문제로 여전히 효율성 면에서 문제될 수 있다.In this case, a developer may lack a knowledge or experience with specialized data, and thus an incorrect search model may be generated. Then, a wrong search result may be exposed to the query input by the user. In order to prevent such a problem, a search model may be generated by reflecting the opinion of the search planner, but it may still be a problem in efficiency due to a communication problem between the developer and the search planner.

따라서, 전문적인 데이터의 특성을 알고 있으면, 검색 모델의 개발자 수준이 아니더라도 검색 모델을 생성할 수 있는 발명이 요구된다.Therefore, if the characteristics of specialized data are known, an invention that can generate a search model even if the developer level of the search model is not required.

본 발명은 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성함으로써 전문적인 지식에 대한 정답 랭킹을 제공할 수 있는 고속화 검색 모델링 시스템 및 방법을 제공할 수 있다.The present invention can provide a fast search modeling system and method that can provide the correct ranking of the expert knowledge by generating a test collection using the search results for the query.

본 발명은 질의어에 대한 검색 결과의 랭킹을 상기 질의어에 대한 전문가 또는 검색 기획자 중심으로 정렬함으로써 보다 정확한 검색 모델을 생성할 수 있는 고속화 검색 모델링 시스템 및 방법을 제공할 수 있다.The present invention can provide a speedy search modeling system and method that can generate a more accurate search model by aligning the ranking of search results for a query term by experts or search planners for the query term.

본 발명은 생성한 검색 모델을 실시간으로 성능 평가함으로써 빠르게 검색 모델을 수정할 수 있는 고속화 검색 모델링 시스템 및 방법을 제공할 수 있다.The present invention can provide a fast search modeling system and method that can quickly modify the search model by evaluating the generated search model in real time.

본 발명은 생성한 검색 모델에 대해 성능 평가하여 성능이 기준에 미달하는 경우, 검색 결과의 랭킹을 재정렬하여 테스트 컬렉션을 다시 생성함으로써 보다 안정적이고 효율적인 성능의 검색 모델을 생성할 수 있는 고속화 검색 모델링 시스템 및 방법을 제공할 수 있다.According to the present invention, when the performance of the generated search model is not met, the fast search modeling system can generate a more stable and efficient search model by rearranging the ranking of search results and regenerating a test collection. And methods.

본 발명의 일실시예에 따른 고속화 검색 모델링 시스템은 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션(test collection)을 생성하는 테스트 컬렉션 생성부, 상기 테스트 컬렉션으로부터 상기 질의어에 따른 정답 랭킹을 판단할 수 있는 검색 모델을 생성하는 검색 모델 생성부 및 상기 생성된 검색 모델에 대해 평가 데이터를 분석하여 상기 검색 모델의 성능을 평가하는 검색 모델 평가부를 포함할 수 있다.The fast search modeling system according to an embodiment of the present invention includes a test collection generation unit that generates a test collection using a search result for a query, and determines a correct ranking according to the query from the test collection. A search model generator for generating a search model and a search model evaluator for evaluating the performance of the search model by analyzing evaluation data on the generated search model.

이 때, 상기 검색 모델 생성부는 기계 학습 방법을 통해 검색 모델을 생성할 수 있다.In this case, the search model generator may generate a search model through a machine learning method.

또한, 상기 검색 모델 평가부는 상기 검색 결과에 대해 선택된 피쳐 각각의 가중치를 분석할 수 있다.The search model evaluator may analyze weights of each of the selected features with respect to the search result.

또한, 상기 검색 모델 평가부는 상기 생성된 검색 모델에 대해 정확도 및 상관도를 실시간으로 확인할 수 있다.In addition, the search model evaluator may check the accuracy and correlation with respect to the generated search model in real time.

본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성하는 단계, 상기 테스트 컬렉션으로부터 상기 질의어에 따른 정답 랭킹을 판단할 수 있는 검색 모델을 생성하는 단계 및 상기 생성된 검색 모델에 대해 평가 데이터를 분석하여 상기 검색 모델의 성능을 평가하는 단계를 포함할 수 있다.In accordance with an aspect of the present invention, there is provided a method of speeding up a search model, the method comprising: generating a test collection using a search result for a query, generating a search model capable of determining a correct ranking according to the query from the test collection; Analyzing evaluation data with respect to the generated search model may include evaluating the performance of the search model.

이 때, 테스트 컬렉션을 생성하는 상기 단계는 상기 검색 결과의 랭킹을 정렬하여 상기 질의어에 대한 테스트 컬렉션을 생성할 수 있다.In this case, the step of generating a test collection may generate a test collection for the query by sorting the ranking of the search results.

본 발명에 따르면, 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성함으로써 전문적인 지식에 대한 정답 랭킹을 제공할 수 있는 고속화 검색 모델링 시스템 및 방법이 제공된다.According to the present invention, there is provided a fast search modeling system and method that can provide the correct ranking of the expertise by generating a test collection using the search results for the query.

본 발명에 따르면, 질의어에 대한 검색 결과의 랭킹을 상기 질의어에 대한 전문가 또는 검색 기획자 중심으로 정렬함으로써 보다 정확한 검색 모델을 생성할 수 있는 고속화 검색 모델링 시스템 및 방법이 제공된다.According to the present invention, there is provided a speedy search modeling system and method that can generate a more accurate search model by sorting the ranking of search results for a query word by an expert or search planner for the query word.

본 발명에 따르면, 생성한 검색 모델을 실시간으로 성능 평가함으로써 빠르게 검색 모델을 수정할 수 있는 고속화 검색 모델링 시스템 및 방법이 제공된다.According to the present invention, there is provided a speedy search modeling system and method that can quickly modify a search model by evaluating the generated search model in real time.

본 발명에 따르면, 생성한 검색 모델에 대해 성능 평가하여 성능이 기준에 미달하는 경우, 검색 결과의 랭킹을 재정렬하여 테스트 컬렉션을 다시 생성함으로써 보다 안정적이고 효율적인 성능의 검색 모델을 생성할 수 있는 고속화 검색 모델링 시스템 및 방법이 제공된다.According to the present invention, if the performance of the generated search model is evaluated and the performance is less than the criterion, the fast search can generate a more stable and efficient search model by rearranging the ranking of the search results and regenerating the test collection. Modeling systems and methods are provided.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 고속화 검색 모델링 시스템에 의해 수행될 수 있다.Hereinafter, with reference to the contents described in the accompanying drawings will be described in detail an embodiment according to the present invention. However, the present invention is not limited or limited by the embodiments. Like reference numerals in the drawings denote like elements. The speed search modeling method according to an embodiment of the present invention may be performed by the speed search modeling system.

도 1은 본 발명의 일실시예에 따른 고속화 검색 모델링 시스템의 구성을 도시한 블록 다이어그램이다.1 is a block diagram showing the configuration of a fast search modeling system according to an embodiment of the present invention.

본 발명의 일실시예에 따른 고속화 검색 모델링 시스템(100)은 테스트 컬렉션 생성부(101), 검색 모델 생성부(102), 및 검색 모델 평가부(103)를 포함할 수 있다.The accelerated search modeling system 100 according to an embodiment of the present invention may include a test collection generator 101, a search model generator 102, and a search model evaluator 103.

테스트 컬렉션 생성부(101)는 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션(test collection)을 생성할 수 있다. 일례로, 테스트 컬렉션 생성부(101) 는 검색 결과의 랭킹을 정렬하여 질의어에 대한 테스트 컬렉션을 생성할 수 있다. 예를 들어, '와인'이라는 질의어에 10개의 검색 결과가 도출된다고 하면, 테스트 컬렉션 생성부(101)는 '와인'에 대한 10개의 검색 결과들을 랭킹에 따라 정렬하여 하나의 테스트 컬렉션을 생성할 수 있다.The test collection generator 101 may generate a test collection by using a search result for the query. In one example, the test collection generation unit 101 may generate a test collection for the query by sorting the ranking of the search results. For example, if 10 search results are derived from the query 'wine', the test collection generator 101 may generate one test collection by sorting the 10 search results for 'wine' according to the ranking. have.

이 때, 테스트 컬렉션은 특정 질의어와 상기 질의어에 대한 검색 결과가 정렬된 랭킹의 집합이라고 할 수 있다. 다시 말해서, 테스트 컬렉션은 질의어와 상기 질의어에 대한 검색 결과의 정답적인 랭킹을 포함하는 집합을 의미할 수 있다. 여기서, 질의어에 대한 검색 결과의 정답적인 랭킹은 최초 정렬 과정에서 생성될 수 있지만, 반복적인 재정렬 과정을 통해서 생성될 수도 있다.In this case, the test collection may be referred to as a set of rankings in which a specific query and search results for the query are sorted. In other words, the test collection may refer to a collection including a query and a correct ranking of search results for the query. Here, the correct ranking of the search results for the query may be generated during the initial sorting process, but may also be generated through an iterative reordering process.

이 때, 테스트 컬렉션 생성부(101)는 데이터베이스(104)로부터 질의어에 대한 검색 결과를 제공받을 수 있다. 일례로, 데이터베이스(104)는 '꽃', '와인', '스포츠', '재테크', '음악' 등의 특정 분야에 대한 전문적인 정보를 저장할 수 있다.In this case, the test collection generation unit 101 may receive a search result for a query from the database 104. For example, the database 104 may store professional information on a specific field, such as 'flower', 'wine', 'sports', 'jae tech', 'music', and the like.

일례로, 테스트 컬렉션 생성부(101)는 사용자 단말기를 통해 질의어가 속한 해당 분야의 지식 및 경험을 갖춘 전문가 또는 검색 기획자의 의견 또는 명령을 입력 받아 검색 결과의 랭킹을 정렬할 수 있다. 본 발명은 질의어에 대한 검색 결과의 랭킹을 상기 질의어에 대한 전문가 또는 검색 기획자 중심으로 정렬함으로써 보다 정확한 검색 모델을 생성할 수 있는 고속화 검색 모델링 시스템 및 방법을 제공할 수 있다.For example, the test collection generation unit 101 may receive a comment or command of an expert or a search planner having knowledge and experience in a corresponding field to which a query word belongs, to sort the ranking of search results through a user terminal. The present invention can provide a speedy search modeling system and method that can generate a more accurate search model by aligning the ranking of search results for a query term by experts or search planners for the query term.

테스트 컬렉션 생성부(101)는 특정 분야의 다수 질의어 각각에 대해 테스트 컬렉션을 생성할 수 있다. 따라서 생성되는 테스트 컬렉션의 개수는 하나 이상일 수 있다.The test collection generation unit 101 may generate a test collection for each of a plurality of queries in a specific field. Therefore, the number of test collections generated may be one or more.

결국, 본 발명의 일실시예에 따르면, 검색자가 전문 분야에 대한 질의어를 입력하여 검색을 하는 경우, 전문가 또는 검색 기획자의 의도에 따라 랭킹이 정렬된 검색 결과가 상기 검색자에게 노출될 수 있다. 즉 본 발명의 일실시예에 따르면, 전문 분야에 속하는 질의어에 대해 정확한 검색 결과는 검색자에게 제공될 수 있다. As a result, according to an embodiment of the present invention, when a searcher searches by inputting a query word for a specialized field, search results in which the ranking is arranged according to the intention of the expert or search planner may be exposed to the searcher. That is, according to an embodiment of the present invention, an accurate search result for a query belonging to a specialized field may be provided to a searcher.

테스트 컬렉션을 생성하는 과정은 도 2 및 도 3에서 구체적으로 설명된다.The process of generating the test collection is described in detail in FIGS. 2 and 3.

검색 모델 생성부(102)는 생성된 테스트 컬렉션으로부터 상기 질의어에 따른 정답 랭킹을 판단할 수 있는 검색 모델을 생성할 수 있다. 이 때, 검색 모델 생성부(102)는 기계학습 방법을 이용하여 테스트 컬렉션으로부터 검색 모델을 생성할 수 있다. 예를 들면, 검색 모델 생성부(102)는 Linear Regression, classification and regression tree, logistic regression, ListRank, Bradley-Terry Model, Multi-Class Bradley-Terry Model 등의 기계학습 방법을 이용하여 검색 모델을 생성할 수 있다. The search model generator 102 may generate a search model capable of determining the correct ranking according to the query word from the generated test collection. In this case, the search model generator 102 may generate a search model from the test collection by using a machine learning method. For example, the search model generator 102 may generate a search model by using a machine learning method such as linear regression, classification and regression tree, logistic regression, ListRank, Bradley-Terry Model, and Multi-Class Bradley-Terry Model. Can be.

또한, 검색 모델 생성부(102)는 검색 결과에 대해 적어도 하나의 피쳐(feature) 및 상기 피쳐에 대한 정규화 방법을 선택하여 검색 모델을 생성할 수 있다. 이 때, 피쳐는 검색 결과의 랭킹을 정렬할 때 기준이 되는 데이터를 의미할 수 있다. 즉, 검색 모델 생성부(102)는 테스트 컬렉션을 생성할 때 주로 어떠한 피쳐를 이용해서 검색 결과의 랭킹을 정렬하였는 지를 학습하여 검색 모델을 생성 할 수 있다.In addition, the search model generator 102 may generate a search model by selecting at least one feature and a normalization method for the feature with respect to the search result. In this case, the feature may mean data that is a reference when sorting the ranking of the search results. That is, the search model generator 102 may generate a search model by learning which features are used to sort the ranking of search results when generating a test collection.

검색 모델 생성부(102)가 검색 모델을 생성하기 위해 피쳐를 선택하는 과정은 도 4에서 구체적으로 설명된다.A process of selecting a feature to generate a search model by the search model generator 102 is described in detail with reference to FIG. 4.

검색 모델 평가부(103)는 생성된 검색 모델에 대해 성능을 평가할 수 있다. 검색 모델의 성능 평가를 통해 생성한 모델이 요구되는 검색 결과를 제공할 수 있는 지 여부를 판별할 수 있다The search model evaluator 103 may evaluate the performance of the generated search model. Evaluating the performance of the search model can determine whether the generated model can provide the required search results.

이 때, 검색 모델 평가부(103)는 검색 결과에 대해 선택된 피쳐 각각의 가중치를 분석할 수 있다. 즉, 분석된 가중치는 검색 결과의 랭킹을 정렬할 때 어떤 피쳐가 중요한 기준이 되었는 지를 알려줄 수 있다.In this case, the search model evaluator 103 may analyze the weight of each of the selected features with respect to the search result. In other words, the analyzed weights may indicate which features have become important criteria when sorting the ranking of search results.

또한, 검색 모델 평가부(103)는 상기 생성된 검색 모델에 대해 정확도 및 상관도를 실시간으로 확인할 수 있다. 즉, 본 발명의 일실시예에 따르면, 검색 모델 평가부(103)를 통해 검색 모델의 성능을 실시간으로 평가함으로써 검색 모델의 문제점을 빠른 시간 내에 파악할 수 있는 효과가 있다.In addition, the search model evaluation unit 103 may check the accuracy and the correlation with respect to the generated search model in real time. That is, according to the exemplary embodiment of the present invention, the search model evaluator 103 may evaluate the performance of the search model in real time, so that the problem of the search model may be quickly identified.

이 때, 검색 모델의 성능이 미리 설정한 기준을 만족하지 못하는 경우, 테스트 컬렉션 생성부(101)는 검색 결과의 랭킹을 재정렬하여 상기 생성된 테스트 컬렉션을 재생성할 수 있다. 도 1에서 볼 수 있듯이, 반복적인 테스트 컬렉션 생성, 검색 모델 생성 및 검색 모델 평가를 통해 일정 기준 이상의 성능을 발휘할 수 있는 최종적인 검색 모델(105)이 생성될 수 있다. 즉, 본 발명의 일실시예에 따르면, 평가 데이터의 분석을 통해 검색 모델의 성능을 평가함으로써 안정적인 성능을 보장할 수 있는 검색 모델(105)이 생성될 수 있다. 검색 모델 평가부(103)는 도 5 의 예를 통해 구체적으로 설명된다.At this time, if the performance of the search model does not meet the preset criteria, the test collection generator 101 may regenerate the generated test collection by rearranging the ranking of the search results. As shown in FIG. 1, a final search model 105 that can perform performance above a certain standard may be generated through iterative test collection generation, search model generation, and search model evaluation. That is, according to an embodiment of the present invention, the search model 105 that can ensure stable performance by evaluating the performance of the search model through analysis of the evaluation data can be generated. The search model evaluator 103 is described in detail through the example of FIG. 5.

도 2는 본 발명의 일실시예에 따른 테스트 컬렉션을 생성하는 과정의 일례를 도시한 도면이다.2 is a diagram illustrating an example of a process of generating a test collection according to an embodiment of the present invention.

구체적으로, 도 2는 질의어(201)에 대한 검색 결과를 정렬하는 과정을 도시하고 있다. 도 2를 참조하면, '영화'분야에서 '전쟁'이라는 질의어에 대한 검색 결과를 정렬하여 테스트 컬렉션을 생성하는 과정을 나타내고 있다. 도 2에서 테스트 컬렉션은 질의어(201)와 질의어(201)에 대해 랭킹에 따라 정렬된 검색 결과(202, 203)의 집합이라고 할 수 있다.Specifically, FIG. 2 illustrates a process of sorting the search results for the query word 201. Referring to FIG. 2, a process of generating a test collection by arranging search results for a query term 'war' in the 'movie' field is illustrated. In FIG. 2, the test collection may be referred to as a set of search results 202 and 203 arranged according to rankings for the query 201 and the query 201.

앞에서 이미 언급했듯이, 검색 결과는 데이터베이스(104)로부터 질의어에 대한 검색 결과를 제공받을 수 있다. 도 2에서 볼 수 있듯이, 질의어(201)는 '전쟁'외에도 '미녀', '캐리비안 해적', '해리포터', '슈퍼맨' 등 영화 분야에서 적어도 하나가 될 수 있다.As already mentioned above, the search results may be provided with search results for the query term from the database 104. As shown in FIG. 2, the query word 201 may be at least one in the movie field such as 'beauty', 'Caribbean pirate', 'Harry Potter', and 'Superman'.

테스트 컬렉션 생성부(101)는 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성할 수 있다. 이 때, 테스트 컬렉션 생성부(101)는 상기 검색 결과의 랭킹을 정렬하여 상기 질의어에 대한 테스트 컬렉션을 생성할 수 있다.The test collection generation unit 101 may generate a test collection by using a search result for the query. At this time, the test collection generation unit 101 may generate a test collection for the query by sorting the ranking of the search results.

도 2에서 볼 수 있듯이, '우주 전쟁'에 대한 검색 결과(203)이 1위이지만, 검색 결과의 랭킹을 정렬하여 4위인 '엑스맨-최후의 전쟁'에 대한 검색 결과(202)를 1위로 할 수 있다. 검색 결과의 랭킹을 정렬하는 기준은 검색 결과의 피쳐에 따라 달라질 수 있다. 예를 들어, '영화'에 대한 검색 결과인 경우, 검색 결과의 피쳐는 최신성, 이미지 수, 평점, 참여자 수, 명대사 수, 문서 길이 등을 포함할 수 있다. 이러한 검색 결과의 피쳐에 대해 영화 전문가 또는 검색 기획자가 검색 모델의 개발자보다 이해도가 높다.As shown in FIG. 2, the search result 203 for 'space war' is the first place, but the search result 202 for 'X-man-last war', which is the fourth place, is ranked first by sorting the ranking of the search results. can do. The criteria for sorting the ranking of search results may vary depending on the features of the search results. For example, in the case of a search result for 'movie', the features of the search result may include the latestness, the number of images, the rating, the number of participants, the number of famous words, the document length, and the like. Movie specialists or search planners understand the features of these search results better than developers of search models.

따라서, 일례로, 테스트 컬렉션 생성부(101)는 사용자 단말기를 통해 질의어가 속한 해당 분야의 지식 및 경험을 갖춘 전문가 또는 검색 기획자의 의견 또는 명령을 입력 받아 검색 결과를 랭킹에 따라 정렬할 수 있다.Thus, as an example, the test collection generation unit 101 may receive an opinion or command of an expert or search planner having knowledge and experience in the corresponding field to which the query belongs, and sort the search results according to the ranking through the user terminal.

도 3은 본 발명의 일실시예에 따른 테스트 컬렉션을 생성하는 과정의 다른 예를 도시한 도면이다. 3 is a diagram illustrating another example of a process of generating a test collection according to an embodiment of the present invention.

구체적으로, 도 3는 질의어(301)에 대한 검색 결과를 정렬하는 과정을 도시하고 있다. 도 3을 참조하면, '영화' 분야에 있어 '해리포터'라는 질의어에 대한 검색 결과를 정렬하여 테스트 컬렉션을 생성하는 과정을 나타내고 있다.Specifically, FIG. 3 illustrates a process of sorting the search results for the query word 301. Referring to FIG. 3, a process of generating a test collection by arranging search results for a query term 'Harry Potter' in the 'movie' field is illustrated.

도 3에서 볼 수 있듯이, 랭킹 1위로 정렬된 검색 결과가 3개인 것을 확인할 수 있다. 일례로, 테스트 컬렉션 생성부(101)는 질의어(301)에 대한 검색 결과(302, 303, 304)들을 랭킹으로 구분하기 어려운 경우 또는 피쳐의 차이가 거의 없는 경우, 동일한 랭킹으로 정렬할 수 있다. 예를 들어, 랭킹으로 구분하기 어려운 경우는 검색 결과 간의 유사한 검색 빈도를 나타내거나 시리즈 형태인 경우를 포함할 수 있다. 동일한 랭킹으로 정렬하기 위한 기준은 시스템의 구성에 따라 변경될 수 있다.As can be seen in Figure 3, it can be seen that there are three search results arranged in the ranking first place. For example, the test collection generator 101 may sort the search results 302, 303, and 304 for the query word 301 in the same ranking when it is difficult to distinguish the ranking by the ranking or when there is little difference in features. For example, the case where it is difficult to distinguish the ranking may include a case in which a similar search frequency between search results is displayed or a series form. The criteria for aligning with the same ranking may vary depending on the configuration of the system.

도 4는 본 발명의 일실시예에 따른 검색 모델 생성을 위해 피쳐를 선택하는 일례를 도시한 도면이다.4 is a diagram illustrating an example of selecting a feature for generating a search model according to an embodiment of the present invention.

이 때, 검색 모델은 특정 질의어에 대해 가장 적합성이 높은 정보를 검색하 는 과정을 추상화하는 모형을 의미할 수 있다. 검색 모델 생성부(102)는 테스트 컬렉션으로부터 질의어에 따른 정답 랭킹을 판단할 수 있는 검색 모델을 생성할 수 있다. 즉, 검색 모델 생성부(102)는 정렬된 검색 결과의 랭킹이 정답 랭킹인지 여부를 판단하기 위해 검색 모델을 생성할 수 있다. 이 때, 검색 모델 생성부(102)는 적어도 하나의 피쳐를 선택하여 기계 학습 방법을 통해 검색 모델을 생성할 수 있다. 예를 들면, In this case, the search model may mean a model that abstracts the process of searching for the most relevant information for a specific query. The search model generator 102 may generate a search model capable of determining the correct ranking according to the query from the test collection. That is, the search model generator 102 may generate a search model to determine whether the ranking of the sorted search results is the correct answer ranking. In this case, the search model generator 102 may select at least one feature to generate the search model through a machine learning method. For example,

도 4에 도시된 피쳐 선택 테이블(400)은 피쳐 각각에 대해 피쳐명(401), 피쳐에 대한 설명(402) 및 정규화 방법(403)으로 구성될 수 있다. 피쳐 선택 테이블(400)은 시스템에 따라 구성되는 목록이 달라질 수 있다. 도 4에서 볼 수 있듯이, 피쳐는 최신성, 이미지 수, 평점, 평점 참여자/리뷰 수, 명대사 수가 선택되었다. 일례로, 검색 모델 생성부(102)는 각 피쳐에 대해 정규화 방법을 부가적으로 선택하여 검색 모델을 생성할 수 있다. The feature selection table 400 shown in FIG. 4 may consist of a feature name 401, a description of the feature 402, and a normalization method 403 for each feature. The feature selection table 400 may vary depending on the system. As can be seen in Figure 4, the feature has been selected the freshness, the number of images, the rating, the number of participants / reviews, the number of pronouns. In one example, the search model generator 102 may additionally select a normalization method for each feature to generate a search model.

정규화 방법은 초기화 또는 로그 정규화를 포함할 수 있다. 즉, 피쳐의 값이 자릿수가 작은 경우, 해당 피쳐값은 초기값 그대로 이용될 수 있다. 반대로, 피쳐의 값이 자릿수가 큰 경우, 해당 피쳐값은 로그 정규화를 통해 이용될 수 있다. 정규화 방법을 선택하는 기준은 시스템의 구성에 따라 달라질 수 있다.The normalization method may include initialization or log normalization. That is, when the value of the feature has a small number of digits, the feature value may be used as it is. Conversely, if the value of the feature has a large number of digits, the feature value may be used through log normalization. The criteria for selecting a normalization method may vary depending on the configuration of the system.

도 5는 본 발명의 일실시예에 따라 검색 모델의 성능에 대한 평가 결과의 일례를 도시한 도면이다.5 is a diagram illustrating an example of an evaluation result of the performance of a search model according to an embodiment of the present invention.

구체적으로, 도 5는 학습 결과 테이블(500), 평가 데이터(505) 및 분석 그래프(508)를 도시하고 있다. 학습 결과 테이블(500)은 피쳐명(501), 피쳐 각각에 대한 설명(502), 정규화 방법(503) 및 중요도(504)를 포함할 수 있다. 검색 모델 평가부(103)는 검색 결과에 대해 선택된 피쳐 각각의 가중치를 분석할 수 있다. 도 5를 참고하면, 학습 결과 테이블(500)에서 중요도(504)의 항목이 분석된 가중치와 대응된다고 할 수 있다.Specifically, FIG. 5 shows a learning result table 500, evaluation data 505, and analysis graph 508. The training result table 500 may include a feature name 501, a description 502 for each feature, a normalization method 503, and an importance 504. The search model evaluator 103 may analyze the weight of each of the selected features with respect to the search result. Referring to FIG. 5, it may be said that the item of importance 504 corresponds to the analyzed weight in the learning result table 500.

즉, 검색 모델 평가부(103)는 어떠한 피쳐를 중심으로 검색 결과의 랭킹을 정렬하여 테스트 컬렉션을 생성하였는지 여부를 중요도 항목을 통해 평가할 수 있다. 도 5를 참고하면, 검색 모델 평가부(103)는 유사도, 최신성, 신뢰도 있는 평점을 중심으로 검색 결과의 랭킹을 정렬하여 테스트 컬렉션을 생성하였다고 평가할 수 있다.That is, the search model evaluator 103 may evaluate whether or not a feature has been generated by sorting the ranking of search results based on the importance item. Referring to FIG. 5, the search model evaluator 103 may evaluate that the test collection is generated by sorting the ranking of search results based on similarity, freshness, and reliable rating.

또한, 검색 모델 평가부(103)는 평가 데이터(505)를 통해 생성된 검색 모델에 대해 정확도(precision) 및 상관도(correlation)를 실시간으로 확인할 수 있다. 여기서, 정확도는 질의어와 생성된 검색 모델과의 정확도를 의미할 수 있다. 그리고, 상관도는 질의어와 검색 모델과의 상관도를 의미할 수 있다.In addition, the search model evaluator 103 may check the accuracy and correlation with respect to the search model generated through the evaluation data 505 in real time. Here, the accuracy may mean the accuracy of the query word and the generated search model. The correlation may mean a correlation between a query and a search model.

또한, 분석 그래프(508)는 질의어에 대한 테스트 컬렉션의 개수와 상관도의 관계를 나타낸다. 도 5를 참고하면, 질의어에 대한 테스트 컬렉션의 개수가 증가할수록 상관도가 증가함을 알 수 있다. 즉, 테스트 컬렉션을 많이 생성할수록 질의어와 검색 모델 사이의 상관 관계가 높아질 수 있다.The analysis graph 508 also shows the relationship between the number of test collections for the query and the correlation. Referring to FIG. 5, it can be seen that the correlation increases as the number of test collections for the query increases. In other words, the more test collections are created, the higher the correlation between the query and the search model.

도 6은 본 발명의 일실시예에 따른 고속화 검색 모델링 방법을 도시한 플로우차트이다.6 is a flowchart illustrating a speedy search modeling method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 질의어에 대한 검색 결과를 이용하여 테스트 컬렉션을 생성할 수 있다(S601). 테스트 컬렉션을 생성하는 단계(S601)는 상기 검색 결과의 랭킹을 정렬하여 상기 질의어에 대한 테스트 컬렉션을 생성할 수 있다. 앞에서 이미 언급했듯이, 테스트 컬렉션은 특정 질의어와 상기 질의어에 대한 검색 결과가 정렬된 랭킹의 집합이라고 할 수 있다.The fast search modeling method according to an embodiment of the present invention may generate a test collection by using a search result for a query (S601). Generating a test collection (S601) may generate a test collection for the query by sorting the ranking of the search results. As mentioned earlier, a test collection is a set of rankings in which a particular query and the search results for that query are sorted.

다시 말해서, 테스트 컬렉션은 질의어와 상기 질의어에 대한 검색 결과의 정답적인 랭킹을 포함하는 집합을 의미할 수 있다. 여기서, 질의어에 대한 검색 결과의 정답적인 랭킹은 최초 정렬 과정에서 생성될 수 있지만, 반복적인 재정렬 과정을 통해서 생성될 수도 있다.In other words, the test collection may refer to a collection including a query and a correct ranking of search results for the query. Here, the correct ranking of the search results for the query may be generated during the initial sorting process, but may also be generated through an iterative reordering process.

이 때, 테스트 컬렉션을 생성하는 단계(S601)는 상기 검색 결과의 랭킹을 구분할 수 없는 경우, 동일한 순위로 정렬할 수 있다. 즉, 테스트 컬렉션을 생성하는 단계(S601)는 검색 결과 간에 순위 산정하기가 모호하여 랭킹을 구분할 수 없는 경우, 동일한 순위로 정렬할 수 있다. 또한, 테스트 컬렉션은 특정 분야의 다수 질의어 각각에 대해 생성될 수 있으며, 생성되는 테스트 컬렉션의 개수는 하나 이상일 수 있다.At this time, the step of generating a test collection (S601) can be sorted in the same order, if the ranking of the search results can not be distinguished. That is, the step (S601) of generating a test collection may be sorted by the same rank when the ranking cannot be distinguished because it is ambiguous to rank the search results. In addition, a test collection may be generated for each of a plurality of queries in a specific field, and the number of test collections generated may be one or more.

일례로, 테스트 컬렉션을 생성하는 단계(S601)는 사용자 단말기를 통해 질의어가 속한 해당 분야의 지식 및 경험을 갖춘 전문가 또는 검색 기획자의 의견 또는 명령을 입력 받아 검색 결과의 랭킹을 정렬할 수 있다. 본 발명은 질의어에 대한 검색 결과의 랭킹을 상기 질의어에 대한 전문가 또는 검색 기획자 중심으로 정렬함으로써 보다 정확한 검색 모델을 생성할 수 있는 고속화 검색 모델링 방법을 제공할 수 있다.For example, the step of generating a test collection (S601) may receive an opinion or command of an expert or search planner having knowledge and experience in a corresponding field to which a query word belongs, to sort the ranking of search results through a user terminal. The present invention can provide a faster search modeling method that can generate a more accurate search model by aligning the ranking of the search results for the query to the expert or search planner for the query.

본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 상기 테스트 컬렉션으로부터 상기 질의어에 따른 정답 랭킹을 판단할 수 있는 검색 모델을 생성할 수 있다(S602).The speed search modeling method according to an embodiment of the present invention may generate a search model capable of determining a correct ranking according to the query word from the test collection (S602).

이 때, 검색 모델을 생성하는 단계(S602)는 기계 학습 방법을 통해 검색 모델을 생성할 수 있다. 일례로, 검색 모델을 생성하는 단계(S602)는 Linear Regression, classification and regression tree, logistic regression, ListRank, Bradley-Terry Model, Multi-Class Bradley-Terry Model 등의 기계학습 방법을 이용하여 검색 모델을 생성할 수 있다. At this time, in step S602 of generating a search model, a search model may be generated through a machine learning method. In one example, generating the search model (S602) generates a search model by using a machine learning method such as linear regression, classification and regression tree, logistic regression, ListRank, Bradley-Terry Model, Multi-Class Bradley-Terry Model, etc. can do.

이 때, 검색 모델을 생성하는 단계(S602)는 상기 검색 결과에 대해 적어도 하나의 피쳐(feature) 및 상기 피쳐에 대한 정규화 방법을 선택하여 검색 모델을 생성할 수 있다. 이 때, 피쳐는 검색 결과의 랭킹을 정렬할 때 기준이 되는 데이터를 의미할 수 있다. 즉, 검색 모델을 생성하는 단계(S602)는 전문가 또는 검색 기획자가 검색 결과의 랭킹을 정렬할 때 기준이 되는 피쳐를 참고하여 기계 학습 방법을 통해 검색 모델을 생성할 수 있다.In this case, the generating of the search model (S602) may generate a search model by selecting at least one feature and a normalization method for the feature with respect to the search result. In this case, the feature may mean data that is a reference when sorting the ranking of the search results. That is, in operation S602 of generating a search model, a search model may be generated through a machine learning method by referring to a feature that is a reference when an expert or a search planner sorts the ranking of search results.

본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 상기 생성된 검색 모델에 대해 상기 검색 모델의 성능을 평가할 수 있다(S603).In the speed search modeling method according to an embodiment of the present invention, the performance of the search model may be evaluated with respect to the generated search model (S603).

이 때, 검색 모델의 성능을 평가하는 단계(S603)는 상기 검색 결과에 대해 선택된 피쳐 각각의 가중치를 분석할 수 있다. 즉, 검색 모델의 성능을 평가하는 단계(S603)는 가중치를 분석함으로써 검색 결과 정렬을 통해 테스트 컬렉션을 생성할 때, 전문가 또는 검색 기획자가 중점적으로 참고한 피쳐를 판단할 수 있다.At this time, in step S603 of evaluating the performance of the search model, the weight of each of the selected features may be analyzed for the search result. That is, in the step of evaluating the performance of the search model (S603), when the test collection is generated by sorting the search results by analyzing the weights, the expert or the search planner may determine a feature that is mainly referred to.

이 때, 검색 모델의 성능을 평가하는 단계(S603)는 상기 생성된 검색 모델에 대해 정확도 및 상관도를 실시간으로 확인할 수 있다. 즉, 검색 모델의 성능을 평가하는 단계(S603)는 검색 모델의 성능을 실시간으로 평가함으로써 검색 모델의 문제점을 빠른 시간 내에 파악할 수 있다.In this case, in the step of evaluating the performance of the search model (S603), the accuracy and the correlation may be confirmed in real time with respect to the generated search model. That is, in the step of evaluating the performance of the search model (S603), the problem of the search model may be identified in a short time by evaluating the performance of the search model in real time.

이 때, 테스트 컬렉션을 생성하는 단계(S601)는 상기 검색 모델의 성능이 미리 설정한 기준을 만족하지 못하는 경우, 상기 검색 결과의 랭킹을 재정렬하여 상기 생성된 테스트 컬렉션을 재생성할 수 있다. 즉, 본 발명의 일실시예에 따르면, 테스트 컬렉션을 생성하는 단계(S601)를 통해 검색 모델의 성능을 평가하고, 평가 데이터를 근거로 다시 테스트 컬렉션을 생성함으로써 안정적인 성능을 보장할 수 있는 검색 모델이 생성될 수 있다.In this case, in the step S601 of generating a test collection, when the performance of the search model does not meet a preset criterion, the generated test collection may be regenerated by rearranging the ranking of the search results. That is, according to an embodiment of the present invention, a search model for evaluating the performance of the search model through the step of generating a test collection (S601) and ensuring a stable performance by generating the test collection again based on the evaluation data. Can be generated.

도 6에서 설명되지 않은 부분은 도 1 내지 도 5를 참고할 수 있다.Parts not described in FIG. 6 may refer to FIGS. 1 to 5.

또한 본 발명의 일실시예에 따른 고속화 검색 모델링 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, the fast search modeling method according to an embodiment of the present invention includes a computer readable medium including program instructions for performing various computer-implemented operations. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium or program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the present invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications thereof will belong to the scope of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 고속화 검색 모델링 시스템100: Fast Search Modeling System

101: 테스트 컬렉션 생성부101: test collection generator

102: 검색 모델 생성부102: search model generation unit

103: 검색 모델 평가부103: search model evaluation unit

104: 데이터베이스104: database

105: 검색 모델105: search model

Claims

A test collection generator configured to generate a test collection using a search result for a query;

A search model generator configured to generate a search model capable of determining a correct answer ranking according to the query word from the test collection through a machine learning method; And

Search model evaluation unit for evaluating the performance of the generated search model

Speeding search modeling system comprising a.

The method of claim 1,

The test collection generation unit,

Speeding search modeling system, characterized in that to sort the ranking of the search results to generate a test collection for the query.

The method of claim 1,

The test collection generation unit,

If the ranking of the search results can not be distinguished, high-speed search modeling system, it characterized in that the sorting in the same rank.

The method of claim 1,

The test collection,

And a query between the query word and the correct answer ranking of the search results for the query word.

delete

The method of claim 1,

The search model generation unit,

Speed search modeling system characterized in that for generating a search model by selecting at least one feature (feature) and the normalization method for the feature on the search results.

The method of claim 6,

The search model evaluation unit,

Speed search modeling system characterized in that for analyzing the weight of each of the selected feature on the search results.

The method of claim 1,

The search model evaluation unit,

Speed search modeling system characterized in that for checking the accuracy and the correlation in real time with respect to the generated search model.

The method of claim 1,

The test collection generation unit,

And when the performance of the search model does not satisfy a predetermined criterion, regenerating the generated test collection by rearranging the ranking of the search results.

Generating a test collection using the search results for the query word;

Generating a search model capable of determining a correct ranking according to the query word from the test collection through a machine learning method; And

Evaluating the performance of the search model with respect to the generated search model;

Speed search modeling method comprising a.

The method of claim 10,

The step of creating a test collection,

And sorting the ranking of the search results to generate a test collection for the query.

The method of claim 10,

The step of creating a test collection,

If the ranking of the search results can not be distinguished, high-speed search modeling method characterized in that the sorting can be arranged in the same rank.

The method of claim 10,

The test collection,

And a query between the query and the correct answer ranking of the search results for the query.

delete

The method of claim 10,

The step of generating a search model,

Speed search modeling method characterized in that for generating a search model by selecting at least one feature (feature) and the normalization method for the feature on the search results.

The method of claim 15,

The step of evaluating the performance of the search model,

Speed search modeling method characterized in that for analyzing the weight of each selected feature on the search results.

The method of claim 10,

The step of evaluating the performance of the search model,

Speed search modeling method characterized in that for verifying the accuracy and correlation in real time with respect to the generated search model.

The method of claim 10,

The step of creating a test collection,

And if the performance of the search model does not satisfy a predetermined criterion, regenerating the generated test collection by rearranging the ranking of the search results.

19. A computer readable recording medium having recorded thereon a program for executing the method of any one of claims 10 to 13 or 15 to 18.