KR100525616B1

KR100525616B1 - Method and system for identifying related search terms in the internet search system

Info

Publication number: KR100525616B1
Application number: KR10-2004-0026074A
Authority: KR
Inventors: 최재걸; 문상준; 최병엽; 이준호
Original assignee: 엔에이치엔(주)
Priority date: 2004-04-16
Filing date: 2004-04-16
Publication date: 2005-11-03
Also published as: KR20050100873A

Abstract

본 발명은 상호 연관된 검색 쿼리(query)를 추출하는 방법 및 시스템에 관한 것으로서, 더욱 상세하게는 각 검색 쿼리가 입력된 검색 세션(session)의 수와 상기 검색 쿼리를 포함하는 검색 쿼리의 쌍이 입력된 검색 세션의 수를 측정하고, 이를 이용하여 검색 쿼리 간 연관 여부를 판단할 수 있는 연관 검색 쿼리 추출 방법 및 시스템에 관한 것이다. The present invention relates to a method and system for extracting correlated search queries, and more particularly, to a number of search sessions in which each search query is input and a pair of search queries including the search query. The present invention relates to a related search query extraction method and system that can measure the number of search sessions and determine whether the search sessions are related to each other.

본 발명에 따른 연관 검색 쿼리 추출 방법은 검색 세션(session) 및 상기 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 레코드를 포함하는 데이터베이스를 유지하는 단계(상기 레코드는 소정의 시간 간격 마다 생성되어 상기 데이터베이스에 기록됨), 상기 데이터베이스를 참조하여 상기 시간 간격 당 설정된 총 검색 세션의 수를 카운팅(counting)하여 총 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 제1 검색 쿼리가 수신된 검색 세션의 수를 카운팅(counting)하여 제1 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제2 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제3 검색 세션 수 정보를 생성하는 단계, 상기 제1 검색 세션 수 정보 및 상기 제3 검색 세션 수 정보를 이용하여 조건부 확률(conditional probability) 정보를 생성하는 단계, 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성하는 단계, 및 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단하는 단계를 포함하는 것을 특징으로 한다.In accordance with another aspect of the present invention, there is provided a method for extracting an associated search query, the method including: maintaining a database including a search session and a record relating to a search query received from a user terminal during the search session (the record is generated at predetermined time intervals; Recorded in the database), generating a total number of search sessions by counting the total number of search sessions set per time interval with reference to the database, and referring to the database, a first search query per time interval Counting the number of search sessions received to generate first search session number information, and counting the number of search sessions for which a second search query was received per time interval with reference to the database for a second search session. Generating number information, referring to the database for each time interval; Generating third search session number information by counting the number of search sessions in which the first search query and the second search query have been received, using the first search session number information and the third search session number information Generating conditional probability information, and performing correlation using the total search session number information, the first search session number information, the second search session number information, and the third search session number information. ) Generating information, and determining whether the first search query and the second search query are related to each other based on the conditional probability information and the correlation information.

본 발명에 따르면, 사용자로부터 입력 받은 검색 쿼리에 관한 데이터를 효과적으로 수집, 분석하여 검색 쿼리 간 연관 여부를 자동적으로 판단함으로써 정확한 연관 검색 쿼리를 신속하게 추출하여 보다 질 높은 서비스를 사용자에게 제공할 수 있는 연관 검색 쿼리 추출 방법 및 시스템이 제공된다.According to the present invention, by efficiently collecting and analyzing the data related to the search query received from the user to automatically determine whether or not the association between the search query to quickly extract the correct associated search query to provide a higher quality service to the user An associated search query extraction method and system are provided.

Description

{METHOD AND SYSTEM FOR IDENTIFYING RELATED SEARCH TERMS IN THE INTERNET SEARCH SYSTEM}

일반적으로 검색 서비스를 제공하는 검색 서비스 시스템은 사용자로부터 검색쿼리가 입력되면 상기 검색 쿼리에 대응하는 검색 결과(예를 들면, 상기 검색 쿼리를 포함하는 웹 사이트, 상기 검색 쿼리를 포함하는 기사, 상기 검색 쿼리를 포함하는 파일명을 갖는 이미지 등)를 사용자에게 제공한다.In general, a search service system that provides a search service, when a search query is input from a user, a search result corresponding to the search query (for example, a web site including the search query, an article including the search query, and the search). An image with a file name containing the query).

한편, 최근의 검색 서비스 시스템은 사용자가 원하는 정보를 보다 빠르고 정확하게 찾을 수 있도록 하기 위해 사용자로부터 입력 받은 검색 쿼리와 연관 관계가 있는 검색 쿼리를 추출하여 사용자에게 제공하는 연관 검색 쿼리 제공 서비스를 제공하고 있다. 즉, 검색 서비스 시스템은 입력된 검색 쿼리에 따라 매우 다른 검색 결과를 사용자에게 제공하는 것이 보통이다. 예를 들어, 검색자가 "자동차"를 입력하여 얻은 검색 결과와 "승용차"를 입력하여 얻게 되는 검색 결과는 서로 상이하다. 따라서, 검색자는 자신이 원하는 정보를 얻기 위하여 자신이 원하는 정보와 좀 더 관련성이 높은 검색 쿼리를 입력하려고 하지만, 검색자 스스로 이러한 검색 쿼리를 떠올리기가 어려운 경우들이 있다. 따라서, 최근의 검색 서비스 시스템은 사용자가 입력한 검색 쿼리 및 통계 정보를 이용하여 상기 입력된 검색 쿼리와 관련성이 있는 검색 쿼리들을 사용자에게 제공함으로써, 사용자가 다른 검색 쿼리를 이용하여 검색할 수 있도록 하는 서비스를 제공하고 있다.Meanwhile, the recent search service system provides a related search query providing service that extracts a search query related to a search query input from the user and provides the search query to the user so that the user can find the desired information more quickly and accurately. . That is, the search service system generally provides users with very different search results according to the input search query. For example, a search result obtained by a searcher by entering "car" and a search result obtained by entering "car" are different from each other. Therefore, the searcher tries to input a search query that is more relevant to the information he / she wants in order to obtain the information he / she wants, but it is difficult for the searcher to think of such a search query by himself. Therefore, a recent search service system provides a user with search queries related to the input search query by using the search query and statistical information input by the user so that the user can search by using another search query. Providing a service.

여기서, 상기 연관 관계가 있는 검색 쿼리라 함은, 예를 들어 사용자로부터 입력 받은 검색 쿼리의 상위 개념 또는 하위 개념에 해당하는 검색 쿼리("일본어"를 입력 받은 경우 "외국어", 또는 역으로 "외국어"를 입력 받은 경우 "일본어"), 사용자로부터 입력 받은 검색 쿼리와 동의어 관계에 있는 검색 쿼리("책방"을 입력 받은 경우 "서점"), 사용자로부터 입력 받은 검색 쿼리와 유의어 관계에 있는 검색 쿼리("꼬리"를 입력 받은 경우 "꽁지"), 사용자로부터 입력 받은 검색 쿼리와 관계어 관계에 있는 검색 쿼리("see"를 입력 받은 경우 "saw, seen, seeing") 등 의미적으로 관련이 있는 검색 쿼리를 의미할 수 있다. 그러나, 상기 연관 관계가 있는 검색 쿼리는 단순히 의미적 관련이 있는 검색 쿼리만을 의미하는 것은 아니며, 예를 들어 사용자로부터 "박찬호"를 입력 받은 경우 그의 직업인 "야구", 그가 속한 리그인 "메이저 리그", 그의 출신 대학인 "한양대", 그가 속해 있는 소속팀인 "텍사스 레인저스", 같은 메이저 리그에 속한 한국인 야구 선수인 "김병현" 등 다양한 관점에서의 관련이 있는 검색 쿼리를 의미할 수 있다. Here, the related search query is, for example, a search query corresponding to a higher concept or a lower concept of a search query input from a user ("foreign language" when the Japanese language is input, or "foreign language"). "Japanese" if you enter "), a search query that is synonymous with the search query you entered (" Bookstore "if you typed" bookstore "), or a search query that is synonymous with the search query you entered ( `` Tails '' if you type "tail"), or a search query that is related to a search query entered by a user ("saw, seen, seeing" if you typed "see") Can mean a query. However, the relevant search query does not simply mean a search query that has a semantic relation, for example, if the user inputs "Park Chan-ho", his occupation "baseball" and his league "major league" , His college "Hanyang University", his team "Texas Rangers", and Korean baseball player "Kim Byung-hyun" in the same major league.

그런데, 종래 기술에 따른 연관 검색 쿼리 제공 서비스에 의하면, 서비스 운영자가 하나의 검색 쿼리와 연관성이 있는 다른 검색 쿼리를 일일이 분류하여 저장해야 했기 때문에, 이를 위해 시간적, 경제적 손실을 감수해야만 하는 문제점이 있었다.However, according to the related art search query providing service, a service operator had to sort and store another search query that is related to one search query, and thus there was a problem in that it had to take time and economic loss. .

또한, 보다 적은 시간과 비용으로 연관 검색 쿼리를 추출하기 위해, 용어들 간의 관련성을 동시에 발생할 확률로 정의한 동시 발생 분류 방법, 문서들을 분류한 후에 각 그룹에서만 주로 나타나는 용어들을 관련어로 정의하는 문서 분류 방법, 어학적 지식과 문서에서의 동시 발생 특성을 이용하여 용어들 간의 관계를 파악하는 문법 분류 방법이 등장하였으나, 상기 방법들은 주로 용어들 간의 의미적 관계는 고려하지 않고 통계적인 관계만 고려함으로 인해, 사용자가 추출된 연관 검색 쿼리 간의 연관 관계를 납득할 수 없는 경우가 발생하는 문제점이 있었다.In addition, in order to extract the associated search query with less time and cost, the concurrent classification method defined as the probability of occurrence of relevance among terms simultaneously, and the document classification method defined by related terms as terms that appear mainly in each group after classifying documents In addition, a grammar classification method for identifying relationships between terms by using linguistic knowledge and co-occurrence characteristics in documents has emerged. However, these methods mainly deal with statistical relationships without considering semantic relationships between terms. There was a problem that a user could not understand the association between the extracted association search queries.

이러한 문제를 해결하기 위해, 한국특허등록 제 10-0372078 호는 사용자로부터 입력 받은 질의어 로그를 이용하여 관련어집을 자동으로 생성하고, 생성된 관련어집을 이용하여 의미적으로 관련성이 있는 관련어를 검색할 수 있도록 하는 관련어 검색 방법에 대해 개시하고 있다.In order to solve this problem, Korean Patent Registration No. 10-0372078 automatically generates a related glossary using a query log input from a user, and searches for a semantically relevant related word using the generated related glossary. A related word search method is disclosed.

상기 관련어 검색 방법은 사용자가 입력한 질의어들 중 일정한 세션 내에 이루어진 질의어들 중 일부를 관련어로 등록하여 관련어집을 생성하고, 관련어 검색 장치에 질의어를 입력하면, 상기 질의어의 관련어들을 식별하여 정렬하고 이를 사용자에게 제공한다.The related word search method registers some of the query words made in a specific session among the query words input by the user as related words, and generates a related glossary. When a query is input into the related word search apparatus, the related words of the query words are identified and sorted, and the user is input. To provide.

그러나, 상기 관련어 검색 방법은 일정한 세션 내에 이루어진 질의어 로그를 이용하여 질의어 간 연관 여부를 판단할 수는 있지만, 아래와 같은 종래 기술의 문제를 그대로 안고 있다.However, although the related word search method can determine whether or not a query is related to each other by using a query log made within a specific session, the related art problem is as follows.

즉, 상기 관련어 검색 방법을 이용하는 경우에도 1) 같은 세션에서 1회 이상 출현한 용어라면 이를 모두 관련어로 등록함으로 인해 사용자가 추출된 관련어 간의 연관 관계를 납득할 수 없는 경우가 발생한다는 점, 2) 일정 기간 동안 누적된 데이터를 체계적으로 분석하여 관련어 등록을 수행하는 것이 아님으로 인해 우연히 같은 세션에서 출현한 용어 모두가 관련어로 등록된다는 점, 3) 통상 사용자로부터 입력 되는 것은 질의어가 아닌 검색 쿼리임에도 불구하고, 질의어로부터 용어를 추출하는 방법을 채택함으로 인해 검색 쿼리 간 연관 관계를 파악하기 위해 불필요한 시간이 소비된다는 점, 4) 같은 세션에서 1회 이상 출현한 용어라면 이를 모두 관련어로 등록함으로 인해 너무 많은 관련어가 존재할 수 있다는 점 등의 문제점이 여전히 존재한다. In other words, even when using the related word search method, 1) if a term appears more than once in the same session, the user may not be able to understand the relationship between the extracted related words because all of them are registered as related words. Because it is not a systematic analysis of accumulated data over a period of time, the related words are coincidentally registered in the same session. 3) In general, although the input from a user is a search query, not a query, In addition, the method of extracting terms from query terms requires unnecessary time to identify correlations between search queries. 4) If a term appears more than once in the same session, all of them are registered as related words. There are still problems such as that may exist.

결국, 상기 관련어 검색 방법에 의하더라도, 사용자로부터 입력 받은 검색 쿼리와 연관 관계가 없는 검색 쿼리가 추출될 가능성이 여전히 높고, 이로 인해 질 높은 서비스를 제공 받아 원하는 정보를 보다 빠르고 정확하게 찾기를 원하는 사용자의 욕구를 충족시킬 수 없는 단점이 있다.As a result, even with the related term search method, it is still highly likely that a search query that is not related to a search query input from a user is extracted, and thus, a user who wants to search for desired information more quickly and accurately by providing a high quality service. There is a disadvantage that cannot satisfy the desire.

이에, 사용자로부터 입력 받은 검색 쿼리에 관한 데이터를 효과적으로 수집하고, 상기 수집된 데이터를 체계적으로 분석하고 이를 이용하여 검색 쿼리 간 연관 여부를 정확하게 판단함으로써 사용자에게 보다 질 높은 서비스를 제공할 수 있는 새로운 기술의 출현이 요청되어 왔다. Therefore, a new technology that can provide a higher quality service to the user by effectively collecting data about the search query received from the user, by analyzing the collected data systematically and using the same to accurately determine the association between the search query Has been called for.

본 발명은 상술한 바와 같은 종래 기술의 문제점을 해결하기 위해 안출된 것으로서, 사용자로부터 입력 받은 검색 쿼리에 관한 데이터를 효과적으로 수집, 분석하여 검색 쿼리 간 연관 여부를 자동적으로 판단할 수 있는 시스템을 구축함으로써, 하나의 검색 쿼리와 연관성이 있는 다른 검색 쿼리를 일일이 분류하여 저장함으로 인해 서비스 운영자에게 야기되는 시간적, 경제적 손실을 줄일 수 있는 연관 검색 쿼리 추출 방법 및 시스템을 제공하는 것을 그 목적으로 한다.The present invention has been made to solve the problems of the prior art as described above, by constructing a system that can automatically determine the association between the search query by effectively collecting and analyzing data about the search query received from the user It is an object of the present invention to provide an associated search query extraction method and system that can reduce the time and economic loss caused to a service operator by classifying and storing another search query that is related to one search query.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템은 검색 쿼리 간 연관 지수 정보를 체계적으로 기록하는 데이터베이스를 유지함으로써, 사용자로부터 검색 쿼리가 입력된 경우 상기 연관 지수 정보를 이용하여 연관 정도가 더 높은 검색 쿼리를 우선적으로 상기 사용자에게 제공하는 것을 그 목적으로 한다.In addition, the related search query extraction method and system according to the present invention maintains a database that systematically records the association index information between search queries, so that when the search query is input from the user, the degree of association is higher by using the association index information. Its purpose is to provide a search query to the user first.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템은 사용자들로부터 입력된 검색 데이터 중 체계적인 전처리 과정을 거친 유용한 데이터만을 추출하여 적절한 수의 연관 검색 쿼리를 유지함으로써, 진정으로 의미 있는 연관 검색 쿼리를 추출하여 보다 질 높은 연관 검색 쿼리 서비스를 사용자에게 제공하는 것을 그 목적으로 한다.In addition, the related search query extraction method and system according to the present invention extracts only useful data that has undergone a systematic preprocessing process among the search data inputted from users and maintains an appropriate number of related search queries, thereby creating a truly meaningful related search query. The purpose of this is to provide users with higher quality related search query service.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템은 일정 기간 동안 누적된 데이터를 종합하여 검색 쿼리 간 연관 여부를 판단함으로써, 장기간에 걸쳐 일정한 연관 관계를 유지하고 있는 검색 쿼리를 사용자에게 제공하여 보다 정확한 연관 검색 쿼리 서비스를 제공 받기를 원하는 사용자의 욕구를 충족시키는 것을 그 목적으로 한다.In addition, the related search query extraction method and system according to the present invention aggregates the data accumulated for a certain period of time to determine whether or not the association between the search query, providing a user with a search query that maintains a constant association for a long time Its purpose is to meet the needs of users who want to provide accurate associated search query services.

상기의 목적을 달성하고, 상술한 종래 기술의 문제점을 해결하기 위하여, 본 발명에 따른 연관 검색 쿼리 추출 방법은 검색 세션(session) 및 상기 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 레코드를 포함하는 데이터베이스를 유지하는 단계(상기 레코드는 소정의 시간 간격 마다 생성되어 상기 데이터베이스에 기록됨), 상기 데이터베이스를 참조하여 상기 시간 간격 당 설정된 총 검색 세션의 수를 카운팅(counting)하여 총 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 제1 검색 쿼리가 수신된 검색 세션의 수를 카운팅(counting)하여 제1 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제2 검색 세션 수 정보를 생성하는 단계, 상기 데이터베이스를 참조하여 상기 시간 간격 당 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제3 검색 세션 수 정보를 생성하는 단계, 상기 제1 검색 세션 수 정보 및 상기 제3 검색 세션 수 정보를 이용하여 조건부 확률(conditional probability) 정보를 생성하는 단계, 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성하는 단계, 및 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object and to solve the above-mentioned problems of the prior art, the related search query extraction method according to the present invention includes a search session and a record relating to a search query received from a user terminal during the search session. Maintaining a database (the record is generated at predetermined time intervals and recorded in the database), and counting the total number of search sessions set per time interval with reference to the database to obtain the total number of search session information. Generating a first search session number information by counting the number of search sessions in which a first search query is received per time interval with reference to the database; and generating the first search session number information with reference to the database. Modifying a second search session by counting the number of search sessions for which a second search query was received Generating a third search session number information by counting the number of search sessions in which the first search query and the second search query have been received per time interval, with reference to the database; Generating conditional probability information using session number information and third search session number information, the total search session number information, the first search session number information, the second search session number information, and Generating correlation information by using the third search session number information, and determining whether to correlate between the first search query and the second search query based on the conditional probability information and the correlation information; Characterized in that it comprises a step.

본 발명의 일측에 따르면, 상기 조건부 확률 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단하는 상기 단계는 상기 조건부 확률 정보가 소정의 수치 이상인 경우에 한하여 상기 연관 여부를 판단하고, 상기 수치는 상기 제1 검색 세션 수 정보를 변수로 하는 소정의 함수에 기초하여 변동되는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다.According to one aspect of the invention, the step of determining whether or not the association between the first search query and the second search query based on the conditional probability information is determined whether the association only when the conditional probability information is a predetermined value or more. And the numerical value is varied based on a predetermined function using the first search session number information as a variable.

또한, 본 발명의 다른 일측에 따르면, 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단하는 상기 단계는 상기 조건부 확률 정보 및 상기 상관 관계 정보를 이용하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 지수 정보를 생성하는 단계 및 상기 연관 지수 정보에 기초하여 상기 연관 여부를 판단하는 단계를 포함하고, 상기 연관 여부 판단 결과, 연관된 것으로 판단된 경우 상기 연관 지수 정보를 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리와 연관하여 제2 데이터베이스에 기록하는 단계, 사용자 단말기로부터 제3 검색 쿼리를 수신하는 단계, 상기 제2 데이터베이스를 참조하여 상기 제3 검색 쿼리와 연관된 하나 이상의 제4 검색 쿼리를 추출하는 단계, 상기 연관 지수 정보에 따라 상기 추출된 제4 검색 쿼리를 소팅(sorting)하여 연관 검색 쿼리 목록을 생성하는 단계, 및 상기 생성된 연관 검색 쿼리 목록을 상기 사용자 단말기로 제공하는 단계를 더 포함하는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다.According to another aspect of the present invention, the step of determining whether or not the association between the first search query and the second search query based on the conditional probability information and the correlation information is the conditional probability information and the correlation Generating association index information between the first search query and the second search query using the information; and determining the association based on the association index information, and as a result of the association determination, If determined, recording the association index information in a second database in association with the first search query and the second search query, receiving a third search query from a user terminal, and referring to the second database. Extracting at least one fourth search query associated with a third search query; And sorting the extracted fourth search query to generate a related search query list, and providing the generated related search query list to the user terminal. Extraction methods are provided.

또한, 본 발명에 따른 연관 검색 쿼리 추출 시스템은 검색 세션(session) 및 상기 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 레코드를 포함하는 데이터베이스, 상기 레코드를 소정의 시간 간격 마다 생성하여 상기 데이터베이스에 기록하는 데이터베이스 관리 수단, 상기 데이터베이스를 참조하여, 상기 시간 간격 당 설정된 총 검색 세션의 수를 카운팅(counting)하여 총 검색 세션 수 정보를 생성하고, 상기 시간 간격 당 제1 검색 쿼리가 수신된 검색 세션의 수를 카운팅(counting)하여 제1 검색 세션 수 정보를 생성하고, 상기 시간 간격 당 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제2 검색 세션 수 정보를 생성하며, 상기 시간 간격 당 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제3 검색 세션 수 정보를 생성하는 카운터 수단, 상기 제1 검색 세션 수 정보 및 상기 제3 검색 세션 수 정보를 이용하여 조건부 확률(conditional probability) 정보를 생성하는 조건부 확률 정보 생성 수단, 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성하는 상관 관계 정보 생성 수단, 및 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단하는 연관 여부 판단 수단을 포함하는 것을 특징으로 한다.In addition, the association search query extraction system according to the present invention is a database comprising a search session and a record relating to a search query received from a user terminal during the search session, generating the record at predetermined time intervals to the database Database management means for recording, counting the total number of search sessions set per said time interval, with reference to said database to generate total search session number information, and a search session for which a first search query has been received per said time interval; Counting the number of times to generate first search session number information, counting the number of search sessions from which a second search query was received per time interval, to generate second search session number information, and per second time interval. A third search count by counting the number of search sessions in which the first search query and the second search query have been received; Counter means for generating number information, conditional probability information generating means for generating conditional probability information using the first search session number information and the third search session number information, the total search session number information, the Correlation information generating means for generating correlation information using first search session number information, second search session number information, and third search session number information, and the conditional probability information and the correlation And association means for determining association between the first search query and the second search query based on the information.

이하, 첨부된 도면을 참조하여 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템을 상세히 설명하기로 한다.Hereinafter, an association search query extraction method and system according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 연관 검색 쿼리 추출 시스템의 네트워크 연결을 도시한 도면이다. 사용자들은 사용자 단말기(110a 또는 110b)를 이용하여 유무선 통신망을 통해 연관 검색 쿼리 추출 시스템(100)에 접속하여 검색 쿼리를 입력한다. 연관 검색 쿼리 추출 시스템(100)은 상기 검색 쿼리에 대응하는 연관 검색 쿼리를 사용자 단말기(110a 또는 110b)로 전송한다. 또한, 본 발명의 또 다른 실시예에 따른 연관 검색 쿼리 추출 시스템(100)은 상기 연관 검색 쿼리의 연관 지수 정보에 따른 순위 정보 등을 사용자 단말기(110a 또는 110b)에 더 전송할 수 있다. 본 발명의 연관 검색 쿼리 추출 시스템은 인터넷 검색 서비스 시스템에 통합되어 운영될 수 있고, 따라서, 사용자가 인터넷 검색 서비스 시스템에 접속하여 검색 쿼리를 입력한 경우, 상기 검색 쿼리에 대한 검색 결과를 사용자에게 제공할 때에 상기 검색 쿼리의 연관 검색 쿼리도 상기 사용자에게 제공될 수 있다.1 is a diagram illustrating a network connection of an association search query extraction system according to an embodiment of the present invention. Users access the associated search query extraction system 100 through a wired or wireless communication network using the user terminal 110a or 110b and input a search query. The association search query extraction system 100 transmits an association search query corresponding to the search query to the user terminal 110a or 110b. In addition, the association search query extraction system 100 according to another embodiment of the present invention may further transmit the ranking information according to the association index information of the association search query to the user terminal 110a or 110b. The related search query extracting system of the present invention can be integrated and operated in the Internet search service system. Therefore, when the user accesses the Internet search service system and inputs a search query, the search result for the search query is provided to the user. The associated search query of the search query may also be provided to the user.

도 2는 본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법을 도시한 흐름도이다. 본 실시예에 따른 연관 검색 쿼리 추출 방법은 소정의 연관 검색 쿼리 추출 시스템에서 제공된다.2 is a flowchart illustrating a method of extracting a related search query according to an embodiment of the present invention. The related search query extraction method according to the present embodiment is provided in a predetermined related search query extraction system.

단계(201)에서 본 발명에 따른 연관 검색 쿼리 추출 시스템은 검색 세션(session) 및 상기 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 레코드를 포함하는 데이터베이스를 유지한다. In step 201 the associative search query extraction system according to the invention maintains a database containing a search session and a record relating to a search query received from a user terminal during the search session.

본 발명의 일실시예에 따르면, 상기 검색 세션은 상기 사용자 단말기로 검색창이 최초로 제공될 때 설정되고, 소정의 시간 동안 상기 사용자 단말기로부터 데이터 전송이 없을 때 종료되며, 상기 검색 세션의 종료 후 상기 사용자 단말기로부터 새로운 검색 쿼리를 수신한 경우에는 새로운 검색 세션을 시작하는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다.According to an embodiment of the present invention, the search session is established when a search box is first provided to the user terminal, terminates when there is no data transmission from the user terminal for a predetermined time, and after the search session is terminated. If a new search query is received from the terminal, a related search query extraction method is provided, which starts a new search session.

본 실시예에 있어서, 상기 검색 세션은 종래의 검색 세션과는 다른 방식으로 설정된 새로운 방식의 검색 세션이다. 종래의 검색 세션은 단순히 임의의 사용자가 소정의 시간 동안 수행한 검색 활동을 의미하는 것으로, 검색 세션이 설정되어 종료되기까지의 시간이 사전에 결정되어 있었다. 예를 들어, 상기 시간이 10분이라고 하면, 사용자가 10분 동안 검색 활동을 수행한 순간 종래의 검색 세션은 종료되고, 상기 종료 후 10분 동안의 검색 활동은 새로운 검색 세션에서 수행된 것으로 보는 것이 종래의 검색 세션 설정 방식이다. In the present embodiment, the search session is a new type of search session set up in a different manner from the conventional search session. The conventional search session simply means a search activity performed by a user for a predetermined time, and the time until the search session is established and terminated is predetermined. For example, assuming that the time is 10 minutes, the conventional search session is terminated when the user performs a search activity for 10 minutes, and the search activity for 10 minutes after the termination is performed in a new search session. It is a conventional search session establishment method.

이에 반해, 본 실시예에 있어서의 검색 세션은 사용자 단말기로 검색창이 최초로 제공될 때 설정되고, 소정의 시간 동안 상기 사용자 단말기로부터 데이터 전송이 없을 때 종료되는 것을 특징으로 한다. 예를 들어, 상기 시간이 5분이라고 하면, 사용자가 사용자 단말기를 통하여 검색 서비스 웹 페이지에 접속하여 검색창이 최초로 열릴 때 상기 검색 세션이 시작되고, 상기 사용자가 검색 쿼리를 입력하거나 검색 결과를 선택하는 등의 검색 활동을 수행한 "최종 시각"으로부터 5분 동안 검색 활동을 전혀 수행하지 않고 대기하는 경우 상기 검색 세션이 종료된다. 즉, 사용자의 최종 검색 활동 시각으로부터 5분 동안 사용자가 아무 액션을 취하지 않고 대기하는 경우 검색 세션이 종료되도록 하는 것이다. 또한, 상기 검색 세션의 종료 후 사용자 단말기로부터 새로운 검색 쿼리를 수신한 경우에는 새로운 검색 세션을 시작한다. In contrast, the search session in this embodiment is set when the search window is first provided to the user terminal, and is terminated when there is no data transmission from the user terminal for a predetermined time. For example, if the time is 5 minutes, the search session is started when a user accesses a search service web page through a user terminal and the search window is first opened, and the user inputs a search query or selects a search result. The search session is terminated when waiting for 5 minutes without performing any search activity from the "last time" of performing the search activity. That is, if the user waits without taking any action for 5 minutes from the time of the user's last search activity, the search session is terminated. In addition, when a new search query is received from the user terminal after the termination of the search session, a new search session is started.

검색 쿼리를 이용한 검색 결과가 사용자를 만족시키지 못할 경우, 상기 사용자는 상기 검색 쿼리와 관련된 다른 검색 쿼리를 입력하여 검색 활동을 수행하는 것이 일반적이므로, 본 실시예에 따른 연관 검색 쿼리 추출 시스템은 최종 검색 활동 시각으로부터 소정의 시간 동안 사용자가 아무 액션을 취하지 않고 대기한 것을 이미 특정 주제와 관련한 검색 활동을 마친 것으로 판단한다. 따라서, 상기 연관 검색 쿼리 추출 시스템은 상기 시간이 경과한 후 새로운 검색 쿼리가 입력된 경우, 새로운 특정 주제에 관한 검색 쿼리를 입력한 것으로 보고, 새로운 세션을 시작한다. When a search result using a search query does not satisfy the user, the user generally inputs another search query related to the search query to perform a search activity. Therefore, the related search query extraction system according to the present embodiment performs a final search. It is determined that the user waits without taking any action for a predetermined time from the activity time and finishes the search activity related to the specific subject. Therefore, if the new search query is input after the elapsed time, the associated search query extraction system regards the search query related to the new specific subject as input and starts a new session.

한편, 아무리 긴 시간이 경과하더라도 사용자로부터 계속적인 검색 활동이 수행되는 경우, 본 실시예에 따른 연관 검색 쿼리 추출 시스템은 상기 사용자가 하나의 의도에 따라 특정 주제와 관련된 다양한 검색 쿼리를 입력하고 있다고 판단한다. 따라서, 상기 연관 검색 쿼리 추출 시스템은 이러한 경우 새로운 검색 세션을 시작하지 않고, 하나의 검색 세션을 유지한다.On the other hand, even when a long time elapses, when a continuous search activity is performed by a user, the related search query extraction system according to the present embodiment determines that the user inputs various search queries related to a specific subject according to one intention. do. Thus, the associated search query extraction system does not start a new search session in this case, and maintains one search session.

검색 세션의 시작에서 종료까지 시간을 고정적으로 정해 놓고 그 검색 세션에서 등장한 검색 쿼리를 모두 연관 검색 쿼리로 등록하는 종래의 방식과 비교할 때, 본 실시예에 의하면 검색 세션을 상기와 같은 새로운 방식으로 정의함으로써, 동일한 검색 세션에서 수신한 검색 쿼리 간 연관성을 매우 높은 정도로 신뢰할 수 있는 효과를 얻을 수 있다. Compared to the conventional method of fixedly setting the time from start to end of a search session and registering all search queries appearing in the search session as associative search queries, according to the present embodiment, the search session is defined in the above-described new manner. As a result, the correlation between search queries received in the same search session can be reliably obtained to a very high degree.

또한, 본 발명에 따르면, 동일한 검색 세션에서 수신한 검색 쿼리를 모두 연관 검색 쿼리로 등록하는 것이 아니라, 후술하듯이 검색 쿼리의 쌍이 등장한 검색 세션의 수를 카운팅하여 상기 검색 쿼리 간 연관 여부를 판단할 수 있는 하나의 인자로 활용함으로써, 검색 쿼리 간 연관 여부를 보다 정확하게 판단할 수 있는 연관 검색 쿼리 추출 방법 및 시스템이 제공된다.In addition, according to the present invention, instead of registering all the search queries received in the same search session as the related search query, as described below, by counting the number of search sessions in which a pair of search queries appeared, it is determined whether the search queries are related to each other. By utilizing as one of the possible factors, an association search query extraction method and system for more accurately determining whether an association between search queries is provided is provided.

이하, 상기와 같은 검색 세션의 새로운 정의를 이용하여, 검색 세션 및 검색 쿼리에 관한 레코드를 포함하는 데이터베이스를 유지하는 단계(201)의 일실시예에 대하여 상세히 설명한다. 도 3은 본 실시예에 따라 데이터베이스를 유지하는 과정을 도시한 흐름도이다.Hereinafter, an embodiment of the step 201 of maintaining a database including records relating to the search session and the search query will be described in detail using the new definition of the search session as described above. 3 is a flowchart illustrating a process of maintaining a database according to the present embodiment.

단계(301)에서 본 실시예에 따른 연관 검색 쿼리 추출 시스템은 제1 검색 세션과 연관된 제1 검색 세션 식별자를 생성하여 상기 데이터베이스에 기록한다. 상기 연관 검색 쿼리 추출 시스템은 사용자로부터 검색 활동이 이루어질 때마다 상기 제1 검색 세션 식별자 및 각각의 검색 시간에 관한 시각 정보를 사용자 단말기로 송신하고, 상기 사용자 단말기는 수신된 제1 검색 세션 식별자 및 상기 시각 정보를 쿠키(cookie)의 형태로 상기 사용자 단말기 내 소정의 위치에 저장할 수 있다.In step 301, the associated search query extraction system according to the present embodiment generates a first search session identifier associated with a first search session and records it in the database. The associated search query extraction system transmits the first search session identifier and visual information about each search time to a user terminal whenever a search activity is made from a user, and the user terminal receives the received first search session identifier and the The visual information may be stored in a predetermined position in the user terminal in the form of a cookie.

상기 사용자로부터 최종적인 검색 활동이 이루어진 경우, 단계(302)에서 상기 연관 검색 쿼리 추출 시스템은 상기 제1 검색 세션 식별자 및 최종 검색 시각에 관한 제1 시각 정보를 사용자 단말기로 송신하고, 상기 사용자 단말기는 상기 제1 검색 세션 식별자 및 상기 제1 시각 정보를 역시 쿠키(cookie)의 형태로 상기 사용자 단말기 내 소정의 위치에 저장할 수 있다.When a final search activity is made from the user, in step 302, the association search query extraction system transmits the first search session identifier and first time information about the last search time to the user terminal. The first search session identifier and the first visual information may also be stored in a predetermined position in the user terminal in the form of a cookie.

단계(303)에서 상기 연관 검색 쿼리 추출 시스템은 상기 사용자 단말기로부터 검색 쿼리를 수신하고, 단계(304)에서 상기 검색 쿼리가 수신된 제2 시각 정보 및 상기 제1 시각 정보를 비교한다.In step 303, the association search query extraction system receives a search query from the user terminal, and in step 304 compares the second time information and the first time information from which the search query was received.

단계(305)에서 상기 비교 결과 양 시각 정보 간 격차가 소정의 시간을 초과한 것으로 판단된 경우, 상기 연관 검색 쿼리 추출 시스템은 단계(306)에서 제2 검색 세션과 연관된 제2 검색 세션 식별자를 생성하고 단계(307)에서 상기 제2 검색 세션 식별자 및 상기 수신된 검색 쿼리에 관한 레코드를 상기 데이터베이스에 기록한다.If it is determined in step 305 that the gap between the two pieces of time information exceeds a predetermined time, the associated search query extraction system generates a second search session identifier associated with the second search session in step 306. And in step 307 record a record relating to the second search session identifier and the received search query to the database.

한편, 단계(305)에서 양 시각 정보 간 격차가 소정의 시간 이하인 것으로 판단된 경우, 상기 연관 검색 쿼리 추출 시스템은 단계(308)에서 상기 제1 검색 세션 식별자와 연관하여 상기 수신된 검색 쿼리에 관한 레코드를 상기 데이터베이스에 기록한다.On the other hand, if it is determined in step 305 that the gap between the two pieces of time information is less than or equal to a predetermined time, the associated search query extraction system is configured in step 308 for the received search query in association with the first search session identifier. Write a record to the database.

본 실시예에 따른 연관 검색 쿼리 추출 방법에 의하면, 상기와 같은 체계적인 과정을 통하여 데이터베이스를 유지함으로써 검색 세션을 효율적으로 관리할 수 있고, 이렇게 효율적으로 관리된 검색 세션을 이용할 경우, 동일한 검색 세션에서 수신한 검색 쿼리 간 연관성을 매우 높은 정도로 신뢰할 수 있는 효과를 얻을 수 있다.According to the related search query extraction method according to the present embodiment, it is possible to efficiently manage a search session by maintaining a database through the systematic process as described above, and when the search session is efficiently managed, it is received in the same search session. You can achieve a highly reliable effect on the relevance between one search query.

도 4는 본 발명의 일실시예에 있어서, 데이터베이스에 포함된 레코드의 일례를 도시한 도면이다. 도 4에 도시한 것과 같이, 상기 레코드는 검색 세션 식별자(401) 및 상기 검색 세션 식별자(401)와 연관된 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 정보를 포함할 수 있다. 도 4에 도시된 도면 부호(402)를 참조하면, "sessionId1"이라는 검색 세션 식별자 및 "sessionId1"이라는 검색 세션 식별자가 할당된 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리인 "박찬호", "메이저리그", "야구" 등이 레코드에 기록되어 있음을 알 수 있다.4 is a diagram illustrating an example of a record included in a database according to one embodiment of the present invention. As shown in FIG. 4, the record may include a search session identifier 401 and information about a search query received from a user terminal during a search session associated with the search session identifier 401. Referring to reference numeral 402 illustrated in FIG. 4, "Park Chan-Ho", "Major League", which is a search query received from a user terminal during a search session to which a search session identifier "sessionId1" and a search session identifier "sessionId1" are assigned. It can be seen that "," baseball "and the like are recorded in the record.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 상기 레코드를 소정의 시간 간격 마다 생성하여 상기 데이터베이스에 기록한다. 상기 시간 간격은 서비스 운영자에 의해서 "하루", "이틀", "일주일" 등과 같이 사전에 결정되어 있을 수 있고, 상기 서비스 운영자는 기존의 시간 간격을 다른 시간 간격으로 변경할 수도 있다. 본 실시예에 의하면, 일정한 시간 간격에 따라 데이터를 수합하여 검색 쿼리 간 연관 여부를 판단함에 있어 상기 데이터를 활용할 수 있게 되고, 시간의 경과에 따라 변경될 수 있는 검색 쿼리 간 연관 여부를 지속적으로 체크할 수 있게 된다. 예를 들어, 하루 간격으로 상기 레코드를 생성하는 경우, 이틀 전 레코드에는 "박찬호"와 "메이저리그"라는 검색 쿼리가 동시에 수신된 검색 세션이 다수 기록되어 있어 이에 기초하여 어제의 연관 검색 쿼리 추출 시스템은 "박찬호"와 "메이저리그"를 연관 검색 쿼리로 판단했을 수 있지만, 어제 레코드에는 "박찬호"와 "메이저리그"가 동시에 수신된 검색 세션이 거의 기록되지 않아 오늘의 연관 검색 쿼리 추출 시스템은 이에 기초하여 "박찬호"와 "메이저리그"를 연관 검색 쿼리가 아닌 것으로 판단할 수 있다. 따라서, 가장 최근의 데이터를 바탕으로 추출된 연관 검색 쿼리를 사용자에게 제공할 수 있는 장점이 있다.According to an embodiment of the present invention, the associated search query extraction system generates the records at predetermined time intervals and records them in the database. The time interval may be predetermined by the service operator such as "day", "two days", "weekly", etc., and the service operator may change the existing time interval to another time interval. According to the present exemplary embodiment, the data may be utilized in determining whether the search queries are related to each other by collecting data according to a predetermined time interval, and continuously checking whether the search queries may change over time. You can do it. For example, if the record is generated at an interval of one day, two days ago, a plurality of search sessions in which the search queries "Park Chan Ho" and "major league" are simultaneously received are recorded, and based on this, the related search query extraction system of yesterday. May have judged "Park Chan-ho" and "major league" as an associative search query, but yesterday's records rarely recorded search sessions that received both "Park Chan-ho" and "major league" at the same time. On the basis of this, it can be determined that "Park Chan-ho" and "major league" are not related search queries. Therefore, there is an advantage that can provide a user with an associated search query extracted based on the most recent data.

본 발명의 일실시예에 따른 연관 검색 쿼리 추출 시스템은 단계(201)에서 상기 검색 세션 또는 상기 수신된 검색 쿼리를 숫자로 매핑(mapping)하고, 상기 매핑된 숫자를 이용하여 상기 레코드를 생성할 수 있다.The association search query extracting system according to an embodiment of the present invention may map the search session or the received search query to a number in step 201 and generate the record using the mapped number. have.

도 5는 본 실시예에 있어서, 검색 세션 및 검색 쿼리를 숫자로 매핑한 레코드의 일례를 도시한 도면이다. 도 5에 도시된 도면 부호(501)을 참조하면, 도 4에 도시된 도면 부호(402)와 비교하여 "sessionId1"이라는 검색 세션 식별자에 "56"이라는 숫자가 매핑되었고, "박찬호", "메이저리그", "야구"의 검색 쿼리에 "18759", "18760", "18761"이라는 숫자가 매핑되었음을 알 수 있다. FIG. 5 is a diagram showing an example of a record in which a search session and a search query are mapped to numbers in this embodiment. Referring to reference numeral 501 illustrated in FIG. 5, a number “56” is mapped to a search session identifier “sessionId1” compared to reference numeral 402 illustrated in FIG. 4, and “Park Chan Ho”, “Major”. It can be seen that the numbers "18759", "18760", and "18761" are mapped to the search query for "league" and "baseball."

문자열로 이루어진 데이터를 이용하여 상호 연관된 검색 쿼리를 추출하는 것과 비교하여, 본 실시예에 따라 숫자로 매핑된 데이터를 이용하여 본 발명에 따른 각 단계를 수행하는 경우, 상기 데이터를 데이터베이스에 기록할 때 메모리를 더 적게 사용하게 되므로 메모리가 절약될 뿐 아니라, 문자열에 비해 그 처리가 매우 간단하므로 처리속도 향상을 꾀할 수 있는 효과를 얻을 수 있다.When performing the steps according to the present invention using the data mapped to the number according to the present embodiment, compared to extracting the correlated search query using the data consisting of a string, when writing the data to the database Using less memory not only saves memory, but also makes processing easier compared to strings, resulting in faster processing speed.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 단계(201)에서 특정 검색 세션 동안 소정의 수를 초과하는 검색 쿼리가 수신된 경우, 상기 특정 검색 세션 및 상기 특정 검색 세션 동안 수신된 검색 쿼리에 관한 레코드는 상기 데이터베이스에 포함시키지 않는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다. 본 실시예에 있어서, 상기 연관 검색 쿼리 추출 시스템은 하나의 검색 세션에서 발생한 검색 쿼리의 수를 카운팅하여 상기 검색 쿼리의 수가 소정의 수를 초과하는 경우, 그 검색 세션에서 수신한 데이터는 일반적인 방식이 아닌 예상치 못한 방식으로 들어온 데이터로 판단할 수 있다. 하나의 검색 세션에서 수신한 검색 쿼리의 수가 너무 많은 경우, 그 검색 쿼리 모두가 연관 검색 쿼리일 가능성은 매우 낮기 때문에, 이러한 데이터는 데이터베이스에 기록하지 않음으로써 보다 정확한 연관 검색 쿼리를 추출해 낼 수 있는 효과를 얻기 위함이다.According to an embodiment of the present invention, the associated search query extracting system may receive the received information during the specific search session and the specific search session when more than a predetermined number of search queries are received during the specific search session in step 201. A related search query extraction method is provided which does not include records relating to a search query in the database. In the present embodiment, the associated search query extraction system counts the number of search queries generated in one search session, and when the number of the search queries exceeds a predetermined number, the data received in the search session is in a general manner. Rather, it can be judged as data coming in unexpected ways. If the number of search queries received in one search session is too large, it is very unlikely that all of those search queries are related search queries, so you can extract more accurate related search queries by not writing this data to the database. To get

단계(202)에서 상기 연관 검색 쿼리 추출 시스템은 상기 데이터베이스를 참조하여 상기 시간 간격 당 설정된 총 검색 세션의 수를 카운팅(counting)하여 총 검색 세션 수 정보를 생성한다. 예를 들어, 상기 시간 간격이 하루인 경우, 상기 연관 검색 쿼리 추출 시스템은 하루 동안 설정된 검색 세션의 총 수를 카운팅할 수 있다.In step 202, the associated search query extraction system counts the total number of search sessions set per time interval with reference to the database to generate total search session number information. For example, if the time interval is one day, the associated search query extraction system may count the total number of search sessions established during the day.

단계(203)에서 상기 연관 검색 쿼리 추출 시스템은 상기 데이터베이스를 참조하여 상기 시간 간격 당 제1 검색 쿼리가 수신된 검색 세션의 수를 카운팅(counting)하여 제1 검색 세션 수 정보를 생성하고, 단계(204)에서 상기 데이터베이스를 참조하여 상기 시간 간격 당 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제2 검색 세션 수 정보를 생성한다. 예를 들어, 상기 시간 간격이 하루인 경우, 상기 연관 검색 쿼리 추출 시스템은 하루 동안 "박찬호"라는 검색 쿼리가 수신된 검색 세션의 수를 카운팅하고, "메이저리그"라는 검색 쿼리가 수신된 검색 세션의 수를 카운팅할 수 있다.In step 203, the associated search query extraction system generates a first search session number information by counting the number of search sessions in which the first search query has been received per time interval with reference to the database, In operation 204, the second search session number information is generated by counting the number of search sessions in which the second search query is received per time interval. For example, if the time interval is one day, the associated search query extraction system counts the number of search sessions in which a search query called "Park Chan-ho" was received during the day, and the search session in which a search query called "major league" was received. You can count the number of.

단계(205)에서 상기 연관 검색 쿼리 추출 시스템은 상기 데이터베이스를 참조하여 상기 시간 간격 당 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 모두 수신된 검색 세션의 수를 카운팅하여 제3 검색 세션 수 정보를 생성한다. 예를 들어, 상기 시간 간격이 하루인 경우, 상기 연관 검색 쿼리 추출 시스템은 하루 동안 "박찬호"라는 검색 쿼리 및 "메이저리그"라는 검색 쿼리가 모두 수신된 검색 세션의 수를 카운팅할 수 있다.In step 205, the associated search query extraction system counts the number of search sessions in which both the first search query and the second search query have been received per time interval, referring to the database, to obtain third search session number information. Create For example, when the time interval is one day, the associated search query extraction system may count the number of search sessions in which both a search query called "Park Chan Ho" and a search query called "major league" are received during the day.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 단계(205)에서 상기 제1 검색 세션 수 정보 및 상기 제2 검색 세션 수 정보가 소정의 수 이상인 경우에 한하여 상기 제3 검색 세션 수 정보를 생성하는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다. 즉, 본 실시예에 있어서, 상기 연관 검색 쿼리 추출 시스템은 상기 제1 검색 세션 수 정보 또는 상기 제2 검색 세션 수 정보가 소정의 수에 미치지 못하는 경우 상기 제3 검색 세션 수 정보를 생성하지 않을 수 있다. 각각의 검색 쿼리가 수신된 검색 세션의 수가 너무 적은 경우, 이러한 검색 쿼리들은 서로 연관 검색 쿼리가 될 가능성이 매우 낮기 때문에, 소정의 수 이상 등장하지 못하는 검색 쿼리들은 연관 검색 쿼리를 판단하기 위한 데이터로 생성하지 않음으로써, 상기 연관 검색 쿼리 추출 시스템의 수행 속도 향상에 크게 기여하는 효과를 얻을 수 있다. According to an embodiment of the present invention, the associated search query extraction system determines the number of the third search session only when the first search session number information and the second search session number information are greater than or equal to a predetermined number in step 205. An associated search query extraction method is provided which generates information. That is, in the present embodiment, the associated search query extraction system may not generate the third search session number information when the first search session number information or the second search session number information does not reach a predetermined number. have. If each search query receives too few search sessions, these search queries are very unlikely to be related search queries, so search queries that do not appear above a predetermined number are used to determine the associated search query. By not generating, it is possible to obtain an effect that greatly contributes to improving the performance of the associated search query extraction system.

본 발명의 일실시예에 따르면, 단계(205)에서 검색 세션 수를 카운팅함에 있어서 해쉬 트리(Hash-tree) 자료구조를 이용하는 연관 검색 쿼리 추출 방법이 제공된다. According to one embodiment of the present invention, an associated search query extraction method using a hash-tree data structure in counting the number of search sessions is provided in step 205.

해쉬 트리(Hash-tree) 자료구조라 함은 데이터를 저장하고 찾는 데 사용되는 자료 구조의 한 종류로서, 찾고자 하는 문자열을 특정한 함수(Hash function)로 처리하여 얻은 값을 이용하여 데이터의 위치를 찾는 방법으로 알려져 있다. 해쉬 트리(Hash-tree) 자료구조는 데이터를 찾는 속도에 데이터의 개수가 거의 영향을 주지 않는 특성을 지니고 있어, 이를 이용할 경우 효율적이고 빠르게 데이터의 위치를 찾을 수 있을 뿐만 아니라 시스템 내 메모리를 크게 절약할 수 있다.Hash-tree data structure is a kind of data structure used to store and find data. It is a method of finding the location of data by using a value obtained by processing a string to be searched by a specific function. Known as The hash-tree data structure has a characteristic that the number of data has little influence on the speed of finding data, which can not only locate data efficiently and quickly, but also greatly save memory in the system. can do.

도 6은 본 실시예에 있어서, 검색 세션 수를 카운팅하는 데 사용되는 해쉬 트리 자료구조의 일례를 도시한 도면이다. 도 6에서는 "대장금"이라는 검색 쿼리와 "이영애"라는 검색 쿼리로 이루어진 검색 쿼리의 쌍이 수신된 검색 세션이 상기 데이터베이스에 존재할 경우, 상기 검색 세션의 수를 해쉬 트리 자료구조를 사용하여 카운팅하는 일례가 도시되어 있다. 6 is a diagram showing an example of a hash tree data structure used for counting the number of search sessions in this embodiment. FIG. 6 illustrates an example of counting the number of search sessions using a hash tree data structure when a search session including a search query composed of a search query "Dae Jang Geum" and a search query "Lee Young Ae" exists in the database. Is shown.

단계(206)에서 상기 연관 검색 쿼리 추출 시스템은 상기 제1 검색 세션 수 정보 및 상기 제3 검색 세션 수 정보를 이용하여 조건부 확률(conditional probability) 정보를 생성한다.In step 206, the associated search query extraction system generates conditional probability information using the first search session number information and the third search session number information.

상기 조건부 확률 정보는 검색 쿼리 간 연관 여부를 평가하는 하나의 인자로서 활용될 수 있다. 예를 들어, "대장금"이라는 검색 쿼리 및 "이영애"라는 검색 쿼리 간 연관 여부는, "대장금"이 등장한 검색 세션의 수 중 "이영애"가 등장한 검색 세션의 수가 얼마나 되는가에 관한 확률 정보를 하나의 인자로 활용하여 판단될 수 있다. 즉, "대장금"이 등장한 검색 세션에 "이영애"가 다수 등장했다면, 양 검색 쿼리 간 연관 여부를 판단함에 있어 강한 영향을 미칠 수 있는 하나의 인자로서 활용될 수 있는 것이다. The conditional probability information may be used as one factor for evaluating correlation between search queries. For example, the association between a search query called "Dae Jang Geum" and a search query called "Lee Young Ae" can provide a single piece of probability information about the number of search sessions in which "Dae Jang Geum" appeared. Can be determined by using as a factor. In other words, if a large number of "Lee Young-ae" appeared in the search session that "Daejanggeum" appeared, it can be used as a factor that can have a strong influence in determining the relationship between the two search queries.

다음은 상기 조건부 확률 정보를 생성하는 데 이용될 수 있는 수식의 일례를 나타낸 것이다. The following shows an example of a formula that can be used to generate the conditional probability information.

<수식 1. 조건부 확률 정보><Equation 1. Conditional Probability Information>

수식 1에서 보는 것과 같이, "A" 검색 쿼리와 "B" 검색 쿼리 모두가 같은 검색 세션에 등장할 확률을 "B" 검색 쿼리가 검색 세션에 등장할 확률로 나눔으로써, "B" 검색 쿼리가 등장한 세션에 "A" 검색 쿼리도 등장할 확률 정보를 생성할 수 있다.As shown in Equation 1, the "B" search query is divided by the probability that both "A" and "B" search queries appear in the same search session by the probability that the "B" search query appears in the search session. Probability information may also be generated for an "A" search query to appear in the appeared session.

단계(207)에서 상기 연관 검색 쿼리 추출 시스템은 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성한다.In step 207, the association search query extraction system correlates using the total search session number information, the first search session number information, the second search session number information, and the third search session number information. ) Generate information.

상기 상관 관계 정보는 검색 쿼리 간 연관 여부를 평가하는 또 하나의 인자로서 활용될 수 있다. 사용자로부터 입력되는 빈도가 매우 큰 검색 쿼리의 경우 실제적인 연관성은 없어도 상기 생성된 조건부 확률 정보의 값이 높을 수 있기 때문에, 상기 상관 관계 정보를 검색 쿼리 간 연관 여부를 평가하는 또 하나의 인자로서 활용하는 것은 보다 정확하게 연관 여부를 판단하는 데 크게 기여할 수 있다.The correlation information may be used as another factor for evaluating whether the search query is related to each other. In the case of a search query with a high frequency input from a user, the generated conditional probability information may be high even though there is no actual correlation. Therefore, the correlation information is used as another factor for evaluating the correlation between search queries. Doing so can greatly contribute to determining whether or not the association is more accurate.

확률 이론 중에는 독립성 판단이라는 것이 존재하는데, 상기 상관 관계 정보는 상기 독립성 판단에 활용된다. 즉, 상기 상관 관계 정보가 1에 가까운 값을 갖는 경우 이를 양 검색 쿼리가 연관되어 있지 않고 독립적이라고 판단할 수 있는 강한 인자로서 활용할 수 있고, 상기 상관 관계 정보가 1보다 상당히 큰 값을 갖는 경우 이를 양 검색 쿼리가 상호 연관되어 있다고 판단할 수 있는 강한 인자로서 활용할 수 있다. In probability theory, there is an independence judgment, and the correlation information is used for the independence judgment. That is, when the correlation information has a value close to 1, it can be used as a strong factor that can be determined that both search queries are not related and independent, and when the correlation information has a value significantly larger than 1, it is used. It can be used as a strong factor to determine that both search queries are correlated.

다음은 상기 상관 관계 정보를 생성하는 데 이용될 수 있는 수식의 일례를 나타낸 것이다. The following shows an example of a formula that can be used to generate the correlation information.

<수식 2. 상관 관계 정보>Equation 2. Correlation Information

수식 2에서 보는 것과 같이, "A" 검색 쿼리와 "B" 검색 쿼리가 모두 같은 검색 세션에 등장할 확률을 "A" 검색 쿼리가 검색 세션에 등장할 확률과 "B" 검색 쿼리가 검색 세션에 등장할 확률을 곱한 값으로 나누어줌으로써, 상관 관계 정보를 생성할 수 있다. 또한, 상기 수식을 전개할 경우 상기 상관 관계 정보는 "A" 검색 쿼리 및 "B" 검색 쿼리가 모두 등장한 검색 세션의 수에 총 검색 세션 수를 곱한 값을 "A" 검색 쿼리가 등장한 검색 세션의 수와 "B" 검색 쿼리가 등장한 검색 세션의 수로 나누어준 값으로 산정된다. 따라서, 상기 연관 검색 쿼리 추출 시스템은 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성할 수 있게 된다.As shown in Equation 2, the probability that both "A" and "B" search queries appear in the same search session is determined by the probability that "A" search query appears in the search session and "B" search query in the search session. The correlation information may be generated by dividing the probability of appearance by the product of multiplication. In addition, when the formula is developed, the correlation information may be calculated by multiplying the total number of search sessions by the number of search sessions in which both the "A" search query and the "B" search query are present and the number of search sessions in which the "A" search query appears. It is calculated by dividing the number by the number of search sessions in which the "B" search query appeared. Accordingly, the associated search query extraction system may generate correlation information by using the total search session number information, the first search session number information, the second search session number information, and the third search session number information. It can be created.

단계(208)에서 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단한다. 예를 들어, 양 정보에 소정의 수치를 곱하고 이를 더하여 일정 지수를 산정하고, 상기 지수를 이용하여 연관 여부를 판단할 수 있다. 또한, 양 정보에 소정의 수치를 곱하고 이를 다시 곱하여 일정 지수를 산정하고, 상기 지수를 이용하여 연관 여부를 판단하는 것도 가능하다. 상기 방법 이외에도, 상기 조건부 확률 정보 및 상기 상관 관계 정보를 이용하여 연관 여부를 판단하는 방법에는 다양한 실시예가 존재할 수 있음은 본 발명이 속하는 기술분야의 당업자에게 있어 자명하다.In step 208, the association search query extraction system determines whether the first search query and the second search query are related based on the conditional probability information and the correlation information. For example, a predetermined index may be calculated by multiplying both pieces of information by a predetermined value, and determining whether the index is related by using the index. In addition, it is also possible to calculate a certain index by multiplying both pieces of information by a predetermined value and multiplying it again, and determining whether or not it is related using the index. In addition to the above method, it is apparent to those skilled in the art that various embodiments may exist in the method for determining the association using the conditional probability information and the correlation information.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 단계(208)에서 상기 조건부 확률 정보가 소정의 수치 이상인 경우에 한하여 상기 연관 여부를 판단하는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다.According to an embodiment of the present invention, the related search query extraction system provides a related search query extraction method, characterized in that in step (208) determines the association only when the conditional probability information is a predetermined value or more. do.

본 실시예에 있어서, 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보가 소정의 수치에 미달될 정도로 매우 낮은 수치에 해당하는 경우, 이미 검색 쿼리 간 연관성이 매우 낮다고 판단하여 연관 여부를 판단하지 않음으로써, 불필요한 메모리의 소요를 줄이고 상기 시스템의 수행 속도를 향상시키는 효과를 얻을 수 있다. In the present embodiment, if the conditional probability information is very low so that the conditional probability information falls below a predetermined value, the association search query does not determine whether or not the association by determining that the association is very low In addition, it is possible to obtain an effect of reducing unnecessary memory requirements and improving performance of the system.

본 발명의 일실시예에 따르면, 상기 실시예에서의 상기 소정의 수치는 상기 제1 검색 세션 수 정보를 변수로 하는 소정의 함수에 기초하여 변동되는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다. According to an embodiment of the present invention, an association search query extraction method is provided, wherein the predetermined value in the embodiment is changed based on a predetermined function using the first search session number information as a variable. .

상기 제1 검색 세션 수 정보가 매우 낮은 수치에 해당하는 경우, 상기 조건부 확률 정보가 정상적으로 구해지지 않아, 고정되어 있는 소정의 수치를 훨씬 초과할 가능성이 있기 때문이다. 예를 들어, "A" 검색 쿼리 및 "B" 검색 쿼리가 모두 등장한 검색 세션의 수가 "1"이고, "A" 검색 쿼리가 등장한 검색 세션의 수가 "5"인 경우, 조건부 확률 정보가 "1/5"로 계산되어 매우 높은 수치를 기록할 수 있게 된다. 이 경우, 실제 "A" 검색 쿼리 및 "B" 검색 쿼리 간 연관 정도는 낮음에도 불구하고, 양 검색 쿼리 간 연관성이 인정되어 정확하지 않은 연관 검색 쿼리를 사용자에게 제공할 가능성이 있다. 따라서, 본 실시예와 같이, 상기 소정의 수치를 제1 검색 세션 수 정보에 따라서 변동해야 할 필요성이 있고, 이로써 보다 정확한 연관 검색 쿼리를 사용자에게 제공할 수 있게 된다.This is because if the first search session number information corresponds to a very low value, the conditional probability information may not be obtained normally, which may far exceed a fixed predetermined value. For example, if the number of search sessions in which both the "A" search query and the "B" search query appeared is "1" and the number of search sessions in which the "A" search query appeared is "5", the conditional probability information is "1". It is calculated as / 5 "so that a very high number can be recorded. In this case, although the degree of association between the actual "A" search query and the "B" search query is low, there is a possibility that the association between the two search queries is recognized, thereby providing the user with an incorrect related search query. Therefore, as in the present embodiment, it is necessary to vary the predetermined numerical value according to the first search session number information, thereby providing a more accurate related search query to the user.

본 발명의 일실시예에 따르면, 상기 실시예에서의 상기 소정의 함수는 100 퍼센트(값으로는 "1")를 상기 제1 검색 세션 수 정보의 제곱근으로 나눈 퍼센티지 값을 함수값으로 갖는 함수인 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다. 본 실시예에 따르면, 상기 함수는 "100/√(root)제1 검색 세션 수 정보"로 표현될 수 있다.According to an embodiment of the present invention, the predetermined function in the embodiment is a function having a percentage value obtained by dividing 100 percent ("1" as a value) by the square root of the first search session number information as a function value. An associated search query extraction method is provided. According to the present embodiment, the function may be expressed as "100 / √ (root) first search session number information".

또한, 다른 실시예에 따르면, 상기 소정의 함수는 아래와 같은 일반적인 수식으로 표현될 수도 있다. According to another embodiment, the predetermined function may be expressed by the following general formula.

y(%) = a * x^-b y (%) = a * x ^-b

<수식 3. 함수><Equation 3. Function>

수식 3에서, y는 상기 소정의 수치이고, x는 상기 제1 검색 세션 수 정보이다. 또한, 수식 3에 있어서, 상수 a와 차수 b는 상기 수치를 구하기 위한 최선의 식을 유도하는 실험 과정을 통해서 구해질 수 있다. 예를 들어, a는 80, b는 1의 값이 사용될 수 있다.In Equation 3, y is the predetermined value, and x is the first search session number information. In addition, in Equation 3, the constant a and the degree b may be obtained through an experimental procedure for deriving the best equation for obtaining the numerical value. For example, a may be 80 and b may be 1.

상기와 같은 실시예들에 의하면, 상기 조건부 확률 정보가 소정의 수치 이상인 경우에 한하여 상기 연관 여부를 판단하는 것을 특징으로 하는 본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법에 있어서, 상기 제1 검색 세션 수 정보가 낮은 수치에 해당할 수록 상기 소정의 수치는 높아지므로 상기 연관 검색 쿼리 추출 시스템은 연관 여부를 판단하지 않는 경우가 증가하게 되고, 이로써 불필요한 메모리의 소요를 줄이고 상기 시스템의 수행 속도를 향상시키는 효과를 얻을 수 있다.In the above-described embodiments, the association search query extracting method according to an embodiment of the present invention, wherein the association is determined only when the conditional probability information is equal to or greater than a predetermined value. Since the predetermined number becomes higher as the number of search session information is lower, the related search query extraction system increases the number of cases in which the related search query extraction system does not determine the association, thereby reducing unnecessary memory requirements and speeding up the performance of the system. The effect of improving can be obtained.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 단계(208)에서 상기 상관 관계 정보가 소정의 수치 이상인 경우에 한하여 상기 연관 여부를 판단하는 것을 특징으로 하는 연관 검색 쿼리 추출 방법이 제공된다. According to an embodiment of the present invention, the related search query extraction system provides a related search query extraction method, characterized in that in step 208, the correlation information is determined only when the correlation information is more than a predetermined value. do.

본 실시예에 있어서, 상기 연관 검색 쿼리 추출 시스템은 상기 상관 관계 정보가 소정의 수치에 미달될 정도로 매우 낮은 수치에 해당하는 경우(예를 들어, "1"에 거의 근접한 경우), 이미 검색 쿼리 간 연관성이 매우 낮다고 판단하여 연관 여부를 판단하지 않음으로써, 불필요한 메모리의 소요를 줄이고 상기 시스템의 수행 속도를 향상시키는 효과를 얻을 수 있다. In the present embodiment, the association search query extraction system is already between search queries when the correlation information corresponds to a value that is too low to fall below a predetermined value (for example, when it is almost close to "1"). By determining that the association is very low and not determining whether the association is very low, it is possible to obtain an effect of reducing unnecessary memory requirements and improving performance of the system.

본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법에 의하면, 상기 연관 검색 쿼리 추출 시스템은 양 검색 쿼리가 연관된 것으로 판단된 경우 이를 기록하고, 토글 오류 검사를 수행함으로써 보다 정확한 연관 검색 쿼리를 추출할 수 있는데, 이하 본 실시예에 대하여 설명한다.According to the related search query extraction method according to an embodiment of the present invention, the related search query extraction system records when both search queries are determined to be related, and extracts a more accurate related search query by performing a toggle error check. This embodiment will be described below.

일반적으로 키보드에 있어서 토글키는 하나의 키로 두 가지 이상의 기능을 할 수 있는 키를 의미하는 용어이다. 이러한 토글키의 대표적인 예로는 "Insert", "한/영", "Caps Lock", "Num Lock", "Scroll Lock" 등이 있다. In general, a toggle key in a keyboard refers to a key capable of performing two or more functions with one key. Representative examples of the toggle key include "Insert", "Korean / English", "Caps Lock", "Num Lock", and "Scroll Lock".

본 명세서에서 사용되는 "토글 오류 검사"는 상기 토글키 중 "한/영" 변환키와 연관된 검사에 관한 것이다. 예를 들어, 한글 자판을 통하여 "다음"을 입력하고자 하는 사용자가, "한/영" 변환키의 설정에 따라 영어 자판을 통한 "ekdma"을 입력하는 경우가 있을 수 있다. 상기 예와 같이, 실제적으로 사용자가 "한/영" 변환키의 설정에 따라서 한글로 된 검색 쿼리를 영어 자판으로 입력하거나, 또는 영어로 된 검색 쿼리를 한글 자판으로 입력하게 되는 경우는 빈번히 발생할 수 있다. 이 경우, 사용자는 정확한 검색 쿼리를 다시 입력하게 되는데, 이로 인하여 원래의 검색 쿼리와 토글 오류로 인한 검색 쿼리가 동일한 검색 세션에서 등장하게 되어, 양 검색 쿼리 간 연관성이 인정되는 경우가 발생할 수 있다. 상기 예의 경우, "다음"의 토글 오류인 "ekdma"이 "다음"의 연관 검색 쿼리로 지정될 가능성이 있다. As used herein, "toggle error check" relates to a check associated with a "Korean / English" conversion key of the toggle keys. For example, a user who wants to input "next" through the Korean keyboard may enter "ekdma" through the English keyboard according to the setting of the "Korean / English" conversion key. As shown in the above example, a user may frequently enter a Korean search query in English keyboard or an English search query in Korean keyboard according to the setting of the "Korean / English" conversion key. have. In this case, the user may re-enter the correct search query, which may cause the original search query and the search query due to the toggle error to appear in the same search session, so that the association between the two search queries may be recognized. In the case of the above example, there is a possibility that "ekdma" which is a toggle error of "next" is designated as an associated search query of "next".

토글 오류로 인해 실제적으로 아무 연관이 없는 검색 쿼리가 연관 검색 쿼리로 지정되는 경우, 이를 기록하기 위해 불필요한 메모리가 소요되고, 그로 인해 상기 시스템의 수행 속도가 저하되는 문제점이 발생할 수 있고, 또한 부정확한 연관 검색 쿼리가 사용자에게 제공되어 서비스의 신뢰도가 낮아지는 문제점 등도 발생할 수 있다. 그러나, 본 실시예에 따른 연관 검색 쿼리 추출 시스템은 상기와 같은 토글 오류 검사를 수행함으로써 상술한 문제점을 해결할 수 있다.If a search error is specified as an associative search query due to a toggle error, unnecessary memory is required to record it, which may cause a problem that the performance of the system is slowed down and is also inaccurate. A related search query may be provided to the user, thereby lowering the reliability of the service. However, the related search query extraction system according to the present embodiment can solve the above-mentioned problem by performing the above-described toggle error check.

본 실시예에 따른 연관 검색 쿼리 추출 시스템은 단계(208)에서 연관 여부 판단 결과 연관된 것으로 판단된 경우, 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리를 연관 검색 쿼리로 지정하여 제2 데이터베이스에 기록할 수 있다. If the association search query extraction system according to the present embodiment determines that the association result is determined in step 208, the association search query extraction system designates the first search query and the second search query as an association search query and records the association in the second database. Can be.

또한, 상기 연관 검색 쿼리 추출 시스템은 상기 제2 데이터베이스를 참조하여, 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리에 대한 토글 오류 검사를 수행할 수 있다. 본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 형태소 분석을 이용하여 상기 토글 오류 검사를 수행할 수 있다.The association search query extraction system may perform a toggle error check on the first search query and the second search query by referring to the second database. According to an embodiment of the present invention, the association search query extraction system may perform the toggle error check using morphological analysis.

상기 검사 결과, 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 토글 오류 관계에 있는 경우, 상기 연관 검색 쿼리 추출 시스템은 상기 제2 데이터베이스로부터 상기 연관 검색 쿼리 지정과 연관된 기록을 삭제할 수 있다.If the check result indicates that the first search query and the second search query are in a toggle error relationship, the associated search query extraction system may delete a record associated with the associated search query specification from the second database.

본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법에 의하면, 상기 연관 검색 쿼리 추출 시스템은 검색 쿼리 간 연관 지수 정보를 생성하고 이에 기초하여 연관 여부를 판단하여 연관 검색 쿼리 목록을 사용자에게 제공할 수 있다.According to the related search query extraction method according to an embodiment of the present invention, the related search query extraction system may generate association index information between search queries and determine the association based on this, and provide the related search query list to the user. have.

도 7은 본 실시예에 있어서, 연관 지수 정보를 이용하여 연관 검색 쿼리 목록을 제공하기 위한 과정을 도시한 흐름도이다. 본 실시예에 따른 연관 검색 쿼리 추출 방법에 의하면, 단계(208)은 단계(701) 및 단계(702)를 포함할 수 있다. FIG. 7 is a flowchart illustrating a process for providing a related search query list by using related index information. According to the related search query extraction method according to the present embodiment, step 208 may include step 701 and step 702.

단계(701)에서 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보 및 상기 상관 관계 정보를 이용하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 지수 정보를 생성한다. 예를 들어, 상기 연관 지수 정보는 양 정보에 소정의 수치를 곱하고 이를 더하거나, 양 정보에 소정의 수치를 곱하고 이를 다시 곱하는 방법으로 생성될 수 있다. 상기 방법 이외에도, 상기 조건부 확률 정보 및 상기 상관 관계 정보를 이용하여 상기 연관 지수 정보를 생성하는 방법에는 다양한 실시예가 존재할 수 있음은 본 발명이 속하는 기술분야의 당업자에게 있어 자명하다.In step 701, the association search query extraction system generates association index information between the first search query and the second search query using the conditional probability information and the correlation information. For example, the association index information may be generated by multiplying and adding both information to a predetermined value, or multiplying both information by a predetermined value and multiplying the information. In addition to the above method, it will be apparent to those skilled in the art that various embodiments may exist in the method for generating the correlation index information using the conditional probability information and the correlation information.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보와 연관된 제1 가중치 정보를 생성하고, 상기 상관 관계 정보와 연관된 제2 가중치 정보를 생성하여, 상기 제1 가중치 정보 및 상기 제2 가중치 정보를 이용하여 상기 연관 지수 정보를 생성할 수 있는 연관 검색 쿼리 추출 방법이 제공된다. 이 경우, 상기 제1 가중치 정보 또는 상기 제2 가중치 정보는 소정의 기준에 따라 변동될 수 있다.According to an embodiment of the present invention, the association search query extraction system generates first weight information associated with the conditional probability information and generates second weight information associated with the correlation information, thereby generating the first weight information and An association search query extraction method capable of generating the association index information using the second weight information is provided. In this case, the first weight information or the second weight information may vary according to a predetermined criterion.

본 실시예에 의하면, 상기 조건부 확률 정보가 상기 연관 지수 정보에 미치는 영향 또는 상기 상관 관계 정보가 상기 연관 지수 정보에 미치는 영향을 상황에 따라 적절히 조정할 수 있게 되어, 보다 정확한 연관 지수 정보를 생성할 수 있는 효과를 얻을 수 있다. 또한, 상기 연관 검색 쿼리 추출 시스템은 상기 가중치 정보를 변동할 수도 있다. 예를 들어, 제1 검색 세션 수 정보 또는 제2 검색 세션 수 정보가 너무 낮은 수치에 해당할 경우, 상기 조건부 확률 정보 또는 상기 상관 관계 정보가 매우 커질 수 있고, 이 경우 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보 또는 상기 상관 관계 정보의 영향을 조정함으로써, 보다 정확한 연관 지수 정보를 생성할 수 있게 된다.According to the present exemplary embodiment, the influence of the conditional probability information on the association index information or the correlation information on the association index information can be appropriately adjusted according to a situation, thereby generating more accurate association index information. You can get the effect. In addition, the associated search query extraction system may vary the weight information. For example, when the first search session number information or the second search session number information is too low, the conditional probability information or the correlation information may become very large, in which case the associated search query extraction system may be By adjusting the influence of the conditional probability information or the correlation information, more accurate association index information can be generated.

단계(702)에서 상기 연관 검색 쿼리 추출 시스템은 상기 연관 지수 정보에 기초하여 상기 연관 여부를 판단한다. 예를 들어, 상기 연관 지수 정보가 소정의 수치 이상인 경우 양 검색 쿼리 간 연관성을 인정할 수 있다.In step 702, the association search query extraction system determines whether the association is based on the association index information. For example, when the association index information is equal to or greater than a predetermined value, the association between the two search queries may be recognized.

본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법에 의하면, 상기 연관 검색 쿼리 추출 시스템은 단계(703) 내지 단계(707)을 더 수행할 수 있다.According to the related search query extraction method according to an embodiment of the present invention, the related search query extraction system may further perform steps 703 to 707.

단계(703)에서 상기 연관 검색 쿼리 추출 시스템은 상기 연관 여부 판단 결과, 연관된 것으로 판단된 경우 상기 연관 지수 정보를 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리와 연관하여 제2 데이터베이스에 기록한다.In step 703, the association search query extraction system records the association index information in a second database in association with the first search query and the second search query when it is determined that the association is related.

도 8은 본 실시예에 있어서, 연관 지수 정보가 기록된 제2 데이터베이스의 일례를 도시한 도면이다. 도 8에 도시한 것과 같이, 검색 쿼리의 쌍과 연관하여 연관 지수 정보가 상기 제2 데이터베이스에 기록되어 있다.8 is a diagram showing an example of a second database in which association index information is recorded in this embodiment. As shown in FIG. 8, association index information is recorded in the second database in association with a pair of search queries.

단계(704)에서 상기 연관 검색 쿼리 추출 시스템은 사용자 단말기로부터 제3 검색 쿼리를 수신하고, 단계(705)에서 상기 제2 데이터베이스를 참조하여 상기 제3 검색 쿼리와 연관된 하나 이상의 제4 검색 쿼리를 추출한다.In step 704 the association search query extraction system receives a third search query from a user terminal and in step 705 extracts one or more fourth search queries associated with the third search query with reference to the second database. do.

단계(706)에서 상기 연관 검색 쿼리 추출 시스템은 상기 연관 지수 정보에 따라 상기 추출된 제4 검색 쿼리를 소팅(sorting)하여 연관 검색 쿼리 목록을 생성한다. 예를 들어, 연관 지수 정보가 소정의 수치 이상인 추출된 검색 쿼리만을 소팅하여 상기 연관 검색 쿼리 목록을 생성하는 방법을 이용할 수 있고, 또는 연관 지수 정보의 오름차순 또는 내림차순으로 추출된 검색 쿼리를 소팅하여 상기 연관 검색 쿼리 목록을 생성하는 방법을 이용할 수 있다. 상기 생성 방법 이외에도, 연관 지수 정보에 따라 추출된 검색 쿼리를 소팅(sorting)하여 연관 검색 쿼리 목록을 생성하는 방법에는 다양한 실시예가 존재할 수 있음은 본 발명이 속하는 기술분야의 당업자에게 있어 자명하다.In step 706, the related search query extraction system sorts the extracted fourth search query according to the related index information to generate a related search query list. For example, a method of generating the related search query list by sorting only the extracted search queries whose related index information is equal to or greater than a predetermined value may be used, or by sorting the search queries extracted in ascending or descending order of the related index information. You can use this method to generate a list of related search queries. In addition to the above generation method, it is apparent to those skilled in the art that various embodiments may exist in a method of generating a related search query list by sorting the search query extracted according to the related index information.

단계(707)에서 상기 연관 검색 쿼리 추출 시스템은 상기 생성된 연관 검색 쿼리 목록을 상기 사용자 단말기로 제공한다.In step 707, the related search query extraction system provides the generated related search query list to the user terminal.

본 실시예에 의하면, 사용자는 연관 정도가 높은 검색 쿼리만을 제공 받을 수 있고, 또한 연관 정도가 높거나 낮은 순서로 검색 쿼리를 제공 받을 수도 있게 되어, 제공된 연관 검색 쿼리를 이용하여 자신이 찾고자 하는 정보를 더 신속하게 찾을 수 있는 장점이 있다. According to the present embodiment, the user may be provided with only a high relevance search query, and may also be provided with a search query in the order of high or low relevance. There is an advantage that can be found more quickly.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 검색 쿼리 간 연관 지수 정보를 일정한 시간 간격 마다 생성하고 이를 소정의 데이터베이스에 누적 기록하며, 상기 누적 기록된 연관 지수 정보를 이용하여 검색 쿼리 간 연관 여부를 판단하는 연관 검색 쿼리 추출 방법이 제공된다.According to an embodiment of the present invention, the related search query extraction system generates relevant index information between search queries at predetermined time intervals and accumulates them in a predetermined database, and uses the accumulated recorded related index information to search the query. A method of extracting an association search query for determining whether an association between objects is provided.

도 9는 본 실시예에 있어서, 누적된 연관 지수 정보를 이용하여 연관 여부를 판단하는 과정을 도시한 흐름도이다. 본 실시예에 의하면, 단계(208)은 단계(901) 내지 단계(905)를 포함할 수 있다.FIG. 9 is a flowchart illustrating a process of determining whether an association is made using accumulated association index information. According to the present embodiment, step 208 may include steps 901 to 905.

단계(901)에서 상기 연관 검색 쿼리 추출 시스템은 상기 조건부 확률 정보 및 상기 상관 관계 정보를 이용하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 지수 정보를 제2 시간 간격 마다 생성한다. 이 때, 상기 제2 시간 간격은 검색 세션 및 검색 쿼리에 관한 레코드를 생성하는 상기 소정의 시간 간격과 동일하거나 또는 다를 수 있다.In step 901, the association search query extraction system generates association index information between the first search query and the second search query every second time interval using the conditional probability information and the correlation information. In this case, the second time interval may be the same as or different from the predetermined time interval for generating records relating to the search session and the search query.

단계(902)에서 상기 연관 검색 쿼리 추출 시스템은 상기 생성된 연관 지수 정보를 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리와 연관하여 상기 제2 시간 간격 마다 제2 데이터베이스에 누적 기록한다. In step 902, the association search query extraction system accumulates the generated association index information in association with the first search query and the second search query in a second database at every second time interval.

도 10은 본 실시예에 있어서, 누적 연관 지수 정보 및 그 기록 시점을 기록한 제2 데이터베이스의 일례를 도시한 도면이다. 도 10에 도시한 것과 같이, 상기 제2 데이터베이스에는 검색 쿼리의 쌍, 상기 검색 쿼리의 쌍에 해당하는 누적 연관 지수 정보, 상기 누적 연관 지수 정보가 기록된 각각의 기록 시점이 기록되어 있다. 도 10에서 상기 제2 시간 간격은 "1일"로 되어 있지만, 이는 하나의 예시에 불과하고 다양한 시간 간격 마다 상기 연관 지수 정보를 생성 및 기록할 수 있다.FIG. 10 is a diagram showing an example of a second database in which cumulative correlation index information and its recording time point are recorded in this embodiment. As shown in FIG. 10, each recording time point at which the pair of search queries, cumulative correlation index information corresponding to the pair of search queries, and the cumulative correlation index information are recorded is recorded in the second database. Although the second time interval is “1 day” in FIG. 10, this is only an example and the association index information may be generated and recorded at various time intervals.

단계(903)에서 상기 연관 검색 쿼리 추출 시스템은 상기 제2 데이터베이스를 참조하여 제1 누적 연관 지수 정보 및 제2 누적 연관 지수 정보를 추출하고, 단계(904)에서 상기 제1 누적 연관 지수 정보 및 상기 제2 누적 연관 지수 정보를 이용하여 제2 연관 지수 정보를 생성한다. 예를 들어, 상기 제2 연관 지수 정보는 양 정보에 소정의 수치를 곱하고 이를 더하거나, 양 정보에 소정의 수치를 곱하고 이를 다시 곱하는 방법으로 생성될 수 있다. 상기 방법 이외에도, 상기 제1 누적 연관 지수 정보 및 상기 제2 누적 연관 지수 정보를 이용하여 상기 제2 연관 지수 정보를 생성하는 방법에는 다양한 실시예가 존재할 수 있음은 본 발명이 속하는 기술분야의 당업자에게 있어 자명하다.In step 903, the associated search query extraction system extracts first cumulative correlation index information and second cumulative correlation index information with reference to the second database, and in step 904, the first cumulative correlation index information and the The second association index information is generated using the second cumulative association index information. For example, the second correlation index information may be generated by multiplying and adding both information to a predetermined value, or multiplying both information by a predetermined value and multiplying the information. In addition to the above method, there may be various embodiments in the method of generating the second association index information using the first cumulative association index information and the second cumulative association index information. Self-explanatory

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 상기 제1 누적 연관 지수 정보와 연관된 제1 가중치 정보를 생성하고, 상기 제2 누적 연관 지수 정보와 연관된 제2 가중치 정보를 생성하여, 상기 제1 가중치 정보 및 상기 제2 가중치 정보를 이용하여 상기 제2 연관 지수 정보를 생성할 수 있는 연관 검색 쿼리 추출 방법이 제공된다. 예를 들어, 상기 제2 시간 간격이 "1일"이라고 할 경우, 상기 제1 누적 연관지수 정보 및 상기 제2 누적 연관 지수 정보는 매일 생겨나는 결과물로서 하루만의 상태를 표현하므로, 상기 매일 생겨나는 결과물을 합성하는 방법을 택하는 것이다. 예를 들어, 7일 전 결과물부터 오늘의 결과물까지를 합성하여 최종 결과물로서의 연관 지수 정보를 만들어 낼 수 있다. According to an embodiment of the present invention, the associated search query extraction system generates first weight information associated with the first cumulative correlation index information, and generates second weight information associated with the second cumulative correlation index information. An association search query extraction method capable of generating the second association index information using the first weight information and the second weight information is provided. For example, when the second time interval is "1 day", since the first cumulative correlation index information and the second cumulative correlation index information represent a state of one day as a result that occurs every day, the first cumulative correlation index information occurs. Chooses how to synthesize the result. For example, you can synthesize the results from today's seven days ago to today's results to generate the relevant index information as the final result.

본 실시예에 의하면, 장기간에 거쳐 일정한 연관 관계를 유지하고 있는 검색 쿼리를 사용자에게 제공하여 보다 정확한 연관 검색 쿼리 서비스를 제공 받기를 원하는 사용자의 욕구를 충족시킬 수 있는 효과를 얻을 수 있다.According to the present embodiment, it is possible to provide a user with a search query that maintains a constant association over a long period of time, thereby obtaining an effect of satisfying a user's desire to receive a more accurate related search query service.

본 실시예에 있어서, 상기 제1 가중치 정보는 상기 제1 누적 연관 지수 정보가 상기 제2 데이터베이스에 기록된 시점에 기초하여 생성되고, 상기 제2 가중치 정보는 상기 제2 누적 연관 지수 정보가 상기 제2 데이터베이스에 기록된 시점에 기초하여 생성될 수 있다. 상기 가중치 정보를 생성할 때, 선형적(linear) 방식을 사용할 수도 있고, 비선형적(non-linear) 방식을 사용할 수도 있다. In the present exemplary embodiment, the first weight information is generated based on a time point at which the first cumulative correlation index information is recorded in the second database, and the second weight information is the second cumulative correlation index information. 2 can be generated based on the time points recorded in the database. When generating the weight information, a linear method may be used, or a non-linear method may be used.

본 실시예에 의하면 최근에 생성되어 기록된 누적 연관 지수 정보에 더 가중치를 두어 연관 지수 정보를 생성함으로써, 보다 최근의 연관 관계를 반영한 연관 검색 쿼리를 사용자에게 제공할 수 있게 되어 보다 정확한 연관 검색 쿼리 서비스를 제공 받기를 원하는 사용자의 욕구를 충족시킬 수 있는 효과를 얻을 수 있다.According to the present embodiment, by generating weighted association index information by weighting more recently and recorded cumulative association index information, it is possible to provide a user with an association search query that reflects a more recent association relationship. The effect of satisfying the desire of the user who wants to receive the service can be obtained.

단계(905)에서 상기 연관 검색 쿼리 추출 시스템은 상기 제2 연관 지수 정보에 기초하여 상기 연관 여부를 판단한다. 예를 들어, 상기 제2 연관 지수 정보가 소정의 수치 이상인 경우 양 검색 쿼리 간 연관성을 인정할 수 있다.In step 905, the association search query extraction system determines whether the association is based on the second association index information. For example, when the second correlation index information is equal to or greater than a predetermined value, the correlation between the two search queries may be recognized.

본 발명의 일실시예에 따르면, 상기 연관 검색 쿼리 추출 시스템은 검색 쿼리 간 연관 여부를 판단하여 기록하고, 이를 이용하여 연관 검색 쿼리를 사용자에게 제공할 수 있다. 본 실시예에 따른 연관 검색 쿼리 추출 방법은 단계(209) 내지 단계(212)를 더 포함할 수 있다.According to an embodiment of the present invention, the related search query extraction system may determine and record the association between the search queries, and provide the related search query to the user by using the related search query extraction system. The associated search query extraction method according to the present embodiment may further include steps 209 to 212.

단계(209)에서 상기 연관 검색 쿼리 추출 시스템은 단계(208)에서의 연관 여부 판단 결과, 연관된 것으로 판단된 경우 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리를 연관 검색 쿼리로 지정하여 제2 데이터베이스에 기록한다. In step 209, the related search query extraction system determines that the first search query and the second search query are related search queries to the second database if it is determined that they are related. Record it.

단계(210)에서 상기 연관 검색 쿼리 추출 시스템은 사용자 단말기로부터 제3 검색 쿼리를 수신하고, 단계(211)에서 상기 제2 데이터베이스를 참조하여 상기 제3 검색 쿼리와 연관된 하나 이상의 제4 검색 쿼리를 추출한다. In step 210 the association search query extraction system receives a third search query from a user terminal and in step 211 extracts one or more fourth search queries associated with the third search query with reference to the second database. do.

단계(212)에서 상기 연관 검색 쿼리 추출 시스템은 상기 추출된 제4 검색 쿼리를 상기 사용자 단말기로 제공한다.In step 212, the association search query extraction system provides the extracted fourth search query to the user terminal.

본 발명에 따른 연관 검색 쿼리 추출 방법에 의하면, 상기와 같은 일련의 과정을 통하여, 진정으로 의미 있는 연관 검색 쿼리를 추출하여 보다 질 높은 연관 검색 쿼리 서비스를 사용자에게 제공할 수 있는 효과를 얻을 수 있다.According to the related search query extraction method according to the present invention, through a series of processes as described above, it is possible to extract a truly meaningful related search query to provide a higher quality related search query service to the user. .

또한, 본 발명의 실시예들은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체를 포함한다. 상기 컴퓨터 판독 가능 매체에 기록되는 프로그램은 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. In addition, embodiments of the present invention include computer-readable media containing program instructions for performing various computer-implemented operations. The program recorded on the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.

이하, 본 발명의 또 다른 실시예에 따른 연관 검색 쿼리 추출 시스템에 대하여 설명한다. 도 11은 본 실시예에 따른 연관 검색 쿼리 추출 시스템을 도시한 블록도이다.Hereinafter, a related search query extraction system according to another embodiment of the present invention will be described. 11 is a block diagram illustrating an association search query extraction system according to the present embodiment.

본 실시예에 따른 연관 검색 쿼리 추출 시스템(1100)은 데이터베이스(1101), 데이터베이스 관리 수단(1102), 카운터 수단(1103), 조건부 확률 정보 생성 수단(1104), 상관 관계 정보 생성 수단(1105), 및 연관 여부 판단 수단(1106)을 포함한다.The association search query extraction system 1100 according to the present embodiment includes a database 1101, a database management means 1102, a counter means 1103, conditional probability information generating means 1104, correlation information generating means 1105, and the like. And association means 1106.

데이터베이스(1101)는 검색 세션(session) 및 상기 검색 세션 동안 사용자 단말기로부터 수신된 검색 쿼리에 관한 레코드를 포함한다. The database 1101 includes a record relating to a search session and a search query received from a user terminal during the search session.

상기 검색 세션은 상기 사용자 단말기로 검색창이 최초로 제공될 때 설정되고, 소정의 시간 동안 상기 사용자 단말기로부터 데이터 전송이 없을 때 종료될 수 있다. 또한, 상기 검색 세션의 종료 후 사용자 단말기로부터 새로운 검색 쿼리를 수신한 경우에는 새로운 검색 세션을 시작한다. The search session may be established when a search window is first provided to the user terminal, and may end when there is no data transmission from the user terminal for a predetermined time. In addition, when a new search query is received from the user terminal after the termination of the search session, a new search session is started.

도 3은 데이터베이스(1101)를 유지하는 과정에 관한 일실시예를 도시한 흐름도이고, 도 4는 데이터베이스(1101)에 포함된 레코드의 일례를 도시한 도면이다. 도 3 및 도 4에 관하여는 본 명세서에서 이미 설명한 바 있으므로, 자세한 설명을 생략한다.3 is a flowchart illustrating an embodiment of a process of maintaining the database 1101, and FIG. 4 is a diagram illustrating an example of a record included in the database 1101. Since FIG. 3 and FIG. 4 have already been described herein, a detailed description thereof will be omitted.

본 발명의 일실시예에 따르면, 연관 검색 쿼리 추출 시스템(1100)은 상기 검색 세션 또는 상기 수신된 검색 쿼리를 숫자로 매핑(mapping)하고, 상기 매핑된 숫자를 이용하여 생성된 레코드를 포함하는 데이터베이스(1101)를 유지할 수 있다. 도 5는 검색 세션 및 검색 쿼리를 숫자로 매핑한 레코드의 일례를 도시한 도면이다. 도 5에 관하여는 본 명세서에서 이미 설명한 바 있으므로, 자세한 설명을 생략한다.According to an embodiment of the present invention, the associated search query extraction system 1100 maps the search session or the received search query to numbers, and includes a database including records generated using the mapped numbers. 1110 may be maintained. 5 illustrates an example of a record in which a search session and a search query are mapped to numbers. Since FIG. 5 has already been described herein, a detailed description thereof will be omitted.

데이터베이스 관리 수단(1102)은 상기 레코드를 소정의 시간 간격 마다 생성하여 데이터베이스(1101)에 기록한다.The database management means 1102 generates the record at predetermined time intervals and records the records in the database 1101.

카운터 수단(1103)은 데이터베이스(1101)를 참조하여, 상기 시간 간격 당 설정된 총 검색 세션의 수를 카운팅(counting)하여 총 검색 세션 수 정보를 생성하고, 상기 시간 간격 당 제1 검색 쿼리가 수신된 검색 세션의 수를 카운팅(counting)하여 제1 검색 세션 수 정보를 생성하고, 상기 시간 간격 당 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제2 검색 세션 수 정보를 생성하며, 상기 시간 간격 당 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리가 수신된 검색 세션의 수를 카운팅하여 제3 검색 세션 수 정보를 생성한다.The counter means 1103 generates a total search session number information by counting the total number of search sessions set per time interval, referring to the database 1101, and receives a first search query per time interval. Counting the number of search sessions to generate first search session number information, counting the number of search sessions for which a second search query was received per time interval, to generate second search session number information, and the time The third search session number information is generated by counting the number of search sessions in which the first search query and the second search query have been received per interval.

본 발명의 일실시예에 따르면, 카운터 수단(1103)은 검색 세션 수를 카운팅함에 있어서 해쉬 트리(Hash-tree) 자료구조를 이용할 수 있다. According to one embodiment of the invention, the counter means 1103 may use a hash-tree data structure in counting the number of search sessions.

해쉬 트리(Hash-tree) 자료구조라 함은 데이터를 저장하고 찾는 데 사용되는 자료 구조의 한 종류로서, 찾고자 하는 문자열을 특정한 함수(Hash function)로 처리하여 얻은 값을 이용하여 데이터의 위치를 찾는 방법으로 알려져 있다. 도 6은 본 실시예에 있어서, 검색 세션 수를 카운팅하는 데 사용되는 해쉬 트리 자료구조의 일례를 도시한 도면이다. 도 6에 관하여는 본 명세서에서 이미 설명한 바 있으므로, 자세한 설명을 생략한다.Hash-tree data structure is a kind of data structure used to store and find data. It is a method of finding the location of data by using a value obtained by processing a string to be searched by a specific function. Known as 6 is a diagram showing an example of a hash tree data structure used for counting the number of search sessions in this embodiment. Since FIG. 6 has already been described herein, a detailed description thereof will be omitted.

조건부 확률 정보 생성 수단(1104)은 상기 제1 검색 세션 수 정보 및 상기 제3 검색 세션 수 정보를 이용하여 조건부 확률(conditional probability) 정보를 생성한다.The conditional probability information generating means 1104 generates conditional probability information using the first search session number information and the third search session number information.

상관 관계 정보 생성 수단(1105)은 상기 총 검색 세션 수 정보, 상기 제1 검색 세션 수 정보, 상기 제2 검색 세션 수 정보, 및 상기 제3 검색 세션 수 정보를 이용하여 상관 관계(correlation) 정보를 생성한다.Correlation information generating means 1105 is configured to correlate correlation information using the total search session number information, the first search session number information, the second search session number information, and the third search session number information. Create

연관 여부 판단 수단(1106)은 상기 조건부 확률 정보 및 상기 상관 관계 정보에 기초하여 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리 간 연관 여부를 판단한다. The association determining unit 1106 determines whether the first search query and the second search query are related based on the conditional probability information and the correlation information.

본 발명의 또 다른 실시예에 따른 연관 검색 쿼리 추출 시스템(1120)은 소정의 기록 수단(1107)을 더 포함할 수 있다. 기록 수단(1107)은 상기 제1 검색 세션 수 정보를 변수로 하는 소정의 함수를 기록할 수 있다. Related search query extraction system 1120 according to another embodiment of the present invention may further include a predetermined recording means 1107. The recording means 1107 can record a predetermined function using the first search session number information as a variable.

본 실시예에 있어서, 연관 여부 판단 수단(1106)은 상기 조건부 확률 정보가 소정의 수치 이상인 경우에 한하여 상기 연관 여부를 판단하고, 상기 수치는 기록 수단(1107)에 기록된 상기 함수에 기초하여 변동될 수 있다.In the present embodiment, the correlation determining unit 1106 determines the association only when the conditional probability information is equal to or greater than a predetermined value, and the numerical value is changed based on the function recorded in the recording unit 1107. Can be.

본 실시예는 상기 제1 검색 세션 수 정보가 매우 낮은 수치에 해당하는 경우, 상기 조건부 확률 정보가 정상적으로 구해지지 않아, 고정되어 있는 소정의 수치를 훨씬 초과하는 경우에 대비하기 위한 것이다. 예를 들어, "A" 검색 쿼리 및 "B" 검색 쿼리가 모두 등장한 검색 세션의 수가 "1"이고, "A" 검색 쿼리가 등장한 검색 세션의 수가 "5"인 경우, 조건부 확률 정보가 "1/5"로 계산되어 매우 높은 수치를 기록할 수 있게 된다. 이 경우, 실제 "A" 검색 쿼리 및 "B" 검색 쿼리 간 연관 정도는 낮음에도 불구하고, 양 검색 쿼리 간 연관성이 인정되어 정확하지 않은 연관 검색 쿼리를 사용자에게 제공할 가능성이 있다. 따라서, 본 실시예와 같이, 상기 소정의 수치를 제1 검색 세션 수 정보에 따라서 변동해야 할 필요성이 있고, 이로써 보다 정확한 연관 검색 쿼리를 사용자에게 제공할 수 있게 된다. This embodiment is for the case where the conditional probability information is not normally obtained when the first search session number information corresponds to a very low value, and far exceeds a fixed predetermined value. For example, if the number of search sessions in which both the "A" search query and the "B" search query appeared is "1" and the number of search sessions in which the "A" search query appeared is "5", the conditional probability information is "1". It is calculated as / 5 "so that a very high number can be recorded. In this case, although the degree of association between the actual "A" search query and the "B" search query is low, there is a possibility that the association between the two search queries is recognized, thereby providing the user with an incorrect related search query. Therefore, as in the present embodiment, it is necessary to vary the predetermined numerical value according to the first search session number information, thereby providing a more accurate related search query to the user.

본 발명의 또 다른 실시예에 따르면, 검색 쿼리 간 연관 여부를 판단하여 기록하고, 이를 이용하여 연관 검색 쿼리를 사용자에게 제공할 수 있는 연관 검색 쿼리 추출 시스템이 제공된다. 본 실시예에 따른 연관 검색 쿼리 추출 시스템은 상기 실시예에 따른 연관 검색 쿼리 추출 시스템(1100)에 추가하여 도면 부호(1130)에 해당하는 소정의 장치들을 더 포함할 수 있다.According to still another embodiment of the present invention, there is provided an association search query extraction system that can determine and record an association between search queries, and provide an association search query to a user using the same. The association search query extraction system according to the present embodiment may further include predetermined devices corresponding to the reference numeral 1130 in addition to the association search query extraction system 1100 according to the embodiment.

도면 부호(1130)에는 제2 데이터베이스(1108), 제2 데이터베이스 관리 수단(1109), 검색 쿼리 수신 수단(1110), 검색 쿼리 추출 수단(1111), 검색 쿼리 제공 수단(1112)이 도시되어 있다.Reference numeral 1130 denotes a second database 1108, a second database management means 1109, a search query receiving means 1110, a search query extracting means 1111, and a search query providing means 1112.

제2 데이터베이스(1108)는 연관 검색 쿼리에 관한 레코드를 포함한다. 도 8은 제2 데이터베이스(1108)의 일례를 도시한 도면이다. 도 8에 관하여는 본 명세서에서 이미 설명한 바 있으므로, 자세한 설명을 생략한다.The second database 1108 includes records related to associative search queries. 8 is a diagram illustrating an example of the second database 1108. Since FIG. 8 has already been described herein, a detailed description thereof will be omitted.

제2 데이터베이스 관리 수단(1109)은 연관 여부 판단 수단(1106)이 상기 연관 여부 판단 결과, 연관된 것으로 판단한 경우 상기 제1 검색 쿼리 및 상기 제2 검색 쿼리를 연관 검색 쿼리로 지정하여 제2 데이터베이스(1108)에 기록한다.The second database management means 1109 designates the first search query and the second search query as an association search query when the association determination means 1106 determines that the association result is related, and thus the second database 1108. ).

검색 쿼리 수신 수단(1110)은 사용자 단말기로부터 제3 검색 쿼리를 수신하고, 검색 쿼리 추출 수단(1111)은 제2 데이터베이스(1108)를 참조하여 상기 제3 검색 쿼리와 연관된 하나 이상의 제4 검색 쿼리를 추출한다.The search query receiving means 1110 receives a third search query from the user terminal, and the search query extracting means 1111 refers to the second database 1108 to generate one or more fourth search queries associated with the third search query. Extract.

검색 쿼리 제공 수단(1112)은 상기 추출된 제4 검색 쿼리를 상기 사용자 단말기로 제공한다.The search query providing unit 1112 provides the extracted fourth search query to the user terminal.

본 실시예에 따른 연관 검색 쿼리 추출 시스템에 의하면, 상기와 같은 일련의 과정을 통하여, 진정으로 의미 있는 연관 검색 쿼리를 추출하여 보다 질 높은 연관 검색 쿼리 서비스를 사용자에게 제공할 수 있는 효과를 얻을 수 있다.According to the related search query extraction system according to the present embodiment, through a series of processes as described above, it is possible to extract a truly meaningful related search query to provide a higher quality related search query service to the user. have.

도 12는 본 발명에 따른 연관 검색 쿼리 추출 방법을 수행하는 데 채용될 수 있는 범용 컴퓨터 장치의 내부 블록도이다.12 is an internal block diagram of a general purpose computer device that may be employed to perform the associated search query extraction method in accordance with the present invention.

컴퓨터 장치(1200)는 램(RAM: Random Access Memory)(1020)과 롬(ROM: Read Only Memory)(1230)을 포함하는 주기억장치와 연결되는 하나 이상의 프로세서(1210)를 포함한다. 프로세서(1210)는 중앙처리장치(CPU)로 불리기도 한다. 본 기술분야에서 널리 알려져 있는 바와 같이, 롬(1230)은 데이터(data)와 명령(instruction)을 단방향성으로 CPU에 전송하는 역할을 하며, 램(1220)은 통상적으로 데이터와 명령을 양방향성으로 전송하는 데 사용된다. 램(1220) 및 롬(1230)은 컴퓨터 판독 가능 매체의 어떠한 적절한 형태를 포함할 수 있다. 대용량 기억장치(Mass Storage)(1240)는 양방향성으로 프로세서(1210)와 연결되어 추가적인 데이터 저장 능력을 제공하며, 상기된 컴퓨터 판독 가능 기록 매체 중 어떠한 것일 수 있다. 대용량 기억장치(1240)는 프로그램, 데이터 등을 저장하는데 사용되며, 통상적으로 주기억장치보다 속도가 느린 하드 디스크와 같은 보조기억장치이다. CD 롬(1260)과 같은 특정 대용량 기억장치가 사용될 수도 있다. 프로세서(1210)는 비디오 모니터, 트랙볼, 마우스, 키보드, 마이크로폰, 터치스크린 형 디스플레이, 카드 판독기, 자기 또는 종이 테이프 판독기, 음성 또는 필기 인식기, 조이스틱, 또는 기타 공지된 컴퓨터 입출력장치와 같은 하나 이상의 입출력 인터페이스(1250)와 연결된다. 마지막으로, 프로세서(1210)는 네트워크 인터페이스(1270)를 통하여 유선 또는 무선 통신 네트워크에 연결될 수 있다. 이러한 네트워크 연결을 통하여 상기된 방법의 절차를 수행할 수 있다. 상기된 장치 및 도구는 컴퓨터 하드웨어 및 소프트웨어 기술 분야의 당업자에게 잘 알려져 있다.Computer device 1200 includes one or more processors 1210 coupled with a main memory device including random access memory (RAM) 1020 and read only memory (ROM) 1230. The processor 1210 is also called a central processing unit (CPU). As is well known in the art, the ROM 1230 serves to transfer data and instructions to the CPU unidirectionally, and the RAM 1220 typically transfers data and instructions bidirectionally. Used to. RAM 1220 and ROM 1230 may include any suitable form of computer readable media. Mass storage 1240 is bi-directionally coupled to processor 1210 to provide additional data storage capabilities, and may be any of the computer readable recording media described above. The mass storage device 1240 is used to store programs, data, and the like, and is a secondary memory device such as a hard disk which is generally slower than the main memory device. Certain mass storage devices such as CD ROM 1260 may also be used. The processor 1210 may include one or more input / output interfaces, such as a video monitor, trackball, mouse, keyboard, microphone, touchscreen display, card reader, magnetic or paper tape reader, voice or handwriting reader, joystick, or other known computer input / output device. 1250 is connected. Finally, the processor 1210 may be connected to a wired or wireless communication network through the network interface 1270. Through this network connection, the procedure of the method described above can be performed. The apparatus and tools described above are well known to those skilled in the computer hardware and software arts.

상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있다.The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. As described above, although the present invention has been described with reference to limited embodiments and drawings, the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며, 후술하는 특허 청구의 범위뿐 아니라 이 특허 청구의 범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the claims below, but also by the equivalents of the claims.

본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템에 의하면, 사용자로부터 입력 받은 검색 쿼리에 관한 데이터를 효과적으로 수집, 분석하여 검색 쿼리 간 연관 여부를 자동적으로 판단할 수 있는 시스템을 구축함으로써, 하나의 검색 쿼리와 연관성이 있는 다른 검색 쿼리를 일일이 분류하여 저장함으로 인해 서비스 운영자에게 야기되는 시간적, 경제적 손실을 줄일 수 있는 효과를 얻을 수 있다.According to the related search query extraction method and system according to the present invention, by building a system that can automatically determine whether or not the association between the search query by effectively collecting and analyzing data about the search query received from the user, one search query By sorting and storing other search queries that are related to, you can reduce the time and economic loss caused to service operators.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템에 의하면, 검색 쿼리 간 연관 지수 정보를 체계적으로 기록하는 데이터베이스를 유지함으로써, 사용자로부터 검색 쿼리가 입력된 경우 상기 연관 지수 정보를 이용하여 연관 정도가 더 높은 검색 쿼리를 우선적으로 상기 사용자에게 제공할 수 있는 효과를 얻을 수 있다.In addition, according to the related search query extraction method and system according to the present invention, by maintaining a database that systematically records the association index information between the search query, the degree of association by using the association index information when the search query is input from the user The effect of providing a higher search query to the user first can be obtained.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템에 의하면, 사용자들로부터 입력된 검색 데이터 중 체계적인 전처리 과정을 거친 유용한 데이터만을 추출하여 적절한 수의 연관 검색 쿼리를 유지함으로써, 진정으로 의미 있는 연관 검색 쿼리를 추출하여 보다 질 높은 연관 검색 쿼리 서비스를 사용자에게 제공할 수 있는 효과를 얻을 수 있다.In addition, according to the method and system for extracting the relevant search query according to the present invention, by extracting only the useful data that has undergone a systematic preprocessing process among the search data input from the users to maintain an appropriate number of related search queries, a truly meaningful related search By extracting the query, it is possible to provide the user with a higher quality related search query service.

또한, 본 발명에 따른 연관 검색 쿼리 추출 방법 및 시스템에 의하면, 일정 기간 동안 누적된 데이터를 종합하여 검색 쿼리 간 연관 여부를 판단함으로써, 장기간에 걸쳐 일정한 연관 관계를 유지하고 있는 검색 쿼리를 사용자에게 제공하여 보다 정확한 연관 검색 쿼리 서비스를 제공 받기를 원하는 사용자의 욕구를 충족시킬 수 있는 효과를 얻을 수 있다.In addition, according to the method and system for extracting the relevant search query according to the present invention, by combining the accumulated data for a certain period of time to determine whether the search query is related, providing a search query that maintains a constant association for a long time to the user In this way, the user's desire to receive a more accurate associated search query service can be obtained.

도 1은 본 발명의 일실시예에 따른 연관 검색 쿼리 추출 시스템의 네트워크 연결을 도시한 도면이다.1 is a diagram illustrating a network connection of an association search query extraction system according to an embodiment of the present invention.

도 2는 본 발명의 일실시예에 따른 연관 검색 쿼리 추출 방법을 도시한 흐름도이다.2 is a flowchart illustrating a method of extracting a related search query according to an embodiment of the present invention.

도 3은 본 발명의 일실시예에 있어서, 데이터베이스를 유지하는 과정을 도시한 흐름도이다.3 is a flowchart illustrating a process of maintaining a database according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 있어서, 데이터베이스에 포함된 레코드의 일례를 도시한 도면이다.4 is a diagram illustrating an example of a record included in a database according to one embodiment of the present invention.

도 5는 본 발명의 일실시예에 있어서, 검색 세션 및 검색 쿼리를 숫자로 매핑한 레코드의 일례를 도시한 도면이다.5 illustrates an example of a record in which a search session and a search query are mapped to numbers according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 있어서, 검색 세션 수를 카운팅하는 데 사용되는 해쉬 트리 자료구조의 일례를 도시한 도면이다.FIG. 6 illustrates an example of a hash tree data structure used to count the number of search sessions according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 있어서, 연관 지수 정보를 이용하여 연관 검색 쿼리 목록을 제공하기 위한 과정을 도시한 흐름도이다.FIG. 7 is a flowchart illustrating a process for providing a related search query list using related index information according to an embodiment of the present invention.

도 8은 본 발명의 일실시예에 있어서, 연관 지수 정보가 기록된 제2 데이터베이스의 일례를 도시한 도면이다.8 is a diagram illustrating an example of a second database in which association index information is recorded according to an embodiment of the present invention.

도 9는 본 발명의 일실시예에 있어서, 누적된 연관 지수 정보를 이용하여 연관 여부를 판단하는 과정을 도시한 흐름도이다.9 is a flowchart illustrating a process of determining whether an association is made using accumulated association index information according to an embodiment of the present invention.

도 10은 본 발명의 일실시예에 있어서, 누적 연관 지수 정보 및 기록 시점을 기록한 제2 데이터베이스의 일례를 도시한 도면이다.FIG. 10 illustrates an example of a second database in which cumulative correlation index information and a recording time point are recorded according to an embodiment of the present invention.

도 11은 본 발명의 또 다른 실시예에 따른 연관 검색 쿼리 추출 시스템을 도시한 블록도이다.11 is a block diagram illustrating an association search query extraction system according to another embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

1100 : 연관 검색 쿼리 추출 시스템1100: Association Search Query Extraction System

1101 : 데이터베이스 1102 : 데이터베이스 관리 수단1101: database 1102: database management means

1103 : 카운터 수단 1104 : 조건부 확률 정보 생성 수단1103: counter means 1104: conditional probability information generating means

1105 : 상관 관계 정보 생성 수단 1106 : 연관 여부 판단 수단 1105: correlation information generating means 1106: association determination means

Claims

delete

In a method of extracting a correlated search query,

Maintaining a database comprising a search session and a record relating to a search query received from a user terminal during the search session, wherein the record is generated and recorded in the database at predetermined time intervals;

Generating total search session number information by counting the total number of search sessions set per time interval with reference to the database;

Generating first search session number information by counting the number of search sessions in which the first search query has been received per time interval with reference to the database;

Generating second search session number information by counting the number of search sessions for which a second search query was received per time interval, with reference to the database;

Generating third search session number information by counting the number of search sessions in which the first search query and the second search query have been received per time interval with reference to the database;

Generating conditional probability information using the first search session number information and the third search session number information;

Generating correlation information using the total search session number information, the first search session number information, the second search session number information, and the third search session number information; And

Determining an association between the first search query and the second search query based on the conditional probability information and the correlation information.

Including,

The determining of the association between the first search query and the second search query based on the conditional probability information and the correlation information may include determining whether the association occurs when the conditional probability information is greater than or equal to a predetermined value.

The numerical value is changed based on a predetermined function that decreases as the first search session number information increases.

The search session is established when a search window is provided to the user terminal, terminates when there is no data transmission from the user terminal for a predetermined time, and when a new search query is received from the user terminal after the termination of the search session. Starting a search session

Associated search query extraction method characterized in that.

The method of claim 3,

The step of maintaining a database comprising a search session and a record relating to a search query received from a user terminal during the search session, includes:

Generating and recording a first search session identifier associated with a first search session in the database;

Transmitting first time information about the first search session identifier and the last search time to a user terminal;

Receiving a search query from the user terminal;

Comparing the first visual information with the second visual information from which the search query was received; And

As a result of the comparison, when the gap between the first time information and the second time information exceeds a predetermined time, a second search session identifier associated with a second search session is generated to generate the second search session identifier and the received search. Recording a record relating to the query in the database and recording in the database a record relating to the received search query in association with the first search session identifier if the gap is less than a predetermined time

Association search query extraction method characterized in that it comprises a.

The method of claim 3,

The step of maintaining a database comprising a record relating to the search session and a search query received from a user terminal during the search session,

Mapping the search session or the received search query to a number; And

Generating the record using the mapped number

The method of claim 3,

If more than a predetermined number of search queries are received during a particular search session, records relating to the specific search session and search queries received during the specific search session are not included in the database. .

The method of claim 3,

The step of generating third search session number information by counting the number of search sessions in which the first search query and the second search query have been received per time interval with reference to the database,

And generating the third search session number information only when the first search session number information and the second search session number information are more than a predetermined number.

delete

The method of claim 3,

The determining of the association between the first search query and the second search query based on the conditional probability information and the correlation information may include:

An association search query extraction method, wherein the association is determined only when the correlation information is equal to or greater than a predetermined value.

The method of claim 3,

If it is determined that the association result is related, designating the first search query and the second search query as an association search query and recording the same in a second database;

Performing a toggle error check on the first search query and the second search query by referring to the second database; And

Deleting a record associated with the associated search query designation from the second database if the first search query and the second search query are in a toggle error relationship as a result of the checking.

Association search query extraction method characterized in that it further comprises.

The method of claim 3,

Generating correlation index information between the first search query and the second search query using the conditional probability information and the correlation information; And

Determining whether the association is based on the association index information;

The method of claim 12,

If it is determined that the association result is related, recording the association index information in a second database in association with the first search query and the second search query;

Receiving a third search query from a user terminal;

Extracting one or more fourth search queries associated with the third search query with reference to the second database;

Generating a related search query list by sorting the extracted fourth search query according to the related index information; And

Providing the generated related search query list to the user terminal;

The method of claim 12,

The generating of the correlation index information between the first search query and the second search query based on the conditional probability information and the correlation information may include:

Generating first weight information associated with the conditional probability information;

Generating second weight information associated with the correlation information; And

Generating the association index information using the first weight information and the second weight information.

The method of claim 14,

And the first weight information or the second weight information is changed according to a predetermined criterion.

The method of claim 3,

Generating correlation index information between the first search query and the second search query every second time interval using the conditional probability information and the correlation information;

Accumulating and recording the generated correlation index information in a second database at each second time interval in association with the first search query and the second search query;

Extracting first cumulative correlation index information and second cumulative correlation index information with reference to the second database;

Generating second association index information by using the first cumulative association index information and the second cumulative association index information; And

Determining whether the association is based on the second association index information;

The method of claim 16,

The generating of the second association index information by using the first cumulative association index information and the second cumulative association index information may include:

Generating first weight information associated with the first cumulative correlation index information;

Generating second weight information associated with the second cumulative correlation index information; And

Generating the second association index information using the first weight information and the second weight information.

Including,

The first weight information is generated based on a time point when the first cumulative correlation index information is recorded in the second database, and the second weight information is a time point when the second cumulative correlation index information is recorded in the second database. The associated search query extraction method, characterized in that it is generated based on.

In a method of extracting a correlated search query,

Generating second search session number information by counting the number of search sessions in which the first search query and the second search query have been received per time interval with reference to the database;

Generating conditional probability information using the first search session number information and the second search session number information; And

Determining whether the first search query is related to the second search query based on the conditional probability information;

Including,

The determining of the association between the first search query and the second search query based on the conditional probability information may include determining whether the association is performed when the conditional probability information is equal to or greater than a predetermined value.

Associated search query extraction method characterized in that.

delete

The method according to claim 3 or 18, wherein

Receiving a third search query from a user terminal;

Extracting one or more fourth search queries associated with the third search query with reference to the second database; And

Providing the extracted fourth search query to the user terminal;

19. A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 3-7 and 10-18.

In a system for extracting correlated search queries,

A database comprising a search session and a record relating to a search query received from a user terminal during the search session;

Database management means for generating the record at predetermined time intervals and recording the record in the database;

Referring to the database, the total number of search sessions set per the time interval is counted to generate total search session number information, and the number of search sessions for which the first search query is received per time interval is counted. Generate first search session number information, generate second search session number information by counting the number of search sessions in which the second search query has been received per time interval, and generate the first search query and Counter means for counting the number of search sessions for which the second search query was received to generate third search session number information;

Conditional probability information generating means for generating conditional probability information using the first search session number information and the third search session number information;

Correlation information generating means for generating correlation information using the total search session number information, the first search session number information, the second search session number information, and the third search session number information; And

Associating means for determining association between the first search query and the second search query based on the conditional probability information and the correlation information

Including,

The association determining unit determines whether the association is performed when the conditional probability information is equal to or greater than a predetermined value;

Association search query extraction system, characterized in that.

In a system for extracting correlated search queries,

Referring to the database, the number of search sessions in which the first search query is received per time interval is counted to generate first search session number information, and the first search query and the second search per time interval. Counter means for counting the number of search sessions for which a query was received to generate second search session number information;

Conditional probability information generating means for generating conditional probability information using the first search session number information and the second search session number information; And

Association determination means for determining association between the first search query and the second search query based on the conditional probability information;

Including,

Association search query extraction system, characterized in that.

delete

The method of claim 23 or 24,

A second database containing records relating to the associated search query;

Second database management means for designating the first search query and the second search query as an associated search query and recording the result in the second database when it is determined that the association is related;

Search query receiving means for receiving a third search query from a user terminal;

Search query extracting means for extracting one or more fourth search queries associated with the third search query with reference to the second database; And

Search query providing means for providing the extracted fourth search query to the user terminal

Association search query extraction system further comprises.