KR102080362B1

KR102080362B1 - Query expansion

Info

Publication number: KR102080362B1
Application number: KR1020157001356A
Authority: KR
Inventors: 리 주; 징 동; 윤핑 후앙
Original assignee: 알리바바 그룹 홀딩 리미티드
Priority date: 2012-07-20
Filing date: 2013-07-18
Publication date: 2020-02-21
Also published as: WO2014015176A1; TWI544351B; JP2015526809A; KR20150036117A; JP6247292B2; US20140025701A1; CN103577416A; CN103577416B; US9317550B2; TW201405342A

Abstract

본 개시는 예시적인 쿼리 확장 방법 및 시스템을 제공한다. 사용자에 의해 입력된 쿼리가 수신된다. 쿼리의 정규화된 쿼리가 그 쿼리에 따라서 결정된다. 정규화된 쿼리는 쿼리의 확장 용어로서 사용되어 쿼리 확장을 시행한다. 예를 들면, 사용자의 검색 로그에서 세션 정보가 획득된다. 단일 세션에서 출현하는 모든 쿼리가 획득되고 각 쿼리 마다 득표수가 계산되거나 카운트된다. 단일 쿼리와 타겟 쿼리 간의 득표 유사성이 결정되고 단일 쿼리와 타겟 쿼리 간의 상관 정도가 득표 유사성에 따라서 결정된다. 타겟 쿼리의 정규화된 쿼리는 상관 정도에 따라서 결정된다. 본 기술은 정확하게 쿼리를 확장하고 쿼리 시간을 줄이게 되어, 시스템 응답 속도 및 처리 효율을 개선한다. The present disclosure provides an example query extension method and system. The query entered by the user is received. The normalized query of a query is determined according to that query. Normalized queries are used as an extension term in the query to enforce query expansion. For example, session information is obtained from a user's search log. All queries appearing in a single session are obtained and the vote count is counted or counted for each query. Vote similarity between the single query and the target query is determined, and the degree of correlation between the single query and the target query is determined according to the vote similarity. The normalized query of the target query is determined by the degree of correlation. This technique accurately scales queries and reduces query time, improving system response speed and processing efficiency.

Description

Query extension {QUERY EXPANSION}

관련 특허 출원의 상호 참조Cross Reference of Related Patent Application

본 출원은 2012년 7월 20일, "Query Expansion Method and System (쿼리 확장 방법 및 시스템"이라는 명칭으로 출원한 중국 특허 출원 제 201210254810.0 호의 외국 우선권을 주장하며, 이 출원의 전체는 참조문헌으로 인용된다.This application claims the foreign priority of Chinese Patent Application No. 201210254810.0, filed July 20, 2012, entitled "Query Expansion Method and System," which is incorporated by reference in its entirety. .

기술분야Technical Field

본 개시는 컴퓨터 데이터 처리의 분야에 관한 것으로, 특히, 쿼리 확장 방법 및 시스템에 관한 것이다.
FIELD The present disclosure relates to the field of computer data processing, and more particularly, to query expansion methods and systems.

네트워크 기술의 발전에 따라, 검색 엔진이 연이어 개선되었으며 각종 정보가 이 검색 엔진을 통해 인터넷으로부터 구해질 수 있다. 검색 엔진은 사용자가 인터넷에서 신속하게 정보를 구하는데 도움을 주는 주요 방법 중 하나가 되고 있다. 사용자는 질의 용어(쿼리)를 검색 엔진에 제출하고, 이 검색 엔진은 쿼리에 대응하는 검색 결과를 사용자에게 반환한다. With the development of network technology, search engines have been continuously improved and various information can be obtained from the Internet through these search engines. Search engines have become one of the main ways to help users quickly find information on the Internet. The user submits a query term (query) to the search engine, which returns a search result corresponding to the query to the user.

전자 상거래 웹사이트에서, 더 특별하게는 대형 전자 상거래 웹사이트에서, 사용자는 보통 쿼리를 사용하여 그의/그녀의 희망하는 제품을 검색하고 찾아야 한다. 사용자에 의해 입력된 쿼리는 보통 사용자의 희망에 따라서 구성되므로, 쿼리에 대응하는 결과가 많거나 적어지는 결과를 가져올 수 있고, 그래서 검색 결과의 낮은 정확성과 빈번한 검색을 초래한다. 그래서, 검색 엔진은 보통 쿼리 정보를 풍부하게 하고 사용자에 의해 입력된 쿼리를 지능적으로 최적화하기 위해 쿼리를 확장하거나 재작성할 수 있으며, 그에 따라 검색 결과의 정확성을 개선함과 동시에 사용자에 의한 잦은 검색으로 인해 서버에 가해지는 압력이 줄어들게 된다. In an e-commerce website, more particularly in a large e-commerce website, a user usually has to search and find his / her desired product using a query. Since the queries entered by the user are usually constructed according to the user's wishes, the results corresponding to the query may be high or low, resulting in low accuracy and frequent searches of the search results. Thus, search engines can usually expand or rewrite queries to enrich query information and intelligently optimize queries entered by users, thereby improving the accuracy of search results and at the same time frequenting searches by users. This reduces the pressure on the server.

통상의 쿼리 확장 방법은 쿼리 엔드(a query end)와 인덱스 엔드(an index end)에서 확장을 포함한다. 쿼리 엔드에서 확장은 주로 쿼리의 추가, 대체, 및 삭제를 포함한다. 즉, 특정한 문자나 특정한 부분이 사용자에 의해 입력된 쿼리로부터 추가되거나, 대체되거나 또는 삭제된다. 예를 들면, 만일 사용자에 의해 입력된 쿼리가 "노키아™ 모바일 폰"인 경우, 추가 동작은 쿼리를 "노키아™ N95 모바일 폰"으로 변경하도록 쿼리에 적용할 수 있고, 삭제 동작은 쿼리를 "노키아™" 또는 "모바일 폰"으로 변경하도록 쿼리에 적용할 수 있고, 또는 대체 동작은 쿼리를 "삼성™ 모바일 폰" 또는 "애플™ 모바일 폰" 등으로 변경하도록 쿼리에 적용할 수 있다. 인덱스 엔드에서 확장은 주로 인덱스 엔드에서 쿼리의 동의어 확장을 말한다. 동의어 집합은 보통 통상의 데이터 마이닝(data mining)을 통해 구해진다. 어떤 용어가 출현할 때, 그의 동의어들이 확장을 위해 동의어 집합으로부터 추출된다. 검색 결과 및 쿼리의 정확성을 보장하기 위하여, 쿼리 엔드 및 인덱스 엔드에서 동시적인 확장이 채택될 수 있다. 다시 말해서, 쿼리는 쿼리 엔드 및 인덱스 엔드 양쪽에서 각기 확장되며, 동일한 확장 용어에 대응하는 결과는 확장 결과로서 선택된다.Typical query extension methods include extensions at a query end and at an index end. Extensions at the query end mainly include adding, replacing, and deleting queries. That is, certain characters or portions are added, replaced or deleted from the query entered by the user. For example, if the query entered by the user is a "Nokia ™ mobile phone", the add action may apply to the query to change the query to "Nokia ™ N95 mobile phone", and the delete action may change the query to "Nokia". ™ "or" mobile phone "can be applied to the query, or alternative actions can be applied to the query to change the query to" Samsung ™ mobile phone "or" Apple ™ mobile phone "and the like. Expansion at the index end refers primarily to synonym expansion of queries at the index end. The set of synonyms is usually obtained through normal data mining. When a term appears, its synonyms are extracted from the set of synonyms for expansion. In order to ensure the accuracy of the search results and the query, concurrent extensions may be employed at the query end and the index end. In other words, the query is expanded on both the query end and the index end, respectively, and the result corresponding to the same extension term is selected as the extension result.

실제 처리 동안, 검색 엔진은 보통 특정 시퀀스에 따라 검색하기 위해 쿼리 엔드에서 하나씩 확장 용어를 선택하고, 그 확장 용어를 인덱스 엔드에서 확장 용어와 매치시키고, 만일 확장 용어들 사이에서 매치가 있다면 그 확장 용어의 검색 결과를 반환할 수 있다. 이러한 프로세스 동안, 쿼리 엔드에서는 복수의 확장 용어가 있지만 인덱스 엔드에서는 하나의 확장 용어만이 있는 일이 가능하다. 그래서, 시퀀스에 따라서, 쿼리 엔드에서 마지막 확장 용어는 인덱스 엔드에서 쿼리 용어와 매치한다. 검색 엔진은 쿼리 엔드에서 마지막 확장 용어가 인덱스 엔드에서 확장 용어와 매치할 때까지 복수 회 검색하여야 한다. 따라서, 검색 엔진의 무효한 검색 시간 뿐만 아니라 시스템이 검색 결과를 반환하는 시간이 증가되고, 시스템 응답 속도가 줄어들며, 시스템 자원 점유율이 늘어난다.
During actual processing, the search engine usually selects extended terms one by one at the query end to search according to a particular sequence, matches those extended terms with extended terms at the index end, and if there is a match between the extended terms, the extended terms Can return search results. During this process, it is possible for a query end to have multiple extended terms, but only one extended term at the index end. So, depending on the sequence, the last extension term at the query end matches the query term at the index end. The search engine must search multiple times until the last extended term at the query end matches the extended term at the index end. Thus, not only the invalid search time of the search engine but also the time for the system to return the search results, the system response speed is reduced, and the system resource share is increased.

이 요약은 아래의 상세한 설명에서 추가 설명되는 개념들 중 선택된 개념을 간략한 형태로 소개하기 위해 제공된다. 이 요약은 청구된 주제의 모든 주요 특징 또는 본질적인 특징을 식별하려는 것도 아니고, 청구된 주제의 범위를 결정하는데 도움으로서만 사용되게 하려는 것도 아니다. 예를 들면 "기술"이라는 용어는 본 개시의 앞부분과 전체의 맥락에 의해 용인되는 것으로서 장치(들), 시스템(들), 방법(들) 및/또는 컴퓨터-판독가능한 명령어를 말할 수 있다. This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify all key features or essential features of the claimed subject matter, nor is it intended to be used only as an aid in determining the scope of the claimed subject matter. For example, the term "technology" may refer to device (s), system (s), method (s) and / or computer-readable instructions as permitted by the context of the foregoing and entirety of this disclosure.

본 개시는 쿼리 확장 방법 및 시스템을 제공한다. 예를 들면, 본 기술은 검색 엔진에 의해 쿼리를 반복하여 검색하고 매치시키는 것 때문에 발생하는, 시스템 응답 속도 및 효율에도 영향을 주는 과도한 검색 시간 및 시스템 자원 점유 문제를 해결할 수 있다. The present disclosure provides a query extension method and system. For example, the present technology can solve the problem of excessive search time and system resource occupancy, which also affects system response speed and efficiency, caused by repeated search and matching of queries by search engines.

본 개시는 예시적인 쿼리 확장 방법을 기술한다. 사용자에 의해 입력된 쿼리가 수신된다. 쿼리의 정규화된 쿼리가 쿼리에 따라서 결정된다. 정규화된 쿼리는 쿼리의 확장 용어로서 사용되어 쿼리 확장을 시행한다. This disclosure describes an example query extension method. The query entered by the user is received. The normalized query of a query is determined by the query. Normalized queries are used as an extension term in the query to enforce query expansion.

쿼리의 정규화된 쿼리는 다음의 방법을 이용하여 결정될 수 있다. 사용자의 검색 로그에 있는 세션 정보가 획득된다. 단일의 세션에서 출현하는 모든 쿼리가 획득되며 각 쿼리마다 득표수가 계산되거나 카운트된다. 단일의 세션에서, 각 쿼리의 출현 시퀀스에 따라서, 특정 쿼리의 앞에 출현하는 임의의 쿼리는 (특정 쿼리에 대해) 하나의 득표로서 카운트된다. The normalized query of the query can be determined using the following method. Session information in the user's search log is obtained. All queries appearing in a single session are obtained and the vote count is counted or counted for each query. In a single session, depending on the appearance sequence of each query, any query that appears before a particular query is counted as one vote (for that particular query).

단일 쿼리와 타켓 쿼리 간의 득표 유사성 정도는 모든 세션에서 타겟 쿼리의 총 득표수 및 타켓 쿼리에 대한 단일 쿼리의 득표수에 따라서 결정된다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도에 따라서 결정된다. 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 결정된다. The degree of vote similarity between a single query and a target query is determined by the total number of votes in the target query and the number of votes in a single query for the target query in all sessions. The degree of correlation between a single query and the target query is determined by the degree of vote similarity. The normalized query of the target query is determined by the degree of correlation between the single query and the target query.

예를 들면, 모든 세션에서 타겟 쿼리의 총 득표수는 다음과 같이 계산되거나 카운트된다. 타겟 쿼리를 포함하는 하나 이상의 세션이 획득된다. 각각의 세션에서 타겟 쿼리의 득표수가 카운트된다. 각각의 세션에서 득표수가 누적되어 타겟 쿼리의 총 득표수를 획득한다.For example, the total number of votes for the target query in all sessions is calculated or counted as follows. One or more sessions containing the target query are obtained. The vote count of the target query is counted in each session. In each session, the number of votes is accumulated to obtain the total number of votes for the target query.

예를 들면, 타겟 쿼리에 대한 단일 쿼리의 득표수는 다음과 같이 계산되거나 카운트된다. 단일 쿼리 및 타겟 쿼리를 포함하는 하나 이상의 세션이 획득된다. 단일 쿼리가 각각의 세션에서 타겟 쿼리에 득표를 제공하는지의 여부가 결정된다. 만일 결정 결과가 긍정이면, 각각의 세션이 선택된다. 선택된 세션의 개수가 계산되어 타겟 쿼리에 대한 단일 쿼리의 득표수를 획득한다. For example, the number of votes in a single query for a target query is calculated or counted as follows. One or more sessions are obtained that include a single query and a target query. It is determined whether a single query provides a vote for the target query in each session. If the decision is positive, then each session is selected. The number of selected sessions is calculated to obtain the number of votes of a single query for the target query.

예를 들면, 단일 쿼리와 상기 타겟 쿼리 간의 득표 유사성 정도는 다음과 같이 결정될 수 있다. 타겟 쿼리에 대한 단일 쿼리의 총 득표수 대 (단일 제품에 대해) 타겟 쿼리에 대한 모든 쿼리의 총 득표 점수의 비율을 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도로서 사용한다.For example, the degree of vote similarity between a single query and the target query may be determined as follows. The ratio of the total number of votes in a single query to the target query to the total vote scores of all the queries for the target query (for a single product) is used as the degree of vote similarity between the single query and the target query.

다른 예로, 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도는 또한 다음과 같이 결정될 수 있다. 타겟 쿼리에 대한 각 득표의 가중치 및 기수(a base number)가 결정된다. 가중치 및 기수에 따라서 각 득표의 점수가 계산된다. 타겟 쿼리에 대한 단일 쿼리의 총 득표 점수 대 타겟 쿼리에 대한 모든 쿼리의 총 득표 점수의 비율은 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도로서 사용된다.As another example, the degree of vote similarity between a single query and a target query may also be determined as follows. The weight and a base number of each vote for the target query is determined. The score of each vote is calculated according to the weight and the cardinality. The ratio of the total vote score of a single query to the target query to the total vote score of all the queries for the target query is used as the degree of vote similarity between the single query and the target query.

예를 들면, 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 다음과 같이 결정될 수 있다. 정규화된 쿼리의 임계치가 설정된다. 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도의 값이 정규화된 쿼리의 임계치를 초과하면, 단일 쿼리가 타겟 쿼리의 정규화된 쿼리로서 결정된다.For example, the normalized query of the target query may be determined as follows according to the degree of correlation between the single query and the target query. The threshold of the normalized query is set. If the value of the degree of correlation between the single query and the target query exceeds the threshold of the normalized query, the single query is determined as the normalized query of the target query.

다른 예로, 타겟 쿼리의 정규화된 쿼리는 또한 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 다음과 같이 결정될 수 있다. 정규화된 쿼리의 카테고리가 동의어 정규화된 쿼리, 상관 정규화된 쿼리, 및 확장 정규화된 쿼리로 분리된다. 세가지 카테고리의 값 범위가 상관 정도 값에 따라서 제각기 내림차순으로 설정된다. 단일 쿼리와 상기 타겟 쿼리 간의 상관 정도가 속하는 값 범위에 상응하는 카테고리가 단일 쿼리 및 타겟 쿼리의 세부화된 카테고리로서 사용된다.As another example, the normalized query of the target query may also be determined as follows according to the degree of correlation between the single query and the target query. The categories of normalized queries are separated into synonym normalized queries, correlated normalized queries, and extended normalized queries. The range of values for the three categories is set in descending order, respectively, according to the degree of correlation. The category corresponding to the value range to which the degree of correlation between the single query and the target query belongs is used as the detailed category of the single query and the target query.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도가 득표 유사성 정도에 따라서 결정되기 전에, 이 방법은 다음과 같은 동작을 더 포함할 수 있다. 사용자의 검색 로그에 있는 검색 결과의 클릭 정보가 획득된다. 타겟 쿼리를 포함하는 검색 결과가 클릭 정보로부터 추출된다. 단일 쿼리와 타겟 쿼리 간의 클릭 유사성 정도가 타겟 쿼리를 포함하는 검색 결과의 총 클릭수 및 타겟 쿼리를 포함하면서 단일 쿼리에 상응하는 검색 결과의 총 클릭수에 따라서 결정된다.For example, before the degree of correlation between a single query and the target query is determined according to the degree of vote similarity, the method may further include the following operation. Click information of the search results in the user's search log is obtained. Search results including the target query are extracted from the click information. The degree of click similarity between the single query and the target query is determined according to the total number of clicks of the search results including the target query and the total clicks of the search results corresponding to the single query while including the target query.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 클릭 유사성 정도에 따라서 결정된다.For example, the degree of correlation between a single query and the target query is determined according to the degree of vote similarity and the degree of click similarity.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 클릭 유사성 정도에 따라서 다음과 같이 결정될 수 있다. 득표 유사성 정도와 클릭 유사성 정도 사이에서 더 큰 값이 타겟 쿼리와 타겟 쿼리 간의 상관 정도로서 사용된다.For example, the degree of correlation between the single query and the target query may be determined as follows according to the degree of vote similarity and the degree of click similarity. The larger value between the degree of vote similarity and the degree of click similarity is used as the degree of correlation between the target query and the target query.

대안으로, 득표 유사성 정도 및 클릭 유사성 정도의 가중치가 결정된다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 클릭 유사성 정도 뿐만 아니라 이들 각각의 가중치에 기초하여 하나 이상의 미리 정해진 규칙에 따라 계산된다.Alternatively, weights of vote similarity and click similarity are determined. The degree of correlation between the single query and the target query is calculated according to one or more predetermined rules based on the degree of vote similarity and the degree of click similarity as well as their respective weights.

예를 들면, 득표 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정하는 단계 이전에, 이 방법은 다음의 동작을 더 포함한다. 서버에 저장된 판매자 데이터가 획득된다. 판매자 데이터는 판매자가 제품을 설명할 때 정해진 제품 설명 정보를 포함한다. 판매자 데이터가 분석되고 쿼리 및 쿼리의 특성 용어가 판매자 데이터로부터 추출된다. 특성 유사성 정도는 단일 쿼리 및 타겟 쿼리의 특성 용어에 따라서 결정된다. For example, prior to the step of determining the degree of correlation between a single query and the target query according to the degree of vote similarity, the method further includes the following operation. Merchant data stored in the server is obtained. Seller data includes product description information specified when the seller describes the product. Seller data is analyzed and query and characteristic terms of the query are extracted from the seller data. The degree of feature similarity is determined by the feature terminology of the single query and the target query.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도는 다음과 같이 결정될 수 있다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 특성 유사성 정도에 따라서 결정될 수 있다.For example, the degree of correlation between a single query and a target query can be determined as follows. The degree of correlation between a single query and the target query may be determined according to the degree of vote similarity and the degree of feature similarity.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 특성 유사성 정도에 따라서 다음과 같이 결정될 수 있다. 각 특성 용어의 특성 값이 계산된다. 특성 값은 특성 용어 및 그의 상응하는 쿼리의 클릭 정보에 따라서 계산된다. 단일 쿼리와 타겟 쿼리 간의 특성 유사성 정도는 특성 값에 따라서 계산된다.For example, the degree of correlation between the single query and the target query may be determined as follows according to the degree of vote similarity and the degree of feature similarity. The characteristic value of each characteristic term is calculated. The characteristic value is calculated according to the click information of the characteristic term and its corresponding query. The degree of feature similarity between a single query and the target query is calculated according to the property value.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정하는 단계 이전에, 본 방법은 다음의 동작을 더 포함할 수 있다. 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도 및/또는 카테고리 유사성 정도가 결정된다. For example, prior to the step of determining a normalized query of the target query according to the degree of correlation between the single query and the target query, the method may further include the following operation. The degree of semantic similarity and / or category similarity between a single query and the target query is determined.

타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 다음과 같이 결정될 수 있다. 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 시맨틱 유사성에 따라서 결정될 수 있다. 대안으로, 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 카테고리 유사성에 따라서 결정될 수 있다. 대안으로, 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도, 시맨틱 유사성, 및 카테고리 유사성에 따라서 결정될 수 있다. The normalized query of the target query can be determined as follows according to the degree of correlation between the single query and the target query. The normalized query of the target query may be determined according to the degree of correlation and semantic similarity between the single query and the target query. Alternatively, the normalized query of the target query can be determined according to the degree of correlation and category similarity between the single query and the target query. Alternatively, the normalized query of the target query may be determined according to the degree of correlation, semantic similarity, and category similarity between the single query and the target query.

예를 들면, 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성의 결정은 다음 동작을 포함할 수 있다. 단일 쿼리와 타겟 쿼리 간의 편집 거리가 결정된다. 편집 거리는 하나의 용어에서 다른 용어로 변환하는 최소 개수의 편집 동작을 말한다. 편집 거리는 정규화되어 상관 정도와 동일한 양적 수준을 갖는 시맨틱 유사성 정도를 획득한다.For example, the determination of semantic similarity between a single query and a target query may include the following actions. The editing distance between the single query and the target query is determined. The editing distance refers to the minimum number of editing operations for converting from one term to another term. The edit distance is normalized to obtain a degree of semantic similarity with a quantitative level equal to the degree of correlation.

본 개시는 또한 예시적인 쿼리 확장 시스템을 제공한다. 시스템은 쿼리 입력 모듈, 정규화된 쿼리 결정 모듈, 및 쿼리 확장 모듈을 포함할 수 있다. The present disclosure also provides an example query extension system. The system can include a query input module, a normalized query decision module, and a query extension module.

쿼리 입력 모듈은 사용자에 의해 입력된 쿼리를 획득한다. 정규화된 쿼리 결정 모듈은 쿼리에 따라서 쿼리의 정규화된 쿼리를 결정한다. 쿼리 확장 모듈은 정규화된 쿼리를 쿼리의 확장 용어로서 사용하여 쿼리 확장을 시행한다. The query input module obtains a query entered by the user. The normalized query decision module determines the normalized query of the query according to the query. The query expansion module implements query expansion using normalized queries as the query's extension term.

정규화된 쿼리 결정 모듈은 세션 정보 획득 모듈, 쿼리 득표 계산 모듈, 득표 유사성 정도 결정 모듈, 상관 정도 결정 모듈, 및 정규화된 쿼리 결정 모듈을 포함할 수 있다. The normalized query determination module may include a session information acquisition module, a query vote calculation module, a vote similarity degree determination module, a correlation degree determination module, and a normalized query determination module.

세션 정보 획득 모듈은 사용자의 검색 로그로부터 세션 정보를 획득한다.The session information obtaining module obtains session information from the search log of the user.

쿼리 득표 계산 모듈은 단일 세션에서 출현하는 모든 쿼리를 획득하고, 각 쿼리마다 득표를 카운트한다. 단일 세션에서, 각 쿼리의 출현 시퀀스에 따라서, 특정 쿼리 앞에 출현하는 임의의 쿼리는 (특정 쿼리에 대해) 하나의 득표로서 카운트된다. The query vote calculation module obtains all queries appearing in a single session and counts votes for each query. In a single session, depending on the appearance sequence of each query, any query that appears before a particular query is counted as one vote (for that particular query).

득표 유사성 정도 결정 모듈은 모든 세션에서 타겟 쿼리의 총 득표수 및 타겟 쿼리에 대한 단일 쿼리의 득표수에 따라서 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도를 결정한다. The vote similarity determination module determines the degree of vote similarity between the single query and the target query according to the total number of votes of the target query and the number of votes of the single query for the target query in all sessions.

상관 정도 결정 모듈은 득표 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다.The degree of correlation determination module determines the degree of correlation between a single query and the target query according to the degree of vote similarity.

정규화된 쿼리 결정 모듈은 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정한다. The normalized query determination module determines the normalized query of the target query based on the degree of correlation between the single query and the target query.

예를 들면, 득표 유사성 정도 결정 모듈은 기수 및 가중치 결정 유닛, 점수 계산 유닛, 비율 계산 유닛을 포함할 수 있다. 기수 및 가중치 결정 유닛은 타겟 쿼리에 대하여 각 득표의 가중치 및 기수를 결정한다. 점수 계산 유닛은 가중치 및 기수에 따라서 각 득표의 점수를 계산한다. 비율 계산 유닛은 타겟 쿼리에 대한 단일 쿼리의 총 득표 점수 대 타겟 쿼리에 대한 모든 쿼리의 총 득표 점수의 비율을 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도로서 사용한다. For example, the vote similarity degree determination module may include a cardinal and weight determination unit, a score calculation unit, and a ratio calculation unit. The cardinal and weight determination unit determines the weight and cardinality of each vote for the target query. The score calculation unit calculates the score of each vote according to the weight and the cardinality. The ratio calculation unit uses the ratio of the total vote score of a single query to the target query to the total vote score of all the queries for the target query as the degree of vote similarity between the single query and the target query.

예를 들면, 정규화된 쿼리 결정 모듈은 정규화된 쿼리의 임계치를 설정하고 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도 값이 정규화된 쿼리의 임계치를 초과하는 경우 단일 쿼리를 타겟 쿼리의 정규화된 쿼리로서 결정하는 정규화된 쿼리 임계치 설정 유닛을 포함할 수 있다. For example, the normalized query decision module sets the threshold of a normalized query and determines that a single query as the normalized query of the target query if the correlation degree value between the single query and the target query exceeds the threshold of the normalized query. It may include a normalized query threshold setting unit.

다른 예로, 정규화된 쿼리 결정 모듈은 정규화된 쿼리 카테고리 분류 유닛, 값 범위 설정 유닛, 및 카테고리 결정 유닛을 포함할 수 있다. 정규화된 쿼리 카테고리 분류 유닛은 정규화된 쿼리 카테고리를 동의어 정규화된 쿼리, 상관 정규화된 쿼리, 및 확장 정규화된 쿼리로 분리한다. As another example, the normalized query determination module may include a normalized query category classification unit, a value range setting unit, and a category determination unit. The normalized query category classification unit separates the normalized query categories into synonym normalized queries, correlated normalized queries, and extended normalized queries.

값 범위 설정 유닛은 세 가지 카테고리의 값 범위를 상관 정도 값에 따라서 내림차순으로 설정한다.The value range setting unit sets the value ranges of the three categories in descending order according to the correlation degree value.

카테고리 결정 유닛은 단일 쿼리와 타겟 쿼리 간의 상관 정도가 속하는 값 범위에 상응하는 카테고리를 단일 쿼리 및 타겟 쿼리의 세부화된 카테고리로서 결정한다. The category determining unit determines the category corresponding to the range of values to which the degree of correlation between the single query and the target query belongs as a detailed category of the single query and the target query.

예를 들면, 정규화된 쿼리 결정 모듈은 클릭 정보 획득 모듈, 검색 결과 추출 모듈, 및 클릭 유사성 정도 결정 모듈을 포함할 수 있다. 클릭 정보 획득 모듈은 사용자의 검색 로그로부터 검색 결과의 클릭 정보를 획득한다. 검색 결과 추출 모듈은 클릭 정보로부터 타겟 쿼리를 포함하는 검색 결과를 추출한다. 클릭 유사성 정도 결정 모듈은 타겟 쿼리를 포함하는 검색 결과의 총 클릭 수 및 타겟 쿼리를 포함하면서 단일 쿼리에 상응하는 검색 결과의 총 클릭 수에 따라서 단일 쿼리와 타겟 쿼리 간의 클릭 유사성 정도를 결정한다. 상관 정도 결정 모듈은 득표 유사성 정도 및 클릭 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다. For example, the normalized query determination module may include a click information acquisition module, a search result extraction module, and a click similarity degree determination module. The click information obtaining module obtains click information of a search result from a user's search log. The search result extraction module extracts a search result including the target query from the click information. The click similarity determining module determines the degree of click similarity between a single query and the target query according to the total number of clicks of the search result including the target query and the total number of clicks of the search result corresponding to the single query while including the target query. The degree of correlation determination module determines the degree of correlation between the single query and the target query according to the degree of vote similarity and the degree of click similarity.

예를 들면, 정규화된 쿼리 결정 모듈은 또한 판매자 데이터 획득 모듈, 데이터 분석 모듈, 및 특성 유사성 정도 결정 모듈을 더 포함할 수 있다. For example, the normalized query determination module may also include a merchant data acquisition module, a data analysis module, and a characteristic similarity degree determination module.

판매자 데이터 획득 모듈은 서버에 저장된 판매자 데이터를 획득한다. 판매자 데이터는 판매자가 제품을 설명할 때 정해진 제품 설명 정보를 포함한다.The seller data acquisition module acquires seller data stored in the server. Seller data includes product description information specified when the seller describes the product.

데이터 분석 모듈은 판매자 데이터를 분석하고 판매자 데이터로부터 쿼리뿐만 아니라 쿼리의 특성 용어를 추출한다. 특성 유사성 정도 결정 모듈은 단일 쿼리 및 타겟 쿼리의 특성 용어에 따라서 특성 유사성을 결정한다. 상관 정도 결정 모듈은 득표 유사성 정도 및 특성 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다. The data analysis module analyzes the seller data and extracts query terminology as well as queries from the seller data. The degree of characteristic similarity determination module determines characteristic similarity according to characteristic terms of a single query and a target query. The degree of correlation determination module determines the degree of correlation between a single query and a target query according to the degree of vote similarity and the degree of similarity of characteristics.

예를 들면, 특성 유사성 정도 결정 모듈은 각 특성 용어의 특성 값 및 특성 값에 따라서 단일 쿼리와 타겟 쿼리 간의 특성 유사성을 계산하는 특성 값 계산 유닛을 포함할 수 있다. 특성 값은 특성 용어 및 그의 상응하는 쿼리의 클릭 정보에 따라서 계산된다. For example, the characteristic similarity determining module may include a characteristic value calculation unit that calculates characteristic similarity between a single query and a target query according to the characteristic value and the characteristic value of each characteristic term. The characteristic value is calculated according to the click information of the characteristic term and its corresponding query.

다른 예로, 정규화된 쿼리 결정 모듈은 또한 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도 및/또는 카테고리 유사성 정도를 각기 결정하는 시맨틱 유사성 정도 결정 모듈 및/또는 카테고리 유사성 정도 결정 모듈을 포함할 수 있다. As another example, the normalized query determination module may also include a semantic similarity degree determination module and / or a category similarity degree determination module that respectively determine a degree of semantic similarity and / or category similarity between a single query and a target query.

정규화된 쿼리 결정 모듈은 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 시맨틱 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정한다. 대안으로, 정규화된 쿼리 결정 모듈은 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 카테고리 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정한다. 대안으로, 정규화된 쿼리 결정 모듈은 단일 쿼리와 타겟 쿼리 간의 상관 정도, 시맨틱 유사성 정도, 및 카테고리 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정한다. The normalized query determination module determines the normalized query of the target query according to the degree of correlation and semantic similarity between the single query and the target query. Alternatively, the normalized query determination module determines the normalized query of the target query according to the degree of correlation and category similarity between the single query and the target query. Alternatively, the normalized query determination module determines the normalized query of the target query according to the degree of correlation, degree of semantic similarity, and category similarity between the single query and the target query.

예를 들면, 시맨틱 유사성 정도 결정 모듈은 편집 거리 결정 유닛 및 정규화 처리 유닛을 포함할 수 있다. 편집 거리 결정 유닛은 단일 쿼리와 타겟 쿼리 간의 편집 거리를 결정한다. 편집 거리는 하나의 용어를 다른 용어로 변환하는 최소 개수의 편집 동작을 말한다. 정규화 처리 유닛은 편집 거리를 정규화하여 상관 정도와 동일한 양적 수준을 갖는 시맨틱 유사성 정도를 획득한다. For example, the semantic similarity degree determination module may include an edit distance determination unit and a normalization processing unit. The edit distance determining unit determines the edit distance between a single query and the target query. The editing distance refers to the minimum number of editing operations for converting one term into another term. The normalization processing unit normalizes the edit distance to obtain a degree of semantic similarity having a quantitative level equal to the degree of correlation.

본 개시의 쿼리 확장 방법 및 시스템은 사용자의 검색 로그 내 세션 정보를 활용함으로써 쿼리를 정규화하며, 동일하거나 유사한 의미를 가진 쿼리를 동일하거나 유사한 쿼리로서 취급한다. 쿼리를 획득한 후, 본 기술은 쿼리를 자동으로 정규화하고 동시에 검색을 확장할 수 있으며, 검색 결과의 범위를 증가시킬 때 동시에 검색 결과의 정확도를 보장할 수 있다. 본 기술은 사용자의 검색 로그 내 세션 정보를 이용하여 정규화를 수행하고, 각 세션에 포함된 쿼리를 추출하며, 쿼리의 시퀀스에 따라서 사용자 행위를 분석함으로써, 검색 중에 각 사용자의 쿼리 변경 프로세스를 획득할 수 있다. 단일 세션은 더 짧고 연속하는 시간 내에 사용자 검색 정보를 기록하므로, 단일 세션에서 쿼리들 간의 상관 정도가 커질 수 있다. 그러므로, 확장된 쿼리 처리는 세션 정보의 그러한 특성에 기초하여 처리되어 정규화 처리 효과를 개선하고 정규화 이후 두 쿼리들 간의 충분히 높은 상관 정도를 보장하게 되어, 최종 검색 결과의 정확성을 보장하고 검색 시간을 줄일 수 있다. 따라서, 시스템 자원 점유율이 줄어들고 시스템 응답 속도 및 쿼리 확장 효율이 개선된다. The query extension method and system of the present disclosure normalizes a query by utilizing session information in a search log of a user, and treats a query having the same or similar meaning as the same or similar query. After acquiring the query, the technology can automatically normalize the query and simultaneously expand the search, and ensure the accuracy of the search results at the same time as it increases the scope of the search results. The technique performs normalization by using session information in a user's search log, extracts a query included in each session, and analyzes user behavior according to a sequence of queries, thereby obtaining a process of changing a user's query during a search. Can be. Since a single session records user search information in a shorter, continuous time, the degree of correlation between queries in a single session can be greater. Therefore, extended query processing is handled based on such characteristics of session information to improve the effectiveness of normalization processing and to ensure a sufficiently high correlation between the two queries after normalization, ensuring the accuracy of the final search results and reducing the search time. Can be. Thus, system resource occupancy is reduced and system response speed and query expansion efficiency are improved.

게다가, 세션 정보의 특성에 기초하여, 작은 상관 정도를 가진 쿼리들이 동일 세션에서 출현하는 확률이 줄어든다. 그래서, 정규화 처리할 데이터 범위가 줄어든다. 처리 속도가 개선되며 처리 시간이 절감된다.In addition, based on the nature of the session information, the probability that queries with a small degree of correlation appear in the same session is reduced. Thus, the data range to be normalized is reduced. Processing speed is improved and processing time is saved.

또한, 처리를 위해 세션 정보를 고려하는 것 이외에도, 본 기술은 사용자 클릭 정보 및 판매자 데이터와 같은 다른 차원을 추가로 고려하여 정규화 처리의 정확성을 더욱 개선할 수 있다. Furthermore, in addition to considering session information for processing, the present technology can further improve the accuracy of normalization processing by further considering other dimensions such as user click information and seller data.

분명, 본 개시의 임의의 제품은 전술한 특징 모두를 갖고 있을 필요는 없다.
Obviously, any product of the present disclosure need not have all of the foregoing features.

본 개시의 실시예를 더 잘 설명하기 위하여, 실시예의 설명에서 사용될 도면에 대해 다음과 같이 간략히 소개한다. 다음의 도면들은 단지 본 개시의 일부 실시예에 관련될 뿐이라는 것이 자명하다. 본 기술에서 통상의 지식을 가진 자들이라면 창조적인 노력 없이도 본 개시에 있는 도면들에 따라서 다른 도면을 얻을 수 있다.
도 1은 본 개시의 제 1의 예시적인 실시예에 따른 예시적인 쿼리 확장 방법의 플로우차트를 도시한다.
도 2는 본 개시의 제 1의 예시적인 실시예에 따른 쿼리의 정규화된 쿼리를 결정하는 예시적인 방법의 플로우차트를 도시한다.
도 3은 본 개시의 제 2의 예시적인 실시예에 따른 쿼리의 정규화된 쿼리를 결정하는 다른 예시적인 방법의 플로우차트를 도시한다.
도 4는 본 개시의 제 3의 예시적인 실시예에 따른 쿼리의 정규화된 쿼리를 결정하는 다른 예시적인 방법의 플로우차트를 도시한다.
도 5는 본 개시에 따른 예시적인 쿼리 확장 시스템의 다이어그램을 도시한다.
도 6은 본 개시에 따른 제 1의 예시적인 정규화된 쿼리 결정 모듈의 다이어그램을 도시한다.
도 7은 본 개시에 따른 제 2의 예시적인 정규화된 쿼리 결정 모듈의 다이어그램을 도시한다.
도 8은 본 개시에 따른 제 3의 예시적인 정규화된 쿼리 결정 모듈의 다이어그램을 도시한다. To better describe the embodiments of the present disclosure, the drawings to be used in the description of the embodiments are briefly introduced as follows. Obviously, the following figures are only related to some embodiments of the present disclosure. Those skilled in the art may obtain other drawings according to the drawings in the present disclosure without creative efforts.
1 illustrates a flowchart of an example query extension method according to the first exemplary embodiment of the present disclosure.
2 illustrates a flowchart of an example method of determining a normalized query of a query according to the first example embodiment of the present disclosure.
3 shows a flowchart of another example method of determining a normalized query of a query according to the second example embodiment of the present disclosure.
4 shows a flowchart of another example method of determining a normalized query of a query according to the third example embodiment of the present disclosure.
5 shows a diagram of an example query extension system in accordance with the present disclosure.
6 shows a diagram of a first exemplary normalized query decision module in accordance with the present disclosure.
7 shows a diagram of a second exemplary normalized query determination module in accordance with this disclosure.
8 shows a diagram of a third exemplary normalized query decision module in accordance with the present disclosure.

본 개시의 목적, 특성 및 장점을 명료히 하고 이해하기 쉽게 하기 위해, 다음의 설명은 도면과 일부 예시적인 실시예를 참조하여 기술된다. BRIEF DESCRIPTION OF THE DRAWINGS In order to make the object, features and advantages of the present disclosure clear and easy to understand, the following description is described with reference to the drawings and some exemplary embodiments.

본 개시에서 쿼리는 예상된 결과를 질의하고 구하기 위해 사용자에 의해 입력된 주요 용어일 수 있다. 예를 들면, 쿼리는 제품 이름, 제품 상표, 제품 모델, 또는 다른 용어를 포함할 수 있다. 특수한 분야에서, 쿼리는 특수한 카테고리의 용어일 수 있다. 예를 들면, 전자 상거래 웹사이트를 이용할 때, 쿼리는 모바일 폰, 드레스 등과 같은 제품 이름이나 카테고리를 표현하는 제품 용어일 수 있다. 제품 용어는 제품 용어가 검색 결과와 사용자 예상 간의 매칭 정도를 상대적이고 효과적으로 개선하기 때문에 공용의 쿼리이다. In this disclosure, a query may be a key term entered by a user to query and obtain expected results. For example, the query may include a product name, product brand, product model, or other term. In a particular field, a query may be a term of a particular category. For example, when using an e-commerce website, a query may be a product term that represents a product name or category, such as a mobile phone, a dress, or the like. Product terms are public queries because product terms improve the relative and effective degree of matching between search results and user expectations.

예를 들면, 사용자의 검색 로그를 분석함으로써, 쿼리의 약 57 퍼센트는 제품 용어를 활용하여 검색되며 쿼리의 거의 88 퍼센트는 제품 용어를 포함하고 있다. 게다가, 전자 상거래 웹사이트에서 제품 공급자는 제품을 설명하고 그 설명을 서버에 저장할 수 있다. 설명은 제품 이름 및 그 제품의 상세한 설명을 포함할 수 있다. 일반적인 검색 방법에서, 전자 상거래 웹사이트의 검색 엔진은 사용자에 의해 입력된 제품 용어를 서버 내의 제품 이름과 매치하여 매칭 결과에 따라 검색 결과를 구한다. 그러므로, 사용자에 의해 입력된 제품 용어를 제품 공급자에 의해 서버에 저장된 제품 이름과 상관시키는 것은 검색 결과의 정확성을 개선하는 중요한 전제이다. For example, by analyzing a user's search log, about 57 percent of queries are retrieved using product terms and nearly 88 percent of queries contain product terms. In addition, on an e-commerce website, a product supplier can describe a product and store the description on a server. The description may include a product name and a detailed description of the product. In a general search method, a search engine of an e-commerce website matches a product term entered by a user with a product name in a server to obtain a search result according to the matching result. Therefore, correlating product terms entered by the user with product names stored on the server by the product provider is an important premise to improve the accuracy of the search results.

또한, 일부의 대형 웹사이트에서 정보 데이터 량은 방대하다. 그러나, 정보 데이터에 포함된 쿼리는 총 정보 데이터보다 훨씬 적다. 그래서, 만일 쿼리가 정규화되고 동일하거나 유사한 의미를 표현하는 쿼리가 질의하는 동안 동일하거나 유사한 것으로서 상관되고 간주된다면, 데이터 중복성은 더욱 줄어들 수 있으며, 검색 엔진의 응답 속도는 개선될 수 있다. 예를 들어 대형 전자 상거래 웹사이트를 이용하면, 그의 제품 용어는 총 정보 데이터보다 훨씬 적을 수 있다. Also, the amount of information and data on some large websites is huge. However, the queries included in the informational data are much less than the total informational data. Thus, if a query is normalized and a query expressing the same or similar semantics is correlated and considered as the same or similar during the query, data redundancy can be further reduced and the response speed of the search engine can be improved. Using a large e-commerce website, for example, his product terminology may be much less than the total information data.

따라서, 본 개시는 쿼리의 정규화된 처리를 실현하기 위한 예시적인 쿼리 확장 방법 및 시스템을 제공한다. Thus, the present disclosure provides an example query extension method and system for realizing a normalized processing of a query.

도 1 및 도 2는 본 개시의 제 1의 예시적인 실시예에 따른 예시적인 쿼리 확장 방법을 도시한다. 1 and 2 illustrate an example query extension method according to a first exemplary embodiment of the present disclosure.

(102)에서, 사용자에 의해 입력된 쿼리가 획득된다.At 102, a query entered by a user is obtained.

(104)에서, 쿼리의 정규화된 쿼리가 그 쿼리에 따라서 결정된다.At 104, a normalized query of a query is determined according to that query.

(106)에서, 정규화된 쿼리가 쿼리 확장을 시행하는 쿼리의 확장 용어로서 사용된다. At 106, a normalized query is used as the extension term of the query that enforces the query extension.

예를 들면, (104)에서, 동작은 다음을 포함할 수 있다. For example, at 104, the operation may include the following.

(1020)에서, 사용자의 검색 로그에 있는 세션 정보가 획득된다.At 1020, session information in the user's search log is obtained.

세션 정보는 웹사이트에서 (보통 수 분 내지 수 시간 사이의) 연이은 기간에 걸쳐 사용자의 일련의 행위를 기술하는 정보를 말한다. 사용자가 웹사이트 페이지를 둘러보기 시작할 때부터 둘러보기를 그만두기 까지 전체 프로세스 동안, 웹사이트 서버는 자동으로 세션 ID를 사용자에게 할당하고, 그 기간 동안 사용자 행위를 기록할 수 있다. 사용자가 오랜 간격 후 다시 웹사이트 페이지를 둘러볼 때, 웹사이트 서버는 다른 세션 ID를 사용자에게 할당하고 사용자 행위를 기록할 수 있다. 일반적으로, 연이은 둘러보기 기간 내에서 사용자 행위들은 보통 특정한 상관을 갖는다. 즉, 하나의 세션 내에서 사용자 행위들은 상관되는 것으로 간주된다. 그러면 검색 및 질의하는 동안 사용자에 의해 세션에 기록되고 사용된 쿼리는 또한 특정한 상관을 가질 수 있다. 그러므로, 본 개시는 예를 들면, 세션 정보에 기초하여 쿼리에 대해 정규화 처리를 시행할 수 있다. Session information refers to information describing a series of actions of a user over a successive period of time (usually between minutes and hours) on a website. During the entire process, from when the user starts browsing the website page to stopping the tour, the website server can automatically assign a session ID to the user and record the user's behavior during that time. When the user browses back to the website page after a long interval, the website server can assign a different session ID to the user and record the user's behavior. In general, within successive look-up periods, user actions usually have a specific correlation. In other words, user actions within a session are considered correlated. The queries recorded and used in the session by the user during the search and query may then also have a specific correlation. Therefore, the present disclosure can enforce normalization processing on a query based on, for example, session information.

웹사이트 서버는 세션 정보를 포함하는 사용자의 검색 로그를 저장하기 위한 특정 데이터베이스를 포함할 수 있다. 데이터 용량을 줄이기 위해, 특정 기간 내의 사용자의 검색 로그가 획득될 수 있다. 대안으로, 상이한 기간 내의 사용자의 검색 로그가 획득될 수 있고, 그럼으로써 데이터의 객관성이 개선될 수 있다. The website server may include a specific database for storing a user's search log including session information. In order to reduce the data capacity, a search log of a user within a certain time period can be obtained. Alternatively, a search log of users within different time periods can be obtained, thereby improving the objectivity of the data.

(1022)에서, 단일의 세션에서 출현하는 모든 쿼리가 획득되고 각 쿼리마다 득표가 카운트된다. 단일의 세션에서, 각 쿼리의 출연 시퀀스에 따라서, 특정 쿼리의 앞에 출현하는 임의의 쿼리가 (특정 쿼리에 대해) 하나의 득표로서 카운트된다. At 1022, all queries appearing in a single session are obtained and votes are counted for each query. In a single session, according to the appearance sequence of each query, any query that appears before a particular query is counted as one vote (for that particular query).

하나의 세션에서, 사용자는 복수 회 검색할 수 있으며, 세션 정보는 복수의 쿼리를 포함할 수 있다. 세션 정보는 쿼리의 출현 시퀀스인 사용자의 검색 시퀀스를 기록할 수 있다. 쿼리의 출현 시퀀스는 세션 정보에 기록된 각 쿼리 시간에 따라서 결정될 수 있다. In one session, a user may search multiple times, and the session information may include a plurality of queries. The session information may record the search sequence of the user, which is the appearance sequence of the query. The appearance sequence of the query may be determined according to each query time recorded in the session information.

각 쿼리마다 득표를 카운트하는 예시적인 상세한 프로세스는 다음과 같다. An exemplary detailed process for counting votes for each query is as follows.

쿼리는 쿼리의 출현 시퀀스에 따라서 순서화된다. 각 쿼리의 득표는 그 쿼리의 앞에 나오는 쿼리들의 총 개수이다. The queries are ordered according to the sequence of occurrences of the query. The vote for each query is the total number of queries preceding that query.

예를 들면, 다섯 개의 쿼리 a, b, c, d 및 e 가 하나의 세션에 포함되어 있고 이들의 출현 순서에 따라서 a, b, c, d 및 e 로서 배열되어 있다. 앞에서의 정의에 따르면, 각 쿼리 앞의 임의의 쿼리는 각 쿼리에 대한 득표로서 카운트된다. 즉, 쿼리 b의 경우, 그의 득표는 1, 즉 a 에서 b까지이다. 쿼리 c의 경우, 득표는 2인데, 즉 a 에서 c까지 그리고 b에서 c까지이다. 쿼리 e의 경우, 득표는 4, 즉 a, b, c 및 d 각각에서부터의 득표이다. 즉, 각 쿼리의 득표는 각 쿼리 앞에 출현하는 쿼리들의 총 개수를 말한다. For example, five queries a, b, c, d and e are included in one session and arranged as a, b, c, d and e according to their appearance order. According to the previous definition, any query before each query is counted as a vote for each query. In other words, for query b, its vote is 1, i.e. from a to b. For query c, the vote is 2, a to c and b to c. For the query e, the vote is 4, that is, the votes from a, b, c and d respectively. In other words, the vote of each query is the total number of queries that appear before each query.

(1024)에서, 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도는 모든 세션에서 타겟 쿼리의 총 득표수 및 타겟 쿼리에 대한 단일 쿼리의 득표수에 따라서 결정된다.At 1024, the degree of vote similarity between a single query and the target query is determined according to the total number of votes of the target query and the number of votes of the single query for the target query in all sessions.

복수의 사용자가 동일한 기간 내에 웹사이트를 방문할 때, 복수의 세션이 존재할 수 있다. 다른 세션에서 각 쿼리 별로 득표를 카운트하는 방법은 (1022)에서 기술된 방법과 동일할 수 있다. When multiple users visit a website within the same time period, there may be multiple sessions. The method of counting votes for each query in another session may be the same as the method described in 1022.

모든 세션에서 타겟 쿼리의 총 득표수는 다음과 같이 카운트될 수 있다.The total votes of the target query in all sessions can be counted as follows:

A1에서, 타겟 쿼리를 포함하는 세션이 획득된다. At A1, a session containing a target query is obtained.

A2에서, 각 세션에서 타겟 쿼리의 득표수가 획득된다.In A2, the number of votes of the target query is obtained in each session.

A3에서, 각 세션에서 득표수가 누적되어 타겟 쿼리의 총 득표수를 구한다. In A3, the number of votes is accumulated in each session to obtain the total number of votes of the target query.

전술한 득표의 정의에 따르면, 각 세션에서 타겟 쿼리의 앞에 출현하는 임의의 쿼리는 타겟 쿼리에 한번 득표를 제공한다. 예를 들면, 각 세션에서 각 쿼리는 타겟 쿼리에 대해 많아 봐야 하나의 득표로서 카운트될 수 있다. 만일 쿼리가 타겟 쿼리 앞에 출현하면, 쿼리는 하나의 득표를 타겟 쿼리에 제공하고, 그렇지 않으면, 쿼리는 타겟 쿼리에 득표를 제공하지 않는다. 그러므로, 타겟 쿼리에 대한 단일 쿼리의 득표수를 결정할 때, 이는 단일 쿼리 및 타겟 쿼리 둘 다 포함하며 그리고 단일 쿼리가 타겟 쿼리의 앞에 출현하는 세션들의 개수를 결정함으로써 결정될 수 있다. 예시적인 상세한 동작은 다음과 같다. According to the above definition of a vote, any query that appears before the target query in each session provides a vote for the target query once. For example, in each session each query can be counted as one vote at most for the target query. If a query appears before the target query, the query gives a vote to the target query, otherwise the query does not give a vote to the target query. Therefore, when determining the number of votes of a single query for a target query, this may be determined by determining the number of sessions that include both the single query and the target query and where the single query appears before the target query. Exemplary detailed operations are as follows.

B1에서, 단일 쿼리 및 타겟 쿼리를 둘 다 포함하는 세션들이 획득된다.At B1, sessions are obtained that include both a single query and a target query.

B2에서, 단일 쿼리가 각 세션에서 타겟 쿼리에 득표를 제공하는지 여부가 결정된다. 만일 결정 결과가 긍정이면, 각 세션이 선택된다. In B2, it is determined whether a single query provides a vote for the target query in each session. If the decision is positive, then each session is selected.

B3에서, 모든 선택된 세션의 개수가 카운트되어 타겟 쿼리에 대한 단일 쿼리의 득표수를 구한다. At B3, the number of all selected sessions is counted to obtain the number of votes in a single query for the target query.

타겟 쿼리를 포함하는 세션 또는 단일 쿼리 및 타겟 쿼리를 둘 다 포함하는 세션을 구하는 것은 매칭함으로써 시행될 수 있다. 즉, 타겟 쿼리 및/또는 단일 쿼리가 먼저 결정되고, 결정된 용어가 세션에 포함된 모든 쿼리들과 매칭된다. 만일 용어가 매치되면, 이 세션은 타겟 쿼리 또는 두 단일 쿼리 및 타겟 쿼리를 포함하는 것으로 결정된다. Obtaining a session containing a target query or a session containing both a single query and a target query can be implemented by matching. That is, a target query and / or a single query are first determined, and the determined term is matched with all queries included in the session. If the terms match, this session is determined to include the target query or two single queries and the target query.

특정한 득표수가 결정될 수 있다면 총 득표수 및 타겟 쿼리에 대한 단일 쿼리의 득표수를 카운트하는 다른 방법이 사용될 수 있다. If a specific number of votes can be determined, other methods of counting the total number of votes and the number of votes in a single query for the target query can be used.

예를 들면, 타겟 쿼리에 대하여 각 세션에서 각 쿼리의 득표는 라우트 방식에 의해 표현될 수 있다. 총 득표수는 총 라우트를 카운트함으로써 계산될 수 있다. 타겟 쿼리에 대한 단일 쿼리의 득표는 동일한 라우트 방식으로 계산되고 표현될 수 있으며 라우트는 이전에 카운트된 모든 라우트와 매치된다. 만일 라우트가 완전히 매치되면, 하나의 득표가 카운트된다. 완전히 매치된 라우트의 개수는 타겟 쿼리에 대한 단일 쿼리의 득표수이다. For example, the votes of each query in each session with respect to the target query may be represented by a route scheme. The total number of votes can be calculated by counting the total route. Votes in a single query for the target query can be calculated and represented in the same route manner, and the route matches all previously counted routes. If the route matches completely, one vote is counted. The number of fully matched routes is the number of votes in a single query for the target query.

단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도는 타겟 쿼리에 대한 단일 쿼리의 득표수 대 타겟 쿼리의 총 득표수의 비율에 따라서 직접 결정될 수 있다. 즉, 특정한 비율 값은 득표 유사성 정도의 값이다. 타겟 쿼리에 대한 단일 쿼리의 득표 대 타겟 쿼리의 총 득표수의 비율이 높을 수록, 사용자들이 단일 쿼리에서 타겟 쿼리로의 라우트 변경을 더 많이 채택하며 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도가 더 높아진다. The degree of similarity between votes between a single query and the target query can be determined directly according to the ratio of the number of votes in the single query to the total number of votes in the target query. In other words, the specific ratio value is the degree of vote similarity. The higher the ratio of the total number of votes in a single query to the target query for the target query, the more users adopt route changes from the single query to the target query and the higher the degree of vote similarity between the single query and the target query.

타겟 쿼리에 대한 단일 쿼리의 득표는 상이한 세션에서 상이할 수 있다. 예를 들면, 단일 쿼리는 바로 타겟 쿼리로 전환될 수 있거나 또는 타겟 쿼리로 복수 회만큼 전환될 수 있다. 예를 들어, 하나의 세션에서 쿼리가 a, b 및 c일 수 있고, 다른 세션에서 쿼리가 a 및 c일 수 있다. 카운트할 때, 두 세션은 a부터 c까지의 득표를 포함하지만, 이들 중 하나는 b에 의해 떨어져 있고, 반면에 다른 것은 그렇지 않다. 그러므로, a부터 c까지의 두 득표는 어떤 차이를 갖는다. 단일 쿼리와 타겟 쿼리 간의 상관 정도를 더욱 객관적으로 계산하기 위해, 아래와 같은 예시적인 동작이 시행될 수 있다. Votes for a single query for a target query may be different in different sessions. For example, a single query can be converted directly into a target query or multiple times into a target query. For example, the query may be a, b, and c in one session, and the query may be a and c in another session. In counting, the two sessions contain votes a to c, but one of them is separated by b, while the other is not. Therefore, the two votes a to c have some difference. In order to more objectively calculate the degree of correlation between a single query and a target query, the following example operation may be implemented.

타겟 쿼리에 대한 각 득표의 가중치 및 기수가 결정된다. 가중치 및 기수에 따라서 각 득표 점수가 계산된다. 타겟 쿼리에 대한 단일 쿼리의 총 득표 점수 대 타겟 쿼리에 대한 모든 쿼리들의 총 득표 점수의 비율은 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도로서 사용된다. The weight and radix of each vote for the target query are determined. Each vote score is calculated according to the weight and the cardinality. The ratio of the total vote score of a single query to the target query to the total vote score of all queries for the target query is used as the degree of vote similarity between the single query and the target query.

특정한 계산 동안, 각 득표의 기수를 가중치로 곱해주는 방식이 채택되어 각 득표 점수를 계산하며, 최종적으로 점수들이 가산되어 총 득점을 구한다. 예를 들면, 각 득표의 기수가 1이라고 가정하고, 만일 한 세션에서 어떤 쿼리가 바로 타겟 쿼리로 전환되면, 그 어떤 쿼리의 가중치는 1로서 설정될 수 있고, 그러면 득표의 최종 계산 결과는 그대로 1이다. 만일 다른 세션에서 어떤 쿼리가 다른 용어를 건너뛴 후 타겟 쿼리로 전환되면, 그 어떤 쿼리의 가중치는 0.9로서 설정될 수 있고, 그러면 득표의 최종 계산 결과는 0.9이다. 다른 예로, 매번 어떤 쿼리로부터 타겟 쿼리로의 전환이 스텝 길이(a step length)로서 기록되면, 가중치는 스텝 길이의 역수로서 결정될 수 있다. During a particular calculation, a method of multiplying the odds of each vote by weight is adopted to calculate each vote score, and finally the scores are added to obtain the total score. For example, assume that the vote base is 1, and if a query is directly converted to a target query in one session, the weight of that query can be set as 1, and the final result of the vote remains 1 to be. If a query in another session is converted to a target query after skipping other terms, the weight of that query can be set as 0.9, then the final result of the vote is 0.9. As another example, each time a transition from a query to a target query is recorded as a step length, the weight may be determined as the inverse of the step length.

상이한 득표들 간의 차이를 보여줄 수 있다면 가중치를 결정하는 임의의 다른 방법이 사용될 수 있다. Any other method of determining the weight can be used if it can show the difference between the different votes.

(1026)에서, 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도에 따라서 결정된다. At 1026, the degree of correlation between the single query and the target query is determined according to the degree of vote similarity.

일예의 실시예에서, 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도는 둘 간의 상관 정도이다. In one example embodiment, the degree of vote similarity between a single query and the target query is the degree of correlation between the two.

이러한 예의 실시예는 상관 정도를 결정할 때 득표 유사성 정도인 한 가지 차원만을 고려한 것일 뿐이다. 일부 예의 실시예에서, 상관 정도를 결정할 때 일부 다른 차원이 고려될 필요가 있을 수 있다. 득표 유사성 정도의 값 및 다른 차원의 수치 값은 동일한 양적 레벨을 갖도록 정규화되어 비교적 정확한 상관 정도를 결정할 수 있다. This example embodiment only considers one dimension, the degree of vote similarity, when determining the degree of correlation. In some example embodiments, some other dimension may need to be considered when determining the degree of correlation. The values of the degree of vote similarity and the numerical values of different dimensions can be normalized to have the same quantitative level to determine a relatively accurate degree of correlation.

(1028)에서, 타겟 쿼리의 정규화된 쿼리는 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 결정된다.At 1028, the normalized query of the target query is determined according to the degree of correlation between the single query and the target query.

정규화된 쿼리의 임계치는 미리 정해질 수 있다. 즉, 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도 값이 정규화된 쿼리의 임계치를 초과한다면, 단일 쿼리는 타겟 쿼리의 정규화된 쿼리로서 결정된다.The threshold of the normalized query can be predetermined. That is, if the correlation degree value between the single query and the target query exceeds the threshold of the normalized query, the single query is determined as the normalized query of the target query.

또한, 단일 쿼리가 타겟 쿼리의 정규화된 쿼리로서 결정된 후, 단일 쿼리는 특정 상관 정도 값에 따라서 더욱 세부화될 수 있다. 예를 들면, 정규화된 쿼리는 동의어 정규화된 쿼리, 상관 정규화된 쿼리, 확장 정규화된 쿼리 등으로 세부화될 수 있다. 카테고리들의 값 범위는 제각기 결정될 수 있다. 상관 정도 값이 특정 값 범위 내이면, 특정 값 범위에 상응하는 카테고리는 정규화된 쿼리의 세부화된 카테고리로서 결정된다. 예를 들면, 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도가 동의어 정규화된 쿼리에 상응하는 값 범위 내이면, 단일 쿼리는 타겟 쿼리의 동의어 정규화된 쿼리로서 결정되며; 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도가 상관 정규화된 쿼리에 상응하는 값 범위 내이면, 단일 쿼리는 타겟 쿼리의 상관 정규화된 쿼리로서 결정되며; 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도가 동의어 확장 정규화된 쿼리에 상응하는 값 범위 내이면, 단일 쿼리는 타겟 쿼리의 확장 정규화된 쿼리로서 결정된다.In addition, after a single query is determined as a normalized query of the target query, the single query can be further refined according to a particular degree of correlation value. For example, a normalized query can be refined to synonym normalized queries, correlated normalized queries, extended normalized queries, and the like. The value range of the categories may be determined respectively. If the correlation degree value is within a specific value range, the category corresponding to the specific value range is determined as the detailed category of the normalized query. For example, if the degree of correlation between a single query and the target query is within a range of values corresponding to a synonym normalized query, the single query is determined as a synonym normalized query of the target query; If the degree of correlation between the single query and the target query is within a range of values corresponding to the correlation normalized query, the single query is determined as the correlation normalized query of the target query; If the degree of correlation between the single query and the target query is within the range of values corresponding to the synonym extended normalized query, the single query is determined as the extended normalized query of the target query.

전술한 방법은 세션에서의 정보에 기초하여 쿼리에 대해 정규화 처리를 시행한다. 세션은 하나의 쿼리 프로세스에서 상이한 사용자들의 쿼리의 라우트 변경 또는 전환을 기록할 수 있다. 사용자 행위의 객관적인 분석은 객관적이고 정확한 정규화 처리 결과를 얻을 수 있다. The above-described method performs normalization processing on the query based on the information in the session. A session can record route changes or conversions of queries of different users in one query process. Objective analysis of user behavior can result in objective and accurate normalization processing.

쿼리의 정규화 처리를 더 잘 시행하기 위해, 세션에서의 정보 이외에 다른 차원으로부터의 분석이 추가로 시행될 수 있다. 정규화 처리 결과는 세션에서의 정보 및 다른 차원을 통해 구한 결과에다 선호도를 제공함으로써 포괄적으로 구해질 수 있고, 그럼으로써 처리 결과의 객관성을 개선할 수 있다. 예를 들면, 사용자의 클릭 행위가 분석될 수 있거나 또는 시스템에 저장된 쿼리의 설명 정보가 분석될 수 있다. In order to better enforce the normalization of queries, analysis from other dimensions may be performed in addition to the information in the session. Normalized processing results can be comprehensively obtained by providing preference to information obtained from sessions and other dimensions, thereby improving the objectivity of the processing results. For example, user click behavior may be analyzed or descriptive information of a query stored in the system may be analyzed.

도 3은 본 개시의 제 2의 예시적인 실시예에 따른 다른 예시적인 쿼리 확장 방법을 도시한다. 제 1의 예시적인 실시예에서 (104)의 동작에 대응하여, (도 3에서 (308)에 대응하는) (1026) 앞에는 다음과 같은 동작들이 있다.3 illustrates another exemplary query extension method according to the second exemplary embodiment of the present disclosure. Corresponding to the operation of 104 in the first exemplary embodiment, there are the following operations before 1026 (corresponding to 308 in FIG. 3).

(302)에서, 사용자의 검색 로그에서 검색 결과의 클릭 정보가 구해진다. In 302, click information of a search result is obtained from a user's search log.

특정 쿼리를 이용하여 검색하고 검색 결과 집합을 구한 후, 사용자는 보통 검색 결과 집합에 있는 특정한 검색 결과를 클릭할 수 있다. 검색 결과의 클릭 정보는 클릭된 검색 결과, 클릭된 검색 결과의 제목 및 설명 정보 등을 포함할 수 있다. After searching using a specific query and obtaining a search result set, a user can usually click on a specific search result in the search result set. The click information of the search result may include a clicked search result, title and description information of the clicked search result, and the like.

(304)에서, 클릭 정보 내 타겟 쿼리를 포함하는 검색 결과가 추출된다. At 304, a search result is extracted that includes the target query in the click information.

검색 결과는 이 결과에서 판매되는 제품을 나타내는 쿼리를 포함할 수 있다. 그러므로, 각 검색 결과를 얻기 위한 쿼리는 클릭 정보에서 각 검색 결과의 제목 및 설명 정보를 분석함으로써 결정될 수 있다. 그런 다음 타겟 쿼리는 실제 요구에 따라서 추출될 수 있다. 예를 들면, 사용자는 "모바일 폰"이라는 쿼리를 사용하여 검색하고 일련의 검색 결과를 구하지만 한편으로 각 검색 결과를 얻기 위한 쿼리는 "아이폰™", "삼성™ 모바일 폰", "노키아™ 모바일폰" 등일 수 있다. 만일 "아이폰™"이 타겟 쿼리로서 사용되면, "아이폰™"을 포함하는 모든 검색 결과가 추출될 수 있다. The search results may include a query that indicates the products sold in these results. Therefore, the query for obtaining each search result can be determined by analyzing the title and description information of each search result in the click information. The target query can then be extracted according to actual needs. For example, a user can search using a query called "mobile phone" and get a series of search results, while a query to get each search result is "iPhone ™", "Samsung ™ mobile phone" and "Nokia ™ mobile". Phone "and the like. If "iPhone ™" is used as the target query, all search results including "iPhone ™" can be extracted.

(306)에서, 단일 쿼리와 타겟 쿼리 간의 클릭 유사성이 타겟 쿼리를 포함하는 모든 검색 결과의 총 클릭 수 및 타겟 쿼리를 포함하면서 단일 쿼리에 상응하는 검색 결과의 클릭 수에 따라서 결정된다. At 306, click similarity between the single query and the target query is determined according to the total number of clicks of all search results including the target query and the number of clicks of the search results corresponding to the single query while including the target query.

타겟 쿼리를 포함하면서 단일 쿼리에 상응하는 검색 결과의 클릭 수는 단일 쿼리를 사용하여 검색 한 후에 구한 검색 결과 집합에서 타겟 쿼리를 포함하는 검색 결과의 클릭 수를 말한다. The number of clicks on a search result that corresponds to a single query while including the target query is the number of clicks on the search result that include the target query in the set of search results obtained after searching using the single query.

타겟 쿼리를 포함하는 모든 검색 결과의 총 클릭 수는 모든 쿼리에 상응하면서 타겟 쿼리를 포함하는 모든 검색 결과의 총 클릭 수를 말한다. The total click count of all search results including the target query is the total click count of all search results including the target query while corresponding to all queries.

예를 들면, 만일 "아이폰™"이 타겟 쿼리이면, "모바일 폰" 및 "스마트 폰"이 제각기 검색을 위한 쿼리로서 사용될 수 있다. 하나의 검색 결과 집합이 "모바일 폰"이라는 쿼리를 이용하여 검색 후 구해지는데, 이 때 "아이폰™"을 포함하는 검색 결과의 클릭 수는 5이다. 단일 쿼리인 "모바일 폰"에 상응하면서 타겟 쿼리인 "아이폰™"을 포함하는 검색 결과의 클릭 수는 5이다. 또한, 다른 검색 결과 집합이 "스마트 폰"이라는 쿼리를 사용하여 검색 후 구해지는데, 이 때 "아이폰™"을 포함하는 검색 결과의 클릭 수는 20이다. 단일 쿼리인 "스마트 폰"에 상응하면서 타겟 쿼리인 "아이폰™"을 포함하는 검색 결과의 클릭 수는 20이다. 그러면 타겟 쿼리인 "아이폰™"을 포함하는 모든 검색 결과의 총 클릭 수는 25이다.For example, if "iPhone ™" is the target query, "mobile phone" and "smart phone" may be used as the query for searching respectively. One set of search results is obtained after searching using the query "mobile phone", where the number of clicks of the search results including "iPhone ™" is five. The number of clicks on a search result that includes the target query "iPhone ™" corresponding to a single query "mobile phone" is five. In addition, another set of search results is obtained after a search using a query called "smartphone", where the number of clicks of the search results including "iPhone ™" is 20. The number of clicks on a search result that includes a target query "iPhone ™" corresponding to a single query "smart phone" is 20. The total number of clicks for all search results containing the target query "iPhone ™" is then 25.

타겟 쿼리를 포함하는 모든 검색 결과의 총 클릭 수 및 단일 쿼리에 상응하면서 타겟 쿼리를 포함하는 클릭 수는 다음과 같이 처리될 수 있다. 단일 쿼리에 상응하면서 타겟 쿼리를 포함하는 클릭 수 대 타겟 쿼리를 포함하는 모든 검색 결과의 총 클릭 수의 비율이 계산된다.The total number of clicks of all search results containing the target query and the number of clicks including the target query while corresponding to a single query can be processed as follows. The ratio of the number of clicks containing the target query to the total number of clicks of all search results containing the target query, corresponding to a single query, is calculated.

이러한 예시적인 실시예는 두 가지 차원을 고려한다. 그래서, 두 가지 차원 하에서 유사성 값은 단일 쿼리와 타겟 쿼리 간의 상관 정도가 결정될 때 고려하는 것이 필요하다. 따라서, 도 3의 (308)에 대응하는 도 2에서 (1026)의 동작은 득표 유사성 정도 및 클릭 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정함에 따라서 대응적으로 수정될 필요가 있다. This exemplary embodiment contemplates two dimensions. Thus, under two dimensions, similarity values need to be considered when the degree of correlation between a single query and a target query is determined. Thus, the operations of 1010 in FIG. 2 corresponding to 308 of FIG. 3 need to be modified correspondingly according to determining the degree of correlation between the single query and the target query according to the degree of vote similarity and the degree of click similarity.

예를 들면, 득표 유사성 정도와 클릭 유사성 정도 사이에서 더 큰 값이 단일 쿼리와 타겟 쿼리 간의 상관 정도로서 사용될 수 있다. For example, a larger value between vote similarity and click similarity may be used as the degree of correlation between a single query and a target query.

다른 예로, 득표 유사성 정도의 가중치 및 클릭 유사성 정도의 가중치가 각기 결정된다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 클릭 유사성 정도 뿐만 아니라 이들 각각의 가중치를 하나 이상의 미리 정해진 규칙에 따라 계산함으로써 구해진다. As another example, the weight of the degree of vote similarity and the weight of the degree of click similarity are respectively determined. The degree of correlation between a single query and the target query is obtained by calculating not only the degree of vote similarity and the degree of click similarity, but also their respective weights according to one or more predetermined rules.

도 4는 본 개시의 제 3의 예시적인 실시예에 따른 다른 예시적인 쿼리 확장 방법을 도시한다. 제 1의 예시적인 실시예에서 (104)의 단계 및 제 2의 예시적인 실시예에서 대응하는 동작에 대하여, (도 4의 (408)에 대응하는) 도 2에서 (1026)의 동작 앞에는 다음과 같은 동작들이 있다.4 illustrates another example query extension method according to the third example embodiment of the present disclosure. For the operation of 104 in the first exemplary embodiment and the corresponding operation in the second exemplary embodiment, the operation of 1026 in FIG. 2 (corresponding to 408 of FIG. 4) is as follows. There are the same operations.

(402)에서, 서버에 저장된 판매자 데이터가 획득된다. 판매자 데이터는 판매자가 제품을 설명할 때 정해진 제품 설명 정보를 말한다. At 402, seller data stored in the server is obtained. Seller data refers to product description information specified when the seller describes the product.

(404)에서, 판매자 데이터가 분석되고 쿼리뿐만 아니라 쿼리의 특성 용어가 추출된다. 쿼리의 특성 용어는 쿼리의 특성을 설명하는 용어를 말한다. At 404, seller data is analyzed and feature terms of the query as well as the query are extracted. The characteristic term of a query is a term that describes the characteristic of the query.

(406)에서, 특성 유사성 정도가 단일 쿼리의 특성 용어 및 타겟 쿼리의 특성 용어에 따라서 결정된다. At 406, the degree of characteristic similarity is determined according to the characteristic term of the single query and the characteristic term of the target query.

예를 들면, 특성 유사성 정도는 다음과 같이 결정될 수 있다. 즉,For example, the degree of property similarity can be determined as follows. In other words,

각 특성 용어의 특성 값이 결정된다. 단일 쿼리와 타겟 쿼리 간의 특성 유사성은 특성 값에 따라서 결정된다. The characteristic value of each characteristic term is determined. Feature similarity between a single query and a target query is determined by the property value.

각 특성 용어의 특성 값은 특성 용어와 상응하는 쿼리 간의 상호 정보를 계산함으로써 획득질 수 있다. 예시적인 계산 공식은 다음과 같다. The characteristic value of each characteristic term can be obtained by calculating mutual information between the characteristic term and the corresponding query. An exemplary calculation formula is as follows.

CP는 쿼리를 나타내며 Word는 설명 용어를 나타낸다. P( CP ) 및 P( Word )는 각기 두 용어 각각이 데이터 집합에서 독립적으로 출현하는 확률을 나타낸다. P(CP&Word)는 두 용어가 데이터 집합에서 출현하는 확률을 나타낸다. C( CP ) 및 C(Word)는 각기 두 용어 각각이 데이터 집합에서 독립적으로 출현하는 정보 개수를 타나낸다. C( CP & Word )는 두 용어가 데이터 집합에서 함께 출현하는 정보 개수를 나타낸다. N은 데이터 집합에서 총 정보 개수를 나타낸다. CP stands for query and Word stands for descriptive term. P ( CP ) and P ( Word ) represent the probability of each of the two terms appearing independently in the data set. P (CP & Word) represents the probability that two terms appear in the data set. C ( CP ) and C (Word) represent the number of pieces of information each of which the two terms appear independently in the data set. C ( CP & Word ) refers to the number of information that two terms appear together in a data set. N represents the total number of information in the data set.

특성 값에 따라서 단일 쿼리와 타겟 쿼리 간의 특성 유사성 정도를 계산하는 것은 특성 유사성 정도를 구하기 위하여 단일 쿼리 및 타겟 쿼리의 각 차원의 특성 용어들 간의 유사성 정도를 계산하는 것으로 간주될 수 있다. 예를 들어, 특성 유사성 정도를 계산하기 위해 코사인 유사성 정도가 사용될 수 있다. 최종 값이 높을 수록, 둘 간의 유사성 정도는 높아진다. Calculating the degree of characteristic similarity between a single query and a target query according to the characteristic value may be considered to calculate the degree of similarity between characteristic terms of each dimension of the single query and the target query to obtain the degree of characteristic similarity. For example, the cosine similarity degree can be used to calculate the degree of characteristic similarity. The higher the final value, the higher the degree of similarity between the two.

전술한 설명에 따르면, 판매자 데이터가 새로운 차원으로서 추가되므로, 추가된 차원은 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정하면서 고려되어야 한다. According to the foregoing description, since the seller data is added as a new dimension, the added dimension should be considered while determining the degree of correlation between the single query and the target query.

새로운 차원은 제 1 실시예와 조합되거나 제 2 실시예와 조합될 수 있다는 것을 납득할 수 있다. 즉, 두 가지 차원 및 세 가지 차원이 모두 고려될 수 있다. 새로운 차원이 제 1 실시예와 조합될 때, (408)의 동작에 상응하는 제 1 실시예에서 (1026)의 동작은 대응적으로 다음과 같이 수정될 수 있다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도 및 특성 유사성 정도에 따라서 결정된다. It can be appreciated that the new dimension can be combined with the first embodiment or with the second embodiment. That is, both dimensions and all three dimensions can be considered. When the new dimension is combined with the first embodiment, the operation of 1026 in the first embodiment corresponding to the operation of 408 may be correspondingly modified as follows. The degree of correlation between a single query and the target query is determined according to the degree of vote similarity and the degree of feature similarity.

세 가지 차원이 조합될 때, (408)의 동작에 상응하는 제 1 실시예에서 (1026)의 동작은 대응적으로 다음과 같이 수정될 수 있다. 단일 쿼리와 타겟 쿼리 간의 상관 정도는 득표 유사성 정도, 클릭 유사성 정도, 및 특성 유사성 정도에 따라서 결정된다.When the three dimensions are combined, the operation of 1026 in the first embodiment corresponding to the operation of 408 may be correspondingly modified as follows. The degree of correlation between a single query and the target query is determined according to the degree of vote similarity, the degree of click similarity, and the degree of characteristic similarity.

제 2 실시예의 유사한 방법이 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정하기 위해 참조될 수 있다. 예를 들면, 유사성 값이 높은 것은 상관 정도로서 선택될 수 있다. 대안으로, 각 유사성 정도의 가중치가 제각기 결정될 수 있고, 그런 다음 선형 피팅(linear fitting)과 같은 사전-결정된 방법이 가중치 및 값에 기초한 계산을 위해 사용될 수 있다. 간략화를 목적으로, 본 출원에서 세부 내용은 논의되지 않는다. Similar methods of the second embodiment may be referenced to determine the degree of correlation between a single query and a target query. For example, a high similarity value can be selected as the degree of correlation. Alternatively, the weight of each degree of similarity can be determined separately, and then a pre-determined method such as linear fitting can be used for the calculation based on the weight and the value. For purposes of simplicity, details are not discussed in this application.

다른 예로, 정규화 처리가 시행될 때 상관 정도 이외에도 다른 요인이 더 고려될 수 있고, 그럼으로써 처리 결과가 더욱 정확해질 수 있다. 예를 들면, 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도, 또는 단일 쿼리와 타겟 쿼리 간의 카테고리 유사성 정도 등이 추가로 고려될 수 있다. As another example, in addition to the degree of correlation, other factors may be considered when the normalization process is performed, so that the processing result may be more accurate. For example, the degree of semantic similarity between a single query and the target query, or the degree of category similarity between the single query and the target query may be further considered.

그러므로, 제 1의 예시적인 실시예, 제 2의 예시적인 실시예, 및/또는 제 3의 예시적인 실시예에서 동작(1028) 앞에는 다음과 같은 동작들이 수행될 수 있다.Therefore, the following operations may be performed before operation 1028 in the first exemplary embodiment, the second exemplary embodiment, and / or the third exemplary embodiment.

단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도가 결정된다. 예를 들면, 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도는 편집 거리에 따라서 결정될 수 있다. 편집 거리는 하나의 문자열을 다른 문자열(또는 쿼리)로 변환하는 최소 개수의 편집 동작을 말한다. 레벤시타인 거리(a Levenshtein distance)는 편집 거리의 예이다. 레벤시타인에서 편집 동작은 하나의 문자를 다른 문자로 대체하고, 하나의 문자를 삽입하고 하나의 문자를 삭제하는 동작을 포함한다. 두 문자열 간의 편집 거리는 다이나믹 플래닝 방법(a dynamic planning method)을 통해 계산될 수 있다. 편집 거리가 계산된 후, 시맨틱 유사성이 상관 정도와 동일한 양적 수준을 가질 수 있도록 정규화 처리가 편집 거리에 적용되어 시맨틱 유사성 정도를 구하며, 그럼으로써 후속 처리의 편의성이 제공된다. The degree of semantic similarity between a single query and the target query is determined. For example, the degree of semantic similarity between a single query and a target query may be determined according to the editing distance. The edit distance is the minimum number of editing operations that convert one string to another string (or query). A Levenshtein distance is an example of an editing distance. Editing operations in Levensitine include replacing one character with another character, inserting one character, and deleting one character. The editing distance between the two strings may be calculated through a dynamic planning method. After the edit distance is calculated, a normalization process is applied to the edit distance to obtain a degree of semantic similarity so that semantic similarity can have a quantitative level equal to the degree of correlation, thereby providing convenience of subsequent processing.

대안으로, 제 1의 예시적인 실시예, 제 2의 예시적인 실시예, 및/또는 제 3의 예시적인 실시예에서 동작(1028) 앞에는 다음과 같은 동작들이 수행될 수 있다.Alternatively, the following operations may be performed before operation 1028 in the first exemplary embodiment, the second exemplary embodiment, and / or the third exemplary embodiment.

단일 쿼리와 타겟 쿼리 간의 카테고리 유사성이 결정된다.Category similarity between a single query and a target query is determined.

예를 들면, 전자 상거래 웹사이트에서, 제품을 분류하고 관리하기 위하여 카테고리가 미리 정해질 수 있다. 각각의 쿼리는 이 쿼리가 속하는 카테고리를 갖는다. 일반적으로, 카테고리는 복수의 레벨로 분리될 수 있다. 즉, 하나의 제 1 레벨 카테고리는 복수의 제 2 레벨 카테고리를 포함할 수 있으며, 각각의 제 2 레벨 카테고리는 복수의 제 3 레벨 카테고리 등으로 더 분리될 수 있다. 카테고리 유사성 정도는 쿼리들이 동일한 제 1 레벨 카테고리, 제 2 레벨 카테고리, 제 3 레벨 카테고리 등에 속하는지 여부를 판단함으로써 결정될 수 있다. 유사하게, 카테고리 유사성 정도가 상관 정도와 동일한 양적 수준을 갖도록 결정 프로세스 동안 정규화 처리가 카테고리 유사성 정도에 적용된다. For example, in an e-commerce website, categories may be predefined to classify and manage products. Each query has a category to which it belongs. In general, categories can be divided into multiple levels. That is, one first level category may include a plurality of second level categories, and each second level category may be further divided into a plurality of third level categories and the like. The degree of category similarity may be determined by determining whether the queries belong to the same first level category, second level category, third level category, or the like. Similarly, normalization processing is applied to the degree of category similarity during the decision process such that the degree of category similarity has the same quantitative level as the degree of correlation.

하나 이상의 새로운 요인이 고려되므로, 전술한 세 개의 실시예에서 (1028)의 동작은 대응적으로 수정될 수 있다.As one or more new factors are considered, the operation of 1028 in the three embodiments described above may be correspondingly modified.

만일 상관 정도 및 시맨틱 유사성 정도가 고려된다면, (1028)의 동작은 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 시맨틱 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정하는 것으로 수정될 수 있다.If the degree of correlation and the degree of semantic similarity are considered, the operation of 1028 may be modified to determine the normalized query of the target query according to the degree of correlation and the degree of semantic similarity between the single query and the target query.

만일 상관 정도 및 카테고리 유사성 정도가 고려된다면, (1028)의 동작은 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 카테고리 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정하는 것으로 수정될 수 있다.If degree of correlation and category similarity are considered, the operation of 1028 may be modified to determine the normalized query of the target query according to the degree of correlation and category similarity between the single query and the target query.

만일 세가지가 모두 동시에 고려된다면, (1028)의 동작은 단일 쿼리와 타겟 쿼리 간의 상관 정도, 시맨틱 유사성 정도, 및 카테고리 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정하는 것으로 수정될 수 있다.If all three are considered simultaneously, the operation of 1028 may be modified to determine the normalized query of the target query according to the degree of correlation, degree of semantic similarity, and category similarity between the single query and the target query.

예를 들면, 세부화된 처리 동안, 선형 피팅이 두 가지 또는 세 가지 요인에 적용되어 단일 쿼리 및 타겟 쿼리의 정규화 점수를 구할 수 있으며, 타겟 쿼리의 정규화된 쿼리가 정규화 점수에 따라서 결정된다. For example, during the refinement process, linear fit can be applied to two or three factors to obtain the normalization score of a single query and the target query, and the normalized query of the target query is determined according to the normalization score.

예를 들면, 제 1의 예시적인 실시예를 참조하여, 임계치가 시행을 위해 사용될 수 있다. 즉, 정규화된 쿼리의 임계치가 미리 정해진다. 정규화 점수가 임계치를 초과하는 경우 단일 쿼리가 타겟 쿼리의 정규화된 쿼리로서 결정된다. 더욱이, 정규화된 쿼리는 더 분류될 수 있다. 즉, 상이한 카테고리에 상응하는 상이한 값이 제각기 설정된다. 정규화 정수가 특정 값 범위에 있을 때, 그 값 범위에 상응하는 카테고리는 정규화된 쿼리의 세부화된 카테고리로서 결정된다. 제 1 실시예에서의 특정 방법이 참조될 수 있다. 간략화를 목적으로, 본 출원에서 세부 사항은 논의되지 않는다. For example, with reference to the first exemplary embodiment, thresholds can be used for enforcement. In other words, the threshold of a normalized query is predetermined. If the normalization score exceeds the threshold, a single query is determined as the normalized query of the target query. Moreover, normalized queries can be further classified. In other words, different values corresponding to different categories are set respectively. When a normalized integer is in a specific value range, the category corresponding to that value range is determined as the detailed category of the normalized query. Reference may be made to the specific method in the first embodiment. For purposes of simplicity, details are not discussed in this application.

어떤 예시적인 실시예의 어떤 특정한 동작 앞에 추가된 전술한 동작들에 대하여, 그렇게 추가된 동작들은 그 어떤 특정 동작과 상호 독립적이므로, 그렇게 추가된 동작들은 특정 동작에 앞서 시행될 수 있거나, 또는 특정 동작 앞의 그러한 동작들과 동시에 또는 특정 동작 앞의 그러한 동작에 앞서 시행될 수 있다는 것이 납득될 수 있다. 본 개시는 본 출원에서 임의의 제한을 부과하지 않는다. 예를 들면, 제 2의 예시적인 실시예에서 추가되는 (302)부터 (306)까지의 동작들에 대하여, 이 동작들은 (1024)의 동작과 (1028)의 동작 사이에서 시행될 수 있다. 대안으로, 이 동작들은 (1020)부터 (1024)까지의 동작과 동시에 시행될 수 있다. 대안으로, 이 동작들은 (1020)의 단계에 앞서 시행될 수 있다. 본 개시는 본 출원에서 임의의 제한을 부과하지 않는다. 다른 예의 실시예가 유사한 처리를 가질 수 있으며, 이는 본 출원에서 상세히 설명되지 않는다.With respect to the above-described operations added before any particular operation of any exemplary embodiment, the operations so added are independent of any particular operation, and so the added operations may be performed before or before the specific operation. It may be appreciated that the operation may be performed simultaneously with or prior to such an operation. This disclosure does not impose any limitation on this application. For example, for the operations 302 to 306 added in the second exemplary embodiment, these operations may be performed between the operation of 1024 and the operation of 1028. Alternatively, these operations may be performed concurrently with the operations 1020 through 1024. Alternatively, these operations may be performed prior to the step of 1020. This disclosure does not impose any limitation on this application. Other example embodiments may have similar processing, which is not described in detail herein.

도 5는 본 개시에 따른 제 1의 예시적인 쿼리 확장 시스템(500)을 도시한다. 시스템(500)은 하나 이상의 프로세서(들)(502) 및 메모리(504)를 포함할 수 있다. 메모리(504)는 컴퓨터-판독가능한 매체의 일 예이다. 본 출원에서 사용된 바와 같이, "컴퓨터-판독가능한 매체"는 컴퓨터 저장 매체 및 통신 매체를 포함한다.5 illustrates a first example query extension system 500 in accordance with the present disclosure. System 500 may include one or more processor (s) 502 and memory 504. Memory 504 is an example of a computer-readable medium. As used in this application, “computer-readable media” includes computer storage media and communication media.

컴퓨터 저장 매체는 컴퓨터-실행 명령어, 데이터 구조, 프로그램 모듈, 또는 다른 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 제거가능한 및 제거가능하지 않은 매체를 포함한다. 이에 반해, 통신 매체는 컴퓨터-판독가능한 명령어, 데이터 구조, 프로그램 모듈, 또는 다른 데이터를 캐리어 웨이브와 같이 변조된 데이터 신호로 구현할 수 있다. 본원에서 정의된 바와 같이, 컴퓨터 저장 매체는 통신 매체를 포함하지 않는다. 메모리(204)는 내부에 프로그램 유닛 또는 모듈 및 프로그램 데이터를 저장할 수 있다.Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-executable instructions, data structures, program modules, or other data. In contrast, a communication medium may implement computer-readable instructions, data structures, program modules, or other data into modulated data signals such as carrier waves. As defined herein, computer storage media does not include communication media. The memory 204 may store a program unit or module and program data therein.

도 5의 예에서, 메모리(504)는 그 안에 쿼리 입력 모듈(506), 정규화된 쿼리 결정 모듈(508), 및 쿼리 확장 모듈(510)을 저장할 수 있다. 쿼리 입력 모듈(506)은 사용자에 의해 입력된 쿼리를 획득한다. 정규화된 쿼리 결정 모듈(508)은 쿼리에 따라서 쿼리의 정규화된 쿼리를 결정한다. 쿼리 확장 모듈(510)은 정규화된 쿼리를 쿼리의 확장 용어로서 사용하여 쿼리 확장을 시행한다. In the example of FIG. 5, memory 504 may store query input module 506, normalized query determination module 508, and query extension module 510 therein. The query input module 506 obtains a query entered by the user. The normalized query determination module 508 determines the normalized query of the query according to the query. The query expansion module 510 enforces query expansion using the normalized query as an extension term of the query.

도 6은 예시적인 정규화된 쿼리 결정 모듈(600)을 도시한다. 예시적인 정규화된 쿼리 결정 모듈(600)은 세션 정보 획득 모듈(602), 쿼리 득표 계산 모듈(604), 득표 유사성 정도 결정 모듈(606), 상관 정도 결정 모듈(608), 및 정규화된 쿼리 결정 모듈(610)을 포함할 수 있다. 6 illustrates an example normalized query determination module 600. Exemplary normalized query determination module 600 includes session information acquisition module 602, query vote calculation module 604, vote similarity determination module 606, correlation degree determination module 608, and normalized query determination module. 610 may be included.

세션 정보 획득 모듈(602)은 사용자의 검색 로그로부터 세션 정보를 획득한다. The session information obtaining module 602 obtains session information from the search log of the user.

쿼리 득표 계산 모듈(604)은 단일의 세션에서 출현하는 모든 쿼리를 획득하고, 각 쿼리 마다 득표를 카운트한다. 단일 세션에서, 각 쿼리의 출현 시퀀스에 따라서, 특정 쿼리 앞에 출현하는 임의의 쿼리는 (그 특정 쿼리에 대해) 하나의 득표로서 카운트된다. Query vote calculation module 604 obtains all queries appearing in a single session and counts votes for each query. In a single session, depending on the appearance sequence of each query, any query that appears before a particular query is counted as one vote (for that particular query).

득표 유사성 정도 결정 모듈(606)은 모든 세션에서 타겟 쿼리의 총 득표수 및 타겟 쿼리에 대한 단일 쿼리의 득표수에 따라서 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도를 결정한다. 예를 들면, 득표 유사성 정도 결정 모듈(606)은 기수 및 가중치 결정 유닛, 점수 계산 유닛, 및 비율 계산 유닛을 포함할 수 있다. 기수 및 가중치 결정 유닛은 타겟 쿼리에 대한 각 득표의 가중치 및 기수를 결정한다. 점수 계산 유닛은 가중치 및 기수에 따라서 각 득표의 점수를 계산한다. 비율 계산 유닛은 타겟 쿼리에 대한 단일 쿼리의 총 득표 점수 대 타겟 쿼리에 대한 모든 쿼리의 총 득표 점수의 비율을 단일 쿼리와 타겟 쿼리 간의 득표 유사성 정도로서 사용한다. The vote similarity determining module 606 determines the degree of vote similarity between the single query and the target query according to the total number of votes of the target query and the number of votes of the single query for the target query in all sessions. For example, the vote similarity degree determination module 606 may include an odd and weight determination unit, a score calculation unit, and a ratio calculation unit. The cardinal and weight determination unit determines the weight and cardinality of each vote for the target query. The score calculation unit calculates the score of each vote according to the weight and the cardinality. The ratio calculation unit uses the ratio of the total vote score of a single query to the target query to the total vote score of all the queries for the target query as the degree of vote similarity between the single query and the target query.

상관 정도 결정 모듈(608)은 득표 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다. The degree of correlation determination module 608 determines the degree of correlation between the single query and the target query according to the degree of vote similarity.

정규화된 쿼리 결정 모듈(610)은 단일 쿼리와 타겟 쿼리 간의 상관 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정한다. The normalized query determination module 610 determines the normalized query of the target query according to the degree of correlation between the single query and the target query.

예를 들면, 정규화된 쿼리 결정 모듈(610)은 정규화된 쿼리의 임계치를 설정하고 만일 단일 쿼리와 타겟 쿼리 간의 상관 정도 값이 정규화된 쿼리의 임계치를 초과하는 경우 단일 쿼리를 타겟 쿼리의 정규화된 쿼리로서 결정하는 정규화된 쿼리 임계치 설정 유닛을 포함할 수 있다. For example, the normalized query determination module 610 sets a threshold of the normalized query and if the correlation degree value between the single query and the target query exceeds the threshold of the normalized query, the normalized query of the target query. It may include a normalized query threshold setting unit to determine as.

다른 예로, 정규화된 쿼리 결정 모듈(610)은 또한 정규화된 쿼리 카테고리 분류 유닛, 값 범위 설정 유닛, 및 카테고리 결정 유닛을 포함할 수 있다. 정규화된 쿼리 카테고리 분류 유닛은 정규화된 쿼리 카테고리를 동의어 정규화된 쿼리, 상관 정규화된 쿼리, 및 확장 정규화된 쿼리로 분리한다. 값 범위 설정 유닛은 상관 정도 값에 따라서 세 가지 카테고리의 값 범위를 내림차순으로 설정한다. 카테고리 결정 유닛은 단일 쿼리와 타겟 쿼리 간의 상관 정도가 속하는 값 범위에 상응하는 카테고리를 단일 쿼리 및 타겟 쿼리의 세부화된 카테고리로서 결정한다. As another example, the normalized query determination module 610 may also include a normalized query category classification unit, a value range setting unit, and a category determination unit. The normalized query category classification unit separates the normalized query categories into synonym normalized queries, correlated normalized queries, and extended normalized queries. The value range setting unit sets the value range of the three categories in descending order according to the correlation degree value. The category determining unit determines the category corresponding to the range of values to which the degree of correlation between the single query and the target query belongs as a detailed category of the single query and the target query.

도 7은 제 2의 예시적인 정규화된 쿼리 결정 모듈(700)을 포함하는 제 2의 예시적인 쿼리 확장 시스템을 도시한다. 세션 정보 획득 모듈(602), 쿼리 득표 계산 모듈(604), 득표 유사성 정도 결정 모듈(606), 상관 정도 결정 모듈(608), 및 정규화된 쿼리 결정 모듈(610)에 추가하여, 정규화된 쿼리 결정 모듈(700)은 클릭 정보 획득 모듈(702), 검색 결과 추출 모듈(704), 및 클릭 유사성 정도 결정 모듈(706)을 더 포함한다. 클릭 정보 획득 모듈(702)은 사용자의 검색 로그로부터 검색 결과의 클릭 정보를 획득한다. 검색 결과 추출 모듈(704)은 클릭 정보로부터 타겟 쿼리를 포함하는 검색 결과를 추출한다. 클릭 유사성 정도 결정 모듈(706)은 타겟 쿼리를 포함하는 검색 결과의 총 클릭 수 및 타겟 쿼리를 포함하면서 단일 쿼리에 대응하는 검색 결과의 총 클릭 수에 따라서 단일 쿼리와 타겟 쿼리 간의 클릭 유사성 정도를 결정한다. 상관 정도 결정 모듈(610)은 득표 유사성 정도 및 클릭 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다. 7 illustrates a second example query extension system that includes a second example normalized query determination module 700. In addition to the session information acquisition module 602, the query vote calculation module 604, the vote similarity determination module 606, the correlation degree determination module 608, and the normalized query determination module 610, normalized query determination. The module 700 further includes a click information acquisition module 702, a search result extraction module 704, and a click similarity degree determination module 706. The click information obtaining module 702 obtains click information of a search result from a user's search log. The search result extraction module 704 extracts a search result including the target query from the click information. The click similarity determination module 706 determines the degree of click similarity between a single query and the target query according to the total number of clicks of the search results including the target query and the total number of clicks of the search results corresponding to a single query while including the target query. do. The degree of correlation determination module 610 determines a degree of correlation between the single query and the target query according to the degree of vote similarity and the degree of click similarity.

도 8은 제 3의 예시적인 정규화된 쿼리 결정 모듈(800)을 포함하는 제 3의 예시적인 쿼리 확장 시스템을 도시한다. 세션 정보 획득 모듈(602), 쿼리 득표 계산 모듈(604), 득표 유사성 정도 결정 모듈(606), 상관 정도 결정 모듈(608), 및 정규화된 쿼리 결정 모듈(610)에 추가하여, 정규화된 쿼리 결정 모듈(800)은 판매자 데이터 획득 모듈(802), 데이터 분석 모듈(804), 및 특성 유사성 정도 결정 모듈(806)을 더 포함한다.8 illustrates a third example query extension system that includes a third example normalized query determination module 800. In addition to the session information acquisition module 602, the query vote calculation module 604, the vote similarity determination module 606, the correlation degree determination module 608, and the normalized query determination module 610, normalized query determination. Module 800 further includes merchant data acquisition module 802, data analysis module 804, and feature similarity degree determination module 806.

판매자 데이터 획득 모듈(802)은 서버에 저장된 판매자 데이터를 획득한다. 판매자 데이터는 판매자가 제품을 설명할 때 정해진 제품 설명 정보를 포함한다. The seller data acquisition module 802 acquires seller data stored in the server. Seller data includes product description information specified when the seller describes the product.

데이터 분석 모듈(804)은 판매자 데이터를 분석하고 판매자 데이터로부터 쿼리뿐만 아니라 쿼리의 특성 용어를 추출한다. The data analysis module 804 analyzes seller data and extracts query terms as well as feature terms from the seller data.

특성 유사성 정도 결정 모듈(806)은 단일 쿼리 및 타겟 쿼리의 특성 용어에 따라서 특성 유사성을 결정한다. 예를 들면, 특성 유사성 정도 결정 모듈(806)은 각 특성 용어의 특성 값을 계산하는 특성 값 계산 유닛을 포함할 수 있다. 특성 값은 특성 용어 및 그의 상응하는 쿼리의 상호 정보에 따라서 계산된다. The degree of characteristic similarity determining module 806 determines characteristic similarity according to characteristic terms of a single query and a target query. For example, the characteristic similarity degree determining module 806 may include a characteristic value calculation unit that calculates the characteristic value of each characteristic term. The characteristic value is calculated according to the mutual information of the characteristic term and its corresponding query.

상관 정도 결정 모듈(608)은 득표 유사성 정도 및 특성 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정한다. The degree of correlation determination module 608 determines the degree of correlation between the single query and the target query according to the degree of vote similarity and the degree of characteristic similarity.

제 2의 예시적인 실시예 및 제 3의 예시적인 실시예에서 기술된 관련 데이터는 처리를 위해서 동시에 추가 고려될 수 있다는 것이 납득될 수 있다. 즉, 상관 정도 결정 모듈(608)은 추가로 득표 유사성 정도, 클릭 유사성 정도, 및 특성 유사성 정도에 따라서 단일 쿼리와 타겟 쿼리 간의 상관 정도를 결정할 수 있다. 상세한 상관 정도를 결정하는 동안, 셋 중 가장 큰 것이 상관 정도로서 사용될 수 있다. 대안으로, 상관 정도로서 최종 값을 획득하기 위해 선형 피팅이 세 개의 유사성 정도에 적용될 수 있다. It can be appreciated that the relevant data described in the second and third exemplary embodiments can be further considered simultaneously for processing. That is, the correlation degree determining module 608 may further determine the degree of correlation between the single query and the target query according to the degree of vote similarity, the degree of click similarity, and the degree of characteristic similarity. While determining the degree of correlation in detail, the largest of the three may be used as the degree of correlation. Alternatively, linear fit can be applied to three degrees of similarity to obtain the final value as the degree of correlation.

득표 유사성 정도가 클릭 유사성 정도 또는 특성 유사성 정도와 조합될 때, 둘 중에서 더 큰 것이 상관 정도로서 선택될 수 있다는 것이 납득 가능하다. 대안으로, 상관 정도로서 최종 값을 구하기 위해 선형 피팅이 두 가지 유사성 정도에 적용될 수 있다. When the degree of vote similarity is combined with the degree of click similarity or the degree of characteristic similarity, it is convincing that the larger of the two can be selected as the degree of correlation. Alternatively, linear fit can be applied to two degrees of similarity to find the final value as the degree of correlation.

다른 예로, 예시적인 시스템들 중 하나 이상은 단일 쿼리와 타겟 쿼리 간의 시맨틱 유사성 정도 및/또는 카테고리 유사성 정도를 각기 결정하는 시맨틱 유사성 정도 결정 모듈 및/또는 카테고리 유사성 정도 결정 모듈을 더 포함할 수 있다. As another example, one or more of the example systems may further include a semantic similarity degree determining module and / or a category similarity degree determining module that respectively determine a degree of semantic similarity and / or category similarity between a single query and a target query.

이에 상응하여, 정규화된 쿼리 결정 모듈(610)은 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 시맨틱 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정할 수 있거나, 단일 쿼리와 타겟 쿼리 간의 상관 정도 및 카테고리 유사성 정도에 따라 타겟 쿼리의 정규화된 쿼리를 결정할 수 있거나, 또는 단일 쿼리와 타겟 쿼리 간의 상관 정도, 시맨틱 유사성 정도, 및 카테고리 유사성 정도에 따라서 타겟 쿼리의 정규화된 쿼리를 결정할 수 있다. Correspondingly, the normalized query determination module 610 may determine the normalized query of the target query according to the degree of correlation and semantic similarity between the single query and the target query, or the degree of correlation and category similarity between the single query and the target query. The normalized query of the target query may be determined or the normalized query of the target query may be determined according to the degree of correlation, the degree of semantic similarity, and the category similarity between the single query and the target query.

예를 들면, 시맨틱 유사성 정도 결정 모듈은 편집 거리 결정 유닛 및 정규화 처리 유닛을 포함할 수 있다. 편집 거리 결정 유닛은 단일 쿼리와 타겟 쿼리 간의 편집 거리를 결정한다. 편집 거리는 하나의 용어를 다른 용어로 변환하는 최소 개수의 편집 동작을 말한다. 정규화 처리 유닛은 편집 거리의 정규화 처리를 시행하여 상관 정도와 동일한 양적 수준을 갖는 시맨틱 유사성 정도를 획득한다. For example, the semantic similarity degree determination module may include an edit distance determination unit and a normalization processing unit. The edit distance determining unit determines the edit distance between a single query and the target query. The editing distance refers to the minimum number of editing operations for converting one term into another term. The normalization processing unit performs the normalization processing of the editing distance to obtain a degree of semantic similarity having the same quantitative level as the correlation degree.

본 개시에서 예시적인 모든 실시예들이 계속하여 기술되고 있다. 각각의 예시적인 실시예는 다른 예의 실시예와 상이함을 강조하고 있다. 예시적인 실시예들의 동일 또는 유사한 부분은 상호 참조될 수 있다. 예시적인 시스템 실시예들은 기본적으로 예시적인 방법 실시예와 유사하기 때문에, 예시적인 시스템 실시예들은 상세히 기술되지 않는다. 관련 부분은 예시적인 실시예들의 부분으로 참조될 수 있다. All example embodiments are described in the present disclosure. Each exemplary embodiment emphasizes differentness from the other example embodiments. The same or similar parts of the example embodiments may be cross referenced. Because the example system embodiments are basically similar to the example method embodiments, the example system embodiments are not described in detail. Related parts may be referred to as part of exemplary embodiments.

본 개시의 몇몇 예시적인 쿼리 확장 방법 및 시스템은 본 출원에서 상세히 기술되지 않는다. 본 개시는 몇몇 예를 이용하여 본 기술의 원리 및 구현을 기술한다. 예시적인 실시예들은 그저 본 개시의 방법 및 핵심 개념을 이해하는데 도움을 주려 의도된 것이다. 한편, 본 기술에서 통상의 지식을 가진 자들이라면 본 개시의 개념에 따라서 예시적인 실시예 또는 애플리케이션을 수정하거나 변경할 수 있으며, 이는 여전히 본 개시의 보호 범위에 속할 것이다. 본 개시는 본 기술에 대한 제한으로 이해되어서는 안될 것이다. Some example query extension methods and systems of the present disclosure are not described in detail herein. The present disclosure describes the principles and implementation of the present technology using some examples. Example embodiments are merely intended to assist in understanding the methods and key concepts of the present disclosure. On the other hand, those of ordinary skill in the art may modify or change the exemplary embodiments or applications in accordance with the concepts of the present disclosure, which will still fall within the protection scope of the present disclosure. This disclosure should not be understood as a limitation on the present technology.

Claims

Obtaining a target query,
Determining a normalized query according to the obtained target query,
The determining step,
Obtaining session information from a search log;
Determining a vote similarity degree between a single query and the target query based on the session information, wherein determining the vote similarity degree comprises:
Obtaining all queries appearing in a single session, including the target query;
Calculating a vote count for each target query, wherein the calculating includes counting the single query appearing before the target query in the single session as a vote from the single query for the target query; and ,
Determining a degree of vote similarity between the single query and the target query according to the calculated number of votes;
Determining a correlation degree between the single query and the target query based in part on the degree of vote similarity;
Determining the normalized query based in part on the degree of correlation between the single query and the target query.
Way.

The method of claim 1,
Using the normalized query as an extended term of the obtained target query;
Way.

The method of claim 1,
The determining the degree of similarity between votes between the single query and the target query according to the calculated number of votes is based on the total number of votes of the target query and the number of votes of the single query for the target query in every session.
Way.

The method of claim 3, wherein
Computing the total number of votes of the target query in all sessions, wherein calculating the total number of votes of the target query,
Acquiring one or more sessions comprising the target query;
Counting the number of votes of the target query in each session;
Accumulating the number of votes of the target query in each session to obtain a total number of votes of the target query;
Way.

The method of claim 3, wherein
Computing the number of votes of the single query for the target query, wherein calculating the number of votes of the single query for the target query,
Acquiring one or more sessions including the single query and the target query;
Determining whether the single query provides a vote for the target query in each session;
In response to determining that the single query provides a vote for the target query in each session, selecting each session;
Counting the number of selected sessions to obtain a vote count of the single query for the target query;
Way.

The method of claim 1,
Determining the degree of similarity between votes between the single query and the target query according to the calculated number of votes may be performed by comparing the ratio of the total number of votes of the single query to the total number of votes of the target query to the target query. Using as a degree of vote similarity between the target queries;
Way.

The method of claim 1,
Determining the degree of vote similarity between the single query and the target query,
Determining a weight and a base number of each vote of the target query;
Calculating a score for each vote according to each of the weights and each of the cardinalities,
Using the ratio of the total vote scores of the single query to the target query to the total vote scores of all the queries for the target query as a degree of vote similarity between the single query and the target query.
Way.

The method of claim 1,
Determining the normalized query based in part on the degree of correlation between the single query and the target query,
Setting thresholds for normalized queries,
In response to determining that a value of the degree of correlation between the single query and the target query exceeds a threshold of the normalized query, determining the single query as a normalized query of the target query.
Way.

The method of claim 1,
Determining the normalized query based in part on the degree of correlation between the single query and the target query,
Separating the categories of normalized queries into synonym normalized queries, correlated normalized queries, and extended normalized queries;
Setting a range of values of each of the synonym normalized query, the correlated normalized query, and the extended normalized query in descending order according to a correlation degree value;
Using a category corresponding to a range of values to which a degree of correlation between the single query and the target query belongs as a detailed category of the single query and the target query;
Way.

The method of claim 1,
Acquiring click information of a search result from the search log;
Extracting one or more search results including a target query according to the click information;
Determining a degree of click similarity between the single query and the target query according to the total number of clicks of the search result including the target query and the number of clicks of the search result including the target query while corresponding to the single query. and,
Determining a degree of correlation between the single query and the target query is based on the degree of vote similarity and the degree of click similarity.
Way.

The method of claim 10,
Determining a degree of correlation between the single query and the target query based on the degree of vote similarity and the degree of click similarity is the largest value between the degree of vote similarity and the degree of click similarity between the single query and the target query. Selecting as the degree of correlation
Way.

The method of claim 10,
Determining a degree of correlation between the single query and the target query based on the degree of vote similarity and the degree of click similarity,
Determining weights of the vote similarity degree and the click similarity degree, respectively;
Calculating a degree of correlation between the single query and the target query according to one or more predetermined rules according to the degree of vote similarity and the degree of click similarity and their respective weights.
Way.

The method of claim 1,
Obtaining seller data stored in a server, wherein the seller data includes product description information; and
Analyzing the seller data to extract a query and characteristic terms of the query;
Determining a degree of characteristic similarity according to each characteristic term of the single query and the target query,
Determining the degree of correlation between the single query and the target query is based on the degree of vote similarity and the degree of feature similarity.
Way.

The method of claim 13,
Determining the degree of characteristic similarity according to each characteristic term of the single query and the target query,
Calculating characteristic values of each characteristic term based on mutual information between each characteristic term and a query corresponding to each characteristic term;
Calculating a degree of characteristic similarity between the single query and the target query according to the characteristic value.
Way.

The method of claim 1,
Determining a degree of semantic similarity between the single query and the target query,
Determining the normalized query of the target query is based on the degree of correlation and the degree of semantic similarity between the single query and the target query.
Way.

The method of claim 1,
Determining a degree of category similarity between the single query and the target query,
Determining the normalized query of the target query is based on the degree of correlation and category similarity between the single query and the target query
Way.

The method of claim 1,
Determining a degree of semantic similarity between the single query and the target query;
Determining a degree of category similarity between the single query and the target query,
Determining the normalized query of the target query is based on the degree of correlation, the degree of semantic similarity, and the category similarity between the single query and the target query.
Way.

Acquiring session information from a search log;
Determining a degree of vote similarity between a single query and a target query based on the session information, wherein determining the degree of vote similarity
Obtaining all queries appearing in a single session, including the target query;
Calculating a vote count for each target query, wherein the calculating includes counting the single query appearing before the target query in the single session as a vote from the single query for the target query; and ,
Determining the degree of vote similarity between the single query and the target query according to the calculated number of votes;
Determining a degree of correlation between the single query and the target query based on the degree of vote similarity;
Determining a degree of semantic similarity between the single query and the target query;
Determining a normalized query based on the degree of correlation and the degree of semantic similarity between the single query and the target query.
Way.

The method of claim 18,
Determining the degree of semantic similarity between the single query and the target query,
Determining an editing distance between the single query and the target query, wherein the editing distance is the minimum number of editing operations changing from one term to another term; and
Normalizing the editing distance to obtain the degree of semantic similarity having a quantitative level equal to the degree of correlation;
Way.

As a system for extending queries,
One or more processors,
A memory device communicatively coupled with the one or more processors;
A query input module having instructions executable by the one or more processors to obtain a target query;
A normalized query determination module having instructions executable by the one or more processors to determine a normalized query in accordance with the obtained target query;
A query extension module having instructions executable by the one or more processors to implement query extension using the normalized query as an extension term of the obtained target query;
Determining the normalized query according to the obtained target query,
Obtaining session information from the search log,
Determining a degree of vote similarity between a single query and a target query based on the session information-determining the degree of vote similarity,
Obtaining all queries appearing in a single session, including the target query;
Calculating a number of votes for each target query, the calculating comprising counting the single query appearing before the target query in the single session as a vote from the single query for the target query; and
Determining the degree of vote similarity between the single query and the target query according to the calculated number of votes; and
Determining a degree of correlation between the single query and the target query,
Determining a normalized query based in part on the degree of correlation between the single query and the target query.
system.