KR101059032B1

KR101059032B1 - Search Schema Setting Device and Method

Info

Publication number: KR101059032B1
Application number: KR1020080127444A
Authority: KR
Inventors: 박경재
Original assignee: 주식회사 엔씨소프트
Priority date: 2008-12-15
Filing date: 2008-12-15
Publication date: 2011-08-24
Also published as: KR20100068926A

Abstract

본 발명은 검색 스키마 설정장치 및 그 방법에 관한 것으로, 클라이언트가 서버로 접속하여 다수의 필드를 포함하는 데이터 스키마를 정의하는 단계; 서버가 정의된 데이터 스키마의 각 필드에 대하여 검색 스키마 설정값을 클라이언트로 추천하는 단계; 클라이언트가 서버로 데이터 스키마에 대응하는 데이터를 전송하는 단계; 서버가 전송받은 데이터 및 검색 스키마를 이용하여 역색인을 생성하는 단계; 및 서버가 역색인 및 검색 스키마를 이용하여 상기 전송받은 데이터에 대한 검색서비스를 제공하는 단계;로 이루어지는 것을 특징으로 한다.The present invention relates to an apparatus and method for setting a search schema, comprising the steps of: a client connecting to a server to define a data schema including a plurality of fields; The server recommending a search schema setting value to the client for each field of the defined data schema; Transmitting, by the client, data corresponding to the data schema to the server; Generating an inverted index using the data and the search schema received by the server; And providing, by the server, a search service for the received data by using an inverse index and a search schema.

데이터 스키마(Data Schema), 검색 스키마(Search Schema), 역색인(Inverted Index), 검색엔진(Search Engine) Data Schema, Search Schema, Inverted Index, Search Engine

Description

Search Scheme Setup Device and Method {APPARATUS AND METHOD OF CONFIGURING SEARCH SCHEMA}

본 발명은 맞춤형 검색서비스를 위하여 검색 스키마를 추천하기 위한 검색 스키마 설정장치 및 그 방법에 관한 것이다.The present invention relates to a search schema setting apparatus and method for recommending a search schema for a customized search service.

검색 엔진은 광의로는 인터넷 상에서 정보를 수집하고 찾아주는 시스템을 말하는데, 요즘은 주로 인터넷 상의 웹 페이지들을 크롤링(Crawling)하여 특정 검색어(Query)를 입력받으면 해당 검색어와 관련된 웹 페이지들을 결과값으로 보여주는 시스템을 지칭한다.A search engine is a system that collects and finds information on the Internet in a broad sense. Nowadays, mainly crawling web pages on the Internet and receiving a specific query, the web pages related to the query are displayed as a result. Refers to the system.

한편, 구글(www.google.com)과 같이 최종 사용자(End User)를 위한 검색엔진이외에도 수많은 컨텐츠를 보유하는 웹 사이트들은 자신들을 위한 독자적인 검색엔진 또는 검색서비스가 필요하다.On the other hand, in addition to a search engine for end users such as Google (www.google.com), web sites that have a lot of contents need their own search engines or search services.

물론, 게시판에 등록된 게시물을 대상으로 특정 검색어가 포함된 게시물들을 검색해주는 등의 기본적인 검색기능은 널리 이용되고 있으나, 데이터의 형태가 다양하고 양이 방대한 대형 웹 사이트들을 위주로 입력되는 검색어의 특징이나, 기대되는 검색결과에 특징에 맞춰 커스터마이징된 검색엔진을 구축할 필요성이 제기되 었다.Of course, the basic search function, such as searching for posts containing a specific search word for the posts registered on the bulletin board, is widely used, but the characteristics of the search word mainly input to large web sites with various data types In addition, the necessity of building a customized search engine in accordance with the expected search results was raised.

종래에는 전문가가 해당 웹 사이트를 분석하여 보유하고 있는 데이터가 어떠한 것인지를 정의한 다음 개별적으로 개발을 하는 것이 일반적이었으나, 이는 대단히 비효율적일 뿐만 아니라 기술적 완성도의 측면에서도 편차가 극심하여 검색엔진 커스터마이징 과정의 자동화 프로세스 개발이 절실한 실정이었다.In the past, it was common for experts to analyze the web site and define what data they had, and then develop it individually. However, this is not only very inefficient, but also has a lot of deviations in terms of technical perfection, thereby automating the search engine customization process. Process development was urgently needed.

그러나, 검색엔진의 구조에 대한 전문적인 지식을 갖지 못한 웹 사이트 운영자들로 하여금 검색엔진의 커스터마이징을 위한 데이터 스키마와 검색 스키마를 직접 설정하도록 한다면 자칫 사용자 편의성과 접근성을 떨어뜨리는 한계에 봉착할 수 있다는 위험성이 상존하였다.However, if website operators who do not have expertise in the structure of search engines can directly set the data schema and search schema for customizing the search engine, they may face limitations that reduce user convenience and accessibility. There was a risk.

본 발명은 상기와 같은 문제점을 해소하기 위하여 안출된 것으로 맞춤형 검색서비스를 위한 데이터 스키마의 설정 이후, 적절한 검색 스키마를 추천함으로써 사용자 편의성을 향상시킨 검색 스키마 설정장치 및 그 방법의 제공을 그 목적으로 한다.An object of the present invention is to provide a search schema setting apparatus and method for improving user convenience by recommending an appropriate search schema after setting up a data schema for a customized search service. .

상기와 같은 목적을 달성하기 위한 본 발명의 검색 스키마 설정장치는 클라이언트로부터 인터넷 망을 통해 다수의 필드를 포함하는 데이터 스키마를 입력받고, 데이터를 전송받는 인터페이스부;The search schema setting apparatus of the present invention for achieving the above object is an interface unit for receiving a data schema including a plurality of fields from the client via the Internet network, and receives the data;

데이터 스키마의 각 필드별로 검색 스키마를 추천하는 검색 스키마 분석부;A search schema analyzer for recommending a search schema for each field of the data schema;

검색 스키마를 이용하여 상기 전송받은 데이터로부터 역색인을 생성하는 인덱스 생성부;An index generator for generating an inverse index from the received data using a search schema;

상기 역색인과 상기 검색 스키마를 이용하여 상기 전송받은 데이터에 대한 검색을 수행하는 검색부;를 구비하는 것을 특징으로 한다.And a search unit for searching the received data using the inverted index and the search schema.

한편, 상기와 같은 목적을 달성하기 위한 본 발명의 검색 스키마 설정방법은 클라이언트가 서버로 접속하여 다수의 필드를 포함하는 데이터 스키마를 정의하는 단계;On the other hand, the search schema setting method of the present invention for achieving the above object comprises the steps of a client defining a data schema including a plurality of fields connected to the server;

서버가 정의된 데이터 스키마의 각 필드에 대하여 검색 스키마 설정값을 클 라이언트로 추천하는 단계;Recommending, by the server, a search schema setting value to the client for each field of the defined data schema;

클라이언트가 서버로 데이터 스키마에 대응하는 데이터를 전송하는 단계; Transmitting, by the client, data corresponding to the data schema to the server;

서버가 전송받은 데이터 및 검색 스키마를 이용하여 역색인을 생성하는 단계; 및Generating an inverted index using the data and the search schema received by the server; And

서버가 역색인 및 검색 스키마를 이용하여 상기 전송받은 데이터에 대한 검색서비스를 제공하는 단계;로 이루어지는 것을 특징으로 한다.The server provides a search service for the received data using an inverted index and a search schema.

한편, 상기와 같은 목적을 달성하기 위한 본 발명의 검색 스키마 설정방법은 클라이언트가 입력한 데이터 스키마의 각 필드에 대해 검색 스키마 설정값을 추천하는 단계;On the other hand, the search schema setting method of the present invention for achieving the above object comprises the steps of recommending a search schema setting value for each field of the data schema input by the client;

클라이언트로부터 데이터 스키마에 대응하는 데이터를 전송받으면 검색 스키마를 이용하여 역색인을 생성하는 단계;Generating an inverted index using the search schema when data corresponding to the data schema is received from the client;

클라이언트로부터 검색어를 입력받으면 역색인을 이용하여 상기 전송받은 데이터에 대한 검색을 수행하는 단계; 및Performing a search on the received data using an inverted index when a search word is input from a client; And

상기 검색 스키마를 이용하여 검색결과를 재구성하여 리턴하는 단계;로 이루어지는 것을 특징으로 한다.And reconstructing and returning a search result using the search schema.

상기와 같은 본 발명에 의하면 막대한 개발비용과 오랜 개발기간을 들이지 않고도 보유하고 있는 데이터의 특징과 얻고자 하는 검색결과의 특징에 대한 다소 의 설정만으로 커스터마이징된 검색엔진을 얻을 수 있다는 뛰어난 효과가 있다.According to the present invention as described above, there is an excellent effect that a customized search engine can be obtained with only a few settings for the characteristics of the data held and the characteristics of the search results to be obtained without having to spend a huge development cost and a long development period.

나아가, 검색엔진의 구조에 관한 전문적인 지식을 갖지 않더라도 적절한 검색 스키마의 설정값을 추천해 줌으로써 사용자 편의성과 접근성을 크게 향상시킬 수 있다는 뛰어난 효과가 있다.Furthermore, even if the user does not have an expert knowledge about the structure of the search engine, it is possible to greatly improve user convenience and accessibility by recommending the appropriate search schema setting value.

이하에서는 첨부하는 도면을 참조하여 본 발명에 의한 검색 스키마 설정장치의 구성을 상세히 살펴보기로 한다.Hereinafter, a configuration of a search schema setting apparatus according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 의한 검색 스키마 설정장치의 구성을 개념적으로 나타내는 기능블록도이며, 도 2는 데이터 스키마 및 검색 스키마를 개념적으로 설명하는 참고도이다. 한편, 도 3 및 도 4는 데이터 스키마의 각 필드별 검색 스키마 선호도 분석과정을 설명하는 참고도이다.1 is a functional block diagram conceptually showing a configuration of a search schema setting apparatus according to the present invention, and FIG. 2 is a reference diagram conceptually illustrating a data schema and a search schema. 3 and 4 are reference diagrams for describing a search schema preference analysis process for each field of the data schema.

도 1을 참조하면 본 발명의 검색 스키마 설정장치(100)는 인터페이스부(110), 검색 스키마 분석부(120), 인덱스 생성부(130) 및 검색부(140)를 구비한다.Referring to FIG. 1, the search schema setting apparatus 100 of the present invention includes an interface unit 110, a search schema analysis unit 120, an index generator 130, and a searcher 140.

인터페이스부(110)는 인터넷 망을 통해 클라이언트(200)로부터 다수의 필드를 포함하는 데이터 스키마를 입력받고, 상기 입력받은 스키마에 대응하는 데이터를 전송받는다. 이때 전송받는 데이터는 후술하는 바와 같이 검색서비스의 대상이 되는 로 데이터(Raw Data)이다.The interface unit 110 receives a data schema including a plurality of fields from the client 200 through the Internet network, and receives data corresponding to the received schema. In this case, the received data is raw data that is a target of a search service, as described below.

한편, 검색 스키마 분석부(120)는 후술하는 바와 같이 기수집된 다수의 자료를 이용하여 각 필드별 검색 스키마 선호도를 분석하고, 상기 입력받은 데이터 스키마의 각 필드별로 검색 스키마값을 상기 클라이언트(200)로 추천한다.On the other hand, the search schema analysis unit 120 analyzes the search schema preference for each field by using a plurality of pre-collected data as described below, and the search schema value for each field of the input data schema, the client 200 Recommended)

이때, 데이터 스키마(Data Schema)란 도 2에 도시된 바와 같이 클라이언트(200)마다 맞춤형 검색서비스를 위해 정의되는 것으로, 다수의 필드를 포함한다.In this case, as shown in FIG. 2, a data schema is defined for a customized search service for each client 200 and includes a plurality of fields.

도 2에 예시적으로 도시된 데이터 스키마는 "URL", "ID", "작성일자", "제목", "내용"이라는 5개의 필드를 가지며, 이들 각각의 필드는 "텍스트" 또는 "데이트"라는 속성을 갖는다. 도 2의 예에서는 "URL", "내용", "제목"은 "가변길이 텍스트", "ID"는 "고정길이 텍스트"로 정의되어 있음을 확인할 수 있다.The data schema exemplarily shown in FIG. 2 has five fields of "URL", "ID", "date", "title", and "content", and each of these fields is "text" or "date". Has the property In the example of FIG. 2, it can be seen that "URL", "content", and "title" are defined as "variable length text" and "ID" as "fixed length text".

한편, 검색 스키마(Search Schema)는 이렇게 정의된 데이터 스키마의 각 필드에 대하여 해당 필드값이 검색엔진을 통한 검색대상인가 여부, 해당 필드값을 기준으로 검색결과를 정렬할 것인가 여부, 해당 필드값을 기준으로 검색결과를 필터링할 것인가 여부 그리고, 해당 필드값을 검색결과로 클라이언트(200) 화면상에 디스플레이할 것인가 여부를 포함한다.On the other hand, the Search Schema checks whether or not the field value is a search target through the search engine, sorts the search results based on the field value, and the field value for each field of the data schema defined in this way. Whether to filter the search results based on the criteria, and whether to display the corresponding field value on the screen of the client 200 as a search result.

도 2의 예에 의하면, "제목"필드는 검색대상이나, 정렬대상은 아니며, 필터링 대상은 아니나 디스플레이 대상임을 알 수 있으며, "ID"필드는 검색대상이며 디스플레이 대상이나, 정렬대상도 필터링 대상도 아님을 알 수 있다.According to the example of FIG. 2, the "Title" field is a search target, but not a sort target, and is not a filtering target, but a display target. The "ID" field is a search target, a display target, and a sorting target and a filtering target. It is not known.

이는 클라이언트(200)로부터 전송받은 데이터 가운데 "제목" 필드에 속하는 값들은 검색대상이므로 후술하는 바와 같이 역색인을 생성하며, 맞춤형 검색엔진의 구축 이후 클라이언트(200)가 검색어를 입력하면 이를 이용하여 검색을 수행하되 그 검색결과들을 대상으로 "ID" 필드값을 기준으로 정렬하지는 않으며, 특정 "ID"값을 가지거나 갖지 않는 결과값을 제외하는 등의 필터링은 수행하지 않으나, 클라이언트(200)의 화면상에 검색결과를 디스플레이할 때에는 "ID"필드값을 디스플레이한다는 의미이다.Since the values belonging to the "Title" field among the data transmitted from the client 200 are search targets, the inverse index is generated as described below. However, the search results are not sorted based on the "ID" field value, and filtering is not performed such as excluding a result value having or not having a specific "ID" value. When displaying a search result on the screen, it means that the "ID" field value is displayed.

한편, 이러한 검색 스키마는 검색대상 필드에 대한 분석방법을 더 포함할 수 있다. 도 2의 예에 의하면 검색대상 필드에 대한 분석방법은 필드값을 형태소 분석기를 이용한 "형태소 분석", 필드값을 두음절로 쪼개는 "바이그램 분석", 또는 필드값을 그대로 검색대상으로 하는 "분석안함"의 3가지가 있다.Meanwhile, the search schema may further include an analysis method for a search target field. According to the example of FIG. 2, the analysis method for the search target field is “morphological analysis” using a morpheme analyzer, “bigram analysis” by dividing a field value into two syllables, or “no analysis” using a field value as a search target. There are three of ".

도 2에 도시된 바와 같이 클라이언트(200)가 "URL", "ID", "작성일자", "제목", "내용"이라는 5개의 필드를 갖는 데이터 스키마를 입력하면, 검색 스키마 분석부(120)는 각각의 필드에 대하여 검색대상인지 여부, 정렬대상인지 여부, 필터링 여부, 디스플레이 여부에 대한 값을 추천한다.As shown in FIG. 2, when the client 200 inputs a data schema having five fields of "URL", "ID", "date", "title", and "content", the search schema analyzer 120 ) Recommends values for whether to search, sort, filter, or display each field.

나아가, 검색대상 필드에 대해서는 분석방법을 추천할 수 있는데, 이때 분석방법은 형태소 분석, 바이그램 분석, 분석안함 가운데 어느 하나의 값을 가질 수 있다.Furthermore, an analysis method may be recommended for the search target field, in which the analysis method may have any one of a morphological analysis, a bigram analysis, and no analysis.

도 2의 예에서 검색 스키마 분석부(120)는 클라이언트(200)가 입력한 "제목"필드에 대해 검색대상 여부 및 디스플레이 여부에 대해서는 "TRUE", 정렬대상 여부, 필터링 여부와 관련해서는 모두 "FALSE"라는 값을 추천하였음을 확인할 수 있다. (도 2의 예에서는 "O", "X"로 표시하고 있으나 이는 각각 "TRUE", "FALSE"를 의미하는 것으로 이해될 수 있으며 이하 같다.)In the example of FIG. 2, the search schema analysis unit 120 may search for "subject" and "display" for the "title" field input by the client 200, and "FALSE" for sorting and filtering. It can be seen that the recommended value of ". (In the example of FIG. 2, "O" and "X" are represented, but it may be understood to mean "TRUE" and "FALSE", respectively.)

그렇다면, 도 3 및 4를 참조하여 이러한 검색 스키마 분석부(120)가 어떤 과정을 거쳐 검색 스키마 값을 추천하는지 살펴보도록 한다.If so, referring to FIGS. 3 and 4, the search schema analysis unit 120 goes through a process to recommend a search schema value.

도 3은 각각의 필드별로 다수의 클라이언트들로부터 검색 스키마 선호도를 분석해내는 과정을 설명하고 있다. 도 3에 의하면 우선 검색 스키마 분석부(120)는 기수집된 다수의 타 클라이언트들의 데이터 스키마를 분석하는데, 필드의 명칭과 속성을 비교하여 동일하거나 연관도가 높은 것으로 인정되는 필드들을 분리해낸다.3 illustrates a process of analyzing a search schema preference from a plurality of clients for each field. Referring to FIG. 3, the search schema analyzer 120 first analyzes data schemas of a plurality of pre-collected clients, and compares field names and attributes to separate fields recognized as the same or highly related.

도 3의 예에서 다수의 클라이언트들이 "제목", "타이틀", "title"이라는 필드를 데이터 스키마에 포함시키고 있으며, 이들이 모두 "가변길이 텍스트"라는 속성을 갖고 있으므로 검색 스키마 분석부(120)는 이들이 같거나 연관된 필드들이라는 결론에 이른다.In the example of FIG. 3, a plurality of clients include fields "title", "title", and "title" in the data schema, and since all of them have an attribute of "variable length text", the search schema analyzer 120 The conclusion is that these are the same or related fields.

이후, 이들 같거나 연관된 필드들이 어떤 검색 스키마 속성을 갖는지 분석함으로써 선호도를 구할 수 있다. 도 3의 예에서, 다수의 클라이언트들의 데이터 스키마로부터 "제목", "타이틀", "title"이라는 명칭의 필드들을 모두 분석한 결과 이들 가운데 검색대상으로 설정된 경우가 70%, 그렇지 않은 경우가 30%로 나타났다면 검색 스키마 분석부(120)는 클라이언트(200)가 입력한 데이터 스키마의 "제목" 필드에 대해 검색대상 여부와 관련 "TRUE" 값을 추천하게 된다. 이는 선호도가 "FALSE"에 비해 70%로 높았기 때문이다.Then, preference can be obtained by analyzing what search schema attributes these same or related fields have. In the example of FIG. 3, 70% of the fields titled "Title", "Title", and "title" from the data schemas of a plurality of clients are set to be searched, and 30% of the cases are not. If it is shown as the search schema analysis unit 120 will recommend the "TRUE" value related to the search object for the "title" field of the data schema input by the client 200. This is because preference is 70% higher than "FALSE".

마찬가지로, 정렬대상 여부와 관련 65%의 선호도로 "FALSE"를 추천하며, 필터링 여부와 관련하여 90%의 선호도로 "FALSE"를 추천하고, 디스플레이 여부와 관련하여 55%의 선호도로 "FALSE"를 추천하게 된다.Similarly, we recommend "FALSE" with a 65% preference for sorting, a recommendation of "FALSE" with 90% of preference for filtering, and "FALSE" with a 55% preference for display. Recommended.

한편, 도 4의 예에서는 이와는 상이한 방법에 의하여 선호도를 분석하는데, 우선 검색 스키마 분석부(120)가 기수집된 자료를 근거로 필드의 명칭과 속성을 비교하여 동일하거나 연관도가 높은 것으로 인정되는 필드들을 분리해낸다는 점은 동일하다. Meanwhile, in the example of FIG. 4, the preference is analyzed by a different method. First, the search schema analyzer 120 compares the names and attributes of fields based on collected data, and is recognized as having the same or higher relevance. The same is true of separating fields.

그러나, 기수집한 다수의 클라이언트들의 신상정보를 이용하여 집단을 분류하고 각 집단별로 검색 스키마 각 항목과 관련한 선호도를 수집하게 된다. 각 항목별 선호도를 수집하는 과정은 도 3의 예에서 설명한 바와 같다.However, group information is classified using the collected information of a plurality of riders and the preferences related to each item of the search schema are collected for each group. The process of collecting the preference for each item is as described in the example of FIG. 3.

도 4의 예에서, 클라이언트(200)는 "20대, 사무직 종사자, 남성"이라는 신상정보를 입력한 바 있으며, 검색 스키마 분석부(120)는 기수집된 다수의 데이터 스키마와 검색 스키마를 이용하여 클라이언트(200)가 속하는 집단의 검색 스키마 항목별 선호도를 분석, 이를 추천한다.In the example of Figure 4, the client 200 has entered the personal information "twenties, office workers, men", the search schema analysis unit 120 using a number of collected data schemas and search schemas It analyzes the preferences of search schema items of the group to which the client 200 belongs and recommends them.

클라이언트(200)의 신상정보는 별도의 회원가입절차를 통해 수집하거나, 기타 여러가지의 과정을 통해 클라이언트(200)로부터 입력받아 수집할 수 있으며 구체적인 수집과정은 논외로 한다.The personal information of the client 200 may be collected through a separate membership registration process or received from the client 200 through various other processes, and a specific collection process is excluded.

상기와 같은 과정을 통해 검색 스키마 분석부(120)가 검색 스키마 항목별로 선호도가 높은 값을 추천하면 클라이언트(200)는 이를 그대로 용인하거나 또는 다른 값으로 변경할 수 있다. 이때, 클라이언트(200)가 그대로 용인하면 추천한 값을, 클라이언트(200)가 변경하면 변경한 값을 저장하며 이렇게 저장된 검색 스키마를 후술하는 바와 같이 상기 클라이언트(200)를 위한 맞춤형 검색서비스에 활용한다.When the search schema analysis unit 120 recommends a high preference value for each search schema item through the above process, the client 200 may accept it or change it to another value. At this time, if the client 200 accepts it, the recommended value is stored, and if the client 200 changes, the changed value is stored and the stored search schema is used for a customized search service for the client 200 as described below. .

한편, 인덱스 생성부(130)는 클라이언트(200)로부터 전송받은 데이터를 분석하여 역색인을 생성한다. 역색인(Inverted Index)이란 특정한 색인어가 어느 위치에 존재하는가를 저장하는 정보로 검색엔진은 이러한 역색인을 이용하여 검색어(Query)가 입력되었을때 검색대상 데이터 가운데 해당 색인어를 포함하는 결과값들만을 신속하게 추출하여 검색결과로 리턴할 수 있다.Meanwhile, the index generator 130 generates an inverse index by analyzing the data received from the client 200. Inverted Index is information that stores where a specific index is located. The search engine uses this inverted index to search only the search results that contain the index. You can quickly extract it and return it as a search result.

이러한 인덱스 생성부(130)는 상기 전송받은 데이터의 각 필드들 가운데서 검색 스키마를 참조하여 검색대상인 필드에 대하여 역색인을 생성한다.The index generator 130 generates an inverted index for a field to be searched by referring to a search schema among the fields of the received data.

도 2의 예에서, 클라이언트(200)가 정의한 검색 스키마에 의하면 "URL", "ID", "제목", "내용" 필드는 검색대상에 해당하므로, 인덱스 생성부(130)는 클라이언트로부터 전송받은 데이터에 대해 이들 필드값을 분석함으로써 역색인을 생성한다. In the example of FIG. 2, since the “URL”, “ID”, “Title”, and “Content” fields correspond to a search object according to the search schema defined by the client 200, the index generator 130 receives the request from the client. Inverse indexes are generated by analyzing these field values against the data.

이때, 검색 스키마에 의해 정의되는 분석방법을 이용하여 이들 필드값을 분석한다. 도 2의 예에 따르면 "제목", "내용" 필드는 형태소 분석을 하도록 정의되어 있으므로 형태소 분석기를 이용하여 해당 필드값을 분석하되, "URL", "ID"필드 는 분석하지 않고 필드값을 그대로 검색대상으로 삼는다.At this time, these field values are analyzed using an analysis method defined by the search schema. According to the example of FIG. 2, since the "Title" and "Content" fields are defined to perform morphological analysis, the corresponding field values are analyzed using the morpheme analyzer, but the "URL" and "ID" fields are not analyzed and the field values are kept as they are. Make it a search target.

검색부(140)는 클라이언트(200)로부터 검색어를 입력받으면, 해당 검색어를 이용하여 상기 생성된 역색인을 이용 검색을 수행하고, 검색결과를 상기 검색 스키마를 이용하여 정렬하거나, 필터링하고 디스플레이 여부를 결정하여 클라이언트(200)로 리턴한다.When the searcher 140 receives a search word from the client 200, the search unit 140 performs a search using the generated inverted index using the search word, and sorts, filters, and displays search results using the search schema. The determination is made and returned to the client 200.

이하에서는 도 5를 참조하여 상기와 같은 구성을 갖는 검색 스키마 설정장치에서 검색 스키마를 추천, 설정하는 방법을 살펴보기로 한다. 도 5는 본 발명에 의한 검색 스키마 설정방법을 시계열적으로 설명하는 플로우차트이다.Hereinafter, a method of recommending and setting a search schema in the search schema setting apparatus having the above configuration will be described with reference to FIG. 5. 5 is a flowchart illustrating a method of setting a search schema according to the present invention in time series.

도 5에 의하면 우선 클라이언트(200)가 인터넷 망을 통해 본 발명에 의한 검색 스키마 설정장치 스키마 설정장치(100)에 접속하여 데이터 스키마를 정의한다(S110). 이때, 데이터 스키마는 도 2에 도시된 바와 같이 다수의 필드를 포함하며, 바람직하게는 각 필드의 속성을 더 포함할 수 있다.Referring to FIG. 5, the client 200 first accesses the search schema setting device schema setting device 100 according to the present invention through an internet network to define a data schema (S110). In this case, the data schema includes a plurality of fields as shown in FIG. 2, and preferably may further include attributes of each field.

한편, 데이터 스키마가 정의되면 이를 입력받은 스키마 설정장치(100)는 데이터 스키마 각 필드에 대해 검색 스키마를 추천한다(S120). 도 2의 예에 도시된 바와 같이 검색 스키마 분석부(120)는 기수집된 자료를 분석함으로써 각 필드마다 검색대상인지 여부, 정렬대상인지 여부, 필터링 여부, 디스플레이 여부와 관련하여 적절한 값을 추천한다. 도 2의 예에서는 O 또는 X의 두가지 값만을 가지는 것으로 표현되어 있으나 이는 예시적인 것에 불과하며 구현하기에 따라서 다양한 값을 가질 수 있음은 물론이다. 도 2의 예에서는 "ID", "작성일자" 필드에 대하여 스키마 설정장치(100)가 정렬대상 여부에 대한 값으로 "TRUE"라는 의미로 "O"라는 값을 추천하였다. 그러나, "ID" 필드를 기준으로 우선 정렬하고, 그 후 "작성일자" 필드를 기준으로 정렬하는 경우 "ID" 필드에 대한 추천값으로 "1"을, "작성일자" 필드에 대한 추천값으로 "2"를 추천할 수도 있다.On the other hand, when the data schema is defined, the schema setting device 100 receiving the input recommends a search schema for each field of the data schema (S120). As shown in the example of FIG. 2, the search schema analyzing unit 120 analyzes the collected data and recommends appropriate values with respect to whether to search, sort, filter, or display each field. . In the example of Figure 2 is expressed as having only two values of O or X, but this is only exemplary and may have a variety of values depending on the implementation. In the example of FIG. 2, the schema setting device 100 recommends a value of "O" as a value of "TRUE" as a value for sorting object for the "ID" and "date of creation" fields. However, if you first sort by the "ID" field, then sort by the "Date" field, then "1" as the recommendation for the "ID" field and the recommendation for the "Date" field. "2" may be recommended.

한편, 검색 스키마 분석부(120)는 기분석된 다수의 타 클라이언트들의 데이터 스키마, 검색 스키마, 신상정보를 분석함으로써 선호도를 조사하며, 선호도 조사결과를 이용하여 적절한 값을 클라이언트(200)로 추천하게 된다.On the other hand, the search schema analysis unit 120 examines the preferences by analyzing data schemas, search schemas, and personal information of a plurality of previously analyzed clients, and recommends an appropriate value to the client 200 using the result of the preferences survey. do.

구현하기에 따라서는 스키마 설정장치(100)가 추천한 값을 자동으로 클라이언트(200)의 검색 스키마로 확정할 수도 있을 것이며, 또는 클라이언트(200)의 확인이 있으면 추천한 값을 검색 스키마로 하되 다른 값으로 변경할 수 있는 권한을 줄 수도 있을 것이다.Depending on the implementation, the schema setting device 100 may automatically determine the recommended value as the search schema of the client 200, or if the client 200 confirms, the recommended value may be used as the search schema. You might be given permission to change it to a value.

이후, 클라이언트(200)가 서버로 데이터 스키마에 대응하는 데이터를 전송하면(S130), 스키마 설정장치(100)가 전송받은 데이터 및 검색 스키마를 이용하여 역색인을 생성한다(S140).Thereafter, when the client 200 transmits data corresponding to the data schema to the server (S130), the schema setting apparatus 100 generates an inverse index using the received data and the search schema (S140).

스키마 설정장치(100)는 생성된 역색인 및 검색 스키마를 이용하여 상기 전송받은 데이터에 대한 검색서비스를 클라이언트(200)로 제공하게 된다.The schema setting apparatus 100 provides the client 200 with a search service for the received data by using the generated inverted index and the search schema.

한편, 이하에서는 도 6을 참조하여 상기와 같은 구성을 갖는 검색 스키마 설정장치에서 검색 스키마를 추천, 설정하는 방법을 서버의 관점에서 살펴보기로 한 다. 도 6은 본 발명에 의한 검색 스키마 설정방법을 시계열적으로 설명하는 플로우차트이다.Meanwhile, referring to FIG. 6, a method of recommending and setting a search schema in a search schema setting apparatus having the above configuration will be described in terms of a server. 6 is a flowchart illustrating a method of setting a search schema according to the present invention in time series.

우선, 클라이언트(200)가 입력한 데이터 스키마의 각 필드에 대해 검색 스키마 설정값을 추천한다(S210). 그 과정은 도 2에 예시적으로 도시된 바와 같이 데이터 스키마가 입력되면 각 필드마다 정렬대상 여부, 검색대상 여부, 필터링 여부, 디스플레이 여부에 대한 값을 미리 설정하여 이를 클라이언트(200)에게 추천함으로써 이루어진다.First, the search schema setting value is recommended for each field of the data schema input by the client 200 (S210). When the data schema is input as illustrated in FIG. 2, the process is performed by presetting values for sorting, searching, filtering, and display for each field, and recommending them to the client 200. .

나아가, 검색대상 필드에 대해서는 분석방법을 더 추천할 수 있으며, 바람직하게는 분석방법은 형태소 분석, 바이그램 분석, 분석안함 가운데 어느 하나의 값을 가질 수 있다.In addition, an analysis method may be further recommended for the search target field, and preferably, the analysis method may have any one value of morphological analysis, bigram analysis, or no analysis.

구현하기에 따라서는 추천한 값을 그대로 클라이언트(200)의 검색 스키마로 확정할 수도 있으나, 이외에도 추천한 검색 스키마 설정값에 대해 클라이언트(200)의 확인이 있으면 추천한 검색 스키마 설정값을 클라이언트가 다른 설정으로 변경하면 변경된 설정값을 검색 스키마로 확정할 수오 있다.Depending on the implementation, the recommended value may be determined as the search schema of the client 200 as it is. In addition, if the client 200 confirms the recommended search schema setting value, the client may change the recommended search schema setting value. If you change the setting, the changed setting can be confirmed by the search schema.

한편, 어떤 과정을 통해 각 필드별로 미리 값을 설정하여 추천할 것인가 하는 자세히 살펴보면, 도 3에 도시된 바와 같이 클라이언트(200)가 보유한 데이터 및 정의된 데이터 스키마만을 참조하여 추천하는 방법과, 도 4에 도시된 바와 같이 클라이언트(200)의 신상정보를 참조하여 추천하는 방법이 있을 수 있다.On the other hand, the process of setting the recommended value for each field in advance through the process in detail, as shown in Figure 3, the method recommended by referring only to the data and the defined data schema retained by the client 200, and Figure 4 As shown in the figure, there may be a method of recommending with reference to the personal information of the client 200.

도 3의 예에 따르면 검색 스키마 분석부(120)는 우선, 기수집된 다수의 클라이언트별 데이터 스키마와 검색 스키마를 분석하여 연관성있는 필드들을 구분한다. 도 3의 예에서 "제목", "타이틀", "title"이라는 필드명칭을 갖는 필드들이 동일하거나 적어도 연관성이 높은 필드로 구분되었음을 알 수 있다. According to the example of FIG. 3, the search schema analyzer 120 first analyzes a plurality of collected client-specific data schemas and a search schema to distinguish relevant fields. In the example of FIG. 3, it can be seen that fields having field names of "title", "title", and "title" are divided into identical or at least highly related fields.

이후, 이들 필드들에 대하여 다른 클라이언트들은 어떻게 검색스키마를 설정하였는가 그 선호도를 조사하여, 선호도에 따라서 적절한 값을 추천하게 된다. 도 3의 예에서 "제목"필드에 대하여 70% 가량이 "TRUE"라는 값을 설정한 것으로 조사되었으므로 선호도가 높은 "TRUE"라는 값을 추천하게 된다.Thereafter, other clients examine the preferences of how the search schema is set, and recommend appropriate values according to the preferences. In the example of FIG. 3, about 70% of the “title” fields are set to a value of “TRUE”. Therefore, a value of “TRUE” having high preference is recommended.

한편, 도 4의 예에 따르면 클라이언트(200)의 신상정보를 참조하여 더욱 세분화된 추천을 진행하게 된다. 기수집된 다수의 타 클라이언트의 신상정보를 신상정보 항목별로 구분하여 다수의 집단으로 나누고, 각 집단마다 상기 도 3의 예에서 설명한 바와 같이 선호도를 조사한다.On the other hand, according to the example of Figure 4 by referring to the personal information of the client 200 to further refine the recommendation. The personal information of a number of other collected clients is divided into a plurality of groups by each item of personal information, and each group is examined for preference as described in the example of FIG.

이후, 상기 클라이언트(200)가 신상정보를 기준으로 어느 집단에 속하는가를 판단하고, 해당 집단의 선호도를 이용하여 적절한 값을 추천한다.Thereafter, the client 200 determines which group the user 200 belongs to based on the personal information and recommends an appropriate value using the preference of the group.

이렇게 검색 스키마의 추천과정을 거쳐 검색 스키마가 확정되면 클라이언트(200)는 데이터 스키마에 대응하는 데이터를 스키마 설정장치(100)로 전송한다. 스키마 설정장치(100)는 이러한 데이터를 전송받으면 검색 스키마를 이용하여 역색인을 생성한다(S220). 구체적으로는 상기 설정된 검색 스키마 가운데 검색대상으로 설정된 필드들에 대해 검색을 위한 역색인을 생성한다.When the search schema is determined through the recommendation process of the search schema, the client 200 transmits data corresponding to the data schema to the schema setting apparatus 100. When the schema setting apparatus 100 receives the data, the schema setting apparatus 100 generates an inverted index using the search schema (S220). Specifically, an inverse index for searching is generated for fields set as a search target among the set search schemas.

이후, 클라이언트(200)로부터 검색어를 입력받으면 역색인을 이용하여 상기 전송받은 데이터에 대한 검색을 수행한다(S230).Subsequently, when a search word is input from the client 200, a search for the received data is performed using an inverse index (S230).

한편, 검색결과에 대하여 검색 스키마를 이용하여 재구성한다. 검색결과의 재구성이란 필터링 및 정렬하고, 클라이언트(200) 화면상에 디스플레이 되지 않는 필드값을 삭제하는 것을 말하며, 이러한 재구성이 완료되면 클라이언트(200)로 이를 리턴한다(S240).Meanwhile, the search results are reconstructed using the search schema. The reconstruction of the search results refers to filtering and sorting and deleting field values that are not displayed on the screen of the client 200. When the reconstruction is completed, the reconstruction of the search results is returned to the client 200 (S240).

이상 몇가지의 실시예를 들어 본 발명을 상세히 살펴보았으나 본 발명은 이러한 실시예에 국한되어 해석되지 아니하며, 특허청구범위에 기재된 기술적 사상의 범위 내에서 자유롭게 변형 실시, 해석되어야 한다.Although the present invention has been described in detail with reference to several embodiments, the present invention is not limited to these embodiments and should not be interpreted, but should be freely modified and interpreted within the scope of the technical idea described in the claims.

도 1은 본 발명에 의한 검색 스키마 설정장치의 구성을 개념적으로 나타내는 기능블록도이며,1 is a functional block diagram conceptually showing the configuration of a search schema setting apparatus according to the present invention;

도 2는 데이터 스키마 및 검색 스키마를 개념적으로 설명하는 참고도이며,2 is a reference diagram conceptually illustrating a data schema and a search schema,

도 3은 각 필드별 검색 스키마의 선호도 분석과정을 예시적으로 설명하는 참고도이며,3 is a reference diagram illustrating an example of a preference analysis process of a search schema for each field.

도 4는 신상정보 항목별 검색 스키마의 선호도 분석과정을 예시적으로 설명하는 참고도이며,4 is a reference diagram for explaining a process of analyzing preferences of personal information items by search schema by way of example.

도 5는 본 발명에 의한 검색 스키마 설정방법을 시계열적으로 설명하는 플로우차트이며,5 is a flowchart illustrating a method of setting up a search schema according to the present invention in time series.

도 6은 본 발명에 의한 검색 스키마 설정방법을 서버의 입장에서 시계열적으로 설명하는 플로우차트이다.6 is a flowchart illustrating a method of setting a search schema according to the present invention in time series from the server's point of view.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

110 : 인터페이스부 120 : 검색 스키마 분석부110: interface unit 120: search schema analysis unit

130 : 인덱스 생성부 140 : 검색부130: index generator 140: search unit

Claims

Connecting the client to the server to define a data schema including a plurality of fields;

The server recommending a search schema setting value to the client for each field of the defined data schema;

Transmitting, by the client, data corresponding to the data schema to the server;

Generating an inverted index using the data and the search schema received by the server; And

And a server providing a search service for the received data by using an inverse index and a search schema.

The method of claim 1,

The recommendation of the search schema setting value is a search schema setting method, characterized in that is made by recommending the values for whether to sort, search, filter, or display for each field.

3. The method of claim 2,

A search schema setting method, characterized in that the method for recommending an analysis method for a field recommended as a search object when recommending a search schema setting value is further recommended.

The method of claim 3,

The analysis method is a search schema setting method characterized in that it has any one of a morphological analysis, a bigram analysis, no analysis.

3. The method of claim 2,

Analyzing the data schemas and search schemas of a plurality of collected clients and examining preferences for sorting, searching, filtering, and display for each field of the data schema;

The search schema setting method of claim 1, wherein values for sorting, searching, filtering, and displaying are recommended for each field of the data schema using the preference.

3. The method of claim 2,

The client is connected to the server to enter the personal information; further comprising,

Analyzing data schemas, search schemas, and personal information of multiple collected clients, values for sorting, searching, filtering, and displaying each field of the data schema based on the client's personal information Search schema setting method, characterized in that for recommending.

Recommending a search schema setting value for each field of the data schema input by the client;

Generating an inverted index using the search schema when data corresponding to the data schema is received from the client;

Performing a search on the received data using an inverted index when a search word is input from a client; And

And reconstructing and returning a search result by using the search schema.

The method of claim 7, wherein

The step of recommending the search schema setting value is a search schema setting method characterized in that it is made by recommending by setting in advance the values for whether to sort, whether to search, filtering or display for each field.

The method of claim 8,

A search schema setting method characterized by further recommending an analysis method for a field recommended as a search target.

The method of claim 9,

The method of claim 7, wherein

And if the client confirms the recommended search schema setting value, storing the changed setting value if the client changes the recommended search schema setting value to another setting.

The method of claim 7, wherein

Analyzing related schemas and a plurality of other client-specific data schemas to distinguish relevant fields; And

And analyzing preferences of sorted fields, search targets, filtering, and display.

The method of claim 12,

And searching for, sorting, filtering, and displaying sorted fields for each field of the data schema input by the client using the analyzed preferences.

The method of claim 12,

The client is connected to the server to input the personal information; Search schema setting method further comprising the.

15. The method of claim 14,

Analyzing personal information of a number of other collected clients to divide into a plurality of groups, analyzing the search schema preference for each group;

The search schema setting method of claim 1, wherein the search schema preference of the group to which the client belongs is recommended for each field of the data schema input by the client.

An interface unit for receiving a data schema including a plurality of fields from a client and transmitting data;

A search schema analyzer for recommending a search schema for each field of the data schema;

An index generator for generating an inverse index from the received data using a search schema;

And a search unit for searching the received data using the inverted index and the search schema.

The method of claim 16,

The search schema analysis unit, characterized in that for each field of the data schema search schema setting device, characterized in that for recommending the value of whether to sort, whether to search, filtering or display.

The method of claim 17,

The search schema analyzing unit further recommends an analysis method for a field recommended as a search target.

The method of claim 18,

The analysis method is a search schema setting apparatus, characterized in that it has any one of a morphological analysis, a bigram analysis, no analysis.

The method of claim 17,

The search schema analysis unit analyzes data schemas and search schemas of a plurality of collected clients and analyzes preferences for each field to recommend sorting, searching, filtering, and display. Setting device.

The method of claim 20,

The search schema analyzing unit analyzes schema preference for each item of personal information by using the collected personal information of a plurality of other clients, and sorts, searches, filters, and displays each field according to the personal information of the client. Search schema set up device, characterized in that for recommending.