KR20100101464A

KR20100101464A - Searching apparatus and method using tag information

Info

Publication number: KR20100101464A
Application number: KR1020090019966A
Authority: KR
Inventors: 서정우; 임종균; 강윤경; 이호경; 김주익
Original assignee: 삼성전자주식회사
Priority date: 2009-03-09
Filing date: 2009-03-09
Publication date: 2010-09-17

Abstract

PURPOSE: A method and a device for searching using tag information are provided to apply searching based on meaning such as ontology, thereby obtaining exact searching result. CONSTITUTION: A tag search unit generates a primary contents list by extracting contents having tag(S210). A weight calculation unit give a first weight value to the contents(S220). A search result providing unit extracts related contents having related tag in the extracted contents(S230). The weight calculating unit gives a second weight value to the related contents(S240). The searching result providing unit outputs the search result contents(S250).

Description

Searching apparatus and method using tag information}

본 발명의 일양상에 따른 기술분야는 정보 검색에 관한 것으로, 보다 상세하게는 태그 정보를 이용한 검색 방법 및 장치에 관한 것이다.The technical field of the present invention relates to information retrieval, and more particularly, to a retrieval method and apparatus using tag information.

네트워크의 발달과 다양한 종류의 사용자 생성정보 증가로 인해 무수히 많은 정보가 사용자에게 노출되어 있다. 그러나 노출되는 정보의 양이 많으면 많을수록, 이러한 정보들중에서 사용자가 실제 원하는 정보를 찾기가 점점 더 어렵게 된다. 다시 말하면, 사용자가 필요로 하는 정보를 얼마나 빠르고 정확하게 찾는가 하는 것이 검색 서비스의 핵심중의 하나가 되었다.Due to the development of the network and the increase of various kinds of user-generated information, a great deal of information is exposed to users. However, the greater the amount of information exposed, the more difficult it is for the user to find the desired information among them. In other words, how quickly and accurately to find the information the user needs has become one of the core of the search service.

종래의 검색 서비스는 사용자가 입력한 검색 쿼리를 분석하여, 검색된 컨텐츠 자체와의 유사도 분석 등의 방법에 따라 검색 결과에 순위를 매기고 그 순위결과를 사용자에게 보여줌으로써 사용자가 원하는 정보를 찾는데 도움을 준다. 따라서 검색 쿼리와 유사하다고 판단된 검색 컨텐츠들을 어떤 알고리즘에 따라 순위를 매기는가에 따라 검색 결과가 크게 달라지고, 원하는 검색결과를 찾기 어려울 수도 있다.Conventional search service analyzes the search query entered by the user, ranks the search results according to the method of analyzing similarity with the searched content itself and shows the ranking result to the user to help the user find the desired information. . Therefore, the search results vary greatly according to which algorithm ranks the search contents determined to be similar to the search query, and it may be difficult to find a desired search result.

따라서, 본 발명의 일 양상에 따라, 컨텐츠에 등록된 태그 정보를 이용하여 검색 적합성을 계산함으로써 검색 서비스의 품질을 향상시킨 태그 정보를 이용한 검색 방법 및 장치를 제공하고자 한다.Accordingly, according to an aspect of the present invention, there is provided a search method and apparatus using tag information that improves the quality of a search service by calculating search suitability using tag information registered in content.

본 발명의 일 양상에 따른 태그 정보를 이용한 검색방법은, 입력된 검색어에 대응되는 태그를 갖는 컨텐츠 또는 상기 태그의 연관 태그를 태그로 갖는 컨텐츠를 추출하는 단계; 상기 태그 또는 연관 태그 정보를 반영하여, 상기 추출된 컨텐츠의 연관도를 계산하는 단계; 및 상기 계산된 연관도 값에 따라 상기 추출된 컨텐츠를 정렬한 검색결과를 제공하는 단계를 포함한다.According to an aspect of the present invention, a search method using tag information includes: extracting content having a tag corresponding to an input search word or content having a tag associated with the tag as a tag; Calculating an association degree of the extracted content by reflecting the tag or related tag information; And providing search results in which the extracted content is sorted according to the calculated relevance value.

또한, 본 발명의 다른 양상에 따른 태그 정보를 이용한 검색장치는, 입력된 검색어에 대응되는 태그를 갖는 컨텐츠 또는 상기 태그의 연관 태그를 태그로 갖는 컨텐츠를 추출하는 태그 검색부; 상기 태그 또는 연관 태그 정보를 반영하여, 상기 추출된 컨텐츠의 연관도를 계산하는 가중치 계산부; 및 상기 계산된 연관도 값에 따라 상기 추출된 컨텐츠를 정렬한 검색결과를 제공하는 검색결과 제공부를 포함한다.In addition, the search apparatus using the tag information according to another aspect of the present invention, the tag search unit for extracting the content having a tag corresponding to the input search word or the content having the tag associated with the tag as a tag; A weight calculator configured to calculate the degree of association of the extracted content by reflecting the tag or related tag information; And a search result providing unit providing search results in which the extracted content is sorted according to the calculated relevance value.

본 발명의 일실시예에 따르면, 온톨로지(ontology)와 같은 의미 기반의 검색을 적용할 수 있기 때문에 종래의 검색방법에 의해 추출된 검색결과보다 정확한 검 색결과를 얻을 수 있다. 다시 말하면, 종래의 검색방법에 따라 수행된 검색 결과나 키워드 빈도수만을 고려한 검색결과보다 검색 서비스 품질을 향상시킬 수 있다.According to an embodiment of the present invention, since a search based on semantics such as ontology can be applied, a more accurate search result can be obtained than a search result extracted by a conventional search method. In other words, the quality of the search service can be improved compared to the search result considering only the search result or keyword frequency performed according to the conventional search method.

또한, 태그 정보를 이용하여 검색의 적합성을 높이고 더 나아가 연관 검색어를 제공함으로써, 사용자에게 양질의 검색 서비스 제공과 아울러 정보 활용 향상 및 업무 생산성을 높일 수 있다.In addition, by using the tag information to improve the suitability of the search and further provide related search terms, it is possible to provide a high quality search service to the user, as well as to improve information utilization and work productivity.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대해 상세히 설명한다. 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, if it is determined that detailed descriptions of related well-known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the present invention, which may vary according to intention or custom of a user or an operator. Therefore, the definition should be made based on the contents throughout the specification.

도 1은 본 발명의 일실시예에 따른 컨텐츠에 등록된 태그 정보의 일예를 도시한 도면이다.1 is a diagram illustrating an example of tag information registered in content according to an embodiment of the present invention.

컨텐츠(100)는 웹 사이트에서 얻을 수 있는 다양한 종류의 정보가 될 수 있다. 예를 들어 T 옴니아의 출시와 관련된, 도 1에 도시된 바와 같은 내용의 컨텐츠(100)가 존재할때, 이와 관련된 키워드인 "SKT, Smartphone, KTF, omnia, 핸드폰시장"과 같은 태그(110)를 컨텐츠(100)에 붙일 수 있다.The content 100 may be various kinds of information that can be obtained from a web site. For example, when there is content 100 as shown in FIG. 1 related to the release of T Omnia, a tag 110 such as "SKT, Smartphone, KTF, omnia, mobile phone market" related to the keyword is shown. To the content 100.

태그(tag)는 웹2.0 기반의 서비스에 사용되는 것으로써, 웹을 통해 정보를 생성하거나 배포하는 경우, 그 정보의 내용에 대해 사용자가 붙이는 의미있는 꼬리표를 의미한다. 예를 들어 컨텐츠의 내용이 "인터넷 기반의 웹2.0 기술"이라는 주제의 내용인 경우, "인터넷, 웹2.0, blog, google, semantic" 등의 단어를 태그로 붙일 수 있다. 즉, 사용자가 컨텐츠에 대해 부여하는 의미있는 키워드를 태그라고 한다. 본 실시예에서는 이 태그를 사용하여 검색의 적합성(relevance)을 향상시킨다. 이러한 서비스는 웹 포탈에서의 검색 서비스, 뉴스 검색 서비스, 블로그 검색 서비스, 커뮤니티 검색 서비스, 소셜 네트워크(social network) 검색 서비스 또는 기업 인트라넷 기반의 비정형 문서 검색 서비스 등에 적용될 수 있다.A tag is used for a web 2.0 based service. When a tag is generated or distributed through the web, a tag is a meaningful tag attached to the content of the information. For example, if the content of the content is the subject of "Internet-based Web 2.0 technology", the words "Internet, Web 2.0, blog, google, semantic" can be tagged. In other words, a meaningful keyword that a user assigns to content is called a tag. In this embodiment, this tag is used to improve the relevance of the search. Such a service may be applied to a search service on a web portal, a news search service, a blog search service, a community search service, a social network search service, or an unstructured document search service based on an enterprise intranet.

도 2는 본 발명의 일실시예에 따른 태그 정보를 이용한 검색방법의 플로우차트이다.2 is a flowchart of a retrieval method using tag information according to an embodiment of the present invention.

검색어를 입력받으면, 입력된 검색어를 포함하는 태그를 갖는 컨텐츠들을 추출하여 1차적인 컨텐츠 목록을 생성한다(S210). 이렇게 생성된 1차 컨텐츠 목록은 데이터베이스에 저장될 수 있다. 예를 들어, "omnia"라는 검색어가 입력되면 "omnia"라는 단어를 태그로 갖는 컨텐츠들을 추출한다. 따라서 도 1에 도시된 바와 같은 컨텐츠(100)가 추출될 수 있다.When the search word is input, the first content list is generated by extracting contents having a tag including the input search word (S210). The primary content list thus generated may be stored in a database. For example, when a search term "omnia" is input, contents that have the word "omnia" as a tag are extracted. Therefore, the content 100 as shown in FIG. 1 may be extracted.

다음으로 이렇게 추출된 컨텐츠들에 제1가중치를 부여하여 연관도를 계산한다(S220). 즉 추출된 각각의 컨텐츠에는 검색어와의 관련성을 나타내는 연관도값이 있으므로, 이 값에 제1가중치를 곱하여 연관도를 다시 계산한다.Next, an association degree is calculated by assigning a first weight value to the extracted contents (S220). That is, since each extracted content has an association value indicating relevance to a search word, the association degree is recalculated by multiplying this value by a first weight value.

그리고, 이렇게 추출된 컨텐츠에 포함된, S210 단계에서의 태그와 관련된 연관 태그를 갖는 연관 컨텐츠들을 추출한다(S230). 전술한 예에서 "omnia"라는 태 그와의 연관 태그는 "SKT, Smartphone, KTF, 핸드폰시장"이 되고, 따라서 이들 단어들을 태그로 갖는 또 다른 컨텐츠들을 연관 컨텐츠들로 추출한다. 한편 이 과정에서 연관 검색어가 더 추출될 수 있다. 예를 들어 빈도수를 고려하여, 자주 등장하는 연관 태그들을 연관 검색어로 추출할 수 있다. 전술한 예에서, "omnia"라는 단어를 태그로 갖는 컨텐츠에 "smartphone"이라는 단어가 존재하는데, 이 단어가 다른 컨텐츠의 태그에서도 자주 등장하면, "smartphone"이 "omnia"라는 검색어의 연관 검색어로 추출될 수 있다. 이렇게 추출된 연관 검색어는 사용자가 원하는 컨텐츠를 검색하는데 있어 보조적으로 더 이용될 수 있다.Then, related contents having the associated tag associated with the tag in the step S210 included in the extracted content are extracted (S230). In the above example, the tag associated with the tag "omnia" becomes "SKT, Smartphone, KTF, mobile phone market", and thus extracts other contents having these words as tags as related contents. Meanwhile, the related search word may be further extracted in this process. For example, in consideration of the frequency, frequently appearing related tags may be extracted as related search terms. In the above example, if the word "smartphone" exists in content that has the word "omnia" as a tag, and this word frequently appears in a tag of other content, "smartphone" is a related search term of "omnia". Can be extracted. The extracted related search word may be further used to assist the user in searching for desired content.

이렇게 추출된 연관 컨텐츠에 제2가중치를 부여하여 연관도를 계산한다(S240). 즉 추출된 각각의 연관 컨텐츠에는 검색어와의 관련성을 나타내는 연관도값이 있으므로, 이 연관도값에 제2가중치를 곱하여 연관도를 다시 계산한다. The association degree is calculated by applying a second weight value to the extracted related content (S240). That is, since each extracted extracted content has an association value indicating relevance to a search word, the association degree is multiplied by a second weight value to calculate the association degree again.

한편 제2가중치는 제1가중치보다 작은 값으로 설정될 수 있다. 예를 들어 제1가중치는 0.5로, 제2가중치는 0.3으로 설정될 수 있다. 즉, 키워드와 동일 또는 유사한 단어를 태그로 갖는 컨텐츠가, 키워드의 연관 태그를 갖는 컨텐츠보다 더 관련되어 있으므로 이를 반영하여 제1가중치를 제2가중치보다 크게 설정할 수 있다.Meanwhile, the second weight value may be set to a value smaller than the first weight value. For example, the first weight value may be set to 0.5 and the second weight value may be set to 0.3. That is, since the content having the same or similar word as the keyword as the tag is more related than the content having the related tag of the keyword, the first weight may be set larger than the second weight.

추출된 검색결과 컨텐츠들을, 이렇게 계산된 연관도를 가지고 순위를 매겨 출력한다(S250). 이때 기존 검색엔진의 알고리즘에 따라 추출된 컨텐츠들의 연관도 데이터가 더 반영될 수 있다. 그리고, 검색결과에는 S230 단계에서 추출된 연관 검색어들도 포함될 수 있다.The extracted search result contents are ranked and output based on the calculated relevance (S250). At this time, the degree of association data of the extracted contents according to the algorithm of the existing search engine may be further reflected. The search result may also include related search words extracted in step S230.

도 3은 본 발명의 일실시예에 따른 태그 정보를 이용한 검색장치의 구성도이다.3 is a block diagram of a search apparatus using tag information according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일실시예에 따른 태그 정보를 이용한 검색장치는 태그 검색부(310), 가중치 계산부(320), 검색결과 제공부(340) 및 데이터베이스(350)를 포함하며, 검색엔진(330)이 선택적으로 추가될 수 있다.Referring to FIG. 3, a search apparatus using tag information according to an embodiment of the present invention includes a tag search unit 310, a weight calculator 320, a search result provider 340, and a database 350. The search engine 330 may be optionally added.

태그 검색부(310)는 검색어가 입력되면 입력된 검색어를 포함하는 태그를 갖는 컨텐츠들을 데이터베이스(350)에서 검색, 추출하여 1차적인 컨텐츠 목록을 생성한다. 이렇게 생성된 1차 컨텐츠 목록은 데이터베이스(350)에 다시 저장될 수 있다. 또한, 이렇게 추출된 컨텐츠에 포함된, 검색어 태그와 관련된 연관 태그를 갖는 연관 컨텐츠들도 추출한다.When a search word is input, the tag search unit 310 searches for and extracts contents having a tag including the input search word from the database 350 to generate a primary content list. The primary content list generated in this way may be stored in the database 350 again. In addition, related contents having an associated tag related to the search word tag included in the extracted content are also extracted.

이 과정에서 연관 검색어가 더 추출될 수 있다. 예를 들어 빈도수를 고려하여, 자주 등장하는 연관 태그들을 연관 검색어로 추출할 수 있다. 전술한 예에서, "omnia"라는 단어를 태그로 갖는 컨텐츠에 "smartphone"이라는 단어가 자주 등장하면, 이 "smartphone"이라는 단어가 "omnia"라는 검색어의 연관 검색어로 추출될 수 있다. 이렇게 추출된 연관 검색어는 사용자가 원하는 컨텐츠를 검색하는데 있어 보조적으로 더 이용될 수 있다.In this process, the related search word may be further extracted. For example, in consideration of the frequency, frequently appearing related tags may be extracted as related search terms. In the above-described example, if the word "smartphone" frequently appears in content having the word "omnia" as a tag, the word "smartphone" may be extracted as a related search word of the search term "omnia". The extracted related search word may be further used to assist the user in searching for desired content.

가중치 계산부(320)는 이렇게 추출된 컨텐츠들에 제1가중치를 부여하여 연관도를 계산한다. 즉 추출된 각각의 컨텐츠에는 검색어와의 관련성을 나타내는 연관도값이 있으므로, 이 값에 제1가중치를 곱하여 연관도를 다시 계산한다. 또한, 태그 검색부(310)에서 추출된 연관 컨텐츠에 제2가중치를 부여하여 연관도를 계산한 다. 즉 추출된 각각의 연관 컨텐츠에는 검색어와의 관련성을 나타내는 연관도값이 있으므로, 이 값에 제2가중치를 곱하여 연관도를 다시 계산한다. 한편 제2가중치는 제1가중치보다 작은 값으로 설정될 수 있다. 즉, 키워드와 동일 또는 유사한 단어를 태그로 갖는 컨텐츠가, 키워드의 연관 태그를 갖는 컨텐츠보다 더 관련되어 있으므로 이를 반영하여 제1가중치를 제2가중치보다 크게 설정할 수 있다.The weight calculator 320 calculates the degree of association by assigning a first weight to the extracted contents. That is, since each extracted content has an association value indicating relevance to a search word, the association degree is recalculated by multiplying this value by a first weight value. In addition, the degree of association is calculated by assigning a second weight value to the associated content extracted by the tag search unit 310. That is, since each extracted extracted content has an association value indicating relevance to a search word, the association degree is recalculated by multiplying this value by a second weight value. Meanwhile, the second weight value may be set to a value smaller than the first weight value. That is, since the content having the same or similar word as the keyword as the tag is more related than the content having the related tag of the keyword, the first weight may be set larger than the second weight.

검색결과 제공부(340)는 이렇게 추출된 검색결과 컨텐츠들을, 전술한 바에 따라 계산된 연관도를 가지고 순위를 매겨 출력한다. 이때 기존 검색엔진(330)의 알고리즘에 따라 추출된 컨텐츠들의 연관도 데이터가 더 반영될 수 있다. 아울러, 태그 검색부(310)에서 추출된 연관 검색어도 제공될 수 있다.The search result providing unit 340 ranks the extracted search result contents with the relevance calculated as described above and outputs the ranking. In this case, the correlation data of the extracted contents according to the algorithm of the existing search engine 330 may be further reflected. In addition, an associated search word extracted by the tag search unit 310 may be provided.

한편, 본 발명의 일실시예에 따른 방법은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.On the other hand, the method according to an embodiment of the present invention can be implemented in a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which may also be implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present invention belongs.

이제까지 본 발명의 바람직한 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment of the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

도 1은 본 발명의 일실시예에 따른 컨텐츠에 등록된 태그 정보의 일예를 도시한 도면,1 is a diagram illustrating an example of tag information registered in content according to an embodiment of the present invention;

도 2는 본 발명의 일실시예에 따른 태그 정보를 이용한 검색방법의 플로우차트,2 is a flowchart of a retrieval method using tag information according to an embodiment of the present invention;

<도면의 주요부분에 대한 설명><Description of main parts of drawing>

310 : 태그 검색부 320 : 가중치 계산부310: tag search unit 320: weight calculation unit

330 : 검색엔진 340 : 검색결과 제공부330: search engine 340: search results provider

350 : 데이터베이스 350: database

Claims

Extracting content having a tag corresponding to an input search word or content having an associated tag of the tag as a tag;

Calculating an association degree of the extracted content by reflecting the tag or related tag information; And

And providing search results in which the extracted content is arranged according to the calculated relevance value.

The method of claim 1, wherein the associating degree calculating step

And a first weight value is assigned to content having a tag corresponding to the input search word, and a tag weight value is calculated by assigning a second weight value to content having an associated tag of the tag.

The method of claim 2,

And the first weight value is set to a value larger than the second weight value.

The method of claim 1, wherein the associating degree calculating step

And extracting a related search word related to the input search word in consideration of the appearance frequency of the related tag.

The method of claim 4, wherein the providing of the search result

Search method using the tag information to further provide the extracted related search word.

The method of claim 1, wherein the providing of the search result

The degree of relevance is calculated by reflecting the relevance value of the extracted contents according to the algorithm of the existing search engine in the calculated relevance value, and using the tag information for providing the search result in which the extracted content is sorted according to the result. Search method.

A tag search unit that extracts content having a tag corresponding to an input search word or content having an associated tag of the tag as a tag;

A weight calculator configured to calculate the degree of association of the extracted content by reflecting the tag or related tag information; And

And a search result providing unit for providing a search result in which the extracted content is sorted according to the calculated correlation value.

The method of claim 7, wherein the tag search unit

And a tag information for extracting content having a tag corresponding to the input search word and content having an associated tag of the tag.

The method of claim 7, wherein the weight calculation unit

And a first weight value is assigned to content having a tag corresponding to the input search word, and a tag weight is calculated by assigning a second weight value to content having an associated tag of the tag.

10. The method of claim 9,

The method of claim 7, wherein

And the search result providing unit further provides the extracted related search word when the weight calculator extracts the related search word related to the input search word in consideration of the frequency of appearance of the related tag.

The method of claim 7, wherein

Searching apparatus using a tag information further comprises a search engine for calculating the relevance value of the extracted content by searching according to the existing search algorithm and delivering to the search result providing unit.