KR101565339B1

KR101565339B1 - Recommendation system using collective intelligence and method thereof

Info

Publication number: KR101565339B1
Application number: KR1020100108755A
Authority: KR
Inventors: 이채현
Original assignee: 네이버 주식회사
Priority date: 2010-11-03
Filing date: 2010-11-03
Publication date: 2015-11-04
Also published as: JP5809527B2; KR20120047079A; JP2012099115A

Abstract

집단지성을 이용한 추천 시스템 및 방법이 개시된다. 집단지성을 이용한 추천 시스템은 상품과 관련되어 인터넷 상에 게시된 평점 정보를, 평점 정보를 게시한 사용자 및 상품과 연관하여 수집하는 평점 정보 수집부; 평점 정보를 이용하여 사용자 간의 유사도를 측정한 후, 사용자 간의 유사도에 따라 사용자를 클러스터링 한 클러스터를 생성하는 유사성향 사용자 클러스터 생성부; 및 서비스 대상자가 속하는 클러스터에 포함된 사용자의 평점 정보를 근거로 서비스 대상자에게 상품에 대한 추천 서비스를 제공하는 서비스 제공부를 포함한다.A recommendation system and method using collective intelligence are disclosed. The recommendation system using collective intelligence includes a rating information collection unit for collecting rating information posted on the Internet related to a product in association with a user and a product that posted rating information; A similar tendency user cluster generating unit for measuring a degree of similarity between users using the rating information, and then generating a cluster in which users are clustered according to the degree of similarity between users; And a service providing unit for providing a recommendation service for the product to the service target based on the rating information of the user included in the cluster to which the service target party belongs.

Description

[0001] RECOMMENDATION SYSTEM USING COLLECTIVE INTELLIGENCE AND METHOD THEREOF [0002]

본 발명의 실시예들은 집단지성을 이용하여 비슷한 성향을 가진 사람들이 추천하는 상품을 제공할 수 있는 추천 시스템 및 방법에 관한 것이다.Embodiments of the present invention relate to a recommendation system and method for providing products recommended by people with similar tendencies using collective intelligence.

영화나 소설과 같은 문화재를 소비할 때는, 절대적인 평가보다는 취향에 따른 상대적인 평가가 더 중요한 역할을 할 수 있다.When consuming cultural properties such as movies and novels, relative evaluation based on taste may play a more important role than absolute evaluation.

포털 사이트나 인터넷 서점에 남기는 영화 감사평이나, 책의 후기 등을 이용하면 해당 작품에 대한 대중의 평가는 쉽게 알 수 있지만, 나와 비슷한 취향을 가진 사람들이 이 작품을 어떻게 평가했는지를 알기는 힘들다.If you use movie reviews or book reviews left on portal sites or internet bookstores, you can easily see the public's evaluation of the work, but it's hard to know how people with similar tastes have evaluated it.

집단지성을 이용한 추천 시스템은 많은 분야에 응용되고 있으나, 우선 많은 사용자들의 평가 데이터베이스를 구축하기가 힘들다. 또한, 해당 서비스를 제대로 사용하기 위해서는 나의 정보(예를 들어, 내가 좋아하는 상품 등)를 먼저 입력해야 하는 진입 장벽이 존재하기 마련이다.The recommendation system using collective intelligence is applied to many fields, but it is difficult to build evaluation database of many users. Also, in order to use the service properly, there is an entry barriers to input my information (for example, my favorite products) first.

마이크로블로그, 포털 사이트의 평점 사이트 등에서 개인들이 상품의 평점을 기록한 평가 문서를 수집하여 집단의 상품 평가 데이터베이스를 구축할 수 있는 집단지성을 이용한 추천 시스템 및 방법이 제공된다.There is provided a recommendation system and method using collective intelligence capable of collecting evaluation documents in which individuals' ratings of products are recorded in a micro blog, a rating site of a portal site, and building a product evaluation database of a group.

집단의 상품 평가 데이터베이스를 이용하여 유사한 성향을 가진 사용자들이 추천하는 상품 정보를 제시할 수 있는 집단지성을 이용한 추천 시스템 및 방법이 제공된다.There is provided a recommendation system and method using collective intelligence that can present product information recommended by users having similar tendencies using a product evaluation database of a group.

상품과 관련되어 인터넷 상에 게시된 평점 정보를, 평점 정보를 게시한 사용자 및 상품과 연관하여 수집하는 평점 정보 수집부; 평점 정보를 이용하여 사용자 간의 유사도를 측정한 후, 사용자 간의 유사도에 따라 사용자를 클러스터링 한 클러스터를 생성하는 유사성향 사용자 클러스터 생성부; 및 서비스 대상자가 속하는 클러스터에 포함된 사용자의 평점 정보를 근거로 서비스 대상자에게 상품에 대한 추천 서비스를 제공하는 서비스 제공부를 포함하는 집단지성을 이용한 추천 시스템이 제공된다.A rating information collection unit for collecting rating information posted on the Internet in association with a product in association with a user and a product that posted rating information; A similar tendency user cluster generating unit for measuring a degree of similarity between users using the rating information, and then generating a cluster in which users are clustered according to the degree of similarity between users; And a service providing unit for providing a recommendation service for the goods to the service target based on the rating information of the users included in the cluster to which the service target person belongs.

일측에 따르면, 집단지성을 이용한 추천 시스템은 인터넷 상에 게시된 게시 문서를 수집하는 문서 수집부; 게시 문서 중 상품과 관련된 게시 문서를 추출하는 문서 필터링부; 및 게시 문서에 대한 단어 긍정도를 계산하는 긍정도 계산부를 더 포함할 수 있다. 이때, 평점 정보 수집부는 단어 긍정도를 상품의 평점 정보로 수집할 수 있다.According to one aspect, a recommendation system using collective intelligence includes a document collection unit for collecting posted documents posted on the Internet; A document filtering unit for extracting a publication document related to a product among the publication documents; And an affirmative degree calculating unit for calculating a word positive degree with respect to the posted document. At this time, the rating information collection unit can collect word affinity as rating information of the product.

다른 측면에 따르면, 집단지성을 이용한 추천 시스템은 상품과 관련된 적어도 하나의 키워드를 유지하는 상품 키워드 데이터베이스를 더 포함할 수 있다. 이때, 문서 필터링부는 상품 키워드 데이터베이스를 기반으로 키워드와 매칭되는 단어가 포함된 게시 문서를 추출할 수 있다.According to another aspect, the recommendation system using collective intelligence may further include a product keyword database that holds at least one keyword related to the product. At this time, the document filtering unit may extract a publication document including a word matching the keyword based on the product keyword database.

또 다른 측면에 따르면, 집단지성을 이용한 추천 시스템은 긍정어에 해당되는 긍정적 키워드 및 긍정적 키워드 별로 부여된 긍정어 가중치와, 부정어에 해당되는 부정적 키워드 및 부정적 키워드 별로 부여된 부정어 가중치를 유지하는 긍정/부정 키워드 데이터베이스를 더 포함할 수 있다. 이때, 긍정도 계산부는 긍정/부정 키워드 데이터베이스를 기반으로 게시 문서에서 긍정적 키워드 또는 부정적 키워드에 매칭되는 단어를 추출한 후, 추출된 단어에 대응되는 긍정어 가중치 또는 부정어 가중치를 이용하여 단어 긍정도를 계산할 수 있다.According to another aspect, the recommendation system using the collective intelligence includes positive and negative keywords assigned to positive keywords and positive keywords, affirmative / negative keywords corresponding to negative keywords, negative / negative keywords corresponding to negative keywords, And a negative keyword database. At this time, the positive likelihood calculator extracts a word matching the positive keyword or the negative keyword in the publication document based on the positive / negative keyword database, and calculates the word affinity using the positive word weight or the negative word weight corresponding to the extracted word .

또 다른 측면에 따르면, 평점 정보 수집부는 적어도 하나의 웹 사이트로부터 수집된 사용자의 사이트 ID 및 사이트 ID 별로 할당된 유니크 키를 유지하는 사용자 데이터베이스; 및 사용자 데이터베이스를 기반으로 사용자가 게시한 평점 정보를 상품의 ID 및 사용자의 유니크 키 값에 따라 관리하는 평점 정보 데이터베이스를 포함할 수 있다. 이때, 사용자 데이터베이스는 사이트 ID 중 ID가 동일한 사이트 ID에 대하여 서로 다른 값의 유니크 키가 할당되되, 사용자의 인증을 통해 동일 사용자로 판단되면 유니크 키가 같은 값으로 재 할당될 수 있다.According to another aspect, the rating information collection unit includes a user database for holding a unique key assigned to each site ID and a site ID of a user collected from at least one web site; And a rating information database for managing the rating information posted by the user based on the user database according to the ID of the product and the unique key value of the user. At this time, a unique key having a different value is assigned to the site ID having the same ID among the site IDs. However, if it is determined that the user is the same user through the authentication of the user, the unique key can be reassigned to the same value.

또 다른 측면에 따르면, 사용자 간의 유사도는 사용자 간에 동일한 상품의 평점 정보를 비교한 값이며, 유사성향 사용자 클러스터 생성부는 사용자 별로 각기 다른 사용자와의 유사도를 유지하는 유사도 데이터베이스; 및 사용자 간의 유사도가 설정치 이상인 사용자를 같은 클러스터로 관리하는 클러스터 데이터베이스를 포함할 수 있다.According to another aspect, the similarity degree between users is a value obtained by comparing the rating information of the same goods among users, and the similarity incidence user cluster generating unit includes a similarity database that maintains similarity with different users for each user; And a cluster database that manages users whose similarities between users are equal to or higher than a set value in the same cluster.

또 다른 측면에 따르면, 집단지성을 이용한 추천 시스템은 서비스 대상자가 클러스터에 포함되어 있지 않은 경우, 인터넷 상에 평점 정보를 게시한 사용자를 대상으로 서비스 대상자와의 유사도가 설정치 이상인 사용자를 검색하는 유사성향 사용자 검색부; 및 서비스 대상자에 대한 상기 검색된 사용자와의 유사도를 유사도 데이터베이스에 갱신한 후 해당 서비스 대상자를 클러스터 데이터베이스에 추가하는 데이터베이스 갱신부를 더 포함할 수 있다.According to another aspect, the recommendation system using the collective intelligence is a similarity tendency in which users who have posted rating information on the Internet when the service target person is not included in the cluster are searched for users whose similarity degree with the service target is higher than the set value A user search unit; And a database updating unit for updating the degree of similarity of the service target person with the searched user in the similarity degree database and adding the service target person to the cluster database.

또 다른 측면에 따르면, 서비스 제공부는 서비스 대상자가 속한 클러스터에 대하여, 상품 별로 사용자의 평점 정보에 해당 사용자와의 유사도를 곱한 가중치의 합계를 사용자 간 유사도의 합계로 나누어 상품에 대한 최종 평점을 산출하는 최종 평점 산출부; 및 최종 평점을 기준으로 상품을 추천하는 상품 추천부를 포함할 수 있다.According to another aspect of the present invention, the service providing unit calculates a final rating for a product by dividing a total of weights of products obtained by multiplying the rating information of the user by the degree of similarity of the user with the rating information of the user for each product, A final rating calculation unit; And a product recommendation unit for recommending the product based on the final rating.

집단지성을 이용하여 상품을 추천하는 추천 방법에 있어서, 상품과 관련되어 인터넷 상에 게시된 평점 정보를, 평점 정보를 게시한 사용자 및 상품과 연관하여 수집하는 평점 정보 수집단계; 평점 정보를 이용하여 사용자 간의 유사도를 측정한 후, 사용자 간의 유사도에 따라 사용자를 클러스터링 한 클러스터를 생성하는 유사성향 사용자 클러스터 생성단계; 및 서비스 대상자가 속하는 클러스터에 포함된 사용자의 평점 정보를 근거로 서비스 대상자에게 상품에 대한 추천 서비스를 제공하는 서비스 제공단계를 포함하는 집단지성을 이용한 추천 방법이 제공된다.A recommendation method for recommending a product using collective intelligence, the method comprising: collecting rating information posted on the Internet related to a product, in association with a user and a product that posted rating information; A similarity-based user cluster generating step of generating a cluster in which users are clustered according to the degree of similarity between users after measuring the similarity between users using rating information; And a service providing step of providing a recommendation service for a product to the service target based on the rating information of the user included in the cluster to which the service target person belongs.

포털 사이트의 평점 사이트 또는 개인들이 사용하는 마이크로블로그를 수집하여 집단의 상품 평가 데이터베이스를 구축함으로써 서비스 대상자가 별도의 취향 정보를 입력하지 않더라도 유사 성향을 가진 사용자의 평점을 기초로 한 추천 서비스를 제공할 수 있다. 따라서, 집단 지성의 데이터베이스를 이용하면 서비스 대상자와 비슷한 취향을 가진 사용자들이 추천하는 상품 정보를 얻을 수 있다.A microblog used by a rating site of a portal site or a microblog used by an individual is collected to construct a product evaluation database of a group to provide a recommendation service based on a rating of a user having a similar tendency even if the service target person does not input separate taste information . Therefore, by using the database of collective intelligence, it is possible to obtain product information recommended by users having similar tastes to the service target person.

도 1은 본 발명의 일실시예에 있어서, 유사 성향 집단의 상품 평가 데이터베이스를 구축하여 이를 통해 추천 서비스를 제공하는 집단지성을 이용한 추천 시스템의 내부 구성을 도시한 블록도이다.
도 2는 평점 정보를 수집하는 일례를 설명하기 위한 도면이다.
도 3은 본 발명의 일실시예에 있어서, 인터넷 상에 게시된 평점 정보와 이를 게시한 사용자를 수집하여 데이터베이스화 한 평점 정보 수집부의 구성을 도시한 블록도이다.
도 4는 본 발명의 일실시예에 있어서, 마이크로블로그에서 수집된 게시 문서의 필터링을 통해 평점 정보와 사용자를 데이터베이스 화 하는 집단지성을 이용한 추천 시스템의 추가 구성을 도시한 블록도이다.
도 5는 본 발명의 일실시예에 있어서, 유사성향의 사용자를 클러스터링 한 유사성향 사용자 클러스터 생성부의 구성을 도시한 블록도이다.
도 6은 유사한 성향의 사용자를 클러스터링 하는 일례를 설명하기 위한 도면이다.
도 7은 본 발명의 일실시예에 있어서, 데이터베이스에 포함되지 않은 사용자에 대한 유사성향의 사용자를 검색하여 데이터베이스를 갱신하는 집단지성을 이용한 추천 시스템의 추가 구성을 도시한 블록도이다.
도 8은 본 발명의 일실시예에 있어서, 유사성향을 가진 사용자의 평점 정보를 이용하여 상품을 추천하는 서비스 제공부의 구성을 도시한 블록도이다.
도 9는 본 발명의 일실시예에 있어서, 유사 성향 집단의 상품 평가 데이터베이스를 구축하여 이를 통해 추천 서비스를 제공하는 집단지성을 이용한 추천 방법을 도시한 흐름도이다.FIG. 1 is a block diagram illustrating an internal configuration of a recommendation system using collective intelligence for providing a recommendation service by constructing a product evaluation database of a similarity group according to an exemplary embodiment of the present invention. Referring to FIG.
2 is a diagram for explaining an example of collecting rating information.
3 is a block diagram illustrating the structure of a rating information collecting unit that collects rating information posted on the Internet and users who have published rating information and converts the rating information into a database in an embodiment of the present invention.
4 is a block diagram illustrating an additional configuration of a recommendation system using collective intelligence for converting rating information and a user into a database through filtering of a publication document collected in a microblog in one embodiment of the present invention.
FIG. 5 is a block diagram illustrating the configuration of a similar tendency user cluster generation unit in which similarity tendency users are clustered in an embodiment of the present invention. FIG.
6 is a diagram for explaining an example of clustering users of similar tendencies.
FIG. 7 is a block diagram illustrating an additional configuration of a recommendation system using collective intelligence to update a database by searching for users who are similar to the users that are not included in the database, according to an exemplary embodiment of the present invention.
8 is a block diagram illustrating the configuration of a service providing unit that recommends a product using rating information of a user having a similar tendency in an embodiment of the present invention.
FIG. 9 is a flowchart illustrating a recommendation method using a collective intelligence for providing a recommendation service by constructing a product evaluation database of a similarity group according to an exemplary embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 있어서, 유사 성향을 보이는 집단의 상품 평가 데이터베이스를 구축하여 이를 통해 추천 서비스를 제공하는 집단지성을 이용한 추천 시스템의 내부 구성을 도시한 블록도이다. 도 1은 상품에 대한 대중의 평가를 토대로 서비스 대상자에게 상품을 추천하는 집단지성을 이용한 추천 시스템(100)을 도시한 것이다.1 is a block diagram illustrating an internal configuration of a recommendation system using collective intelligence to provide a recommendation service by building a product evaluation database of a group showing similar tendencies in an embodiment of the present invention. 1 shows a recommendation system 100 using a collective intelligence for recommending a product to a service target based on an evaluation of the public about the commodity.

본 명세서에서, '상품'은 영화, 연극, 소설 등의 문화재뿐 아니라, 판매를 목적으로 하는 물건이나 서비스 등을 포괄하여 의미할 수 있다.In the present specification, the term 'commodity' may refer not only to cultural properties such as movies, dramas and novels, but also to objects or services for sale.

일실시예에 따른 추천 시스템(100)은 평점 정보 수집부(110), 유사성향 사용자 클러스터 생성부(120), 서비스 제공부(130)를 포함할 수 있다.The recommendation system 100 according to an exemplary embodiment may include a rating information collection unit 110, a similar tendency user cluster generation unit 120, and a service provider 130.

평점 정보 수집부(110)는 상품과 관련되어 인터넷 상에 게시된 평점 정보를, 게시자인 사용자 및 해당 상품과 연관하여 수집 및 저장하는 수단을 의미한다. 평점 정보 수집부(110)는 상품을 소개하는 상품 소개 사이트(예를 들어, 영화 소개 사이트)인 평점 사이트에서 평점 정보를 수집하는 방식, 또는 마이크로블로그(microblog)에 게시된 게시 문서에서 평점 정보를 추출하여 수집하는 방식에 의해 사용자 별 평점 정보를 데이터베이스로 구축할 수 있다.The rating information collecting unit 110 collects and stores rating information posted on the Internet in association with a product in association with a user who is a publisher and a corresponding product. The rating information collecting unit 110 collects rating information from a rating site that is a product introduction site (for example, a movie introduction site) that introduces a product or a rating information from a publication document posted on a microblog And the rating information for each user can be constructed as a database by a method of extracting and collecting.

유사성향 사용자 클러스터 생성부(120)는 사용자 별 평점 정보를 기초로 측정된 사용자 간의 유사도에 따라 사용자를 클러스터링(clustering) 한 사용자 클러스터를 생성하는 역할을 수행한다. 아울러, 유사성향 사용자 클러스터 생성부(120)는 사용자 간의 유사도, 및 사용자 클러스터를 저장하는 수단을 의미한다. 즉, 유사성향 사용자 클러스터 생성부(120)에 의해 사용자 별로 각기 다른 사용자와의 유사도를 유지 및 관리하고, 유사 성향의 사용자를 같은 클러스터로 유지 및 관리할 수 있다.The similar tendency user cluster generating unit 120 plays a role of creating a user cluster clustering users according to the degree of similarity between users measured based on the rating information for each user. Likewise, the similar tendency user cluster generating unit 120 means a means for storing similarity between users and a user cluster. That is, the similarity-based user cluster generating unit 120 can maintain and manage the similarity with different users for each user, and maintain and manage similarity-oriented users in the same cluster.

서비스 제공부(130)는 추천 서비스를 제공받고자 하는 서비스 대상자가 속하는 클러스터를 확인한 후, 서비스 대상자가 속한 클러스터 내 사용자들의 평점 정보를 근거로 서비스 대상자에게 상품에 대한 추천 서비스를 제공할 수 있다. 다시 말해, 서비스 제공부(130)는 클러스터를 통해 서비스 대상자와 유사 성향을 가지는 사용자의 평점 정보를 이용하여 유사 성향의 사용자들이 추천하는 상품을 대상으로 서비스할 수 있다.The service providing unit 130 may provide a recommendation service for the product to the service target based on the rating information of the users in the cluster to which the service target person belongs. In other words, the service providing unit 130 can use the rating information of a user having a similar tendency to the service target through the cluster to service a product recommended by users of similar tendencies.

도 2 내지 도 8을 참조하여, 일실시예에 따른 집단지성을 이용한 추천 시스템의 구성 및 작용을 상세히 설명하기로 한다.The configuration and operation of the recommendation system using collective intelligence according to an embodiment will be described in detail with reference to FIGS. 2 to 8. FIG.

평점 정보 Rating information 수집부Collecting section

먼저, 평점 사이트에서 사용자 별 평점 정보를 수집하는 방식을 설명한다.First, we explain how to collect rating information for each user in a rating site.

도 2에 도시한 바와 같이, 평점 정보 수집부(110)는 포털 사이트에서 제공되는 상품 소개 페이지 등의 평점 사이트(210)에서 사용자의 사이트 ID(201)와 사용자들이 평점 사이트(210)에 남기는 평점 정보(202)를 수집할 수 있다. 도 2와 같이, 평점 사이트(210)에 평점을 남긴 웹 페이지가 다수 페이지인 경우, 평점 정보 수집부(110)는 페이지를 바꿔가며 사용자의 사이트 ID(201)와 평점 정보(202)를 수집할 수 있다.2, the rating information collecting unit 110 collects rating information such as a user's site ID 201 and a rating value of the user's rating in the rating site 210, such as a product introduction page provided on a portal site, Information 202 may be collected. As shown in FIG. 2, when a web page having a rating in the rating site 210 is a plurality of pages, the rating information collecting unit 110 collects the user's site ID 201 and the rating information 202 while changing pages .

평점 정보 수집부(110)는 수집된 사용자 별 평점 정보를 데이터베이스로 구축한다. 도 3에 도시한 바와 같이, 평점 정보 수집부(300)는 사용자 데이터베이스(310)와, 평점 정보 데이터베이스(320)로 구성될 수 있다.The rating information collection unit 110 constructs the collected rating information for each user as a database. As shown in FIG. 3, the rating information collection unit 300 may include a user database 310 and a rating information database 320.

사용자 데이터베이스(310)는 표 1과 같이 적어도 하나의 웹 사이트로부터 수집된 사용자의 사이트 ID, 및 사이트 ID 별로 할당된 유니크 키를 유지 및 관리한다. 사용자 데이터베이스(310)는 기본적으로 사이트 별로 사이트 ID가 같더라도 다른 사용자로 판단하여 서로 다른 값의 유니크 키를 할당하고, 차후에 사용자가 본인 인증을 한 경우 동일 사용자로 간주하여 표 1과 같이 동일한 유니크 키를 재 할당할 수 있다.As shown in Table 1, the user database 310 maintains and manages a site ID of a user collected from at least one web site and a unique key allocated to each site ID. The user database 310 basically assigns unique keys having different values to each other even if the site IDs are the same for each site, and if the user subsequently authenticates the user, the user database 310 regards the same user as the same user, Can be reallocated.

idid 사이트 ID(사용자)Site ID (user) 사이트 명Site name 유니크 키Unique key 1One xlosxlos navernaver 1One 22 xlosxlos Yes24Yes24 1One 33 chaehyunchaehyun interparkinterpark 1One 44 nomadnomad buxmoviebuxmovie 22

평점 정보 데이터베이스(320)는 사용자 데이터베이스(310)를 기반으로 표 2와 같이 사용자가 게시한 평점 정보를 상품의 ID와 사용자의 유니크 키 값과 연관하여 유지 및 관리한다. 즉, 평점 정보 데이터베이스(320)는 사용자 데이터베이스(310)에 구축된 각 사용자가 남기는 상품 별 평점을 기록한다. 이때, 평점 정보는 -1에서 1 이내로 정규화 되어 저장되는 값을 의미한다. 특히, 평점이 -1에 가까울수록 상품에 대한 부정적 의견이, 1에 가까울수록 상품에 대한 긍정적 의견이 반영된 값을 의미한다.The rating information database 320 maintains and manages the rating information posted by the user in association with the product ID and the unique key value of the user based on the user database 310 as shown in Table 2. In other words, the rating information database 320 records ratings of each product that each user has built in the user database 310. In this case, the rating information is a value that is normalized and stored within -1 to 1. In particular, the closer the rating is to -1, the closer to 1 the negative opinion on the product means the value reflecting the positive opinion on the product.

idid 상품 IDProduct ID 유니크 키Unique key 평점(grade)Grade 1One 1One 1One 1One 22 1One 22 1One 22 1One 22 1One 44 22 22 -0.4-0.4

모든 상품에 대하여 사용자 별 평점 정보를 수집하여 평점 정보 데이터베이스(320)를 구축할 수 있으며, 또는 지난 일정 기간(기간 변경 가능) 동안 사용자들이 남긴 평점이 높은 순으로 일정 개수(n개)의 상품을 선별하여 평점 정보 데이터베이스(320)를 구축할 수 있다.The rating information database 320 may be constructed by collecting rating information of each user for each product or a certain number (n) of products may be arranged in descending order of ratings of users during a predetermined period (period changeable) The rating information database 320 can be constructed.

상기한 구성에 의하면, 평점 정보 수집부(300)는 평점 사이트로부터 평점 정보를 수집하여, 각 상품과 연관된 사용자 별 평점 정보를 데이터베이스화 하여 유지 및 관리할 수 있다.According to the above configuration, the rating information collection unit 300 may collect rating information from rating sites and store and maintain rating information for each user associated with each product in a database.

다음으로, 마이크로블로그에서 사용자 별 평점 정보를 수집하는 방식을 설명한다.Next, we explain how to collect rating information for each user in a microblog.

이를 위한 구성으로, 도 4에 도시한 바와 같이 일실시예에 따른 집단지성을 이용한 추천 시스템(400)은 문서 수집부(410), 상품 키워드 데이터베이스(420), 문서 필터링부(430), 긍정/부정 키워드 데이터베이스(440), 긍정도 계산부(450)를 추가 구성으로 더 포함할 수 있다.4, the recommendation system 400 using collective intelligence according to an embodiment includes a document collection unit 410, a product keyword database 420, a document filtering unit 430, an affirmative / The negative keyword database 440, and the affirmity calculator 450 as additional configurations.

문서 수집부(410)는 표 3과 같이 마이크로 블로깅 사이트에서 사용자의 사이트 ID 및 사용자가 게시한 게시 문서를 수집한다. 여기서, 마이크로블로그는 한두 문장 정도의 짧은 메시지를 이용하여 여러 사람과 소통할 수 있는 소셜 네트워크 서비스의 일종으로, 트위터(twitter), 판오우(fanfou) 등의 해외 사이트와, 미투데이(me2day), 요즘(yozm) 등의 국내 사이트가 대표적인 마이크로블로그에 해당된다.The document collecting unit 410 collects the site ID of the user and the posted document posted by the user at the microblogging site as shown in Table 3. Here, microblog is a type of social network service that can communicate with a plurality of people using a short message of about one or two sentences. It is a kind of social network service that includes overseas sites such as twitter, fanfou, me2day, (yozm) are the representative micro blogs.

idid 사이트 명Site name 사이트 ID
(사용자)Site ID
(user) 게시문서(text)Publishing document (text) 게시일Posted 1One me2dayme2day xlosxlos 인셉션 정말 최고였음!!Inception was really awesome !! 2010.08.02 12:22:402010.08.02 12:22:40 22 twittertwitter nomadnomad 이끼.. 정말 최악의 영화Moss .. the worst movie 2010.07.22 22:15:232010.07.22 22:15:23 33 me2dayme2day xlosxlos 오늘 너무 졸리다 ㅠ.ㅠToday is too sleepy ㅠ. 2010.08.01 10:12:402010.08.01 10:12:40 44 twittertwitter chaechae 연극 라이어2 추천~~~Theatrical Liar 2 Recommended ~~ 2010.05.05 12:15:232010.05.05 12:15:23 55 me2dayme2day xlosxlos 오늘 미드 데미지스 봤는데, 완전 재밌었음. ㅎㅎI saw Mid-Damage today, and it was fun. haha 2010.08.02 10:00:152010.08.02 10:00:15

상품 키워드 데이터베이스(420)는 상품과 관련된 적어도 하나의 키워드를 유지하는 역할을 수행한다. 이때, 상품 키워드 데이터베이스(420)는 표 4와 같이 상품 별로 대표 키워드, 및 관련 키워드를 대응시켜 유지 및 관리할 수 있다. 또한, 상품 키워드 데이터베이스(420)는 각 상품을 카테고리 별로 구분하여 상품 별 키워드를 데이터베이스화 할 수 있다.The product keyword database 420 plays a role of maintaining at least one keyword related to a product. At this time, the product keyword database 420 can maintain and manage representative keywords and related keywords for each product in correspondence with each other as shown in Table 4. In addition, the product keyword database 420 may classify each product into categories and convert the product-specific keywords into a database.

idid 카테고리category 키워드keyword 관련 키워드Related Keywords 1One moviemovie 인셉션Inception inceptioninception 22 moviemovie 이끼Moss 33 playplay 라이어2Liar 2 lier2lier2 44 dramadrama damagesdamages 데미지스, 데미지Damage, Damage

문서 필터링부(430)는 상품 키워드 데이터베이스(420)를 기반으로 상품과 관련된 게시문서를 필터링 하는 역할을 수행한다. 다시 말해, 문서 필터링부(430)는 표 3에 수집된 게시문서 중 상품 키워드 데이터베이스(420) 내의 키워드와 매칭되는 단어가 포함된 게시문서를 추출한다. 표 5에서, 음영이 지정된 게시문서가 키워드 매칭을 통해 추출한, 상품과 관련된 게시문서에 해당된다.The document filtering unit 430 filters the posted document related to the product based on the product keyword database 420. In other words, the document filtering unit 430 extracts a publication document including a word matched with a keyword in the product keyword database 420 among the publication documents collected in Table 3. In Table 5, a shaded publication corresponds to a publication document related to a product, which is extracted through keyword matching.

긍정/부정 키워드 데이터베이스(440)는 긍정어와 부정어를 분류하여 표 6과 같이 데이터베이스를 구축하는 것으로, 긍정어에 해당되는 긍정적 키워드 및 긍정적 키워드 별로 부여된 긍정어 가중치, 부정어에 해당되는 부정적 키워드 및 부정적 키워드 별로 부여된 부정어 가중치를 유지하는 역할을 수행한다. 긍정/부정 키워드 데이터베이스(440)는 사전에 기계 학습(machine learning)을 통해 학습하여 준비할 수 있다.The positive / negative keyword database 440 classifies affirmative and negative keywords to construct a database as shown in Table 6. The negative keywords and negative keywords are classified into positive keywords corresponding to positive keywords and positive keywords corresponding to positive keywords, negative keywords corresponding to negative keywords, And plays a role of maintaining adjective weight given to each keyword. The positive / negative keyword database 440 can be prepared by learning in advance through machine learning.

idid 분류(category)Category 키워드(긍정/부정)Keywords (positive / negative) 가중치(weight)Weight 1One 긍정어(POSITIVE)Positive (POSITIVE) 추천recommendation 1One 22 긍정어(POSITIVE)Positive (POSITIVE) 좋다good 1One 33 긍정어(POSITIVE)Positive (POSITIVE) 재미있다It is interesting 0.70.7 44 긍정어(POSITIVE)Positive (POSITIVE) 최고Best 1One 55 부정어(NEGATIVE)NEGATIVE 비추Shine -1-One 66 부정어(NEGATIVE)NEGATIVE 최악worst -1-One 77 부정어(NEGATIVE)NEGATIVE 졸리다Sleepy -0.5-0.5

긍정도 계산부(450)는 긍정/부정 키워드 데이터베이스(440)를 기반으로 필터링을 거쳐 추출된 표 5의 게시 문서의 단어 긍정도를 계산한다. 긍정도 계산부(450)는 게시 문서에서 긍정적 키워드 또는 부정적 키워드에 매칭되는 단어를 추출한 후, 추출된 단어에 대응되는 긍정어 가중치 또는 부정어 가중치를 이용하여 단어 긍정도를 계산할 수 있다. 일례로, 긍정도 계산부(450)는 수학식 1에 의해 게시 문서의 단어 긍정도를 계산할 수 있다.The positive likelihood calculator 450 calculates the word affinity of the publication document of Table 5 extracted through filtering based on the positive / negative keyword database 440. The positive likelihood calculator 450 may calculate a word affinity using an affirmative weight or negative word weight corresponding to the extracted word after extracting a word matching the positive keyword or the negative keyword in the posted document. For example, the positive likelihood calculator 450 can calculate the word affinity of the posted document by using Equation (1).

긍정도 계산부(450)는 게시 문서를 형태소 단위의 단어로 분류하여 분류된 단어 중 긍정적 키워드 또는 부정적 키워드에 매칭되는 단어에 할당된 긍정어 가중치 또는 부정어 가중치를 키워드 가중치(keyword weight)로 판단한다. 이어, 긍정도 계산부(450)는 수학식 2에 의해 문장거리 가중치(sentence distance weight)를 계산할 수 있다.The positive likelihood calculator 450 classifies the posted document into words of morpheme units and determines positive or negative word weights assigned to words matching the positive or negative keywords among the classified words as a keyword weight . The positive likelihood calculator 450 may calculate the sentence distance weight according to Equation (2).

다시 말해, 긍정도 계산부(450)는 게시 문서를 문장 단위의 구절로 분류하여 분류된 전체 구절 수에서, 긍정적 키워드 또는 부정적 키워드에 매칭되는 단어를 포함하는 구절과 상품과 관련된 키워드와 매칭되는 단어를 포함하는 구절 간의 거리를 뺀 값을, 전체 구절 수로 나누어 문장거리 가중치를 계산한다.In other words, the affirmity calculator 450 classifies a publication document as a sentence-by-phrase and counts the total number of phrases classified as sentence-by-sentence phrases, i.e., a phrase including a word matching a positive keyword or a negative keyword, And the sentence distance weight is calculated by dividing the value obtained by subtracting the distance between the phrases including the sentence distance.

예를 들어, 게시 문서가 '인셉션을 봤다. 이거 생각보다 재밌군~ 추천합니다~'인 경우, 다음과 같은 방식으로 단어 긍정도가 계산된다.For example, the publishing document was 'Inception'. If this is more fun than I thought ~ recommended ~ ', the word positivity is calculated in the following manner.

(1) 게시 문서를 형태소 단위로 쪼갠다.(1) Divide the posted document into morphemes.

<인셉션을 봤다. 이거 생각보다 재밌군~ 추천합니다~> -> <[인셉셥], [보다], [이거], [생각], [재미있다], [추천하다]><I saw the insception. This is more fun than I thought. I recommend it.> -> <[INSEPT], [VIEW], [THIS], [THOUGHT]

(2) 키워드 가중치를 계산한다.(2) Calculate the keyword weight.

[재미]= 0.7[Fun] = 0.7

[추천]=1.0[Recommendation] = 1.0

(3) 게시 문서를 문장 단위로 쪼갠다.(3) Split the posted document into sentences.

<인셉션을 봤다. 이거 생각보다 재밌군~ 추천합니다~> -> <[인셉션을 봤다], [이거 생각보다 재밌군], [추천합니다]><I saw the insception. This is more fun than I thought. I recommend it ~> -> <[I saw the insception], [It is fun more than I thought], [I recommend]>

(4) 문장거리 가중치를 계산한다. (4) Calculate sentence distance weight.

[인셉션을 봤다] = (3-0) / 3 = 1[Saw inception] = (3-0) / 3 = 1

[이거 생각보다 재밌군] = (3-1) / 3 = 0.66[More fun than I thought] = (3-1) / 3 = 0.66

[추천합니다] = (3-2) / 3 = 0.33[Recommended] = (3-2) / 3 = 0.33

(5) 키워드 가중치와 문장거리 가중치를 이용하여 단어 긍정도를 최종 판단한다.(5) The word affinity is finally judged using the keyword weight and the sentence distance weight.

단어 긍정도 = [재미](0.7 * 0.66) + [추천](1.0 * 0.33) = 0.79Word positive = [fun] (0.7 * 0.66) + [recommended] (1.0 * 0.33) = 0.79

상기한 과정을 통해 계산된 단어 긍정도가 양수이면 게시 문서에 게시된 상품의 평점이 긍정이고, 음수이면 상품의 평점이 부정인 것으로 판단할 수 있다.If the word positive degree calculated through the above process is positive, the rating of the product posted on the posted document is positive, and if it is negative, it can be determined that the rating of the product is negative.

아울러, 긍정도 계산부(450)는 각 상품에 대하여 사용자 별 단어 긍정도를 합산한 다음, 합산한 값을 -1에서 1 이내로 정규화한다. 여기서, 게시 문서의 단어 긍정도는 게시 문서에 관련된 상품에 대한 사용자의 평점 정보로 이용할 수 있다.In addition, the positive likelihood calculator 450 normalizes the sum of the word affinities per user for each product and the sum to within one to one. Here, the word affinity of the posted document can be used as the rating information of the user regarding the article related to the posted document.

상기한 구성을 통해, 평점 정보 수집부(300)는 마이크로블로그에서 수집된 사용자를 대상으로 사용자의 사이트 ID 별로 유니크 키를 할당한, 표 1과 같은 사용자 데이터베이스(310)를 구축할 수 있다. 또한, 평점 정보 수집부(300)는 긍정도 계산부(450)에서 계산된 게시 문서의 단어 긍정도를 해당 상품의 평점 정보로 저장함으로써 마이크로블로그에서 수집된 평점 정보를 대상으로 각 상품과 연관된 사용자 별 평점 정보를 표 2와 같은 평점 정보 데이터베이스(320)로 구축할 수 있다.Through the above-described configuration, the rating information collection unit 300 can construct a user database 310 as shown in Table 1, in which a unique key is assigned to each user collected from the microblog. In addition, the rating information collection unit 300 stores the word affinity of the publication document calculated by the affirmity calculation unit 450 as rating information of the corresponding product, The star rating information can be constructed in the rating information database 320 as shown in Table 2. [

유사성향 사용자 클러스터 Similarity User Cluster 생성부Generating unit

도 5는 유사성향의 사용자를 클러스터링 한 유사성향 사용자 클러스터 생성부(500)를 도시한 것이다. 도시한 바와 같이, 유사성향 사용자 클러스터 생성부(500)는 유사도 데이터베이스(510)와, 클러스터 데이터베이스(520)로 구성될 수 있다.5 shows a similar tendency user cluster generating unit 500 in which similarity tendency users are clustered. As shown, the pseudo tendency user cluster generating unit 500 may include a similarity degree database 510 and a cluster database 520.

유사도 데이터베이스(510)는 사용자 별 각기 다른 사용자와의 유사도를 유지 및 관리하는 역할을 수행한다. 사용자들의 성향에 대한 유사도는 피어슨 상관점수, 유클리디안 거리, 자카드 계수, 맨하튼 거리 등을 이용한 여러 방식에 의해 측정될 수 있다. 일례로, 피어스 상관점수는 두 개의 데이터 집합이 한 직선으로 얼마나 잘 표현되는가를 나타내는 측정값을 의미하는 것으로, 피어슨 상관점수를 이용하여 사용자 간의 유사도를 측정할 수 있다. 예를 들어, 특정 사용자가 평가한 상품과 동일한 상품을 평가한 다른 사용자들을 찾은 후, 두 사용자들이 상품에 남긴 평점을 비교하여 두 사용자의 피어슨 상관점수를 계산할 수 있다. 이때, 피어슨 상관점수는 -1에서 1 이내의 값으로 나오며 그 중 1은 두 사람이 모든 상품에 같은 점수를 주었음을 나타낸다. 즉, 유사도 데이터베이스(510)는 사용자와 다른 사용자의 상품 별 평점 정보를 기초로 계산된 사용자 간의 유사도를 표 7과 같이 구축할 수 있다.The similarity degree database 510 plays a role of maintaining and managing similarity with different users for each user. The similarity of users' propensity can be measured by various methods using Pearson correlation score, Euclidian distance, Jacquard coefficient, Manhattan distance, and so on. For example, a Pearson correlation score is a measure of how well two datasets are represented in a straight line. The Pearson correlation score can be used to measure the similarity between users. For example, after locating other users who have evaluated the same product as a particular user-rated product, the two users can calculate the Pearson correlation score of the two users by comparing the ratings that they have left on the product. Pearson correlation scores range from -1 to 1, of which 1 indicates that two people gave the same score to all products. That is, the similarity database 510 can construct the similarities between users calculated based on the product-specific rating information of users and other users as shown in Table 7.

사용자ID_1User ID_1 사용자ID_2User ID_2 유사도Similarity 1One 22 0.80.8 1One 33 0.90.9 1One 44 0.10.1

클러스터 데이터베이스(520)는 사용자 간의 유사도가 설정치 이상인 사용자를 같은 클러스터로 관리하는 역할을 수행한다. 각 상품과 연관된 사용자 별 평점 정보를 유지하는 평점 정보 데이터베이스(320)에는 매우 많은 사용자가 존재하고, 각 사용자 간의 유사도를 매번 계산하려면 많은 시간이 소요되기 때문에, 주기적으로 평점 정보 데이터베이스(320)를 기반으로 각 사용자 간의 유사도를 계산한 다음, 유사도 데이터베이스(510)를 갱신하고, 아울러 유사한 성향을 지닌 사용자를 클러스터링 하여 클러스터 데이터베이스(520)를 갱신할 수 있다. 일례로, 사용자들이 남긴 평점이 높은 순에 따라 선택된 일정 개수(n개)의 상품을 대상으로 하는 경우, n개의 상품에 평점을 남긴 사용자들을 대상으로 사용자 간의 유사도를 계산할 수 있다. 도 6을 참조하면, 유사도에 설정치를 적용하여 유사도 값이 설정치를 넘은 사용자들끼리 그래프(601)(602)로 연결하고 동일한 그래프(601)(602)로 묶이는 사용자들을 표 8과 같이 같은 클러스터로 저장할 수 있다.The cluster database 520 plays a role of managing users having similarities between users to the same cluster or more in the same cluster. Since a large number of users exist in the rating information database 320 that maintains rating information for each product associated with each product and it takes a long time to calculate the similarity between the users each time, To update the similarity database 510, and to update the cluster database 520 by clustering users having a similar tendency. For example, when a certain number (n) of commodities are selected according to the order in which users are ranked high, the degree of similarity between users can be calculated for users who have rated n commodities. Referring to FIG. 6, users whose similarity value exceeds the set value by applying a set value to the similarity degree are connected to a graph 601 and a user 602 bound to the same graph 601 (602) Can be stored.

사용자IDUser ID 클러스터 IDCluster ID 1One 1One 22 1One 33 1One

상기한 구성에 따르면, 유사성향 사용자 클러스터 생성부(500)는 사용자 간 유사도를 기반으로 생성된 클러스터를 통해 유사 성향의 사용자들을 관리할 수 있도록 사용자 별 클러스터를 클러스터 데이터베이스(520)로 구축할 수 있다.According to the above configuration, the pseudo-tendency user cluster generating unit 500 can construct a per-user cluster in the cluster database 520 so that users of similar tendencies can be managed through the cluster generated based on the degree of similarity between users .

서비스 service 제공부Offering

일실시예에 따른 집단지성을 이용한 추천 시스템은 추천 서비스를 제공하기에 앞서, 서비스 대상자가 속한 클러스터를 확인하여 서비스 대상자가 클러스터에 포함되지 않은 사용자인 경우, 서비스 대상자를 클러스터에 추가하는 기능을 제공할 수 있다.The recommendation system using collective intelligence according to an exemplary embodiment provides a function of adding a service target to a cluster when the service target is a user not included in the cluster by checking the cluster to which the service target is belonging before providing the recommended service can do.

이를 위한 구성으로, 도 7에 도시한 바와 같이 일실시예에 따른 집단지성을 이용한 추천 시스템(700)은 유사성향 사용자 검색부(710), 데이터베이스 갱신부(720)를 추가 구성으로 더 포함할 수 있다.As shown in FIG. 7, the recommendation system 700 using collective intelligence according to an embodiment may further include a similar-tendency user search unit 710 and a database update unit 720 as an additional configuration have.

유사성향 사용자 검색부(710)는 서비스 대상자가 개인 정보(예를 들어, 사이트 ID)를 입력하면 개인 정보를 통해 서비스 대상자가 클러스터 데이터베이스(520) 내의 클러스터에 포함된 사용자인지 여부를 확인한다. 그리고, 유사성향 사용자 검색부(710)는 서비스 대상자가 클러스터에 포함되지 않은 경우 클러스터에 포함된 사용자를 대상으로 서비스 대상자와의 유사도가 설정치 이상인 사용자를 찾을 수 있다. 또한, 유사성향 사용자 검색부(710)는 클러스터에 포함된 사용자 중 서비스 대상자와 유사한 성향을 가진 사용자가 존재하지 않는 경우 클러스터에 포함되어 있지 않더라도 상품 별 평점 정보가 구축되어 있는 사용자를 대상으로 하거나, 실시간으로 상품 별 평점 정보를 게시하는 사용자를 대상으로 서비스 대상자와의 유사도가 설정치 이상인 사용자를 찾을 수 있다.The similarity user search unit 710 checks whether the service target person is a user included in the cluster in the cluster database 520 through personal information when the service target person inputs personal information (e.g., site ID). If the service target is not included in the cluster, the similarity-based user search unit 710 can search for users included in the cluster and find a user whose similarity with the service target is equal to or higher than a set value. In addition, if there is no user having a tendency similar to that of the service target among the users included in the cluster, the similarity orientation user search unit 710 may target the user having the rating information for each product, It is possible to find a user who posts the rating information for each product in real time and whose similarity with the service target is equal to or higher than a set value.

데이터베이스 갱신부(720)는 신규 사용자인 서비스 대상자에 대하여 유사도 데이터베이스(510)와 클러스터 데이터베이스(520)를 갱신하는 역할을 수행한다. 다시 말해, 클러스터 데이터베이스(520) 내의 클러스터에서 서비스 대상자와 유사한 성향을 가진 사용자를 찾으면 해당 사용자가 속하는 클러스터에 서비스 대상자를 추가하고, 클러스터 이외의 사용자를 대상으로 하여 서비스 대상자와 유사한 성향을 가진 사용자를 찾으면 해당 사용자와 서비스 대상자로 이루어진 클러스터를 추가로 생성하여 클러스터 데이터베이스(520)를 갱신할 수 있다. 또한, 데이터베이스 갱신부(720)는 서비스 대상자와 유사한 성향을 가진 사용자를 찾으면 서비스 대상자에 대한 사용자와의 유사도를 유사도 데이터베이스(510)에 추가 갱신할 수 있다.The database updating unit 720 plays a role of updating the similarity degree database 510 and the cluster database 520 with respect to the service target who is a new user. In other words, if a cluster in the cluster database 520 finds a user having a tendency similar to that of the service target, the service target is added to the cluster to which the user belongs, and a user having a tendency similar to the service target If it is found, the cluster database 520 can be updated by additionally generating a cluster including the user and the service target. Also, if the database updater 720 finds a user having a tendency similar to the service target, the similarity of the service target to the user may be additionally updated in the similarity database 510. [

더욱이, 일실시예에 따른 집단지성을 이용한 추천 시스템은 서비스 대상자와 서비스 대상자가 속한 클러스터에 포함된 사용자 간의 유사도, 및 해당 클러스터에 포함된 사용자의 평점 정보를 이용하여 서비스 대상자에게 상품 추천을 서비스할 수 있다.In addition, the recommendation system using the collective intelligence according to an exemplary embodiment of the present invention provides a service recommendation service to a service target person using the similarity between users included in the cluster to which the service target person belongs, the user included in the cluster, .

이를 위한 구성으로, 일실시예에 따른 집단지성을 이용한 추천 시스템에서 서비스 제공부(800)는 도 8에 도시한 바와 같이 최종 평점 산출부(810), 상품 추천부(820)로 구성될 수 있다.As shown in FIG. 8, the service providing unit 800 may include a final rating calculation unit 810 and a product recommendation unit 820 in the recommendation system using collective intelligence according to an embodiment of the present invention .

최종 평점 산출부(810)는 평점 정보 데이터베이스(320), 유사도 데이터베이스(510), 클러스터 데이터베이스(520)를 이용하여 서비스 대상자가 속한 클러스터에서, 해당 클러스터에 포함된 사용자 간의 유사도 점수와, 상품 별 사용자의 평점 정보를 추출한다. 이때, 서비스 대상자가 속한 클러스터에 포함된 사용자 목록과, 각 사용자와의 유사도 점수와, 상품 별 평점 정보는 도 9와 같이 정리될 수 있다.The final rating calculation unit 810 calculates the similarity score between the users included in the cluster and the product-specific user score using the rating information database 320, the similarity level database 510, and the cluster database 520, And the like. At this time, the user list included in the cluster to which the service target person belongs, the similarity score to each user, and the rating information for each product can be arranged as shown in FIG.

사용자user 유사도Similarity 이끼
평점Moss
grade 이끼
가중치Moss
weight 인셉션
평점Inception
grade 인셉션
가중치Inception
weight 라이어
평점Liar
grade 라이어
가중치Liar
weight 이채현Lee Chae-hyun 0.990.99 0.900.90 0.890.89 0.600.60 0.590.59 0.700.70 0.690.69 홍길동Hong Gil Dong 0.380.38 0.400.40 0.150.15 0.400.40 0.150.15 1.001.00 0.380.38 홍길순Hong Gil Soon 0.780.78 0.700.70 0.550.55 0.900.90 0.700.70 김철수Kim Cheol-Soo 0.670.67 0.800.80 0.540.54 0.900.90 0.600.60 가중치
합계weight
Sum 2.132.13 1.351.35 1.781.78 유사도
합계Similarity
Sum 2.822.82 2.042.04 2.152.15 정규화
(최종평점)Normalization
(Final rating) 0.750.75 0.660.66 0.830.83

상세하게, 최종 평점 산출부(810)는 각 상품의 평점에 유사도를 곱한 가중치의 합계(weighted sum)를 구하고, 동일한 상품에 평점을 남긴 사용자들의 유사도 합계를 산출한 후, 가중치의 합계를 유사도 합계로 나눈 정규화 값을 구한다. 이때, 정규화 값이 상품에 대한 최종 평점이 된다.In detail, the final rating calculation unit 810 calculates a weighted sum of the products obtained by multiplying the rating of each product by the degree of similarity, calculates the sum of similarities of the users who left the rating on the same product, To obtain a normalized value. At this time, the normalized value becomes the final rating for the product.

상품 추천부(820)는 상품 별 최종 평점을 기준으로 서비스 대상자에게 상품을 추천한다. 일례로, 상품 추천부(820)는 서비스 대상자가 추천 서비스를 제공하는 사이트에 로그인하는 경우, 카테고리(예를 들어, 영화, 소설, 연극 등) 별로 구분하여 최종 평점이 높은 순으로 상품을 추천할 수 있다. 다른 일례로, 상품 추천부(820)는 서비스 대상자가 추천 서비스를 제공하는 사이트에 로그인하고 특정 카테고리를 선택하는 경우 서비스 대상장가 선택한 카테고리에 해당되는 상품을 최종 평점이 높은 순으로 추천할 수 있다. 또 다른 일례로, 상품 추천부(820)는 서비스 대상자가 상품을 검색하는 경우, 해당 상품에 대한 일반적인 평점 정보(즉, 모든 사용자들이 남긴 평점의 평균)와 함께, 서비스 대상자와 유사한 성향을 가진 사용자들의 평점 정보를 근거로 산출된 상품 별 최종 평점을 보여줄 수 있다. 또 다른 일례로, 상품 추천부(820)는 서비스 대상자가 상품을 검색하는 경우, 해당 상품과 관련하여 서비스 대상자와 비슷한 성향의 사용자들이 추천하는 상품들을 추천할 수 있다.The product recommendation unit 820 recommends the product to the service target based on the final rating for each product. For example, when a service target person logs in to a site providing a recommended service, the product recommendation unit 820 divides the category into categories (for example, movies, novels, dramas, etc.) . As another example, when the service target person logs in to a site providing a recommendation service and selects a specific category, the commodity recommending unit 820 can recommend products corresponding to the category selected by the service target shop in descending order of the final rating. As another example, when the service target person searches for a product, the product recommendation unit 820 obtains the general rating information (i.e., the average of the ratings left by all the users) And the final score of each product calculated based on the rating information of the products. As another example, when the service target person searches for a product, the product recommendation unit 820 can recommend products recommended by users having a similar tendency to the service target person in relation to the product.

상기한 구성에 의하면, 서비스 제공부(800)는 서비스 대상자와 유사 성향을 가지는 사용자의 평점 정보 및 사용자 간의 유사도를 이용하여 상품 별 최종 평점을 산출하고, 상품 별 최종 평점을 기준으로 하여 상품을 추천할 수 있다.According to the above configuration, the service provider 800 calculates the final score for each product using the rating information of the user having a similar tendency to the service target person and the degree of similarity between the users, can do.

도 9는 본 발명의 일실시예에 있어서, 유사 성향 집단의 상품 평가 데이터베이스를 구축하여 이를 통해 추천 서비스를 제공하는 집단지성을 이용한 추천 방법을 도시한 흐름도이다. 본 실시예에 따른 집단지성을 이용한 추천 방법은 도 1과 도 4를 통해 설명한 추천 시스템에 의해 각각의 단계가 수행될 수 있다.FIG. 9 is a flowchart illustrating a recommendation method using a collective intelligence for providing a recommendation service by constructing a product evaluation database of a similarity group according to an exemplary embodiment of the present invention. The recommendation method using collective intelligence according to the present embodiment can be performed by the recommendation system described with reference to FIG. 1 and FIG.

단계(910)에서 추천 시스템은 상품과 관련되어 인터넷 상에 게시된 평점 정보를, 게시자인 사용자 및 해당 상품과 연관하여 평점 정보 데이터베이스를 구축한다.In step 910, the recommendation system establishes a rating information database by associating rating information posted on the Internet in association with the product with the user who is the publisher and the corresponding product.

평점 정보 데이터베이스를 구축하기 위한 일례로, 단계(911)에서 추천 시스템은 상품 소개 사이트(예를 들어, 영화 소개 사이트)인 평점 사이트에서 평점 정보를 수집하는 방식을 이용할 수 있다. 즉, 포털 사이트에서 제공되는 상품 소개 페이지 등의 평점 사이트에서 사용자의 사이트 ID와 사용자들이 평점 사이트에 남기는 평점 정보를 수집할 수 있다.As an example of building a rating information database, in step 911, the recommendation system can use a method of collecting rating information on a rating site, which is a product introduction site (for example, a movie introduction site). In other words, the user's site ID and the rating information remaining on the rating site can be collected from the rating site such as the product introduction page provided on the portal site.

평점 정보 데이터베이스를 구축하기 위한 다른 일례로, 단계(912)에서 추천 시스템은 마이크로블로그에 게시된 게시 문서에서 평점 정보를 추출하여 수집하는 방식을 이용할 수 있다. 즉, 추천 시스템은 마이크로 블로깅 사이트에서 사용자의 사이트 ID 및 사용자가 게시한 게시 문서를 수집한다. 단계(913)에서 추천 시스템은 상품과 관련된 적어도 하나의 키워드를 유지하는 상품 키워드 데이터베이스를 기반으로 단계(912)에서 수집된 게시 문서 중 상품과 관련된 키워드가 포함된 게시 문서를 추출한다. 단계(914)에서 추천 시스템은 긍정어에 해당되는 긍정적 키워드 및 긍정적 키워드 별로 부여된 긍정어 가중치, 부정어에 해당되는 부정적 키워드 및 부정적 키워드 별로 부여된 부정어 가중치를 유지하는 긍정/부정 키워드 데이터베이스를 기반으로, 단계(913)에서 필터링을 거쳐 추출된 게시 문서의 단어 긍정도를 계산한다. 추천 시스템은 게시 문서에서 긍정적 키워드 또는 부정적 키워드에 매칭되는 단어를 추출한 후, 추출된 단어에 대응되는 긍정어 가중치 또는 부정어 가중치를 이용하여 단어 긍정도를 계산할 수 있다. 이때, 게시 문서의 단어 긍정도는 평점 정보 데이터베이스를 구축하는데 상품 별 사용자의 평점 정보로 저장될 수 있다.As another example for constructing the rating information database, in step 912, the recommendation system may use a method of extracting rating information from the posted document posted on the microblog and collecting the rating information. In other words, the referral system collects the user's site ID and the posted document posted by the user at the microblogging site. In step 913, the recommendation system extracts a publication document including keywords related to the product among the published articles collected in step 912 based on the product keyword database holding at least one keyword related to the product. In step 914, the recommendation system is based on a positive / negative keyword database that maintains affirmative weight values assigned to positive keywords and positive keywords, negative keywords corresponding to negative keywords, and negative keywords assigned to negative keywords, , And calculates the word affinity of the posting document extracted through filtering in step 913. The recommendation system can extract a word matched to a positive keyword or a negative keyword in a publication document, and then calculate a word affinity using an affirmative weight or negative word weight corresponding to the extracted word. At this time, the word affinity of the posted document can be stored as rating information of the user for each product in building the rating information database.

단계(920)에서 추천 시스템은 사용자 별로 각기 다른 사용자와의 유사도를 유지하는 유사도 데이터베이스, 및 사용자 간의 유사도에 따라 사용자를 클러스터링 한 사용자 클러스터를 유지하는 클러스터 데이터베이스를 구축한다. 일례로, 추천 시스템은 특정 사용자가 평가한 상품과 동일한 상품을 평가한 다른 사용자들을 찾은 후, 두 사용자들이 상품에 남긴 평점을 비교하여 두 사용자 간의 유사도를 계산할 수 있다. 그리고, 추천 시스템은 사용자 별로 다른 사용자와의 유사도를 데이터베이스로 구축하고, 아울러 사용자 간의 유사도가 설정치 이상인 사용자들을 클러스터로 묶어 클러스터 별 사용자 그룹을 데이터베이스로 구축할 수 있다. 즉, 추천 시스템은 사용자 별로 다른 사용자와의 유사도를 유지 및 관리하고, 상품 평점에 있어 유사 성향을 가지는 사용자들을 같은 클러스터로 유지 및 관리할 수 있다.In step 920, the recommendation system establishes a similarity database that maintains similarities with different users for each user, and a cluster database that maintains user clusters based on the similarity between users. For example, the recommendation system may find other users who have evaluated the same product as a product evaluated by a particular user, and then calculate the similarity between the two users by comparing the ratings of the two users on the product. In addition, the recommendation system can construct similarity degree with other users for each user as a database, and group users for each cluster into a database by grouping users whose similarities are equal to or higher than a set value. That is, the recommendation system can maintain and manage the similarity with other users for each user, and maintain and manage the users having similar tendency in the product rating by the same cluster.

단계(930)에서 추천 시스템은 단계(910)에서 구축된 평점 정보 데이터베이스, 및 단계(920)에서 구축된 유사도 데이터베이스와 클러스터 데이터베이스를 이용하여 추천 서비스를 제공받고자 하는 서비스 대상자에게 서비스 대상자와 유사 성향을 가지는 사용자들이 추천하는 상품을 추천할 수 있다. 추천 시스템은 서비스 대상자가 속한 클러스터를 통해 사용자 간의 유사도 및, 상품 별 사용자의 평점 정보를 추출한 후, 평점 정보에 유사도를 곱한 가중치의 합계를 사용자 간의 유사도 합계로 나누어 상품 별 최종 평점을 산출할 수 있다. 이에, 추천 시스템은 서비스 대상자에게 최종 평점이 높은 순으로 상품을 추천할 수 있다.In step 930, the recommendation system uses a rating information database established in step 910, and similarity database and cluster database established in step 920, The user can recommend a product recommended by the user. The recommendation system can calculate the final score for each product by extracting the similarity between the users and the rating information of the users according to the product through the cluster to which the service target person belongs, and then dividing the sum of the weights multiplied by the similarity degree by the rating information, . Therefore, the recommendation system can recommend the product to the service person in descending order of the final rating.

이와 같이, 본 발명의 실시예들에 따르면, 포털 사이트의 평점 사이트 또는 개인들이 사용하는 마이크로블로그에서 상품에 대한 사용자 별 평점 정보를 수집하여 집단의 상품 평가 데이터베이스를 구축할 수 있다. 이를 통해, 서비스 대상자가 별도의 취향 정보를 입력하지 않더라도 서비스 대상자와 유사 성향을 가진 사용자의 평점을 기초로 한 추천 서비스를 제공할 수 있다.As described above, according to the embodiments of the present invention, it is possible to collect rating information for each user of a product in a micro blog used by a rating site of a portal site or individuals, and to build a product evaluation database of a group. Accordingly, even if the service target person does not input the preference information, the recommendation service based on the rating of the user having similar tendency to the service target person can be provided.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

110: 평점 정보 수집부
120: 유사성향 사용자 클러스터 생성부
130: 서비스 제공부110: rating information collection unit
120: pseudo-tendency user cluster generation unit
130: Service Offering

Claims

A rating information collection unit for collecting rating information posted on the Internet related to a product in association with a user who posted the rating information and the product;
A similar tendency user cluster generating unit for measuring a degree of similarity between the users by using the rating information, and then generating a cluster in which the users are clustered according to the degree of similarity between the users; And
A service providing unit for providing a recommendation service for the product to the service target person based on the rating information of the user included in the cluster to which the service target person belongs,
Lt; / RTI >
The rating information collecting unit,
A user database for holding a site ID of the user collected from at least one web site and a unique key allocated for each site ID; And
A rating information database for managing the rating information posted by the user on the basis of the user database according to the ID of the goods and the unique key value of the user,
Lt; / RTI >
The user database comprises:
A unique key having a different value is assigned to a site ID having the same ID among the site IDs and if the same user is determined through authentication of the user, the unique key is reassigned to the same value
A recommendation system using collective intelligence.

The method according to claim 1,
The rating information collecting unit,
Collecting the word affinity of the posted document posted on the Internet as the rating information of the goods
A recommendation system using collective intelligence.

The method according to claim 1,
The recommendation system comprises:
A document collecting unit for collecting posted documents posted on the Internet;
A document filtering unit for extracting a publication document related to the product among the publication documents; And
A degree of affirmity calculating unit for calculating a word affinity for the posted document,
Further comprising:
The rating information collecting unit,
Collecting the word affinity as rating information of the goods
A recommendation system using collective intelligence.

The method of claim 3,
The document filtering unit,
Extracting a publication document including a word matched with the keyword of the publication document based on a product keyword database holding at least one keyword related to the product
A recommendation system using collective intelligence.

3. The method of claim 2,
The recommendation system comprises:
A positive degree calculating unit for calculating the word positive degree,
Further comprising:
The affirmth-
And calculating the word affinity using the positive word weight of the positive keyword or the negative word weight of the negative keyword corresponding to the extracted word after extracting a word matching the positive keyword or the negative keyword in the publication document
A recommendation system using collective intelligence.

6. The method of claim 5,
The affirmth-
Calculating a keyword weight by classifying the posted document into words of morpheme units, calculating a sentence distance weight by classifying the posted document into sentence-based phrases, and then adding a value reflecting the sentence distance weight to the keyword weight, Judge the road,
The keyword weight may be expressed as:
The positive word weight or the negative word weight corresponding to each word of the morpheme unit,
The sentence distance weighting may be expressed as:
The number of the phrases constituting the publication document and the distance between the phrases including the positive keyword or the negative keyword to the phrase including the keyword related to the commodity
A recommendation system using collective intelligence.

delete

The method according to claim 1,
The degree of similarity between the users,
And comparing the rating information of the same product among the users,
Wherein the pseudo tendency user cluster generating unit comprises:
A degree of similarity database for maintaining a degree of similarity between different users for each user; And
A cluster database for managing users whose similarities between users are equal to or higher than a threshold value in the same cluster
A Recommendation System Using Collective Intelligence.

9. The method of claim 8,
The recommendation system comprises:
A similarity user search unit for searching a user who has posted the rating information on the Internet, if the service target person is not included in the cluster, a user whose similarity with the service target person is equal to or greater than the set value; And
A database updating unit for updating the degree of similarity of the service target person with the searched user to the similarity degree database and adding the service target person to the cluster database,
The recommendation system using collective intelligence.

The method according to claim 1,
The service providing unit,
Recommending the product to the service target person using the similarity degree between the service target person and the user included in the cluster to which the service target person belongs and the rating information posted by the user included in the corresponding cluster
A recommendation system using collective intelligence.

11. The method of claim 10,
The service providing unit,
A final rating calculation unit for calculating a final rating for the product by dividing the total of weights obtained by multiplying the rating information of the user by the degree of similarity with the user by the sum of the inter-user similarities for the cluster to which the service target person belongs, ; And
And recommending the product based on the final rating
A Recommendation System Using Collective Intelligence.

In a recommendation method for recommending a product using collective intelligence,
In the recommendation method,
A rating information collection step of collecting rating information posted on the Internet related to a product in association with a user who posted the rating information and the product;
A similarity-based user cluster generating step of generating a cluster in which the users are clustered according to the degree of similarity between the users after measuring the similarity between the users using the rating information; And
And a service providing step of providing a recommendation service for the goods to the service target person based on the rating information of the user included in the cluster to which the service target person belongs,
Wherein the rating information collection step comprises:
Constructing a user database that maintains a site ID of the user and a unique key assigned to each site ID collected from at least one web site; And
Establishing a rating information database for managing the rating information posted by the user based on the user database in accordance with the ID of the product and the unique key value of the user
Lt; / RTI >
The user database comprises:
A unique key having a different value is assigned to a site ID having the same ID among the site IDs and if the same user is determined through authentication of the user, the unique key is reassigned to the same value
A recommendation method using collective intelligence.

13. The method of claim 12,
Wherein the rating information collection step comprises:
Collecting the word affinity of the posted document posted on the Internet as the rating information of the goods
A recommendation method using collective intelligence.

13. The method of claim 12,
In the recommendation method,
A document collection step of collecting a publication document posted on the Internet;
A document filtering step of extracting a publication document related to the product among the publication documents; And
An affirmative degree calculating step of calculating a word positive degree with respect to the posted document
Further comprising:
Wherein the rating information collection step comprises:
Collecting the word affinity as rating information of the goods
A recommendation method using collective intelligence.

15. The method of claim 14,
Wherein the document filtering step comprises:
Extracting a publication document including a word matched with the keyword of the publication document based on a product keyword database holding at least one keyword related to the product
A recommendation method using collective intelligence.

14. The method of claim 13,
In the recommendation method,
A positive degree calculating step of calculating the word positive degree
Further comprising:
Wherein the affirmth degree calculating step includes:
And calculating the word affinity using the positive word weight of the positive keyword or the negative word weight of the negative keyword corresponding to the extracted word after extracting a word matching the positive keyword or the negative keyword in the publication document
A recommendation method using collective intelligence.

17. The method of claim 16,
Wherein the affirmth degree calculating step includes:
Calculating a keyword weight by classifying the posted document into words of morpheme units, calculating a sentence distance weight by classifying the posted document into sentence-based phrases, and then adding a value reflecting the sentence distance weight to the keyword weight, Judge the road,
The keyword weight may be expressed as:
The positive word weight or the negative word weight corresponding to each word of the morpheme unit,
The sentence distance weighting may be expressed as:
The number of the phrases constituting the publication document and the distance between the phrases including the positive keyword or the negative keyword to the phrase including the keyword related to the commodity
A recommendation method using collective intelligence.

delete

13. The method of claim 12,
The degree of similarity between the users,
And comparing the rating information of the same product among the users,
Wherein the pseudo-tendency user cluster generation step comprises:
Establishing a similarity database that maintains a similarity degree with respect to different users for each user; And
Establishing a cluster database that manages users whose similarities are equal to or greater than a set value to the same cluster
A method of recommendation using collective intelligence.

20. The method of claim 19,
In the recommendation method,
Searching for a user who has posted the rating information on the Internet if the service target person is not included in the cluster, and searches for a user whose similarity with the service target person is equal to or greater than the set value; And
A database updating step of updating the degree of similarity of the service target person with the searched user to the similarity degree database and adding the service target person to the cluster database
A recommendation method using collective intelligence.

13. The method of claim 12,
The service providing step may include:
Recommending the product to the service target person using the similarity between the service target person and the user included in the cluster to which the service target person belongs and the rating information posted by the user included in the corresponding cluster
A recommendation method using collective intelligence.

22. The method of claim 21,
The service providing step may include:
A final rating calculation step of calculating a final rating for the product by dividing the total of weights obtained by multiplying the rating information of the user by the degree of similarity with the user by the sum of the inter-user similarities for the cluster to which the service target person belongs, ; And
Recommending the product to recommend the product based on the final rating
A method of recommendation using collective intelligence.