KR20100038272A

KR20100038272A - Method, apparatus, system and computer-readable media for determining user interest

Info

Publication number: KR20100038272A
Application number: KR1020090085510A
Authority: KR
Inventors: 송 유; 쳉 도린; 정상오; 에스 카라사풀 스워룹
Original assignee: 삼성전자주식회사
Priority date: 2008-10-05
Filing date: 2009-09-10
Publication date: 2010-04-14

Abstract

PURPOSE: A method, an apparatus, a system and a computer-readable media for determining user interests are provided to adaptively determine the preference and interests depending on a situation on the basis of a context-dependent recommendation. CONSTITUTION: A data item storage unit stores a data item, and the data item includes not only the interest field that has rating for user's interests but also the context field that has the information related to the context which concerns the rating. A grouping unit groups the data items into environment groups. A partition generating unit generates a first partition having a context group having the data items over a first critical value. A clustering unit averages the ratings for the interests in the data items which belong to the context group within the first partition, and performs the clustering operation for the each context group in the first partition.

Description

Method, Apparatus, System and computer-readable media for determining user interest}

클러스터링에 기초한 상황-의존 추천과 관련된 소프트웨어 어플리케이션 기술과 관련된다.It relates to software application techniques related to situation-dependent recommendations based on clustering.

개념적으로, 컴퓨팅 시스템(예컨대, 계산 장치, 개인용 컴퓨터, 랩탑, 스마트폰, 휴대폰 등)은 어떠한 정보(예컨대, 컨텐츠 또는 데이터)를 수신하고 이를 조절하여 정보를 어떻게 처리하여야 할지를 기술한 일련의 인스트럭션(또는 컴퓨터 프로그램)에 기초하여 어떤 결과를 얻는 것이 가능하다. 전형적으로, 컴퓨팅 시스템에 의해 사용되는 정보는 디지털 또는 바이너리 형태로 컴퓨터 판독 가능 메모리에 저장된다. 복잡한 컴퓨팅 시스템은 자신의 컴퓨터 프로그램을 포함하는 컨텐츠를 저장하는 것도 가능하다. 컴퓨터 프로그램은 고정되거나 마이크로프로세서 또는 컴퓨터 칩 상에 제공되는 논리 회로로서의 컴퓨터(또는 컴퓨팅) 장치에 빌드되는 것이 가능하다. 컴퓨팅 시스템은 다양한 자원(예컨대, 메모리, 보조 장치 등) 및 서비스(예컨대, 파일 오픈과 같은 기본 기능)를 관리하고 자원이 다수의 프로그램을 공유할 수 있도록 하는 지원 시스템이 될 수 있다. 이러한 지원 시스템은 프 로그래머에게 자원 및 서비스로의 접근을 위해 사용되는 인터페이스를 제공하는 오퍼레이팅 시스템(OS)으로 잘 알려져 있다.Conceptually, a computing system (e.g., computing device, personal computer, laptop, smartphone, mobile phone, etc.) receives a set of instructions (e.g., content or data) that describes how to process the information by receiving it and adjusting it. Or a computer program). Typically, information used by a computing system is stored in computer readable memory in digital or binary form. Complex computing systems are also capable of storing content containing their computer programs. The computer program may be built in a computer (or computing) device as a logic circuit fixed or provided on a microprocessor or computer chip. The computing system can be a support system that manages various resources (eg, memory, auxiliary devices, etc.) and services (eg, basic functions such as file open) and allows resources to share multiple programs. Such support systems are well known as operating systems (OSs) that provide programmers with an interface used for access to resources and services.

오늘날, 다양한 종류의 컴퓨팅 장치가 이용되고 있다. 이러한 컴퓨팅 장치는 고가의 고성능 서버, 상대적으로 저가인 개인용 컴퓨터 및 랩탑에서부터 저장 장치, 자동차, 및 가전 장치 등에 제공되는 비교적 저가의 마이크로프로세서 또는 컴퓨터 칩에 이르기까지 크기, 가격, 저장 및 처리 성능 등을 고려하여 다양한 범위에 걸쳐 존재한다.Today, various kinds of computing devices are used. These computing devices range in size, price, storage and processing performance, from expensive, high-performance servers, relatively inexpensive personal computers and laptops, to relatively inexpensive microprocessors or computer chips for storage, automobiles, and consumer electronics. Consideration exists over a wide range.

최근, 컴퓨팅 시스템은 더 작고 휴대가 간편한 형태가 되었다. 결과적으로, 다양한 모바일 및 휴대용 장치가 이용되게 되었다. 예컨대, 무선 전화기, 미디어 플레이어, PDA(personal digital assistants) 등이 등장하게 되었다. 일반적으로, 휴대용 장치는 포켓 크기의 컴퓨팅 장치가 될 수 있다. 이러한 장치에는 작은 디스플레이 스크린 및 소형화된 키보드가 사용되는 것이 통상적이다. 예컨대, PDA의 경우, 입력과 출력이 동시에 가능한 터치스크린 인터페이스가 사용되기도 한다.Recently, computing systems have become smaller and more portable. As a result, a variety of mobile and portable devices have become available. For example, wireless telephones, media players, personal digital assistants (PDAs), and the like, have emerged. In general, a portable device may be a pocket sized computing device. Such devices typically use small display screens and miniaturized keyboards. For example, in the case of a PDA, a touch screen interface capable of input and output simultaneously may be used.

특히, 휴대용 통신 장치(예컨대, 스마트 폰)는 거의 모든 사람이 하나씩은 가지고 있을 정도로 대중화되었고, 몇몇 휴대용 통신 장치는 일반적인 PC에서 제공되는 환경과 유사한 컴퓨팅 환경을 제공하는 것이 가능하다. 이러한 스마트 폰은 응용 서비스 개발자에게 표준화된 인터페이스 및 플랫폼으로서의 완전한 오퍼레이팅 시스템을 제공하는 것이 가능하다. 휴대용 통신 장치의 대중화와 관련하여 자세한 통신에 대해서는 후술한다.In particular, portable communication devices (e.g., smart phones) have become so popular that almost everyone has one, and it is possible for some portable communication devices to provide a computing environment similar to that provided in a typical PC. Such smartphones are capable of providing application service developers with a complete operating system as a standardized interface and platform. Detailed communication regarding the popularization of portable communication devices will be described later.

일반적으로, 모바일 폰 또는 셀 폰은 이동 통신에서 사용되는 다양한 휴대 전자 장치가 될 수 있다. 일반적인 전화기에서 이용되는 목소리 전송 기능 외에, 최근 모바일 폰은 다양한 서비스를 제공하는 것이 가능하다. 예컨대, 이러한 모바일 폰은 텍스트 메시징을 위한 단문 메시지 서비스(SMS), 이-메일, 인터넷 접속을 위한 패킷 스위칭, 사진 및 비디오 송수신을 위한 멀티미디어 메시징 서비스(MMS) 등을 제공하는 것이 가능하다. 대부분의 모바일 폰은 기지국의 네트워크에 접속하는 것이 가능하며, PSTN(public switched telephone network)를 통해 서로 연결될 수 있다.In general, mobile phones or cell phones may be various portable electronic devices used in mobile communication. In addition to the voice transmission function used in general telephones, mobile phones have recently been able to provide various services. For example, such a mobile phone is capable of providing a short message service (SMS) for text messaging, e-mail, packet switching for Internet connection, a multimedia messaging service (MMS) for sending and receiving pictures and videos, and the like. Most mobile phones are capable of connecting to a network of base stations and can be connected to each other via a public switched telephone network (PSTN).

SMS는 이러한 모바일 폰에서 짧은 메시지를 송수신하기 위한 수단으로 사용된다. 처음에 SMS는 GSM(global system for mobile communication) 모바일 핸드셋에 있어서, 160개의 문자를 송수신하기 위한 메시징 수단으로, 1985년 GSM 시리즈 표준의 일부로 정의되었다. 이후로 서비스 지원은 ANSI CDMA 네트워크 및 디지털 AMPS와 같은 대체 모바일 표준, 위성 및 지상 통신선 네트워크를 포함하는 데에 까지 확장되기에 이르렀다. 대부분의 SMS 메시지는 모바일-모바일 간의 텍스트 메시지이며, 브로드캐스팅 메시지 역시 지원하는 것이 가능하다. SMS란 용어는 GSM 시스템이 잘 정착된 비 영어권 유럽에서 텍스트 메시지를 언급하기 위한 비-기술 용어로 사용되는 경우가 빈번하다.SMS is used as a means for sending and receiving short messages in such mobile phones. SMS was initially defined as part of the GSM series standard in 1985 as a messaging means for sending and receiving 160 text messages in a global system for mobile communication (GSM) mobile handset. Since then, service support has expanded to include alternative mobile standards such as ANSI CDMA networks and digital AMPS, satellite and terrestrial line networks. Most SMS messages are mobile-mobile text messages, and broadcasting messages can also be supported. The term SMS is often used as a non-technical term to refer to text messages in non-English-speaking Europe where GSM systems are well established.

MMS는 SMS의 텍스트 외에도 이미지, 오디오, 비디오와 같은 멀티미디어 객체를 포함하는 메시지의 송수신을 가능하게 하는 보다 현대적인 메시징 시스템의 표준이다. MMS는 SMS, 모바일 인스턴스 메시징 및 모바일 이-메일과 같은 다른 메시징 시스템과 함께 셀룰러 네트워크에서 개발되었고, 대부분의 표준화 과정은 3GPP, 3GPP2 및 OMA(open mobile alliance) 등에서 이루어졌다.MMS is a standard for more modern messaging systems that enables the sending and receiving of messages that include multimedia objects such as images, audio, and video in addition to text in SMS. MMS has been developed in cellular networks along with other messaging systems such as SMS, mobile instance messaging and mobile e-mail, and most of the standardization process has taken place in 3GPP, 3GPP2 and open mobile alliance (OMA).

사용자 경험의 개인화는 최근 다양한 연구에서 초점이 되고 있다. 개인화는 많은 이득을 제공한다. 예를 들어, 광고 제공자는 개인화 기술을 통해 해당 광고를 개인 별 선호도 또는 관심에 기초하여 제공하는 것이 가능하다. 또한 mp3 제공자의 경우, 개인 별 음악 성향을 고려한 음악 리스트를 제공하는 것도 가능하다. 이와 같이, 개인화를 통해, 어떤 서비스 제공자는 사용자 별 선호도 또는 관심에 기초하여 개인화된 서비스를 제공하는 것이 가능해진다.Personalization of user experiences has been the focus of recent research. Personalization offers many benefits. For example, the advertisement provider may provide the advertisement based on personal preference or interest through personalization technology. In the case of an mp3 provider, it is also possible to provide a music list in consideration of individual music propensity. As such, through personalization, it is possible for some service providers to provide personalized services based on user preferences or interests.

그러나 선호도(preference)는 정적인 개념이 아니다. 즉 사용자의 선호도는 끊임없이 변화하게 된다. 이 변화는 사용자의 현재 상태 또는 상황에 기인할 수가 있다. 예컨대, 사무실에서의 선호도와 집에서의 선호도는 서로 다를 수가 있다. 사용자가 명확하게 선호도를 지정하도록 하는 선호도의 공통적인 접근은 사용자 프로파일 또는 유사한 구조를 통해 이루어질 수 있다. 그러나 몇몇의 사용자는 선호도를 지정하는 것을 싫어하고 선호도가 완전히 유지되지 않는다는 문제점이 있다.But preference is not a static concept. In other words, user preferences are constantly changing. This change may be due to the current state or situation of the user. For example, preferences in the office and at home may be different. A common approach to preferences that allows users to explicitly specify preferences can be through a user profile or similar structure. However, some users do not like specifying preferences and have a problem that the preferences are not completely maintained.

본 명세서에서는, 변화하는 환경에 맞추어 사용자의 선호도 또는 관심을 관심을 적응적으로 결정할 수 있도록 하는 기술이 개시된다.In the present specification, a technique for enabling an adaptive determination of interest of a user's preference or interest in accordance with a changing environment is disclosed.

본 발명의 일 양상에 따른 사용자의 관심을 결정하는 방법은, 사용자의 사용 패턴과 관련되며, 적어도 하나 이상의 관심에 대해 상기 사용자가 가진 관심의 정도를 나타내는 적어도 하나 이상의 레이팅을 갖는 관심 부분과 상기 레이팅과 관련된 환경에 대한 정보를 갖는 환경 부분을 포함하는 데이터 항목을 저장하는 단계; 상기 데이터 항목을, 데이터 항목 및 관련된 환경 부분을 가지는 환경 그룹으로 그룹핑하는 단계; 상기 각각의 환경 그룹에 대하여, 상기 환경 그룹의 데이터 항목의 개수가 제 1 임계 값 이상인지 여부를 결정하는 단계; 상기 제 1 임계 값 이상의 데이터 항목의 개수를 갖는 환경 그룹을 갖는 제 1 파티션을 생성하는 단계; 상기 제 1 파티션 내의 환경 그룹에 속하는 데이터 항목의 관심에 대한 레이팅을 평균화하고, 상기 제 1 파티션 내의 각각의 환경 그룹을 클러스터링하는 단계; 및 현재 환경과 상기 제 1 파티션 내의 환경 그룹을 비교하여 사용자 관심을 획득하는 단계를 포함할 수 있다.A method of determining a user's interest in accordance with an aspect of the present invention is related to a usage pattern of the user, the interest portion having at least one rating indicating the degree of interest the user has with respect to at least one or more interests and the rating Storing a data item comprising an environment portion having information about an environment associated with the environment; Grouping the data items into an environment group having a data item and an associated environment portion; For each environment group, determining whether the number of data items in the environment group is greater than or equal to a first threshold value; Creating a first partition having an environment group having a number of data items above the first threshold value; Averaging ratings for interests of data items belonging to an environment group in the first partition and clustering each environment group in the first partition; And comparing the current environment with an environment group in the first partition to obtain user interest.

또한, 본 발명의 다른 양상에 따른 사용자의 관심을 결정하는 방법은, 사용자의 사용 패턴과 관련되며, 적어도 하나 이상의 관심에 대해 상기 사용자가 가진 관심의 정도를 나타내는 적어도 하나 이상의 레이팅을 갖는 관심 부분과 하루 중 시간을 포함하는 환경 값을 포함하며 상기 레이팅과 관련된 환경에 대한 정보를 갖는 환경 부분을 포함하는 데이터 항목을 저장하는 단계; 상기 데이터 항목을, 데이터 항목 및 관련된 환경 부분을 가지는 환경 그룹으로 그룹핑하는 단계; 상기 각각의 환경 그룹에 대하여, 상기 환경 그룹 내의 데이터 항목의 개수가 제 1 임계 값 이상인지 여부를 판단하는 단계; 상기 제 1 임계 값 이상의 데이터 항목의 개수를 갖는 환경 그룹을 갖는 제 1 파티션을 생성하는 단계; 상기 제 1 파티션 내의 환경 그룹에 속하는 데이터 항목의 관심에 대한 레이팅을 평균화하고, 상기 제 1 파티션 내의 각각의 환경 그룹을 클러스터링하는 단계; 상기 제 1 파티션의 환경 그룹에 대하여, 데이터 항목의 개수가 제 3 임계 값 이상이고 통계적으로 의미 있는 평균 레이팅의 개수가 제 2 임계 값 이하인 경우, 상기 제 1 파티션으로부터 환경 그룹을 필터링하는 단계; 및 현재 환경과 상기 제 1 파티션 내의 환경 그룹을 비교하여 상기 사용자의 관심을 획득하는 단계를 포함할 수 있다.In addition, a method of determining a user's interest in accordance with another aspect of the present invention relates to a user's usage pattern and includes at least one interest portion having at least one rating indicating a degree of interest the user has for at least one or more interests; Storing a data item comprising an environment value comprising a time of day and including an environment portion having information about the environment associated with the rating; Grouping the data items into an environment group having a data item and an associated environment portion; For each environment group, determining whether the number of data items in the environment group is greater than or equal to a first threshold value; Creating a first partition having an environment group having a number of data items above the first threshold value; Averaging ratings for interests of data items belonging to an environment group in the first partition and clustering each environment group in the first partition; Filtering, for the environment group of the first partition, an environment group from the first partition when the number of data items is greater than or equal to a third threshold and the number of statistically significant average ratings is less than or equal to a second threshold; And comparing the current environment with an environment group in the first partition to obtain an interest of the user.

본 발명의 일 양상에 따른 사용자의 관심을 결정하는 장치는, 인터페이스; 및 다수의 프로세서를 포함하며, 상기 프로세서는, 상기 사용자의 사용 패턴과 관련되며, 상기 사용자가 가진 관심의 정도에 대한 레이팅을 갖는 관심 부분과 상기 레이팅에 관련된 환경에 대한 정보를 갖는 환경 부분을 포함하는 데이터 항목을 저장하고; 상기 데이터 항목을, 상기 환경 부분과 관련된 데이터 항목을 갖는 환경 그룹들로 그룹핑하고; 상기 각각의 환경 그룹에 대하여, 상기 환경 그룹 내의 데이터 항목의 개수가 제 1 임계 값 이상인지 여부를 판단하고; 상기 제 1 임계 값 이상의 데이터 항목의 개수를 갖는 환경 그룹을 갖는 제 1 파티션을 생성하고; 상기 제 1 파티션 내의 환경 그룹에 속하는 데이터 항목의 관심에 대한 레이팅을 평균화하고, 상기 제 1 파티션 내의 각각의 환경 그룹을 클러스터링하고; 현재 환경과 상기 제 1 파티션 내의 환경 그룹을 비교하여 상기 사용자의 관심을 획득하는 것이 가능하다.An apparatus for determining interest of a user according to an aspect of the present invention includes an interface; And a plurality of processors, the processor being associated with a usage pattern of the user, the processor including an interest portion having a rating of the degree of interest the user has and an environment portion having information about the environment related to the rating. Store the data item; Group the data item into environment groups having a data item associated with the environment portion; For each environment group, determine whether the number of data items in the environment group is equal to or greater than a first threshold value; Create a first partition having an environment group having a number of data items above the first threshold; Averaging ratings for interests of data items belonging to environment groups in the first partition, clustering each environment group in the first partition; It is possible to obtain an interest of the user by comparing a current environment with an environment group in the first partition.

본 발명의 일 양상에 따른 사용자의 관심을 결정하는 시스템은, 사용자의 사용 패턴과 관련되며, 적어도 하나 이상의 관심에 대해 상기 사용자가 가진 관심의 정도를 나타내는 적어도 하나 이상의 레이팅을 갖는 관심 부분과 상기 레이팅과 관련된 환경에 대한 정보를 갖는 환경 부분을 포함하는 데이터 항목을 저장하는 수단; 상기 데이터 항목을, 데이터 항목 및 관련된 환경 부분을 가지는 환경 그룹으로 그룹핑하는 수단; 상기 각각의 환경 그룹에 대하여, 상기 환경 그룹의 데이터 항목의 개수가 제 1 임계 값 이상인지 여부를 결정하는 수단; 상기 제 1 임계 값 이상의 데이터 항목의 개수를 갖는 환경 그룹을 갖는 제 1 파티션을 생성하는 수단; 상기 제 1 파티션 내의 환경 그룹에 속하는 데이터 항목의 관심에 대한 레이팅을 평균화하고, 상기 제 1 파티션 내의 각각의 환경 그룹을 클러스터링하는 수단; 및 현재 환경과 상기 제 1 파티션 내의 환경 그룹을 비교하여 사용자 관심을 획득하는 수단을 포함할 수 있다.A system for determining a user's interest in accordance with an aspect of the present invention includes an interest portion and at least one rating associated with a usage pattern of the user and having at least one rating indicating a degree of interest the user has for at least one or more interests. Means for storing a data item including an environment portion having information about an environment associated with the environment; Means for grouping the data items into an environment group having a data item and an associated environment portion; Means for determining for each environment group whether the number of data items in the environment group is greater than or equal to a first threshold value; Means for creating a first partition having an environment group having a number of data items above the first threshold; Means for averaging ratings of interest of data items belonging to an environment group in the first partition and clustering each environment group in the first partition; And means for obtaining a user interest by comparing a current environment with an environment group in the first partition.

개시된 내용에 의하면, 통계적으로 의미 있는 데이터 항목이 클러스터링되기 때문에 변화하는 환경에 적응적으로 사용자의 관심을 결정하는 것이 가능해진다.According to the disclosed subject matter, statistically significant data items are clustered, so that it is possible to adaptively determine a user's interest in a changing environment.

이하, 첨부된 도면을 참조하여 본 발명의 실시를 위한 구체적인 예를 상세히 설명한다. Hereinafter, specific examples for carrying out the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 일 실시 예에 따라, 소프트웨어 어플리케이션(software application) 및 다른 항목(item)들은 현재 환경(current context) 및 과거 사용 정보(past usage information)와 관련된 데이터를 기초로 사용자에게 자동적으로 추천되는 것이 가능하다. 환경(context)이란 사용자 및/또는 사용자가 사용하는 장치가 속한 상황 또는 배경을 의미할 수 있다. 예컨대, 환경은 사용자의 위치, 즉 사용자가 집에 있는지 또는 사무실에 있는지 등을 나타낼 수 있다. 환경은 또한 하루 중 시간(the time of day)을 포함할 수 있다. 예컨대, 하루 중 시간은 아침(morning), 점심(afternoon) 또는 저녁(evening) 등이 될 수 있다. 그러나 환경이 이들에 한정되는 것은 아니며, 이 밖에도 다양한 환경 또는 상황 정보를 포함하는 것이 가능하다.According to one embodiment of the invention, the software application and other items are automatically recommended to the user based on data related to the current context and past usage information. It is possible. The context may refer to a situation or a background to which a user and / or a device used by the user belong. For example, the environment may indicate the location of the user, that is, whether the user is at home or in the office. The environment may also include the time of day. For example, the time of day may be morning, afternoon, evening, or the like. However, the environment is not limited to these, and in addition, it is possible to include various environment or situation information.

시스템이 선호도(preference)를 제공하기 위한 방법 중 하나는, 사용자가 자신의 선호도를 지정하도록 하고 이 선호도를 이용하여 어플리케이션 또는 서비스 추천을 안내하는 방법이 있다. 예컨대, 사용자는 사업과 관련된 소프트웨어에 대한 선호도를 지정하고, 시스템은 요청이 있는 경우 사업과 관련된 소프트웨어를 추천하기 위해 선호도를 이용하는 것이 가능하다. 따라서, 게임 어플리케이션과 워드 프로세싱 어플리케이션 중 어느 하나의 선택이 있는 경우, 시스템은 워드 프로세싱 어플리케이션을 추천하는 것이 가능하다.One way for the system to provide a preference is to have the user specify his or her preferences and use this preference to guide the application or service recommendation. For example, a user may specify a preference for software related to the business, and the system may use the preference to recommend software related to the business upon request. Thus, if there is a choice between a game application and a word processing application, the system is able to recommend a word processing application.

다른 방식으로는 사용자가 일정한 시간 주기에 걸쳐서 장치를 훈련시키는 과 정이 필요하다. 이것은 관리 트레이닝(supervised training)이라고 볼 수 있는데, 관리 트레이닝은 장치를 훈련시키기 위해서 많은 시간과 노력이 소요되기 때문에 적절치 아니하다. 기존 방식의 또 다른 단점은 변화하는 장치 사용의 선호도, 환경, 또는 습관 등에 적응적으로 동작하지 않는다는 데에 있다. 따라서 모바일 장치를 위한 개선된 기술이 필요하다.Alternatively, the user needs to train the device over a period of time. This can be seen as supervised training, which is not appropriate because it takes a lot of time and effort to train the device. Another drawback of the existing approach is that it does not adapt adaptively to changing preferences, circumstances, or habits of using the device. Thus, there is a need for improved technology for mobile devices.

본 발명의 일 실시 예에 따라, 사용자에 대한 선호도 정보는 다른 환경으로부터 사용자의 선호도를 이용해서 현재 환경을 통해 계산되는 것이 가능하다.According to an embodiment of the present invention, preference information for the user may be calculated through the current environment using the user's preference from another environment.

과거의 사용 정보와 관련된 데이터는 데이터 포인트(data point)로 수집 및 저장되는 것이 가능하다. 데이터 포인트 정보는 벡터로 저장될 수 있다. 도 1은 본 발명의 일 실시 예에 따라, 하나의 데이터 포인트(single data point)에 대한, 환경 및 관심 정보를 저장하는 벡터를 도시한다. 도 1에서, 환경 정보(context information)는 시간(time)(100), 위치(location)(102), 및 온도(temperature)(104)를 포함한다. 관심 정보(interest information)는 어떤 관심 대상에 대한 사용자의 관심을 나타내는 것으로, 팝 뮤직(pop music)(106) 및 클래식 뮤직(classical music)(108)을 포함할 수 있다.Data related to past usage information can be collected and stored at data points. The data point information can be stored as a vector. 1 illustrates a vector for storing environment and interest information about a single data point according to an embodiment of the present invention. In FIG. 1, context information includes a time 100, a location 102, and a temperature 104. Interest information represents the user's interest in a particular subject of interest and may include pop music 106 and classical music 108.

각각의 데이터 포인트는 2차원 그래프 상의 점으로 시각화될 수 있다. 도 2는 본 발명의 일 실시 예에 따른 그래프를 도시한다. 그래프에서, 데이터 포인트들 간의 거리(proximity)는 벡터의 유사도 레벨을 나타낸다. 예컨대, 데이터 포인트들은 유사한 데이터 포인트들끼리 클러스터(cluster)로 그룹핑되는 것이 가능하다. 클러스터링(clustering)은 어떤 대상을 다른 그룹으로 분류하는 것으로, 어떤 데이터 세트(set)를 서브세트(sub-set)(즉, 클러스터)로 분할해서 각각의 서브세트 내의 데이터들이 공통의 성질을 공유하도록 만드는 과정이 될 수 있다. 이러한 클러스터는 시스템이 추천을 위해 적절한 어플리케이션을 선택하도록 하는 데에 활용될 수 있다.Each data point can be visualized as a point on a two-dimensional graph. 2 illustrates a graph according to an embodiment of the present invention. In the graph, the distance between data points represents the similarity level of the vector. For example, the data points can be grouped into clusters of similar data points. Clustering is the categorization of an object into different groups, dividing a data set into subsets (i.e. clusters) so that the data in each subset share a common property. It can be a making process. Such clusters can be used to allow the system to select the appropriate application for recommendation.

데이터 포인트들을 클러스터링하는 하나의 방법은 주어진 데이터 포인트들에 대해 최적의 클러스터 개수를 결정하는 시도가 될 수 있다. 이 최적의 클러스터 개수는 k와 같이 나타낼 수 있다. 이 때, 최적의 클러스터 개수를 결정하는 방법은 본 실시 예에 따른 발명의 범위를 넘는 것으로 일반적인 방법이 사용될 수 있다. 그러나 일반적인 방법은 여러 가지 결점이 존재한다. k는 동적으로 결정되어야만 하며, 이를 위해서는 현저한 프로세싱 파워(processing power)가 요구된다. 프로세싱 파워는 모바일 장치에서 부족할 수가 있다. 게다가 정확한 k를 결정하기가 어렵고, 부정확한 추정은 클러스터링 및 추천에 부정적인 영향을 미친다. 분할된 클러스터가 주어진 경우, 패턴 추출을 위한 하나의 방법은, 현재 데이터 포인트와 각각의 클러스터 중심(centroid)을 비교하고 현재 데이터 포인트가 어떤 클러스터에 속하는지를 결정하는 방법이 있다. 이어서 관심 패턴이 이 클러스터로부터 추출되는 것이 가능하다. n-차원 구조에 있어서, 중심은 X를 동일한 모멘트의 두 부분으로 나누는 모든 초평면의 교차점이 될 수 있다. 이것은 X의 모든 포인트의 "평균"으로 볼 수 있다. 따라서 클러스터의 중심은 클러스터의 모든 포인트에 대한 평균이 될 수 있다. 도 3은 본 발명의 일 실시 예에 따른 클러스터링된 데이터 포인트들의 그래프를 도시한다. 도 3에서, 300 클러스터의 중심은 302가 될 수 있 고, 304 클러스터의 중심은 306이 될 수 있다.One method of clustering data points may be an attempt to determine the optimal number of clusters for given data points. This optimal number of clusters can be expressed as k. At this time, the method for determining the optimal number of clusters is beyond the scope of the present invention according to the present embodiment can be used a general method. However, the general method has several drawbacks. k must be determined dynamically, which requires significant processing power. Processing power may be lacking in mobile devices. In addition, it is difficult to determine the correct k, and inaccurate estimates have a negative impact on clustering and recommendation. Given a partitioned cluster, one method for pattern extraction is to compare the current data point with each cluster centroid and determine which cluster the current data point belongs to. It is then possible for the pattern of interest to be extracted from this cluster. For n-dimensional structures, the center can be the intersection of all hyperplanes that divide X into two parts of the same moment. This can be seen as the "average" of all points of X. Thus, the center of the cluster can be the average of all points in the cluster. 3 illustrates a graph of clustered data points according to an embodiment of the present invention. In FIG. 3, the center of 300 clusters may be 302 and the center of 304 clusters may be 306.

클러스터링과 별개로, 패턴을 추출하는 다른 방식은 현재 데이터 포인트와 인접한 데이터 포인트들을 그룹핑하는 방식이 될 수 있다. 인접한 데이터 포인트들은 현재 데이터 포인트의 환경 및 관심과 거의 유사한 환경 및 관심을 가질 수 있다. 관심 패턴은 이러한 인접한 데이터 포인트로부터 추출되는 것이 가능하다. 그러나 이러한 방식은 모든 데이터 포인트들끼리 비교를 해야만 하므로, 모바일 장치와 같이 자원이 제한적인 온라인 계산에는 적합하지 아니할 수 있다.Apart from clustering, another way of extracting the pattern may be to group data points adjacent to the current data point. Adjacent data points may have an environment and interest that are close to those of the current data point. The pattern of interest can be extracted from these adjacent data points. However, this method must be compared between all data points, and thus may not be suitable for resource-constrained online calculations, such as mobile devices.

본 발명의 일 실시 예에 따라, 사용자의 선호도는 다양한 키워드에 제공되는 레이팅(rating)에 의해 지정되는 것이 가능하다. 예컨대, 사용자가 클래식 뮤직 및 락 뮤직에 강한 관심을 보이는 경우, 키워드 "클래식 뮤직" 및 "락 뮤직"에는 높은 레이팅이 부여될 수 있다. 일 예로써, 레이팅은 0부터 5 사이의 정수 값을 가질 수 있다. 레이팅 0은 관심이 없음을 나타내고 레이팅 5는 강한 관심을 나타낼 수 있다. 사용자는 몇 개의 키워드에 대해서만 레이팅을 부여하고 나머지 키워드에 대해서는 레이팅을 부여하지 않고 남겨두는 것도 가능하다.According to an embodiment of the present invention, the user's preference may be specified by a rating provided to various keywords. For example, if the user shows a strong interest in classical music and rock music, high ratings can be given to the keywords "classic music" and "rock music". As one example, the rating may have an integer value between 0 and 5. Rating 0 may indicate no interest and rating 5 may indicate strong interest. The user may assign ratings to only a few keywords and leave the remaining keywords unrated.

주어진 환경 및 관심 레이팅이 로그(log)로 저장되면, 이 정보는 벡터화되는 것이 가능하다. 환경 종류에 대하여, 1-in-N 인코딩이 사용될 수 있다. 이것은 환경 종류가 N 값 중 어느 하나인 경우, 환경 종류가 N 크기의 벡터로 나타내질 수 있음을 의미한다. 벡터의 위치는 현재 환경은 1의 값으로 설정되고 나머지는 0의 값으로 설정된 것에 대응될 수 있다. 도 4는 본 발명의 일 실시 예에 따른 환경 세트(context set)를 도시한다. 도 4에서, 가능한 시간 환경(예컨대, 아침, 점심, 저녁)은 상응하는 값(예컨대, 각각 1, 2, 3)으로 표현될 수 있다. If a given environment and interest rating are stored in a log, this information can be vectorized. For the environment type, 1-in-N encoding can be used. This means that if the environment type is any one of N values, the environment type can be represented by a vector of size N. The position of the vector may correspond to the current environment being set to a value of 1 and the rest to a value of zero. 4 illustrates a context set according to an embodiment of the present invention. In FIG. 4, possible time environments (eg, breakfast, lunch, dinner) may be represented by corresponding values (eg, 1, 2, 3 respectively).

도 5는 본 발명의 일 실시 예에 따른 환경 인코딩(context encoding)을 도시한다. 도 5에서, 아침, 점심, 및 저녁은 3개의 항목을 포함하는 벡터로 인코딩되는 것이 가능하다.5 illustrates a context encoding according to an embodiment of the present invention. In FIG. 5, breakfast, lunch, and dinner can be encoded into a vector containing three items.

인스턴스 벡터(instance vector) 내의 각각의 키워드에 대해, "x"는 레이팅되지 아니한 키워드를 지정하는 데에 사용될 수 있고, 0 내지 5는 레이팅된 키워드의 사용자 관심 레벨을 지정하는 데에 사용될 수 있다. 도 6은 본 발명의 일 실시 예에 따른 인스턴스 벡터를 도시한다. 도 6에서, 사용자가 아침에 사무실에 있는 경우 클래식 뮤직은 "5" 레이팅을 갖고 팝 뮤직은 레이팅을 갖지 않는 것이 가능하다. 그리고 사용자가 저녁에 집에 있는 경우 사용자는 팝 뮤직에 "1" 레이팅을 부여하고 클래식 뮤직에는 레이팅을 부여하지 않는 것이 가능하다.For each keyword in the instance vector, "x" may be used to specify an unrated keyword, and 0 through 5 may be used to specify a user interest level of the rated keyword. 6 illustrates an instance vector according to an embodiment of the present invention. In FIG. 6, it is possible that if the user is in the office in the morning, the classical music has a "5" rating and the pop music has no rating. And if the user is at home in the evening, it is possible for the user to give the pop music a "1" rating and not the classic music.

인코딩된 데이터가 주어지면, 전체 로그는 환경 값의 조합을 이용하여 인코딩된다. 예를 들어, "아침, 사무실"에 대한 모든 인스턴스(instance)는 하나의 벡터로 그룹핑되고 "아침, 집"에 대한 모든 인스턴스는 다른 벡터로 그룹핑되는 것이 가능하다. 그룹핑이 이루어지는 동안, 각각의 키워드에 대한 레이팅은 키워드에 대한 레이팅의 평균값을 계산하는 것에 의해 결합되는 것이 가능하다. 이 때, "x" 값으로 주어진 레이팅은 무시할 수 있다. 도 7은 본 발명의 실시 예에 따른 그룹핑된 인스턴스 데이터를 도시한다.Given encoded data, the entire log is encoded using a combination of environment values. For example, all instances of "morning, office" can be grouped into one vector and all instances of "morning, home" can be grouped into another vector. During the grouping, the ratings for each keyword can be combined by calculating the average value of the ratings for the keywords. At this time, the rating given by the value "x" can be ignored. 7 illustrates grouped instance data according to an embodiment of the present invention.

그룹핑의 목적은 통계적 통계적 의미에 기초하여 패턴을 찾는 것이다. 예를 들어, 환경 그룹 내에 다수의 데이터 인스턴스가 있는 경우, 통계적으로 의미있는 환경 패턴이 환경 그룹으로부터 추출될 수 있음을 의미할 수 있다. 제 1 임계 값은 환경 그룹을 두 개의 파티션에 할당하는데 사용될 수 있다. 제 1 파티션은 데이터 인스턴스의 개수가 임계 값과 동일하거나 그 이상인 환경 그룹을 포함할 수 있다. 제 2 파티션은 데이터 인스턴스의 개수가 임계 값 미만인 환경 그룹을 포함할 수 있다. 예컨대, 제 1 임계 값이 5로 설정된 경우, 제 1 파티션은 5개 이상의 데이터 인스턴스를 갖는 환경 그룹을 포함할 수 있다. 이 때, 제 1 임계 값은 관리자 또는 실행되는 어플리케이션에 따라 동적으로 다양한 값이 사용되는 것이 가능하다.The purpose of grouping is to find patterns based on statistical statistical significance. For example, if there are multiple data instances within an environment group, this may mean that a statistically significant environment pattern can be extracted from the environment group. The first threshold can be used to assign an environment group to two partitions. The first partition may include an environment group in which the number of data instances is equal to or greater than the threshold. The second partition may include an environment group in which the number of data instances is less than the threshold. For example, when the first threshold value is set to 5, the first partition may include an environment group having five or more data instances. In this case, the first threshold value may be dynamically used according to an administrator or an application to be executed.

전술하였듯이, 레이팅 데이터는 유효한 레이팅(valid rating) 및 미싱 레이팅(missing rating)도 포함한다. 이러한 미싱 레이팅은 모르는 것으로 취급되어야 한다. 특정한 관심에 대해 몇 개의 유효한 레이팅만 있는 경우, 시스템은 그 관심에 대해 어떤 믿을 만한 레이팅을 결론질 수 없게 된다. 이는 그룹 내의 데이터 인스턴스의 개수가 제 1 임계 값을 초과하는 경우에도 마찬가지이다. 따라서 환경 그룹을 필터링하는 것이 필요하다. 제 1 파티션 내의 각각의 환경 그룹에 대하여, 키워드에 대한 평균 레이팅 값은 환경 그룹의 모든 데이터 인스턴스로부터의 키워드에 대한 유효한 레이팅을 통해 계산된다. 그리고 그룹 내의 평균 레이팅 개수가 계산된다. 만약, 평균 레이팅 개수가 제 2 임계 값 보다 작은 경우 환경 그룹 내의 레이팅의 개수는 통계적으로 의미가 없는 것으로 볼 수 있다.As mentioned above, the rating data also includes a valid rating rating and a missing rating. Such missing ratings should be treated as unknown. If there are only a few valid ratings for a particular interest, the system will not be able to conclude any reliable rating for that interest. This is true even if the number of data instances in the group exceeds the first threshold. Therefore, it is necessary to filter out environmental groups. For each environment group in the first partition, the average rating value for the keyword is calculated through the effective rating for the keyword from all data instances of the environment group. The average number of ratings in the group is then calculated. If the average rating number is smaller than the second threshold value, the number of ratings in the environmental group may be considered to be statistically insignificant.

게다가, 각각의 평균 레이팅에 대하여, 시스템은 얼마나 많은 유효 레이팅이 평균화되는지 고려해야한다. 평균 레이팅이 불충분한 유효 레이팅 개수로부터 계 산되면, 평균 레이팅 값 역시 통계적으로 의미가 없기 때문이다. 따라서 제 3 임계 값을 정의한다. 제 3 임계 값은 평균 레이팅이 통계적으로 의미 있기 위해 필요한 유효 레이팅 개수의 최소 값이 될 수 있다.In addition, for each average rating, the system must consider how many effective ratings are averaged. If the average rating is calculated from an insufficient number of effective ratings, the average rating value is also not statistically significant. Therefore, define a third threshold value. The third threshold may be the minimum value of the number of effective ratings required for the average rating to be statistically significant.

이러한 3개의 임계 값을 통해, 인스턴스는 의미있는 관심 패턴을 추출할 수 있을 만큼의 인스턴스로 감소된다. 이러한 의미있는 파티션(meaningful partition)이란 데이터 인스턴스의 개수가 제 1 임계 값 보다 크고, 평균 레이팅의 개수가 제 2 임계 값 보다 크며, 유효한 레이팅의 개수가 제 3 임계 값 보다 큰 환경 그룹을 의미할 수 있다. With these three thresholds, instances are reduced to enough instances to extract meaningful patterns of interest. Such a meaningful partition may mean an environment group in which the number of data instances is greater than the first threshold, the average rating is greater than the second threshold, and the number of valid ratings is greater than the third threshold. have.

선택된 환경 그룹(즉, 제 1 파티션) 내의 환경 상태 및 평균 레이팅은 환경 상태 및 레이팅에 관한 초기 중심으로 사용된다. 따라서 초기 k 값은 통계적 의미의 식별 가능 패턴의 개수(즉, 제 1 파티션의 환경 그룹의 개수)와 동일하다. 다른 환경 그룹(즉, 제 2 파티션) 내의 나머지 데이터 인스턴스는 이러한 초기 그룹을 기초로 할당된다. 나머지 데이터 인스턴스는 아웃라이어(outlier)와 같이 나타낼 수 있다.The environmental state and average rating in the selected environment group (ie, the first partition) are used as the initial center for the environmental state and rating. Thus, the initial k value is equal to the number of identifiable patterns of statistical significance (ie, the number of environmental groups of the first partition). The remaining data instances in other environment groups (ie, second partitions) are allocated based on this initial group. The remaining data instances can be represented as outliers.

아웃라이어는 환경 상태 및 각각의 존재하는 (제 1 파티션 내의) 환경 그룹의 환경 상태 간의 유클리드 거리(Euclidean distance)를 계산하는 것에 의해 제 1 파티션의 환경 그룹으로 할당되는 것이 가능하다. 최소 유클리드 거리를 갖는 하나는 새로운 데이터 인스턴스가 할당된 환경 그룹이 될 수 있다. 할당이 되면, 할당된 환경 그룹은 새롭게 할당된 데이터 인스턴스를 결합하여 그것의 환경 상태 중심을 업데이트 하는 것이 가능하다.The outlier may be assigned to an environment group of the first partition by calculating an Euclidean distance between the environment state and the environment state of each existing (in the first partition) environment group. One with the minimum Euclidean distance can be an environment group to which a new data instance is assigned. Once assigned, the assigned environment group can update its environmental state center by combining the newly allocated data instances.

인스턴스가 할당되면, 환경 그룹의 레이팅 중심은 그룹 내의 모든 데이터 인스턴스로부터 각각의 키워드에 대한 평균 레이팅을 다시 계산하여 업데이트 되는 것이 가능하다.Once an instance is assigned, the rating center of the environment group can be updated by recalculating the average rating for each keyword from all data instances in the group.

본 발명의 일 실시 예에 있어서, 아웃라이어를 존재하는 클러스터에 위치시키는 것 외에, 모든 아웃라이어를 포함하는 새로운 "아웃라이어 클러스터"를 형성하는 것도 가능하다. 이러한 방식에 의하면 계산 파워를 줄일 수도 있다.In one embodiment of the present invention, in addition to placing an outlier in an existing cluster, it is also possible to form a new "outlier cluster" that includes all outliers. In this way, computational power can be reduced.

현재 환경 하에서 사용자의 선호도를 획득할 때, 먼저 현재 환경을 획득한다. 그리고 1-N 인코딩을 이용하여 현재 환경을 인코딩된 환경으로 변환한다. 현재 인코딩된 환경은 각각의 클러스터 환경 중심과 비교된다. 현재 환경으로부터 클러스터 중심 간의 가장 작은 거리를 갖는 클러스터가 사용되고, 선택된 클러스터의 레이팅 중심은 사용자의 선호도로 이용되는 것이 가능하다.When acquiring the user's preferences under the current environment, first obtain the current environment. The 1-N encoding is used to convert the current environment into the encoded environment. The current encoded environment is compared with the center of each cluster environment. It is possible that a cluster having the smallest distance between the cluster centers from the current environment is used, and the rating center of the selected cluster is used as the user's preference.

새로운 데이터 인스턴스가 도착하면(즉, 사용자가 새로운 선호도를 지시하면), 시스템은 새로운 데이터 인스턴스를 존재하는 클러스터에 위치시키는 것이 가능하다. 새로운 데이터 인스턴스와 함께 그룹핑 가능한 존재하는 클러스터 내의 데이터 인스턴스가 충분한 경우, 클러스터가 두 개로 분할되는 것이 가능하다. 일 예로써, 이것은 단일 반복을 통해 수행된다. 그러나, 존재하는 클러스터의 평가(examination)가 반복적으로 수행되어 그 결과 초기 클러스터 계산 동안 만들어진 모든 가정들이 재평가 될 수 가 있다. 예를 들어, 시스템은 새로운 데이터 인스턴스가 수신될 때마다 전체 클러스터 알고리즘을 다시 수행할 수도 있다.When a new data instance arrives (ie, a user indicates a new preference), it is possible for the system to place the new data instance in an existing cluster. If there are enough data instances in an existing cluster that can be grouped with new data instances, it is possible that the cluster is split into two. In one example, this is done through a single iteration. However, the evaluation of existing clusters may be performed repeatedly so that all assumptions made during the initial cluster calculation may be reevaluated. For example, the system may run the entire cluster algorithm again each time a new data instance is received.

도 8은 본 실시 예에 따른 사용자의 관심을 결정하는 방법을 도시한다. 800 에서, 사용자의 사용 패턴과 관련된 데이터 항목이 저장된다. 이 때, 데이터 항목은 하나 이상의 관심에 대해 사용자가 가진 관심의 정도로서 하나 이상의 레이팅을 갖는 관심 부분과 레이팅과 관련된 환경으로서의 정보를 갖는 환경 부분을 포함할 수 있다. 802에서, 데이터 항목은 환경 그룹으로 그룹핑된다. 각각의 환경 그룹은 환경 부분과 관련된 데이터 항목을 갖는 것이 가능하다. 데이터 항목의 환경 부분은 적어도 하나의 환경 값을 공통으로 갖는 경우 "관련"된다고 볼 수 있다. 804에서, 각각의 환경 그룹에 대해, 환경 그룹 내의 데이터 항목의 개수가 제 1 임계 값과 동일하거나 그 보다 큰지 여부를 결정한다. 806에서, 제 1 파티션이 생성된다. 제 1 파티션은 제 1 임계 값과 동일하거나 그보다 큰 데이터 항목의 개수를 갖는 환경 그룹을 갖는 것이 가능하다. 808에서, 제 1 파티션 내의 환경 그룹 내의 데이터 항목 내의 관심에 대한 레이팅이 평균화되고, 그 결과 제 1 파티션 내의 각각의 환경 그룹은 클러스터를 형성한다. 810에서, 환경 그룹은 통계적으로 의미 있는 평균 레이팅이 제 2 임계 값 보다 작은 경우 제 1 파티션으로부터 필터링된다. 이 때, 평균 레이팅은 데이터 항목의 개수가 제 3 임계 값과 같거나 그 보다 큰 경우 통계적으로 의미 있는 것이 될 수 있다. 812에서, 제 1 파티션에 속하지 아니한 환경 그룹의 데이터 항목(즉, 원래부터 806에서 위치되지 아니하였거나 810에서 필터링된 데이터 항목)은 클러스터에 할당된다. 이에 대한 자세한 내용은 후술한다. 814에서, 사용자의 관심은 현재 환경을 제 1 파티션 내의 환경 그룹과 비교하여 얻어진다. 이것은 클러스터의 중심과 현재 환경 간의 유클리드 거리를 결정하는 것에 의해 얻어진 최대 근접 클러스터를 결정하는 과정을 포함할 수 있다. 최대 근접 클러스터의 평균 레이팅은 값을 갖는 관심은 사용자에게 추천하기 위해 사용될 수 있다.8 illustrates a method of determining interest of a user according to an exemplary embodiment. At 800, a data item associated with a user's usage pattern is stored. At this time, the data item may include an interest portion having one or more ratings as the degree of interest the user has for one or more interests and an environment portion having information as an environment related to the rating. At 802, data items are grouped into environment groups. Each environment group is capable of having data items associated with the environment part. The environmental portion of a data item may be considered "related" if it has at least one environmental value in common. At 804, it is determined whether for each environment group the number of data items in the environment group is equal to or greater than the first threshold. At 806, a first partition is created. It is possible for the first partition to have an environment group with a number of data items equal to or greater than the first threshold. At 808, the rating for the interest in the data item in the environment group in the first partition is averaged, such that each environment group in the first partition forms a cluster. At 810, the environmental group is filtered from the first partition if the statistically significant average rating is less than the second threshold. In this case, the average rating may be statistically significant when the number of data items is equal to or greater than the third threshold value. At 812, data items of the environment group that do not belong to the first partition (i.e., data items not originally located at 806 or filtered at 810) are assigned to the cluster. Details thereof will be described later. At 814, the user's attention is obtained by comparing the current environment with an environment group in the first partition. This may include determining the maximum proximity cluster obtained by determining the Euclidean distance between the center of the cluster and the current environment. The average rating of the closest cluster can be used to recommend to users with a value.

도 9는 본 발명의 일 실시 예에 따라, 제 1 파티션에 포함되지 아니한 환경 그룹의 데이터 항목을 할당하는 방법을 도시한다. 이것은 도 8의 단계 812에 대응될 수 있다. 900에서, 제 1 파티션에 포함되지 아니한 환경 그룹의 각각의 데이터 항목의 환경 부분과 제 1 파티션의 클러스터의 중심 간의 유클리드 거리가 측정된다. 902에서, 제 1 파티션에 포함되지 아니한 환경 그룹의 각 데이터 항목에 대해, 데이터 항목이 클러스터의 중심과 데이터 항목의 환경 부분 간의 최소 유클리드 거리를 갖는 제 1 파티션의 클러스터에 추가된다. 904에서, 제 1 파티션의 클러스터에 대한 중심점은 다시 계산된다.9 illustrates a method of allocating data items of an environment group not included in the first partition according to an embodiment of the present invention. This may correspond to step 812 of FIG. 8. At 900, the Euclidean distance between the environmental portion of each data item of the environmental group not included in the first partition and the center of the cluster of the first partition is measured. At 902, for each data item of an environment group not included in the first partition, a data item is added to the cluster of the first partition having a minimum Euclidean distance between the center of the cluster and the environment portion of the data item. At 904, the center point for the cluster of the first partition is recalculated.

도 10은 본 발명의 다른 실시 예에 따라, 제 1 파티션에 포함되지 아니한 환경 그룹의 데이터 항목을 할당하는 방법을 도시한다. 이것은 도 8의 단계 812에 대응될 수 있다. 1000에서, 제 1 파티션에 포함되지 아니한 환경 그룹의 모든 데이터 항목은 하나의 아웃라이어 클러스터로 병합된다. 1002에서, 아웃라이어 클러스터에 대한 중심이 계산된다.10 illustrates a method of allocating data items of an environment group not included in the first partition according to another embodiment of the present invention. This may correspond to step 812 of FIG. 8. At 1000, all data items of the environment group not included in the first partition are merged into one outlier cluster. At 1002, the centroid for the outlier cluster is calculated.

도 11은 본 발명의 일 실시 예에 따른 시스템을 도시한다. 도 11에서, 시스템은 모바일 장치가 될 수 있으나 반드시 이에 한정되는 것은 아니다. 예컨대, 시스템은 전술한 과정을 수행하는 프로세서를 갖는 모바일 장치를 포함할 수 있으며, 이러한 과정을 수행하는 서버를 포함할 수도 있다. 또한 시스템은 데스크탑 컴퓨터를 포함할 수도 있다.11 illustrates a system according to an embodiment of the present invention. In FIG. 11, the system may be, but is not necessarily limited to, a mobile device. For example, the system may include a mobile device having a processor that performs the foregoing process, and may also include a server that performs this process. The system may also include a desktop computer.

시스템은 인터페이스(1100), 적어도 하나 이상의 프로세서(1102)를 포함할 수 있다. 프로세서(1102)는 도 8 내지 도 10에서 설명한 과정을 수행하는 것이 가능하다. 또한, 시스템이 모바일 장치를 포함하는 경우, 시스템은 부하 검출 모듈(1104)을 포함하는 것도 가능하다. 부하 검출 모듈(1104)은 프로세서(1102)의 부하 레벨(즉, 프로세서가 얼마나 바쁜지를 나타내는)을 측정하는 것이 가능하다. 이러한 경우, 다수의 프로세서가 장치가 사용되지 않거나 비교적 적게 사용되는 경우 저장, 클러스터링, 결정 등을 수행하고 사용자에 의해 실제 추천 요청이 있는 경우 선택, 계산 및 추천 등을 수행하는 것이 가능하다. The system may include an interface 1100 and at least one processor 1102. The processor 1102 may perform the process described with reference to FIGS. 8 to 10. In addition, if the system includes a mobile device, the system may also include a load detection module 1104. The load detection module 1104 is capable of measuring the load level of the processor 1102 (ie, indicating how busy the processor is). In such a case, it is possible for multiple processors to perform storage, clustering, determination, etc. when the device is not used or relatively small, and to perform selection, calculation and recommendation when there is a real recommendation request by the user.

한편, 본 발명의 실시 예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨팅 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, embodiments of the present invention may be implemented in computer readable codes on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computing system.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨팅 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which may also be implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computing systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present invention belongs.

이상에서 본 발명의 실시를 위한 구체적인 예를 살펴보았다. 전술한 실시 예 들은 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 권리범위가 특정 실시 예에 한정되지 아니할 것이다.In the above, the specific example for the implementation of the present invention has been described. The above-described embodiments are intended to illustrate the present invention by way of example and the scope of the present invention will not be limited to the specific embodiments.

도 1은 본 발명의 일 실시 예에 따른 하나의 데이터 포인트에 대한 환경 및 관심 정보를 저장하는 벡터를 도시한다.1 illustrates a vector for storing environment and interest information about one data point according to an embodiment of the present invention.

도 2는 본 발명의 일 실시 예에 따른 그래프를 도시한다.2 illustrates a graph according to an embodiment of the present invention.

도 3은 본 발명의 일 실시 예에 따른 클러스터링된 데이터 포인트를 갖는 그래프를 도시한다.3 shows a graph with clustered data points according to one embodiment of the invention.

도 4는 본 발명의 일 실시 예에 따른 환경 세트를 도시한다.4 illustrates an environment set according to an embodiment of the present invention.

도 5는 본 발명의 일 실시 예에 따른 환경 인코딩을 도시한다.5 illustrates environment encoding according to an embodiment of the present invention.

도 6은 본 발명의 일 실시 예에 따른 인스턴스 벡터를 도시한다.6 illustrates an instance vector according to an embodiment of the present invention.

도 7은 본 발명의 일 실시 예에 따른 그루핑된 인스턴스 데이터를 도시한다.7 illustrates grouped instance data according to an embodiment of the present invention.

도 8은 본 발명의 일 실시 예에 따른 사용자의 관심을 결정하는 방법을 도시한다.8 illustrates a method of determining a user's interest according to an embodiment of the present invention.

도 9는 본 발명의 일 실시 예에 따라 제 1 파티션에 포함되지 아니한 환경 그룹의 데이터 아이템을 할당하는 방법을 도시한다.9 illustrates a method of allocating data items of an environment group not included in the first partition according to an embodiment of the present invention.

도 10은 본 발명의 다른 실시 예에 따라 제 1 파티션에 포함되지 아니한 환경 그룹의 데이터 아이템을 할당하는 방법을 도시한다.10 illustrates a method of allocating data items of an environment group not included in the first partition according to another embodiment of the present invention.

도 11은 본 발명의 일 실시 예에 따른 시스템을 도시한다.11 illustrates a system according to an embodiment of the present invention.

Claims

A data item associated with a usage pattern of the user, the data item including an interest portion having at least one rating indicating a degree of interest the user has for at least one interest and an environment portion having information about the environment associated with the rating; Storing;

Grouping the data items into an environment group having a data item and an associated environment portion;

For each environment group, determining whether the number of data items in the environment group is greater than or equal to a first threshold value;

Creating a first partition having an environment group having a number of data items above the first threshold value;

Averaging ratings for interests of data items belonging to an environment group in the first partition and clustering each environment group in the first partition; And

Comparing the current environment with an environment group in the first partition to obtain user interest.

The method of claim 1,

Filtering an environment group from the first partition when the number of statistically significant average ratings is less than or equal to a second threshold value for the environment group of the first partition,

Wherein said average rating determines a user's statistically significant interest when the number of data items is greater than or equal to a third threshold.

The method of claim 1,

And encoding the data item into a vector.

The method of claim 3, wherein

Wherein each encoded vector comprises an environment portion and an interest portion, wherein the interest portion comprises a keyword list and one or more ratings corresponding to the keyword.

The method of claim 1,

Creating a second partition having an environment group having a number of data items that is greater than or equal to the first threshold value; And

For each data item in an environment group in the second partition, the data item has the first partition having a minimum Euclidean distance between the center of the cluster in the first partition and the environment portion of the data item. Adding to the cluster within the method.

The method of claim 1,

Merging all of the environment groups in the second partition into one outlier cluster.

The method of claim 5,

After adding the data item to a cluster in the first partition, recalculating centroids for each cluster in the first partition.

The method of claim 1,

Receiving a new data item;

Placing the new data item in a cluster in the first partition; And

If the cluster is about to have two clusters having a number of data items equal to or greater than the first threshold value, dividing the cluster in which the new data item is located.

The method of claim 1,

Wherein each environment portion has a plurality of environment values, each environment value representing a description of a characteristic associated with a state including a usage pattern.

A preference value associated with a usage pattern of the user, the preference value having at least one rating indicating the degree of interest the user has for at least one interest and an environment value including time of day; Storing a data item comprising an environment portion with information;

Averaging ratings for interests of data items belonging to an environment group in the first partition and clustering each environment group in the first partition;

Filtering, for the environment group of the first partition, an environment group from the first partition when the number of data items is greater than or equal to a third threshold and the number of statistically significant average ratings is less than or equal to a second threshold; And

And comparing the current environment with an environment group in the first partition to obtain the user's interest.

interface; And

Includes multiple processors,

The processor comprising:

Store a data item associated with a usage pattern of the user, the data item including an interest portion having a rating of the degree of interest the user has and an environment portion having information about an environment related to the rating;

Group the data item into environment groups having a data item associated with the environment portion;

For each environment group, determine whether the number of data items in the environment group is equal to or greater than a first threshold value;

Create a first partition having an environment group having a number of data items above the first threshold;

Averaging ratings for interests of data items belonging to environment groups in the first partition, clustering each environment group in the first partition;

And compare the current environment with an environment group in the first partition to obtain the user's interest.

The method of claim 11,

And the device is a mobile device.

A data item associated with a usage pattern of the user, the data item including an interest portion having at least one rating indicating a degree of interest the user has for at least one interest and an environment portion having information about the environment associated with the rating; Means for storing;

Means for grouping the data items into an environment group having a data item and an associated environment portion;

Means for determining for each environment group whether the number of data items in the environment group is greater than or equal to a first threshold value;

Means for creating a first partition having an environment group having a number of data items above the first threshold;

Means for averaging ratings of interest of data items belonging to an environment group in the first partition and clustering each environment group in the first partition; And

Means for obtaining a user interest by comparing a current environment with an environment group within the first partition.

The method of claim 13,

Means for filtering an environment group from the first partition when the number of statistically significant average ratings is less than or equal to a second threshold, for the environment group of the first partition,

Wherein said average rating determines a statistically meaningful user's interest when the number of data items is greater than or equal to a third threshold.

The method of claim 13,

Means for creating a second partition having an environment group having a number of data items that is greater than or equal to the first threshold; And

For each data item in an environment group in the second partition, the data item has the first partition having a minimum Euclidean distance between the center of the cluster in the first partition and the environment portion of the data item. And adding means to the cluster in the system.

The method of claim 13,

And means for merging all environment groups in the second partition into one outlier cluster.

The method of claim 15,

And after adding the data item to a cluster in the first partition, means for recalculating centroids for each cluster in the first partition.

The method of claim 13,

Means for receiving a new data item;

Means for placing the new data item in a cluster in the first partition; And

Means for dividing the cluster in which the new data item is located when the cluster is about to have two clusters having a number of data items above the first threshold.

The method of claim 13,

Wherein each environment portion has a plurality of environment values, each environment value determining interest of a user representing a description of a property associated with a state including a usage pattern.

And comparing a current environment with an environment group in the first partition to obtain user interest.