KR100979516B1

KR100979516B1 - Service recommendation method for network-based robot, and service recommendation apparatus

Info

Publication number: KR100979516B1
Application number: KR1020070095495A
Authority: KR
Inventors: 문애경; 김형선; 김현; 강태근
Original assignee: 한국전자통신연구원
Priority date: 2007-09-19
Filing date: 2007-09-19
Publication date: 2010-09-01
Also published as: KR20090030144A; JP2009076027A

Abstract

사용자에 대한 사전 정보와 제공 서비스에 대한 구체적인 세부 정보를 가지고 있지 않더라도, 사용자와의 상호 작용을 통해서 학습 가능한 강화 학습에 의해, 사용자의 서비스 사용 패턴을 학습하여 다양한 서비스를 능동적으로 적절하게 제공할 수 있는, 네트워크 기반 로봇을 위한 서비스 추천 방법이다.Even if you do not have detailed information about the user and the service provided, you can learn the service usage pattern of the user and actively provide various services by learning the user's service usage pattern through interaction with the user. Service recommendation for network-based robots.

네트워크 기반 로봇의 사용자로부터 서비스 요청이 있는 경우에, 상황 정보 및 사용자가 요청한 서비스인 행동에 의거하여 테이블을 학습시키고, 서비스 추천 요청이 있는 경우에, 테이블의 내용에 의거하여 사용자에게 서비스를 추천하며, 추천된 서비스에 대한 사용자의 반응에 따라, 상황 정보 및 추천된 서비스인 행동에 의거하여 상기 테이블을 갱신한다.When there is a service request from the user of the network-based robot, the table is trained based on the context information and the action that is the service requested by the user. When there is a service recommendation request, the service is recommended to the user based on the contents of the table. According to the user's response to the recommended service, the table is updated based on the context information and the action of the recommended service.

여기서, 테이블은 테이블 학습 단계 및 상기 테이블 갱신 단계에서 사용자의 서비스 사용 패턴을 학습하고, 서비스 추천 단계에서는 테이블 학습 단계 및 상기 테이블 갱신 단계에서 학습시킨 테이블의 내용에 의거하여 사용자에게 서비스를 추천한다.Here, the table learns the service usage pattern of the user in the table learning step and the table updating step, and recommends the service to the user based on the contents of the table learned in the table learning step and the table updating step in the service recommendation step.

네트워크 기반 로봇, 학습, 서비스 추천 Network-based robotics, learning, service recommendation

Description

Service recommendation method for network-based robot, and service recommendation apparatus

본 발명은 네트워크 기반 로봇을 위한 서비스 추천 방법 및 서비스 추천 장치에 관한 것이고, 더욱 상세하게는 네트워크 기반 로봇이 강화 학습을 통해 사용자의 서비스 사용 패턴을 학습하여 다양한 서비스를 능동적으로 사용자별로 적절하게 제공하게 하는 서비스 추천 방법 및 서비스 추천 장치에 관한 것이다.The present invention relates to a service recommendation method and a service recommendation apparatus for a network-based robot. More particularly, the network-based robot learns a user's service usage pattern through reinforcement learning to actively provide various services appropriately for each user. It relates to a service recommendation method and a service recommendation device.

본 발명은 정보통신부의 IT신성장동력핵심기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-026-02, 과제명: 능동형 서비스를 위한 URC 서버 프레임웍 개발].The present invention is derived from the research conducted as part of the IT new growth engine core technology development project of the Ministry of Information and Communication [Task management number: 2006-S-026-02, Title: Development of URC server framework for active services].

최근 산업용 로봇 시장이 포화되면서, 새로운 시장 창출을 위하여 네트워크 기반 로봇에 대한 연구가 활발히 진행되고 있다. 네트워크 기반 로봇은 기존의 로봇에 네트워크 및 정보 기술을 접목한 지능형 서비스 로봇의 새로운 개념으로서, 언제 어디서나 사용자와 함께하며 사용자에게 필요한 적절한 서비스를 제공한다.Recently, as the industrial robot market is saturated, research on network-based robots is actively conducted to create a new market. Network-based robots are a new concept of intelligent service robots that combine network and information technology with existing robots. They can be with users anytime and anywhere and provide appropriate services for users.

로봇은 일반적으로 외부 환경을 센싱하고, 이 센싱된 환경을 바탕으로 상황 을 판단하고, 이 판단에 따라 적절한 행동을 하는 세 가지의 기능성 요소를 갖는다. 네트워크 기반 로봇이 궁극적으로 추구하고자 하는 바는 로봇 자체에서 처리하던 이 세 가지의 기능을 네트워크를 이용해서 분산하여 처리하는 것이고, 나아가 외부의 센싱 기능과 프로세싱 기능을 네트워크를 통해 충분히 활용하고자 하는 것이다.Robots generally have three functional elements that sense the external environment, judge the situation based on the sensed environment, and act accordingly. The ultimate goal of network-based robots is to distribute and process these three functions that were handled by the robot itself through the network, and furthermore, to fully utilize external sensing and processing functions through the network.

즉, 네트워크 기반 로봇은, 로봇에 하드웨어를 추가하여 센싱 기능을 늘리는 것 보다는, 외부에 갖춰진 센서의 센싱 기능을 활용할 수 있게 한다. 또한, 네트워크 기반 로봇은, 자체의 프로세싱 성능을 높이는 것 보다는, 네트워크로 연결된 원격지의 고기능 서버의 프로세싱 성능을 활용할 수 있게 하고, 원격지의 컨텐츠 서버를 통하여 다양한 컨텐츠 서비스를 제공할 수 있기 때문에 사용자에게 더욱 다양한 기능과 서비스를 제공할 수 있다.In other words, the network-based robot can utilize the sensing function of an externally equipped sensor, rather than adding hardware to the robot to increase the sensing function. In addition, network-based robots allow users to take advantage of the processing power of network-connected remote servers rather than to increase their own processing power, and to provide various content services through remote content servers. It can provide various functions and services.

유비쿼터스 컴퓨팅 환경이 도래함에 따라, 네트워크 기반 로봇에 있어서, 사용자의 명시적인 요구에 따라 제공되는 서비스는 물론이고, 다양한 상황 정보를 활용하여 능동적으로 서비스를 제공할 수 있는 개인화된 추천 관련 기술이 필요하게 되었다. 기존 로봇에 네트워크 기능이 결합된 네트워크 기반 로봇에 있어서, 사용자에게 다양한 서비스를 제공할 수 있게 되었고, 사용자의 요구에 단순하게 응답하는 서비스보다는 사용자의 상황을 인지하고 사용자의 관심도에 따라 서비스를 추천할 수 있는 기술이 필요하게 된 것이다.As the ubiquitous computing environment arrives, network-based robots need personalized recommendation related technologies that can actively provide services based on various contextual information as well as services provided according to the explicit needs of users. It became. In the network-based robot that combines the network function with the existing robot, it is possible to provide various services to the user, and to recognize the user's situation and recommend the service according to the user's interest rather than the service that simply responds to the user's request. The technology that can be needed.

현재 많이 사용되고 있는 추천 기법에는, 내용 기반 추천(Content-based Recommendation)과 협업 추천(Collaborative Recommendation)이 있다. 내용 기반 추천은 정보 검색(Information Retrieval) 분야에 바탕을 두고 있으며, 사용자의 프로파일과 추천의 대상이 되는 아이템의 구성 요소를 비교하여 유사도가 높은 것들을 추천하는 기법으로서, 예를 들어 한국 공개특허공보 제2006-0069143호가 있다. 그리고, 협업 추천은 추천의 대상이 되는 목표 사용자와 유사한 프로파일을 가진 다른 사용자를 찾아, 그 다른 사용자가 높은 평가를 부여한 아이템을 목표 사용자에게 추천하는 기법으로서, 예를 들어 한국 공개특허공보 제2006-0112723호가 있다. 이외에도, 사용자의 성별, 나이, 직업 등의 인구통계학적 정보를 활용하여 추천을 하는 인구통계학적 추천 기법과, 위의 내용 기반 추천과 협업 추천을 결합한 하이브리드 추천 기법 등이 있다. Popular recommendation techniques currently include Content-based Recommendation and Collaborative Recommendation. Content-based recommendation is based on the field of information retrieval, and is a technique for recommending high similarity by comparing the user's profile with the components of the item to be recommended, for example, 2006-0069143. In addition, the collaboration recommendation is a technique of finding another user having a profile similar to the target user who is the target of the recommendation, and recommending the item to which the other user has given high evaluation to the target user. 0112723. In addition, there are demographic recommendation techniques that make recommendations using demographic information such as gender, age, and occupation of the user, and hybrid recommendation techniques that combine the above content-based recommendation and collaborative recommendation.

그러나, 기존에 연구되는 대부분의 추천 기법에서는 사용자에 대한 사전 정보 또는 추천 아이템에 대한 구체적인 세부 정보가 필요하다. 그런데, 네트워크 기반 로봇이 제공하는 서비스에 대한 구체적인 구성요소의 정의가 충분하지 않으며, 보안이나 프라이버시 보호 등을 위해 사용자들에 대한 사전 정보 수집이 어렵기 때문에, 많은 정보들을 미리 보유하고 있어야 하는 기존의 추천 기법의 적용이 어려운 경우가 많고, 이러한 정보들을 체계적으로 수집하여 적용한다는 것도 쉽지가 않다.However, most of the existing recommendation techniques require detailed information about the recommendation items or prior information about the user. However, since the definition of specific components for the service provided by the network-based robot is not sufficient, and it is difficult to collect information about users for security or privacy protection, the existing information that must hold a lot of information in advance. It is often difficult to apply the recommendation technique, and it is not easy to systematically collect and apply this information.

본 발명은, 상기의 문제점을 해결하기 위한 것으로서, 사용자에 대한 사전 정보와 제공 서비스에 대한 구체적인 세부 정보를 가지고 있지 않더라도, 사용자와 의 상호 작용을 통해서 학습 가능한 강화 학습에 의해, 네트워크 기반 로봇이 사용자의 서비스 사용 패턴을 학습하여 다양한 서비스를 능동적으로 적절하게 제공할 수 있는, 네트워크 기반 로봇을 위한 서비스 추천 방법 및 서비스 추천 장치를 제공하는 것을 목적으로 한다.The present invention is to solve the above problems, even if the user does not have the detailed information about the dictionary information and the provided service, the network-based robot by the reinforcement learning that can be learned through interaction with the user It is an object of the present invention to provide a service recommendation method and a service recommendation device for a network-based robot capable of actively and appropriately providing various services by learning a service usage pattern of the service.

여기서, 강화 학습이라는 것은, 주어진 환경에 관해 미리 설정된 모델 없이 보상값(reward)과 행동(action)의 상호 작용을 통해서 학습이 일어나는 기계 학습법이다. 다른 기계 학습법과 비교하여, 강화 학습은, 학습 과정의 학습자가 사전에 어떤 행동을 미리 결정하지 않고 있으며, 외부 환경과의 교류를 통한 학습 시행착오 과정을 통한 탐색(Trial and Error Search) 과정을 갖는다.Here, reinforcement learning is a machine learning method in which learning occurs through interaction of reward and action without a preset model for a given environment. Compared to other machine learning methods, reinforcement learning does not predetermine any behaviors by learners in the learning process, and has a trial and error search process through learning and error process through interaction with the external environment. .

따라서, 본 발명에 따른 서비스 추천 방법 및 서비스 추천 장치는, 사용자에 관한 사전 정보 없이도 사용자와의 상호 작용 과정을 거치면서 사용자의 서비스 사용 패턴을 학습함으로써, 사용자에게 맞는 개인화된 추천 서비스를 제공할 수 있도록 한다. 또한, 시간 및 장소에 따른 사용자의 서비스 사용 회수를 기록하여 추천에 반영함으로써 상황 정보에 더욱 적합한 서비스를 추천할 수 있도록 한다.Accordingly, the service recommendation method and the service recommendation apparatus according to the present invention can provide a personalized recommendation service suitable for a user by learning a service usage pattern of the user while going through an interaction process with the user without prior information on the user. Make sure In addition, by recording the user's use of the service according to the time and place to reflect in the recommendation it is possible to recommend a more suitable service for the situation information.

상기 목적을 달성하기 위하여 본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법은, 네트워크 기반 로봇이 사용자로부터 서비스 요청이 있는 경우에, 주변 상황에 대한 상황 정보 및 상기 네트워크 기반 로봇이 제공하는 서비스에 대응하는 각 행동 중 사용자가 요청한 서비스인 행동에 의거하여 요청된 상기 서비스에 해당되는 테이블 값을 갱신하여 각 테이블에 대한 학습을 수행하는 테이블 학습 단계, 상기 사용자 또는 상황 인식 서버로부터 서비스 추천 요청이 있는 경우에, 상기 각 테이블의 학습 내용에 의거하여 상기 사용자에게 서비스를 추천하는 서비스 추천 단계, 및 상기 네트워크 기반 로봇을 통해 상기 서비스 추천 단계에서 추천된 서비스 제공 시, 해당 서비스에 대한 사용자의 반응에 따라 상기 추천된 서비스에 해당되는 테이블을 갱신하는 테이블 갱신 단계를 포함한다.
이때, 상기 테이블 학습 단계 및 상기 테이블 갱신 단계의 테이블 학습 내용에 의거하여 상기 사용자의 서비스 사용 패턴 및 서비스 선호도를 학습하여, 그 결과를 상기 서비스 추천 단계에 반영한다.In order to achieve the above object, the service recommendation method for a network-based robot according to the present invention corresponds to a situation information about a surrounding situation and a service provided by the network-based robot when a network-based robot requests a service from a user. A table learning step of performing a learning on each table by updating a table value corresponding to the requested service based on an action that is a service requested by a user among each action, and a service recommendation request from the user or the context aware server. The service recommendation step of recommending a service to the user based on the learning contents of each table, and when the service recommended in the service recommendation step is provided through the network-based robot, according to the user's response to the corresponding service. Tables for Recommended Services Table update step to update the table.
At this time, the service usage pattern and service preference of the user are learned based on the table learning contents of the table learning step and the table updating step, and the result is reflected in the service recommendation step.

테이블은, 인자로서 상기 사용자 및/또는 네트워크 기반 로봇의 상태 정보 및 상기 네트워크 기반 로봇이 제공하는 서비스에 대응하는 행동을 포함하는 Q-학습정보 테이블을 구비하고, 상황 정보는 상태 정보를 포함하며, 바람직하게는, 테이블은, 인자로서 요일, 시간대 정보 및 행동을 포함하는 시간별 서비스 사용 빈도 테이블을 더 구비하고, 상황 정보는 요일 및 시간대 정보를 더 포함한다. 또한, 테이블은, 인자로서 장소 정보 및 행동을 포함하는 장소별 서비스 사용 빈도 테이블을 더 구비하고, 상황 정보는 장소 정보를 더 포함한다. 그리고, 테이블은, 네트워크 기반 로봇의 복수의 사용자에 대해 사용자와 테이블을 관련시키는 사용자 정보 테이블을 더 포함할 수 있다.The table includes, as a factor, a Q-learning information table including status information of the user and / or network-based robot and actions corresponding to the services provided by the network-based robot, wherein the status information includes status information, Preferably, the table further includes an hourly service usage frequency table including days of the week, time zone information, and actions as factors, and the situation information further includes day of week and time zone information. The table further includes a place-specific service usage frequency table including place information and actions as factors, and the situation information further includes place information. The table may further include a user information table for associating a table with a user for a plurality of users of the network-based robot.

그리고, 테이블 학습 단계에서는, 보상값 R_S에 의해 요청된 상기 서비스에 해당되는 테이블 값을 갱신하고, 테이블 갱신 단계에서는, 추천된 서비스에 대한 사용자의 반응이 긍정적인 경우에는 보상값 R_P에 의해 테이블의 값을 갱신하고, 추천된 서비스에 대한 사용자의 반응이 부정적인 경우에는 보상값 R_N에 의해 테이블의 값을 갱신하는 것이 바람직하다.In the table learning step, the table value corresponding to the service requested by the compensation value R _S is updated. In the table updating step, when the user's response to the recommended service is positive, the table value is set by the compensation value R _P. It is preferable to update the value of the table and update the value of the table by the compensation value R _N when the user's reaction to the recommended service is negative.

또한, 서비스 추천 단계에서는, 각 행동에 관해, 학습하는 단계에서 학습한 각 테이블의 값을 정규화하여 합산함으로써, 추천할 서비스를 선정하는 것이 바람직하다.In the service recommendation step, it is preferable to select a service to be recommended by normalizing and summing values of the tables learned in the learning step with respect to each action.

본 발명에 의하면 다음과 같은 효과를 얻을 수 있다.According to the present invention, the following effects can be obtained.

네트워크 기반 로봇은 네트워크를 통하여 다양한 서비스를 제공할 수 있어서, 사용자의 서비스 사용 패턴을 학습하여 다양한 서비스 중에서 사용자에게 적합한 서비스를 추천함으로써, 서비스 선택의 오버헤드를 저감시킬 수 있다.Network-based robots can provide a variety of services through the network, by learning the service usage pattern of the user to recommend the appropriate service to the user from among various services, it is possible to reduce the overhead of service selection.

특히, 제공하는 서비스에 대한 구체적인 세부 정보 및 사용자에 대한 사전 정보 수집이 되어 있지 않더라도, 사용자와의 상호 작용을 통한 학습에 의해 사용자의 서비스 사용 패턴을 학습하여 다양한 서비스를 능동적으로 적절하게 제공할 수 있다.In particular, even though detailed information on the services provided and prior information on the user are not collected, various services can be actively and appropriately provided by learning the service usage pattern of the user by learning through interaction with the user. have.

그리고, 다양한 사용자의 서비스 사용 패턴을 각 사용자별로 개별적으로 학습할 수 있어서, 가족의 경우처럼 여러 명이 하나의 네트워크 기반 로봇을 사용하는 경우에도 개인화된 맞춤형의 서비스를 제공하는 것이 가능하다. In addition, since service usage patterns of various users can be individually learned for each user, it is possible to provide a personalized and customized service even when several people use one network-based robot as in the case of a family.

또한, 상황 인식 서버에서 제공되는 상황 정보를 활용하여 많은 정보를 바탕으로 사용자의 서비스 사용 패턴을 학습할 수 있고 적절한 서비스의 추천 가능성이 향상된다.In addition, by using the contextual information provided from the context awareness server, the user can learn the service usage pattern based on a lot of information, and the possibility of recommending an appropriate service is improved.

그리고, 네트워크 인식 로봇이 장소를 인식 가능하거나 상황 인식 서버가 장소에 관한 정보를 제공할 수 있는 경우에는 특정 장소에 적합한 서비스의 추천이 가능하다.If the network aware robot is able to recognize a place or the context aware server can provide information about the place, it is possible to recommend a service suitable for a specific place.

또한, 요일 및 시간대별로 사용자의 서비스 사용 패턴을 학습하여 이를 서비스 추천에 적용함으로써 효율적인 서비스의 추천이 가능하다.In addition, it is possible to efficiently recommend a service by learning a service usage pattern of a user for each day and time and applying the same to a service recommendation.

아래에 본 발명의 이해를 돕기 위하여 첨부되는 도면을 참조하여 바람직한 실시예를 제공한다. 하기의 실시예는 본 발명을 보다 쉽게 이해하기 위하여 제공하는 것이고, 본 실시예에 의해 본 발명이 한정되는 것은 아니다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments will be described with reference to the accompanying drawings to help understand the present invention. The following examples are provided to more easily understand the present invention, and the present invention is not limited by these examples.

도 1은 본 발명에 따른 전체 시스템의 구성도이다. 본 발명에 따른 전체 시스템은 네트워크 기반 로봇(10), 상황 인식 서버(20) 및 컨텐츠 서버(40)로 구성되어 있고, 네트워크 기반 로봇(10)에게 서비스를 요청하고 네트워크 기반 로봇(10)으로부터 서비스를 제공 받는 사용자(30)가 있다.1 is a block diagram of an entire system according to the present invention. The entire system according to the present invention is composed of a network-based robot 10, a context aware server 20 and a content server 40, and requests a service from the network-based robot 10 and services from the network-based robot 10 There is a user 30 who is provided with.

네트워크 기반 로봇(10)은, 상황 인식 서버(20) 및 컨텐츠 서버(40)와 네트워크를 통해 접속되어 있으며, 구체적으로 도시하지는 않았지만, 사용자(30)에게 영화, 뉴스, 날씨안내 등의 서비스를 제공하기 위한 디스플레이, 음악, 음성 등의 서비스를 제공하기 위한 스피커, 음성 인식을 위한 마이크, 이동을 하기 위한 구동모터, 상황 인식 서버(20) 및 컨텐츠 서버(40)와 통신하기 위한 통신장치, 사용자로부터 입력을 받기 위한 입력 인터페이스 등의 다양한 기기를 장착하고 있다.The network-based robot 10 is connected to the situation recognition server 20 and the content server 40 through a network, and although not specifically illustrated, provides a service such as a movie, news, weather guide, etc. to the user 30. To display, to provide services such as music, voice, speaker, microphone for voice recognition, driving motor for moving, communication device for communication with the context recognition server 20 and content server 40, from the user It is equipped with various devices such as an input interface for receiving input.

상황 인식 서버(20)는, 네트워크 기반 로봇(10)과 네트워크를 통해 접속되어 있으며, 구체적으로 도시하지는 않았지만, 조도 센서, 음성 센서, 온도 센서를 포함하는 다양한 센서로부터의 정보 및 기타 입력되는 정보를 통해 현재의 상황 정보를 인식하고 이를 저장하고 있다.The situation recognition server 20 is connected to the network-based robot 10 through a network, and although not specifically illustrated, information from various sensors including an illuminance sensor, a voice sensor, a temperature sensor, and other input information may be used. It recognizes the current status information and stores it.

컨텐츠 서버(40)는, 네트워크 기반 로봇(10)이 사용자(30)에게 제공하기 위한, 영화, 뉴스, 날씨안내, 음악, 음성 등의 컨텐츠를 저장하고 있다. 컨텐츠 서버(40)는, 도시된 바와 같이, 원격지에 설치되어 네트워크 기반 로봇(10)과 네트워 크를 통해 접속될 수도 있고, 상황 인식 서버(20)에 통합되어 제공될 수도 있다. 또한 컨텐츠의 일부가 네트워크 기반 로봇(10) 내부에 저장되어 있어 사용자(30)에게 직접 제공될 수도 있다.The content server 40 stores contents such as a movie, news, weather guide, music, and voice for the network-based robot 10 to provide to the user 30. As shown, the content server 40 may be installed at a remote location and connected to the network-based robot 10 via a network, or may be integrated with the situation recognition server 20 and provided. In addition, some of the content is stored in the network-based robot 10 may be provided directly to the user (30).

도 1을 참조로 하여 본 발명에 따른 전체 시스템의 동작 흐름을 설명하면 다음과 같다.Referring to Figure 1 describes the operational flow of the entire system according to the present invention.

먼저 사용자(30)가 네트워크 기반 로봇(10)의 입력 인터페이스 또는 음성 인식을 이용하여 서비스 요청(30-1)을 하면, 네트워크 기반 로봇(10)은 컨텐츠 서버(40)에 대해 사용자(30)가 요청한 서비스에 대응하는 컨텐츠를 요청(10-1)한다. 사용자(30)가 요청하는 서비스의 종류는 예를 들어 영화, 뉴스, 날씨안내, 음악, 음성, 교육, 요리, 게임이다. 컨텐츠 서버(40)는 네트워크 기반 로봇(10)에게 요청된 컨텐츠를 제공(40-1)하고, 네트워크 기반 로봇(10)은 사용자(30)에게 서비스(컨텐츠)를 제공(10-2)한다. 이것은 사용자(30)가 명시적으로 특정 서비스를 요청한 경우에 이루어지는 서비스 제공 프로세스이다.First, when the user 30 makes a service request 30-1 by using the input interface or voice recognition of the network-based robot 10, the network-based robot 10 provides the user with the content server 40. The content corresponding to the requested service is requested 10-1. Types of services requested by the user 30 are, for example, movies, news, weather information, music, voice, education, cooking, and games. The content server 40 provides the requested content 40-1 to the network-based robot 10, and the network-based robot 10 provides a service (content) to the user 30 (10-2). This is a service provision process that occurs when the user 30 explicitly requests a specific service.

한편, 현재의 상황 정보를 인식하고 있는 상황 인식 서버(20)는, 사용자가 아침에 기상을 한 경우와 같이 인식되는 상황 정보에 변경이 있는 경우 또는 네트워크 기반 로봇(10)이 일정 시간 이상 서비스를 제공하지 않고 있는 경우처럼 필요하다고 판단한 경우에, 네트워크 기반 로봇(10)에 대해 사용자(30)에게 서비스를 추천할 것을 요청하는 서비스 추천 요청(20-2)을 할 수 있다. 또는, 사용자(30)가 네트워크 기반 로봇(10)에 대해 현재 상황에 적합한 서비스를 추천할 것을 직접 요청할 수도 있다.On the other hand, the situation recognition server 20 that recognizes the current situation information, if there is a change in the situation information to be recognized, such as when the user wakes up in the morning, or the network-based robot (10) for more than a certain time If it is determined that it is necessary, such as when not providing, the service recommendation request 20-2 may be requested to the user 30 to recommend the service to the network-based robot 10. Alternatively, the user 30 may directly request the network-based robot 10 to recommend a service suitable for the current situation.

여기서, 상황 인식 서버(20)는 다양한 센서로부터의 정보 및 기타 입력되는 정보를 통해 현재의 상황 정보를 인식하고 저장하고 있으므로, 상기의 서비스 추천 요청(20-2)과 동시에 또는 서비스 추천 요청(20-2)과 관계없이 네트워크 기반 로봇(10)에 대해 상황 정보를 제공(20-1)한다. 이 상황 정보는, 예를 들어 현재 요일 및 시간 정보, 사용자(30)의 식별자 및 위치, 현재의 온도, 조도 등의 다양한 정보를 포함할 수 있다.Here, since the situation recognition server 20 recognizes and stores the current situation information through information from various sensors and other input information, the situation recognition server 20 simultaneously with the service recommendation request 20-2 or the service recommendation request 20. Regardless of 2), context information is provided to the network-based robot 10 (20-1). The situation information may include, for example, various information such as current day and time information, identifier and location of the user 30, current temperature, illuminance, and the like.

네트워크 기반 로봇(10)은, 서비스 추천 요청이 있는 경우, 상황 인식 서버(20)로부터 제공 받은 상황 정보(20-1) 및 자체적으로 저장하고 있는 상태 정보를 기초로 하여, 사용자에게 적합한 서비스를 추천(10-3)한다. 사용자(30)가 추천된 서비스에 대해 긍정적인 반응의 피드백(30-2)을 준 경우에는 추천된 서비스에 대응하는 컨텐츠를 컨텐츠 서버(40)에 요청하여 제공받은 후에, 사용자(30)에게 서비스를 제공(10-2)한다.When there is a service recommendation request, the network-based robot 10 recommends a service suitable for a user based on the situation information 20-1 provided from the situation recognition server 20 and the state information stored in itself. (10-3) When the user 30 gives a feedback 30-2 of a positive response to the recommended service, the user 30 requests the content server 40 to receive content corresponding to the recommended service, and then provides the service to the user 30. Provide (10-2).

본 발명의 일 특징은 위와 같이 사용자가 서비스를 요청(30-1)한 경우에 테이블 학습을 수행하고, 또 서비스 추천(10-3)에 대해 사용자가 피드백(30-2)을 한 경우에 테이블 갱신을 함으로써, 수집된 학습 정보에 따라 사용자에게 현재 상태에 적합한 서비스를 추천하는 것이다. 테이블 학습, 테이블 갱신, 서비스 추천에 관해서는 도 2 내지 도 4를 참조로 하여 이하에서 상세하게 설명한다.One feature of the present invention is that when the user requests the service (30-1) as described above, the table performs the learning, and when the user feedback (30-2) for the service recommendation (10-3) table By updating, the service suitable for the current state is recommended to the user according to the collected learning information. Table learning, table updating, and service recommendation will be described in detail below with reference to FIGS.

도 2는 본 발명에 따른 서비스 추천을 위한 자료 구조를 도시한 것이다. 좌측 칼럼은 자료 구조의 내용(50)이고, 우측 칼럼은 각 내용에 대응하는 구체적인 자료 구조(60)이다. 이 자료 구조의 데이터는 네트워크 기반 로봇(10)에 저장되어 있는 것이 바람직하지만, 상황 인식 서버(20)에만 저장되어 있거나, 또는 상황 인식 서버(20)와 네트워크 기반 로봇(10)이 동기화를 통해 공유하거나 분산 저장하고 있어도 된다.2 illustrates a data structure for service recommendation according to the present invention. The left column is the content 50 of the data structure, and the right column is the specific data structure 60 corresponding to each content. The data of this data structure is preferably stored in the network-based robot 10, but is stored only in the context aware server 20 or shared by the context aware server 20 and the network-based robot 10 through synchronization. Or distributed storage.

본 발명의 서비스 추천 방법에 따른 자료 구조는 Q-학습정보 테이블(Q-TBL[s][a]), 시간별 서비스 사용 빈도 테이블(F-TBL[d][t][a]), 장소별 서비스 사용 빈도 테이블(P-TBL[p][a]), 사용자 정보 테이블(U-TBL[u]), 및 보상값(R_S, R_P, R_N)을 포함한다. 네트워크 기반 로봇(10)은 Q-학습정보 테이블, 시간별 서비스 사용 빈도 테이블, 및 장소별 서비스 사용 빈도 테이블의 데이터를 참조하여 서비스를 사용자(30)에게 추천한다. 각 자료 구조에 대해 상세히 설명하면 다음과 같다.The data structure according to the service recommendation method of the present invention includes a Q-learning information table (Q-TBL [s] [a]), an hourly service use frequency table (F-TBL [d] [t] [a]), and a location. Service usage frequency table P-TBL [p] [a], user information table U-TBL [u], and compensation values R _S , R _P , R _N. The network-based robot 10 recommends the service to the user 30 by referring to the data of the Q-learning information table, the hourly service usage frequency table, and the service usage frequency table for each location. Each data structure is explained in detail as follows.

먼저, Q-학습정보 테이블(Q-TBL[s][a])은 학습을 이용하여 사용자가 사용한 서비스를 기록하기 위한 것으로서, 적어도 상태(s)와 행동(a)의 2차원 인자를 갖는다.First, the Q-learning information table Q-TBL [s] [a] is for recording a service used by a user using learning, and has at least two-dimensional factors of state (s) and behavior (a).

여기서, 상태(s)는 사용자(30) 및/또는 네트워크 기반 로봇(10)의 상태 정보를 의미하는 것으로서, 예를 들면, 사용자가 아침에 일어나면 '기상 상태'가 되고, 사용자가 집에 귀가했다면 '귀가 상태', 네트워크 기반 로봇(10)이 뉴스 서비스를 제공하고 있는 중이라면 '뉴스 상태'가 된다. 이러한 상태 정보는 상황 인식 서버(20)로부터 제공 받은 상황 정보 및 자체적으로 저장하고 있는 상태정보를 기초로 하여 결정된다. 그리고, 상태(s)는 네트워크 기반 로봇이 제공하는 서비스 제공 상태(예컨대, 뉴스 상태)와 사용자의 상태(예컨대, 기상 상태)를 포함하며 이들 이 단독으로 또는 조합되어 구성된다.Here, the state s refers to the state information of the user 30 and / or the network-based robot 10. For example, the state s becomes a 'weather state' when the user wakes up in the morning and the user returns home. 'Return status', if the network-based robot 10 is providing a news service is a 'news status'. The state information is determined based on the situation information received from the situation recognition server 20 and the state information stored in itself. In addition, the state s includes a service providing state (eg, a news state) and a user state (eg, a weather state) provided by the network-based robot, and these are configured alone or in combination.

또, 행동(a)은 네트워크 기반 로봇(10)이 제공하는 서비스를 의미한다. 이것은 서비스 추천의 대상이 될 수 있으며, 영화, 뉴스, 날씨안내, 음악, 음성, 교육, 요리, 게임 등을 예로 들 수 있다.In addition, the action (a) refers to a service provided by the network-based robot 10. This can be a service recommendation, for example, movies, news, weather reports, music, voice, education, cooking, games, etc.

Q-학습정보 테이블인 Q-TBL[s][a]에 저장된 값은 해당 상태(s)에서 행동(a)의 선호도를 나타내며, 해당 상태(s)에서 행동(a)의 선호도인 Q-TBL[s][a]의 값이 클수록 사용자(30)가 자주 사용한 행동(서비스)이거나 추천시에 긍정적인 반응을 보인 행동이었다는 것을 의미한다.The value stored in Q-TBL [s] [a], the Q-learning information table, indicates the preference of behavior (a) in the state (s), and Q-TBL, the preference of behavior (a) in the state (s). A larger value of [s] [a] means that the user 30 frequently used the service (service) or responded positively to the recommendation.

다음으로, 시간별 서비스 사용 빈도 테이블(F-TBL[d][t][a])은 요일/시간대 별로 해당 서비스를 사용한 회수를 저장하고 있는 빈도 테이블로서, 적어도 요일(d)과 시간대(t)와 행동(a)의 3차원 인자를 갖는다. 여기서 요일(d)은 월, 화, 수, 목, 금, 토, 일로 구분될 수 있고, 또는 평일, 주말로 구분될 수도 있다. 또, 시간대(t)는 매 시간별로 구분될 수 있고, 또는 오전, 오후, 저녁, 밤으로 구분될 수도 있다. 한편, 행동(a)은 상기의 Q-학습정보 테이블과 같다.Next, the hourly service usage frequency table F-TBL [d] [t] [a] is a frequency table that stores the number of times the service is used for each day / time zone, and includes at least a day (d) and a time zone (t). And a three-dimensional factor of action (a). Here, the day (d) may be divided into months, Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and days, or may be divided into weekdays and weekends. In addition, the time zone t may be divided by every hour or may be divided into morning, afternoon, evening, and night. On the other hand, action (a) is the same as the above Q-learning information table.

네트워크 기반 로봇(10)의 사용자(30)마다 요일별, 시간대별로 사용하는 서비스의 선호도가 다를 수 있고, 특히 주말에 사용하는 서비스가 크게 다른 경향이 있다. 따라서, 시간별 서비스 사용 빈도 테이블은 이를 반영하여 학습함으로써 특정 사용자의 요일별, 시간대별 서비스 사용 선호도를 기록하기 위한 것이다. 사용자가 사용한 행동(서비스) 및 추천시에 긍정적인 반응을 보인 행동을 요일별, 시간대별로 학습하여 기록함으로써 서비스 추천 시의 요일/시간대에 따라 해당 사용자 의 선호도가 높은 적절한 서비스를 사용자에게 추천할 수 있도록 한다. Each user 30 of the network-based robot 10 may have a different preference for a service used for each day of the week and a time zone. In particular, a service used for a weekend tends to be significantly different. Therefore, the hourly service usage frequency table is to record the service usage preferences of a specific user by day and time zone by learning by reflecting this. By learning and recording the behavior (service) used by the user and the behavior that responded positively to the recommendation by day and time zone, the user can recommend the appropriate service with the user's preference according to the day / time of the service recommendation. do.

다음으로, 장소별 서비스 사용 빈도 테이블(P-TBL[p][a])은 장소별로 해당 서비스를 사용한 회수를 저장하고 있는 빈도 테이블로서, 적어도 장소(p)와 행동(a)의 2차원 인자를 갖는다. 여기서 장소(p)는 예를 들어, 침실, 주방, 아이방, 거실로 구분될 수 있다. 장소(p)는 네트워크 기반 로봇(10)이 장소를 인식 가능한 경우에 구분될 수 있으며, 또는 상황 인식 서버(20)가 사용자(30) 및/또는 네트워크 기반 로봇(10)의 장소를 인식하여 구분될 수도 있다. 한편, 행동(a)은 상기의 Q-학습정보 테이블과 같다.Next, the service usage frequency table P-TBL [p] [a] for each place is a frequency table that stores the number of times the service is used for each place, and at least two-dimensional factors of place (p) and behavior (a). Has Here, the place p may be divided into, for example, a bedroom, a kitchen, a nursery, and a living room. The place p may be classified when the network-based robot 10 can recognize a place, or the situation recognition server 20 recognizes and distinguishes a place of the user 30 and / or the network-based robot 10. May be On the other hand, action (a) is the same as the above Q-learning information table.

장소마다 해당 사용자(30)가 사용하는 서비스의 선호도가 다를 수 있으므로, 장소별 서비스 사용 빈도 테이블은 이를 반영하여 학습함으로써 특정 사용자의 장소별 서비스 사용 선호도를 기록하기 위한 것이다. 사용자가 사용한 행동(서비스) 및 추천시에 긍정적인 반응을 보인 행동을 장소별로 학습하여 기록함으로써, 서비스 추천 시의 장소에 따라 해당 사용자의 선호도가 높은 적절한 서비스를 사용자에게 추천할 수 있도록 한다.Since the preferences of the service used by the user 30 may differ from place to place, the service use frequency table for each place is to record the service use preference of a specific user by learning by reflecting this. By learning and recording the behavior (service) used by the user and the behavior that showed a positive response in the recommendation, it is possible to recommend the appropriate service having a high preference of the user to the user according to the place of the service recommendation.

다음으로, 사용자 정보 테이블(U-TBL[u])은 사용자별로 해당하는 Q-학습정보 테이블, 시간별 서비스 사용 빈도 테이블, 장소별 서비스 사용 빈도 테이블을 매칭시키는 정보를 갖는다. 네트워크 기반 로봇을 다수의 사용자가 사용하는 경우에, 각 사용자별로 테이블을 구분하여 관리할 수 있도록 함으로써 각 사용자의 특성을 반영한 서비스를 추천할 수 있게 한다.Next, the user information table U-TBL [u] has information for matching a Q-learning information table corresponding to each user, a service frequency table for each hour, and a service frequency table for each place. When a network-based robot is used by a large number of users, it is possible to recommend a service reflecting the characteristics of each user by managing the table separately for each user.

다음으로, 보상값은 상술한 각 테이블의 값을 갱신할 때 사용하는 값으로서 네트워크 기반 로봇(10)이 현재의 상태에서 어떤 행동을 취하게 될 때, 사용자(30)로부터 받게 되는 보상 또는 벌금 값을 의미한다. 본 실시예에서의 보상값은 R_S, R_P, R_N의 3가지 값을 정의하며, 각 보상값은 설계에 따라 적절한 값을 갖는다.Next, the compensation value is a value used to update the values of the above-described tables, and the reward or fine value received from the user 30 when the network-based robot 10 takes any action in the current state. Means. The compensation value in this embodiment defines three values of R _S , R _P , and R _N , and each compensation value has an appropriate value according to design.

첫째로, R_S는 사용자가 명시적으로 해당 행동(서비스)을 선택했을 때의 보상값이다. 두 번째로, R_P는 네트워크 기반 로봇에 의해 추천된 서비스(행동)에 대해 사용자가 긍정적인 반응을 보인 경우의 보상값이며, 세 번째로, R_N은 네트워크 기반 로봇에 의해 추천된 서비스(행동)에 대해 사용자가 부정적인 반응을 보인 경우의 보상값이며, 벌금에 해당한다. 여기서 R_S, R_P는 선호도를 증가시켜야 하므로 양의 값을 가지며 R_N은 선호도를 감소시켜야 하므로 음의 값을 갖는다. 한편, R_S와 R_P는 설계에 따라 그 대소가 결정될 수 있다. 사용자가 명시적으로 선택한 경우를 중요시하면 R_S가 R_P보다 큰 값이 되도록 정의하고, 추천된 서비스에 대해 사용자가 긍정적인 반응을 보인 경우를 중요시하면 R_S가 R_P보다 작은 값이 되도록 정의하며, 두 값을 동일한 값으로 설정할 수도 있다.First, R _S is the reward value when the user explicitly selects the action (service). Secondly, R _P is the reward value when the user responds positively to the service (behavior) recommended by the network-based robot. Third, R _N is the service (behavior recommended by the network-based robot. ), Which is a reward for a user's negative reaction to a. Here, R _S and R _P have positive values because they have to increase their preferences, and R _N have negative values because they have to decrease their preferences. On the other hand, the size of R _S and R _P can be determined according to the design. If the user explicitly selects the case, it is important to define R _S to be greater than R _{P. If} the user is positive about the recommended service, it is important to define R _S to be less than R _P. You can also set both values to the same value.

본 발명에 따라 도 1의 전체 시스템의 구성도 및 도 2의 자료 구조를 이용하여 실시하는 서비스 추천 방법의 세부적인 절차를 도 3 및 도 4를 참조하여 설명한다.Detailed procedures of the service recommendation method implemented using the configuration diagram of the entire system of FIG. 1 and the data structure of FIG. 2 according to the present invention will be described with reference to FIGS. 3 and 4.

도 3은 본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법의 개략 적인 흐름도이다. 본 서비스 추천 방법은 크게 학습 정보 초기화 단계(S20), 테이블 학습 단계(S40), 서비스 추천 단계(S60), 테이블 갱신 단계(S70)로 구성된다. 개략적으로 설명하면, 네트워크 기반 로봇은, 단계(S20)에서 학습 정보를 초기화하고, 단계(S30)에서 사용자로부터 서비스 요청이 있는 경우에 단계(S40)에서 테이블을 학습시키고, 단계(S50)에서 서비스 추천 요청이 있는 경우에 단계(S60)에서 서비스를 추천하고 사용자의 반응에 따라 단계(S70)에서 테이블을 갱신한다. 이러한 과정을 반복함으로써 학습된 테이블에 의하여 해당 사용자에게 적합한 서비스를 추천할 수 있게 된다.3 is a schematic flowchart of a service recommendation method for a network-based robot according to the present invention. The service recommendation method is composed of a learning information initialization step (S20), a table learning step (S40), a service recommendation step (S60), and a table update step (S70). In brief, the network-based robot initializes the learning information in step S20, learns the table in step S40 when there is a service request from the user in step S30, and the service in step S50. If there is a recommendation request, the service is recommended in step S60 and the table is updated in step S70 according to the user's response. By repeating this process, it is possible to recommend the appropriate service to the user by the learned table.

먼저, 네트워크 기반 로봇은 최초로 구동되어 시스템이 시작(S10)하면, 학습 정보 초기화 단계(S20)에서 각 테이블의 학습 정보를 초기화한다. 학습 정보 초기화 단계(S20)는 네트워크 기반 로봇의 새로운 사용자가 추가된 경우에, 해당 사용자에 대응하는 테이블의 학습 정보를 초기화할 때도 수행될 수 있다.First, when the network-based robot is initially driven and the system starts (S10), the learning information of each table is initialized in the learning information initialization step (S20). The learning information initialization step S20 may be performed when initializing the learning information of the table corresponding to the user when a new user of the network-based robot is added.

그리고, 네트워크 기반 로봇은 사용자로부터 서비스 요청이 있는지를 판단한다(S30). 사용자로부터 명시적인 서비스 요청이 있는 경우(S30의 Y)에는 테이블 학습 단계(S40)로 진행하여, 사용자가 요청한 서비스에 해당하는 테이블 값을 갱신함으로써, 각 테이블에 대한 학습을 수행한다. 테이블 학습 단계(S40)가 완료되면 단계(S30)로 복귀한다.The network-based robot determines whether there is a service request from the user (S30). If there is an explicit service request from the user (Y in S30), the process proceeds to the table learning step S40, whereby the table value corresponding to the service requested by the user is updated, thereby learning about each table. When the table learning step S40 is completed, the process returns to step S30.

한편, 단계(S30)에서 사용자로부터 서비스 요청이 없다고 판단된 경우(S30의 N)에는 단계(S50)로 진행하여 상황 인식 서버로부터 서비스 추천 요청이 있는지를 판단한다. 물론, 사용자로부터 직접 서비스 추천 요청이 있을 수 있고, 이때는 사 용자로부터 서비스 추천 요청이 있는지를 판단한다.On the other hand, if it is determined in step S30 that there is no service request from the user (N in S30), the process proceeds to step S50 to determine whether there is a service recommendation request from the situation recognition server. Of course, there may be a service recommendation request directly from the user, in which case it is determined whether there is a service recommendation request from the user.

단계(S50)에서 서비스 추천 요청이 있다고 판단한 경우(S50의 Y)에는, 서비스 추천 단계(S60)로 진행하여, 현재까지의 테이블의 학습 내용을 기초로 사용자에게 서비스를 추천한다. 추천된 서비스에 대해 사용자가 긍정 또는 부정의 반응을 보인 경우에, 테이블 갱신 단계(S70)로 진행하여, 추천된 서비스에 해당하는 테이블 값을 갱신하여 다음 서비스 추천시에 활용할 수 있도록 한다. 테이블 갱신 단계(S70)가 완료되면 단계(S30)로 복귀한다. 한편, 단계(S50)에서 서비스 추천 요청이 없다고 판단된 경우(S50의 N)에도 단계(S30)로 복귀한다.If it is determined in step S50 that there is a service recommendation request (Y in S50), the process proceeds to the service recommendation step S60 and recommends the service to the user based on the learning contents of the table so far. If the user shows a positive or negative response to the recommended service, the process proceeds to a table update step S70 where the table value corresponding to the recommended service is updated to be used at the next service recommendation. When the table update step S70 is completed, the process returns to step S30. On the other hand, if it is determined in step S50 that there is no service recommendation request (N in S50), the process returns to step S30.

본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법은 도 3에 도시된 바와 같은 흐름에 따라 사용자의 서비스 선호도를 학습하여 학습 결과를 서비스 추천시에 반영함으로써 사용자에게 적합한 서비스를 추천할 수 있게 된다. 도 3에 개략적으로 도시된 서비스 추천 방법의 구체적인 내용을 도 4를 참조하여 설명한다.In the service recommendation method for a network-based robot according to the present invention, a service suitable for a user can be recommended by learning the service preference of the user and reflecting the learning result in the service recommendation according to the flow as shown in FIG. 3. Detailed description of the service recommendation method schematically illustrated in FIG. 3 will be described with reference to FIG. 4.

도 4는 본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법의 구체적인 흐름도이다. 도 3의 서비스 추천 방법과 동일한 단계에는 동일한 도면 부호를 표시하였다.4 is a detailed flowchart of a service recommendation method for a network-based robot according to the present invention. The same reference numerals are given to the same steps as the service recommendation method of FIG.

먼저, 학습 정보 초기화 단계(S20)에서는 Q-학습정보 테이블(Q-TBL), 시간별 서비스 사용 빈도 테이블(F-TBL), 장소별 서비스 사용 빈도 테이블(P-TBL)의 학습 정보를 '0'으로 초기화한다. 그리고, 보상값 R_S, R_P, R_N에 대해서도 특정값으로 초 기화하고, 현재의 상태(s)를 상황 인식 서버(20)로부터 제공 받은 상황 정보에 의거하여 초기화한다. 예를 들어 현재의 상태(s)를 '기상 상태'로 초기화한다.First, in the learning information initialization step (S20), the learning information of the Q-learning information table (Q-TBL), the hourly service use frequency table (F-TBL), and the service use frequency table for each place (P-TBL) are '0'. Initialize with The compensation values R _S , R _P and R _N are also initialized to specific values, and the current state s is initialized based on the situation information provided from the situation recognition server 20. For example, initialize the current state s to the weather state.

그리고, 네트워크 기반 로봇은 사용자로부터 서비스 요청이 있는지를 판단한다(S30). 사용자로부터 명시적인 서비스 요청이 있는 경우(S30의 Y)에는 테이블 학습 단계(S40)로 진행한다. 테이블 학습 단계(S40)는 사용자의 서비스 요청 내용에 따라 Q-TBL, F-TBL, P-TBL을 학습하는 단계이다.The network-based robot determines whether there is a service request from the user (S30). If there is an explicit service request from the user (Y in S30), the process proceeds to table learning step S40. Table learning step (S40) is a step of learning the Q-TBL, F-TBL, P-TBL according to the service request content of the user.

테이블 학습 단계(S40)에서, 먼저, 사용자가 선택한 서비스인 행동(a)을 식별한다(단계 S41). 그리고, 사용자가 명시적으로 서비스를 요청한 경우이므로, 테이블을 갱신하기 위한 보상값으로서 R_S를 선정한다(단계 S42). 그리고, 단계(S43)에서 사용자가 요청한 서비스인 행동(a)과 보상값(R_S)을 이용하여 테이블 학습을 수행한다.In the table learning step S40, first, the action (a) which is the service selected by the user is identified (step S41). Since the user explicitly requests the service, R _S is selected as a compensation value for updating the table (step S42). Then, in step S43, table learning is performed using the action (a) and the reward value (R _S ), which are the services requested by the user.

단계(S43)에 있어서, 주말 오전에 사용자가 거실에서 뉴스를 청취하다가 '날씨안내' 서비스를 요청한 경우를 예로 들어 테이블 학습에 대해 구체적으로 설명한다. 먼저, 현재의 상태(s)와 사용자가 요청한 행동(a)을 이용하여 Q-TBL 학습을 수행한다. 여기서, 현재의 상태(s)는 '뉴스'이고, 사용자가 요청한 행동(a)은 '날씨안내'이다. 따라서, Q-TBL[s][a]에 있어서 Q-TBL[뉴스][날씨안내]에 해당하는 값이 보상값(R_S)에 의해 갱신된다. 갱신에 이용되는 식은 예를 들어 Q-TBL[s][a]=Q-TBL[s][a]+λ_qR_S 이며, 여기서 λ_q는 보상값 R_S의 반영비율을 조정하기 위한 조정값으로서, 설계에 따라 정해지고 1의 값을 가질 수도 있다. Q-TBL[뉴 스][날씨안내]는 보상값 R_S에 의한 학습에 따라 그 값이 증가하고, 해당 사용자가 뉴스를 청취하다가 날씨안내 서비스를 요청하는 선호도가 증가한 결과가 된다.In step S43, a table learning will be described in detail using an example where a user requests a 'weather guide' service while listening to news in a living room on a weekend morning. First, Q-TBL learning is performed using the current state (s) and the action (a) requested by the user. Here, the current state (s) is 'news', the action (a) requested by the user is 'weather guide'. Therefore, in Q-TBL [s] [a], the value corresponding to Q-TBL [news] [weather guidance] is updated by the compensation value R _S. The equation used for updating is, for example, Q-TBL [s] [a] = Q-TBL [s] [a] + λ _q R _S , where λ _q is an adjustment for adjusting the reflectance ratio of the compensation value R _S. As a value, it is determined according to the design and may have a value of 1. Q-TBL [News] [Weather Guide] is the result of learning by compensation value R _S , and the result of the user's listening to the news, the result of increasing the preference for requesting weather information service.

그리고, 현재의 요일(d)과 시간대(t)와 행동(a)을 이용하여 F-TBL 학습을 수행한다. 여기서, 현재의 요일(d)은 '주말'이고, 현재의 시간대는 '오전'이며, 사용자가 요청한 행동(a)은 '날씨안내'이다. 따라서 F-TBL[d][t][a]에 있어서 F-TBL[주말][오전][날씨안내]에 해당하는 값이 보상값(R_S)에 의해 갱신된다. 갱신에 이용되는 식은 예를 들어 F-TBL[d][t][a]=F-TBL[d][t][a]+λ_fR_S 이며, 여기서 λ_f는 보상값 R_S의 반영비율을 조정하기 위한 조정값으로서, 설계에 따라 정해지고 Q-TBL에서의 λ_q값과 같을 수도 있다. F-TBL[주말][오전][날씨안내]는 보상값 R_S에 의한 학습에 따라 그 값이 증가하고, 해당 사용자가 주말 오전에 날씨안내 서비스를 요청하는 선호도가 증가한 결과가 된다.Then, F-TBL learning is performed using the current day of the week (d), time zone (t), and action (a). Here, the current day of the week (d) is 'weekend', the current time zone is 'am', and the action (a) requested by the user is 'weather guide'. Therefore, in F-TBL [d] [t] [a], the value corresponding to F-TBL [Weekend] [AM] [Weather Guide] is updated by the compensation value R _S. The equation used for updating is, for example, F-TBL [d] [t] [a] = F-TBL [d] [t] [a] + λ _f R _S , where λ _f is a reflection of the compensation value R _S As an adjustment value for adjusting the ratio, it is determined according to the design and may be equal to the λ _q value in Q-TBL. The value of F-TBL [Weekend] [Morning] [Weather Guide] increases according to the learning by the reward value R _S , and the user's preference for requesting weather information service on the weekend morning increases.

그리고, 현재의 장소(p)와 행동(a)을 이용하여 P-TBL 학습을 수행한다. 여기서, 현재의 장소(p)는 '거실'이고, 사용자가 요청한 행동(a)은 '날씨안내'이다. 따라서 P-TBL[p][a]에 있어서 P-TBL[거실][날씨안내]에 해당하는 값이 보상값(R_S)에 의해 갱신된다. 갱신에 이용되는 식은 예를 들어 P-TBL[p][a]=P-TBL[p][a]+λ_pR_S 이며, 여기서 λ_p는 보상값 R_S의 반영비율을 조정하기 위한 조정값으로서, 설계에 따라 정해지고 Q-TBL에서의 λ_q값과 같을 수도 있다. P-TBL[거실][날씨안내]는 보 상값 R_S에 의한 학습에 따라 그 값이 증가하고, 해당 사용자가 거실에서 날씨안내 서비스를 요청하는 선호도가 증가한 결과가 된다.Then, P-TBL learning is performed using the current place (p) and behavior (a). Here, the current place (p) is 'living room', the action (a) requested by the user is 'weather guide'. Therefore, the value corresponding to P-TBL [living room] [weather guide] in P-TBL [p] [a] is updated by the compensation value R _S. The equation used for updating is, for example, P-TBL [p] [a] = P-TBL [p] [a] + λ _p R _S , where λ _p is an adjustment for adjusting the reflectance ratio of the compensation value R _S. As a value, it is determined according to design and may be equal to the λ _q value in Q-TBL. P-TBL [living room] [weather guide] is the result of learning by the compensation value R _S , the value is increased, the result of the user's preference for requesting weather information service in the living room is increased.

이와 같이, 사용자가 요청한 서비스에 해당하는 테이블 값을 갱신함으로써, 각 테이블에 대한 학습을 수행하고, 테이블 학습 단계(S40)가 완료되면 단계(S30)로 복귀한다.As described above, by updating a table value corresponding to the service requested by the user, learning is performed for each table, and when the table learning step S40 is completed, the process returns to step S30.

한편, 단계(S30)에서 사용자로부터 서비스 요청이 없다고 판단된 경우(S30의 N)에는 단계(S50)로 진행하여 상황 인식 서버로부터 서비스 추천 요청이 있는지를 판단한다.On the other hand, if it is determined in step S30 that there is no service request from the user (N in S30), the process proceeds to step S50 to determine whether there is a service recommendation request from the situation recognition server.

단계(S50)에서 서비스 추천 요청이 있다고 판단한 경우(S50의 Y)에는, 서비스 추천 단계(S60)로 진행하여, 현재까지의 Q-TBL, F-TBL, P-TBL의 학습 내용을 기초로 사용자의 선호도를 파악하여 사용자에게 서비스를 추천한다. 서비스 추천 단계(S60)는 선호도 계산 단계(S61)와 행동 추천 단계(S62)의 두 개의 단계를 갖는다.If it is determined in step S50 that there is a service recommendation request (Y in S50), the process proceeds to the service recommendation step S60 and the user is based on the learning contents of Q-TBL, F-TBL, and P-TBL so far. Recommend service to users by grasping preferences. The service recommendation step S60 has two stages: a preference calculation step S61 and an action recommendation step S62.

선호도 계산 단계(S61)에서는 학습 내용을 기초로 해당 사용자의 선호도를 파악한다. 선호도 계산에 관한 실시예를 설명하기 위해, 사용자가 주중 저녁에 침실에서 음악을 청취하고 있는 경우를 가정한다.In the preference calculation step (S61), the user's preference is grasped based on the learning contents. To illustrate an embodiment of preference calculation, assume that a user is listening to music in the bedroom on a weekday evening.

본 실시예에 있어서는 Q-TBL, F-TBL, P-TBL을 사용한 경우를 예로 들지만, 실시에 따라서는 Q-TBL만, 또는 Q-TBL과 F-TBL만 있는 경우도 있고, 다른 테이블을 가지고 있는 경우도 있다. 이때는, 아래의 실시예를 참조하여 충분히 변형 적용 가능하다.In this embodiment, the case where Q-TBL, F-TBL, and P-TBL is used is taken as an example. However, depending on the implementation, there may be only Q-TBL, or only Q-TBL and F-TBL. In some cases. At this time, the deformation can be sufficiently applied with reference to the following embodiments.

먼저, Q-TBL, F-TBL, P-TBL의 비중을 조절하기 위하여 정규화를 수행한다. 아래에 설명하는 정규화는 Q-TBL, F-TBL, P-TBL이 0과 1사이의 값을 갖도록 정규화한 것이다. 식 (1)의 W_q는 Q-TBL을 정규화한 것이고, 식 (2)의 W_f는 F-TBL을 정규화한 것이며, 식 (3)의 W_p는 P-TBL을 정규화한 것이다. 본 예에서, 상태(s)는 '음악'이고, 요일(d)은 '주중'이며, 시간대(t)는 '저녁'이고, 장소(p)는 '침실'이다.First, normalization is performed to control specific gravity of Q-TBL, F-TBL, and P-TBL. Normalization described below is normalization such that Q-TBL, F-TBL, and P-TBL have a value between 0 and 1. W _q in formula (1) is a normalized Q-TBL, W _f in formula (2) is a normalized F-TBL, and W _p in formula (3) is a normalized P-TBL. In this example, state s is music, day d is weekday, time zone t is evening, and place p is bedroom.

식 (1)

Formula (1)

식 (2)

Formula (2)

식 (3)

Equation (3)

예를 들어 W_q는 현재 상태(s)에 대해서 추천의 대상이 되는 서비스인 각 행동(a)의 선호도를 Q-TBL을 이용하여 정규화해서 계산한 결과이다. W_q, W_f, W_p는 해당 사용자에 관해서 현재 상태(s)에 대해서 학습되어 있는 행동(a)의 종류 개수만큼 얻어질 것이다. 식 (1) 내지 식(3)에 의해 얻어진 정규화된 W_q, W_f, W_p 값을 이 용하여 추천의 대상이 되는 서비스인 각 행동(a)에 대해 선호도를 계산한다. For example, W _q is a result obtained by normalizing the preference of each action (a), which is a service to be recommended for the current state (s), using Q-TBL. W _q , W _f , and W _p will be obtained by the number of types of behavior (a) that have been learned about the current state (s) for that user. The normalized W _q , W _f , and W _p values obtained by equations (1) to (3) are used to calculate a preference for each action (a), which is the service to be recommended.

아래의 식 (4)는 각 행동(a)에 대한 해당 사용자(u)의 선호도인

를 계산하기 위한 것이다. 여기서

,

는 해당 사용자(u)에 관한 테이블의 정규화값이다. 그리고, α, β, γ는 각 테이블의 정규화값의 반영 비율을 규정하기 위한 상수로서, 1/3의 동일한 값을 가질 수도 있고 각 테이블의 중요도에 따라 다른 값을 가질 수도 있다. Equation (4) below is the user's (u) 's preference for each action (a)

To calculate. here

,

Is the normalized value of the table for the user u. In addition, α, β, and γ are constants for defining a reflection ratio of normalized values of respective tables, and may have the same value of 1/3 or may have different values depending on the importance of each table.

식 (4)

Equation (4)

식 (4)에 의해 계산된 각 행동(a)에 대한 선호도를 참조하여 현재의 상황, 즉 사용자가 주중 저녁에 침실에서 음악을 청취하고 있는 상황에서, 해당 사용자의 선호도가 가장 높은 행동(a)을 선택함으로써, 서비스 추천의 대상이 결정된다. 단계(S62)에서는 결정된 서비스인 행동(a)을 네트워크 기반 로봇이 사용자(u)에게 추천한다. 여기서는, 단계(S62)에서 '영화' 서비스를 네트워크 기반 로봇이 사용자에게 추천한 것으로 가정한다.Referring to the preferences for each behavior (a) calculated by Equation (4), the current situation, ie the user listening to music in the bedroom on a weekday evening, is the user's most preferred behavior (a). By selecting, the target of the service recommendation is determined. In step S62, the network-based robot recommends the action u, which is the determined service, to the user u. In this case, it is assumed that the network-based robot recommends the 'movie' service to the user in step S62.

다음으로, 테이블 갱신 단계에서는 단계(S62)에서 추천된 서비스인 행동(a)에 대해 사용자가 보인 반응에 따라 보상값을 결정하고 테이블을 갱신한다. 구체적으로는, 단계(S71)에서 사용자 반응이 긍정적인 경우에는 단계(S72)로 진행하여 보상값 R로서 R_P를 선정하고, 단계(S71)에서 사용자 반응이 부정적인 경우 또는 사 용자가 일정 시간 내에 반응을 나타내지 않은 경우에는 단계(S73)로 진행하여 보상값 R로서 R_N을 선정한다. 여기서, R_P는 양의 값이며 R_N은 음의 값이다. 그리고 단계(S74)에서 추천된 서비스에 관한 사용자의 반응에 따라 Q-TBL, F-TBL, P-TBL을 갱신하는 단계이다.Next, in the table updating step, a compensation value is determined and the table is updated according to the response shown by the user to the action (a) which is the service recommended in step S62. Specifically, if the user response in step S71 is affirmative, proceed to step S72 to select R _P as the compensation value R, and in step S71 if the user response is negative or the user within a certain time. If there is no response, the process proceeds to step S73 and R _N is selected as the compensation value R. Where R _P is a positive value and R _N is a negative value. The Q-TBL, F-TBL, and P-TBL are updated according to the user's response to the recommended service in step S74.

먼저, 현재의 상태(s)와 추천된 서비스인 행동(a)을 이용하여 Q-TBL 갱신을 수행한다. 여기서, 현재의 상태(s)는 '음악'이고, 추천된 서비스인 행동(a)은 '영화'이다. 따라서, Q-TBL[s][a]에 있어서 Q-TBL[음악][영화]에 해당하는 값이 보상값 R(R_P 또는 R_N)에 의해 갱신된다. 갱신에 이용되는 식은 예를 들어 Q-TBL[s][a]=Q-TBL[s][a]+λ_qR 이며, 여기서 λ_q는 보상값 R의 반영비율을 조정하기 위한 조정값으로서, 설계에 따라 정해지고 1의 값을 가질 수도 있다. 또한, 단계(S74)에서의 λ_q는 단계(S43)에서의 λ_q와 같을 수도 있고 다를 수도 있다. Q-TBL[음악][영화]는 보상값 R에 의한 학습에 따라 그 값이 증가 또는 감소하고, 해당 사용자가 음악을 청취하다가 영화를 보고 싶어 하는 선호도가 증가 또는 감소한 결과가 된다.First, the Q-TBL update is performed using the current state (s) and the recommended service (a). Here, the current state (s) is 'music', and the recommended service (a) is 'movie'. Therefore, the value corresponding to Q-TBL [music] [movie] in Q-TBL [s] [a] is updated by the compensation value R (R _P or R _N ). The equation used for updating is, for example, Q-TBL [s] [a] = Q-TBL [s] [a] + λ _q R, where λ _q is an adjustment value for adjusting the reflectance ratio of the compensation value R. It may be determined according to the design and may have a value of 1. In addition, λ _q in the step (S74) will be equal to λ _q in the step (S43), and may be different. Q-TBL [music] [movie] is a result of increasing or decreasing the value according to the learning by the compensation value R, the result of the user increases or decreases the preference to watch a movie while listening to music.

마찬가지로, F-TBL[주중][저녁][영화] 및 P-TBL[침실][영화]에 대해서도 Q-TBL과 마찬가지로 테이블 갱신을 수행한다. 상세한 내용은 단계(S43)에 대한 설명과 Q-TBL의 테이블 갱신을 참조하여 용이하게 파악할 수 있으므로 생략하도록 한다.Similarly, table updates are performed for F-TBL [weekday] [evening] [movie] and P-TBL [bedroom] [movie] similarly to Q-TBL. Details thereof will be omitted since they can be easily understood by referring to the description of step S43 and table update of Q-TBL.

이와 같이 단계(S70)에서, 추천된 서비스에 대한 사용자의 반응에 따라 테이 블 값을 갱신함으로써 다음 서비스 추천시에 활용할 수 있도록 한다. 테이블 갱신 단계(S70)가 완료되면 단계(S30)로 복귀한다. 한편, 단계(S50)에서 서비스 추천 요청이 없다고 판단된 경우(S50의 N)에도 단계(S30)로 복귀한다.As such, in step S70, the table value is updated according to the user's response to the recommended service so that it can be utilized in the next service recommendation. When the table update step S70 is completed, the process returns to step S30. On the other hand, if it is determined in step S50 that there is no service recommendation request (N in S50), the process returns to step S30.

본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법 및 서비스 추천 장치는, 강화 학습을 통해 사용자별로 개인화된 서비스를 추천할 수 있어서, 사용자 친화되고 능동적인 로봇을 개발하는데 큰 이용가능성이 있다 할 것이다.The service recommendation method and service recommendation apparatus for a network-based robot according to the present invention can recommend a personalized service for each user through reinforcement learning, and thus, there will be great applicability in developing a user-friendly and active robot.

도 1은 본 발명에 따른 전체 시스템의 구성도이고,1 is a block diagram of an entire system according to the present invention,

도 2는 본 발명에 따른 서비스 추천을 위한 자료 구조이며,2 is a data structure for recommending a service according to the present invention;

도 3은 본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법의 개략적인 흐름도이고,3 is a schematic flowchart of a service recommendation method for a network-based robot according to the present invention;

도 4는 본 발명에 따른 네트워크 기반 로봇을 위한 서비스 추천 방법의 구체적인 흐름도이다.4 is a detailed flowchart of a service recommendation method for a network-based robot according to the present invention.

Claims

As a service recommendation method for network-based robots,

When a network-based robot requests a service from a user, the network-based robot corresponds to the service requested based on situation information on a surrounding situation and an action which is a service requested by the user among the actions corresponding to the service provided by the network-based robot. A table learning step of performing table learning by updating table values;

A service recommendation step of recommending a service to the user based on the learning contents of each table when a service recommendation request is made from the user or the context aware server; And

And a table updating step of updating a table corresponding to the recommended service according to the user's reaction to the service when providing the service recommended in the service recommendation step through the network-based robot.

And learning service usage patterns and service preferences of the user based on table learning contents of the table learning step and the table updating step, and reflecting the learning result to the service recommendation step.

delete

As a service recommending device connected to the situation awareness server and the network,

When there is a service request from the user, the service requested by the user based on the context information on the surrounding situation provided by the context aware server and the action corresponding to the service provided by the network-based robot is a service requested by the user. Table learning means for performing learning on each table by updating a corresponding table value;

Service recommendation means for recommending a service to the user based on the learning contents of each table when a service recommendation request is input from the user or the context aware server; And

Table updating means for updating a table corresponding to the recommended service according to a user's response to a service recommended by the service recommending means,

The table learning means and the table updating means learn a service usage pattern and a service preference of the user based on the contents of the table,

And the service recommendation means recommends a service to the user using the learned service usage pattern and service preference of the user.

The method according to claim 8,

The table includes, as a factor, a Q-learning information table including state information of the user and the network-based robot and actions corresponding to the services provided by the network-based robot.

And the contextual information includes the state information.

The method according to claim 9,

The table further includes an hourly service usage frequency table including the day of the week, time zone information, and the action as a factor,

The contextual information further includes the day of the week and time zone information.

The method according to claim 9,

The table further includes a place-specific service usage frequency table including place information and the action as a factor,

And the contextual information further includes the place information.

The method according to claim 8,

The table further includes a user information table for associating a table with a user for a plurality of users.

The method according to claim 8,

The table learning means updates a table value corresponding to the service requested by the compensation value R _S ,

The table updating means updates the table value by the compensation value R _P when the user's response to the recommended service is positive, and by the compensation value R _N when the user's response to the recommended service is negative. And a service recommendation device for updating a table value.

The method according to claim 8,

And the service recommendation means selects a service to be recommended by normalizing and summing values of the tables learned by the table learning means with respect to the respective actions.