KR100429975B1

KR100429975B1 - Behavior learning heighten method of robot

Info

Publication number: KR100429975B1
Application number: KR10-2001-0082810A
Authority: KR
Inventors: 장성준
Original assignee: 엘지전자 주식회사
Priority date: 2001-12-21
Filing date: 2001-12-21
Publication date: 2004-05-03
Also published as: KR20030052776A

Abstract

본 발명은 로봇의 행동학습 강화방법에 관한 것으로, 특정한 하나의 행동에 대한 단순 강화학습뿐만 아니라 연관된 다른 행동에도 강화학습의 영향을 미치게 하여 로봇의 일반 학습능력을 증가시키도록 한 것이다. 이를 위하여 본 발명은 터치 판넬 및 마이크를 통해 입력되는 외부자극에 의해, 로봇의 움직임을 결정하는 마이크로프로세서가 내장된 로봇에 있어서, 상기 마이크로프로세서에서, 로봇의 내부 상태에 따라, 각 동작의 발현 확률을 설정하여 그 설정에 따라 수행할 동작을 결정하는 제1 과정과; 로봇의 특정 동작에 대하여, 사용자가 터치판넬을 통해 외부 명령신호를 입력하면, 상기 마이크로프로세서에서 그 외부 명령신호를 분석하는 제2 과정과; 상기 마이크포프로세서에서, 사용자에 의해 피드백된 외부의 명령신호를 분석한 결과에 따라 , 해당되는 특정 행동 선택 확률을 증가 또는 감소시키는 제3 과정과; 마이크로컴퓨터에서, 피드백된 행동과 연관된 행동들이 램에 저장되어 있으면, 그 연관된 행동들에 대한 선택 확률을 증가 또는 감소시킨 다음, 로봇의 행동모드를 자율모드로 복귀시키는 제4 과정으로 이루어진다.The present invention relates to a method for reinforcing behavior learning of a robot, and to increase the general learning ability of the robot by influencing reinforcement learning on not only simple reinforcement learning for a specific behavior but also other related behaviors. To this end, the present invention is a robot with a built-in microprocessor for determining the movement of the robot by the external stimulus input through the touch panel and the microphone, in the microprocessor, according to the internal state of the robot, the expression probability of each operation A first process of setting an operation and determining an operation to be performed according to the setting; A second process of analyzing, by the microprocessor, the external command signal when a user inputs an external command signal through a touch panel with respect to a specific operation of the robot; A third step of, in the microphone processor, increasing or decreasing a specific behavior selection probability according to a result of analyzing an external command signal fed back by a user; In the microcomputer, if the behaviors associated with the feedback behaviors are stored in the RAM, a fourth process of increasing or decreasing the selection probability for the associated behaviors and then returning the robot's behavior mode to the autonomous mode.

Description

BEHAVIOR LEARNING HEIGHTEN METHOD OF ROBOT}

본 발명은 로봇의 행동학습 강화방법에 관한 것으로, 특히 특정한 하나의 행동에 대한 단순 강화학습뿐만 아니라 연관된 다른 행동에도 강화학습의 영향을 미치게 하여 로봇의 일반 학습능력을 증가시키도록 한 로봇의 행동학습 강화방법에 관한 것이다.The present invention relates to a method of reinforcing behavior learning of a robot, and in particular, the behavior learning of a robot to increase the general learning ability of the robot by affecting the reinforcement learning on not only simple reinforcement learning for a specific behavior but also other related behaviors. It is about strengthening method.

일반적으로, 서비스 로봇 및 애완 로봇은 사용자의 피드백을 자각하여 특정 행동(Behavior)에 대한 단순 강화 학습을 수행한다.In general, service robots and pet robots perform simple reinforcement learning on specific behaviors by awaking user feedback.

즉, 로봇은, 머리나 목에 있는 터치 센서를 쓰다듬어 주거나, 긍정적인 뜻에 해당되는 음성단어가 인식되면 사용자의 칭찬으로 인식하고, 터치센서를 강하게 때리거나 부정적인 뜻에 해당되는 음성단어가 인식되면 꾸중으로 인식한다.That is, the robot recognizes the user's praise when stroking the touch sensor on the head or neck, or when a voice word corresponding to a positive meaning is recognized, and scolds when a strong hitting the touch sensor or a voice word corresponding to a negative meaning is recognized. To be recognized.

상술한 바와같이, 칭찬에 해당되는 피드백을 받은 경우는, 방금 수행했던 행동만의 차후 발현 비율을 높이는 방식으로 해당 행동을 강화시키고, 꾸중에 해당되는 피드백을 받은 경우는, 방금 수행했던 행동만의 차후 발현 비율을 낮추는 방식으로 해당 행동을 약화시킨다.As described above, when a feedback corresponding to a compliment is received, the behavior is strengthened by increasing a subsequent expression rate of only the behavior that was just performed, and when a feedback corresponding to a scolding is received, only the behavior that was just performed Subsequently, this behavior is attenuated by lowering the expression rate.

즉, 종래 로봇의 행동학습 강화방법은, 특정행동에 대한 사용자의 피드백 정보가 들어올 경우에 해당되는 행동에만 강화를 시키는 방식을 취해 왔는데, 이러한 방식은 로봇을 단순히 사람의 말에 단순하게 복종하게 하므로, 단편적 학습만을 수행하는 수동적인 로봇을 만들고, 또한 행동별로 분리되어 행동 각각이 개별적으로강화학습을 하는 관계로 행동의 일관성을 유지하는데 어려움이 있어 로봇의 행동들이 체계적이지 못하고 무의미하게 되는등의 일반화 성능이 저하되는 문제점이 있다.That is, the conventional behavior learning reinforcement method of the robot has been to reinforce only the behavior when the user's feedback information about a specific behavior comes in. This method simply obeys the words of the person simply because In addition, it is difficult to maintain the coherence of behaviors by creating passive robots that only perform fragmentary learning, and by separating each action into individual behaviors, so that the behaviors of the robots become unstructured and meaningless. There is a problem that performance is reduced.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로, 특정한 하나의 행동에 대한 단순 강화학습뿐만 아니라 연관된 다른 행동에도 강화학습의 영향을 미치게 하여 로봇의 일반 학습능력을 증가시키도록 한 로봇의 행동학습 강화방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above problems, the behavior of the robot to increase the general learning ability of the robot by affecting the reinforcement learning to not only simple reinforcement learning for one specific behavior but also related other behaviors Its purpose is to provide a method for reinforcing learning.

도1은 본 발명 로봇의 행동학습 강화방법이 적용되는 장치의 구성을 보인 블록도.1 is a block diagram showing the configuration of an apparatus to which the behavior learning reinforcement method of the present invention is applied.

도2는 본 발명 로봇의 행동학습 강화방법에 대한 동작흐름도.Figure 2 is a flow chart for the behavior learning strengthening method of the present invention robot.

도3은 도2에 있어서, 로봇의 행동학습 강화방법에 대한 동작을 보인 개략도.Figure 3 is a schematic diagram showing the operation for the behavior learning strengthening method of the robot in Figure 2;

*** 도면의 주요부분에 대한 부호의 설명 ****** Explanation of symbols for main parts of drawing ***

110:움직임 제어부 111:디지탈신호처리부110: motion control unit 111: digital signal processing unit

112:PWM제너레이터 113-1~113-N:모터112: PWM generator 113-1 to 113-N: Motor

1~N:포텐셔미터 120:움직임제어부1 to N: Potentiometer 120: Motion control unit

121:마이크로프로세서 122:플래시메모리121: microprocessor 122: flash memory

상기와 같은 목적을 달성하기 위한 본 발명은, 터치 판넬 및 마이크를 통해 입력되는 외부자극에 의해, 로봇의 움직임을 결정하는 마이크로프로세서가 내장된 로봇에 있어서, 상기 마이크로프로세서에서, 로봇의 내부 상태에 따라, 각 동작의 발현 확률을 설정하여 그 설정에 따라 수행할 동작을 결정하는 제1 과정과; 로봇의 특정 동작에 대하여, 사용자가 터치판넬을 통해 외부 명령신호를 입력하면, 상기 마이크로프로세서에서 그 외부 명령신호를 분석하는 제2 과정과; 상기 마이크포프로세서에서, 사용자에 의해 피드백된 외부의 명령신호를 분석한 결과에 따라, 해당되는 특정 행동 선택 확률을 증가 또는 감소시키는 제3 과정과; 마이크로컴퓨터에서, 피드백된 행동과 연관된 행동들이 램에 저장되어 있으면, 그 연관된 행동들에 대한 선택 확률을 증가 또는 감소시킨 다음, 로봇의 행동모드를 자율모드로 복귀시키는 제4 과정으로 수행함을 특징으로 한다.The present invention for achieving the above object, in the robot with a microprocessor for determining the movement of the robot by the external stimulus input through the touch panel and the microphone, in the microprocessor, the internal state of the robot Accordingly, a first process of setting an expression probability of each operation and determining an operation to be performed according to the setting; A second process of analyzing, by the microprocessor, the external command signal when a user inputs an external command signal through a touch panel with respect to a specific operation of the robot; A third step of increasing or decreasing the corresponding specific action selection probability according to a result of analyzing the external command signal fed back by the user in the microphone processor; In the microcomputer, if the behaviors associated with the feedback behaviors are stored in the RAM, a fourth process of increasing or decreasing the selection probability for the associated behaviors and then returning the robot's behavior mode to the autonomous mode is performed. do.

이하, 본 발명에 의한 로봇의 행동학습 강화방법에 대한 작용 및 효과를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, with reference to the accompanying drawings, the action and effect of the behavior learning strengthening method of the robot according to the present invention will be described in detail.

도1은 본 발명 로봇의 행동학습 강화방법이 적용되는 장치의 구성을 보인 블록도로서, 이에 도시한 바와같이 외부 이미지 및 외부 자극을 입력받아 이를 피지컬링크를 통해 움직임제어부로 전송하고, 그 움직임제어부 (120)의 명령신호에 따라 모터(113-1~113-N)의 구동을 제어하는 모터제어부(110)와; 상기 모터제어부 (110)로부터 입력되는 외부 이미지 및 자극과 터치 판넬 및 마이크를 통해 입력되는 외부자극에 의해, 로봇의 움직임을 결정하여 그에 따른 명령신호를 상기 모터제어부(110)에 전송하는 움직임제어부(120)로 구성한다.1 is a block diagram showing the configuration of a device to which the behavior learning reinforcement method of the present invention is applied. As shown in FIG. 1, an external image and an external stimulus are received and transmitted to a motion controller through a physical link, and the motion controller A motor controller 110 for controlling driving of the motors 113-1 to 113-N according to the command signal of 120; The motion control unit determines the movement of the robot by the external image and the stimulus input from the motor control unit 110 and the external stimulus input through the touch panel and the microphone, and transmits a command signal corresponding thereto to the motor control unit 110. 120).

상기 모터제어부(110)는, 외부의 자극을 센싱하는 센서(114)와; 외부 이미지를 씨씨디(115)를 통해 캡쳐하는 이미지캡쳐(116)와; 상기 센서(114)의 자극 및 이미지 갭쳐(116)의 이미지를 입력받아 이를 피지컬 링크를 통해 상기 움직임 제어부 (120)로 전송하고, 그 움직임 전송부(120)에서 전송되는 특정동작에 대한 명령신호를 디지탈 신호처리하여 그에 따른 모터구동제어신호를 출력하는 디지탈신호처리부 (111)와; 상기 디지탈신호처리부(111)의 모터구동제어신호를 입력받아 그 모터구동제어신호에 해당되는 동작속도를 생성하기 위한 펄스폭변조신호를 출력함과 아울러 상기 모터구동제어신호에 해당되는 특정동작의 위치를 지시하기 위한 위치신호를 출력하는 PWM제너레이터(112)와; 상기 PWM제너레이터(112)의 펄스폭변조신호 및 위치신호에 따라, 각기 해당 링크를 동작시키는 다수의 모터(113-1~113-N)와; 상기 링크를 통해 현재 관절의 위치를 파악하여, 명령에 따른 동작의 오류를 검출하여 그 검출된 오류를 재조정하는 포텐셔미터로 구성된다.The motor controller 110 includes a sensor 114 for sensing an external magnetic pole; An image capture 116 for capturing an external image through the CD 115; The sensor 114 receives an image of the stimulus and the image gap 116 and transmits the image to the motion controller 120 through the physical link, and sends a command signal for a specific operation transmitted from the motion transmitter 120. A digital signal processor 111 for digital signal processing and outputting a motor driving control signal accordingly; Receives the motor drive control signal of the digital signal processor 111 and outputs a pulse width modulated signal for generating an operation speed corresponding to the motor drive control signal, and the position of a specific operation corresponding to the motor drive control signal. PWM generator 112 for outputting a position signal for indicating a; A plurality of motors 113-1 to 113-N for operating the respective links according to the pulse width modulation signal and the position signal of the PWM generator 112; The position of the current joint through the link is detected, the error of the motion according to the command is detected and the potentiometer for readjusting the detected error.

상기 움직임제어부(120)는, 마이크(125)를 통해 음성신호를 입력받아 이를 코딩하는 오디오코덱부(124)와; 운용프로그램 및 응용프로그램이 저장되는 플래시메모리(122)와; 엘씨디 판넬과 일체형으로 이루어져, 외부의 자극을 입력받은 터치판넬(123)과; 상기 오디오코덱부(124)와 터치판넬(123)을 통해 입력되는 음성신호와 외부자극을 저장하는 램(127)과; 상기 오디오코덱부(124) 및 터치판넬(123)을 통해 입력되는 외부자극과 음성신호를 소정 신호처리하여 그에 따라 로봇의 행동을 결정한후, 그 로봇의 행동에 대한 명령신호를 피지컬 링크를 통해 모터제어부(110)에 전송하는 마이크로프로세서(121)로 구성한다.The motion control unit 120 includes: an audio codec unit 124 that receives a voice signal through a microphone 125 and codes the same; A flash memory 122 for storing an operating program and an application program; An LCD panel integrated with the touch panel 123 to receive an external stimulus; A RAM 127 for storing voice signals and external stimuli input through the audio codec unit 124 and the touch panel 123; After processing the external stimulus and the voice signal input through the audio codec unit 124 and the touch panel 123 to determine the behavior of the robot according to the predetermined signal, the command signal for the behavior of the robot through the physical link motor It consists of a microprocessor 121 for transmitting to the control unit 110.

상기 피지컬 링크는 I/O버스나 USB, 또는 RS232-C 케이블로 이루어진다.The physical link consists of an I / O bus, USB, or RS232-C cable.

도2는 로봇은 행동학습 강화방법에 대한 동작 흐름도로서, 이에 도시한 바와같이 내부 상태에 따라, 각 동작의 발현 확률을 설정하여 그 설정에 따라 수행할 동작을 결정하는 제1 과정과; 사용자에 의해 피드백된 외부의 입력이 사용자의 칭찬 또는 꾸중인지를 판단하는 제2 과정과; 상기 제2 과정의 판단결과, 피드백된 외부의 입력이 사용자의 칭찬 또는 꾸중이면 해당 행동 선택 확률을 증가 또는 감소시키는 제3 과정과; 피드백된 행동과 연관된 행동들이 있으면, 그 연관된 행동들에 대한 선택 확률을 증가 또는 감소시킨후 자율모드로 복귀하는 제4 과정으로 이루어지며, 이와같은 본 발명의 동작을 설명한다.2 is an operation flowchart of a method for reinforcing a behavior learning method, the first process of setting an expression probability of each operation and determining an operation to be performed according to the setting according to an internal state as shown in the figure; A second step of determining whether an external input fed back by the user is praise or condemnation of the user; A third step of increasing or decreasing a corresponding behavior selection probability if the external input feedback is praised or scolded by the user as a result of the determination of the second step; If there are behaviors associated with the fed back behavior, a fourth process of returning to autonomous mode after increasing or decreasing the probability of selection for the associated behaviors is described.

먼저, 로봇은 내부 상태에 따라, 행동을 선택할 수 있는 행동 선택 과정 (Behavior Selection)을 갖는데, 상기 내부상태는 노여움,즐거움,놀라움등의 감정 (Emotion)과 식욕,성욕등의 욕구(Motivation)으로 이루어진다.First, the robot has a behavior selection process (Behavior Selection) that can select the action according to the internal state, the internal state is the emotion (emotion) such as anger, pleasure, surprise, and the desire (Motivation) such as appetite, sexual desire Is done.

상기 행동선택과정은, 마이크로프로세서에서 비주기적으로 이루어지고, 선택된 로봇의 동작정보는 로봇의 정지자세를 특정한 시간 간격으로 나타낸 것으로 근사하는데, 이는 정화상을 연속으로 재생하여 동화상을 구현하는 영화나 동화(Animation)의 원리와 같다.The action selection process is performed aperiodically by a microprocessor, and the motion information of the selected robot is approximated by representing the stationary posture of the robot at a specific time interval. Same as the principle of (Animation).

단, 영화가 화상정보로 표현되듯이 로봇의 정지자세는 로봇이 가지고 있는 모든 관절(Joint)의 현재 지시값으로 나타내는데, 회전관절은 현재 지시값이 각도이고, 직동관절은 현재 지시값이 변위가 된다.However, as the movie is represented by the image information, the robot's stationary position is represented by the current indication value of all joints of the robot. In the case of the rotation joint, the current indication value is an angle, and in the linear joint, the current indication value is displaced. do.

이때, 로봇의 정지자세를 나타내는 모든 관절의 지시값을 프레임(Frame)으로 정의하고, 프레임을 시계열(Time Series)로 작성한 것을 동작정보로 정의한다.At this time, the instruction values of all the joints representing the robot's stationary position are defined as a frame, and a frame created in a time series is defined as motion information.

여기서, 본 발명의 동작을 설명하면, 우선, 현재의 기계적 장치로 실행가능한 로봇의 행동들을 분석한후, 각 행동의 연결 비율을 기설정하여 Commom-Sense Stereotype DB를 구현하는데, 그 Commom-Sense Stereotype DB는 임의의 행동간의 연결비율을 조정하기 위해, 특정 항목을 수정하거나 추가한다.,Here, when the operation of the present invention is described, first, after analyzing the actions of the robot executable by the current mechanical device, and implements the Commom-Sense Stereotype DB by setting the connection ratio of each action, the Commom-Sense Stereotype The database modifies or adds specific items to adjust the connection ratio between arbitrary actions.

여기서, 상기 Commom-Sense Stereotype DB는, 무선랜이나 RS232C케이블로 피씨로 연결되어, 사용자가 행동간의 연결비율을 조정하기 위해 내용을 교체한다.Here, the Commom-Sense Stereotype DB is connected to the PC by a wireless LAN or an RS232C cable, and the user replaces contents to adjust the connection ratio between actions.

예를 들어, 로봇의 행동중에 짖는 행동이나 땅을 파는 행동등은 강아지와 유사한 행동으로 묶을 수 있고, 경례하는 행동이나 부동자세등은 군인과 유사한 행동으로 묶어 놓을 수 있다.For example, barking or digging in a robot's actions can be tied to dog-like behavior, and salute or floating posture can be tied to soldier-like behavior.

이때, 로봇은 내부 상태에 따라, 각 동작의 발현 확률을 설정하여 그 설정에 따라 수행할 동작을 결정한다.At this time, the robot sets the expression probability of each operation according to the internal state and determines the operation to be performed according to the setting.

즉, 로봇의 내부상태에 따라, 각 동작들의 발현 확률을 설정한후, 그 설정에 따라 수행할 동작을 결정한다.That is, according to the internal state of the robot, the expression probability of each operation is set, and then the operation to be performed is determined according to the setting.

여기서, 상기 내부상태는 외부 입력과 로봇의 행동수행 상황을 종합하여 지속적으로 갱신한다.Here, the internal state is continuously updated by synthesizing the external input and the behavior of the robot.

그 다음, 사용자에 의해, 피드백 외부입력이 있으면, 그 외부 입력이 사용자의 칭찬 또는 꾸중인지를 판단한다.Then, by the user, if there is a feedback external input, it is judged whether the external input is praise or praise of the user.

즉, 터치센서를 통한 사용자의 입력으로 칭찬 또는 꾸중을 판단하는데, 사용자에 의해 피드백 받는 터치센서가 어느 부위에 있는지로 칭찬인지 꾸중인지를 구별하거나, 사용자에 의한 터치스크린의 눌림 지속시간에 따라 칭찬인지 꾸중인지를 구별한다.That is, it determines the praise or scolding by the user's input through the touch sensor, and distinguishes whether it is complimentary or scolded by which part of the touch sensor is fed back by the user, or compliments according to the duration of the touch screen pressed by the user. Distinguish between cognition and scolding

상기 칭찬 또는 꾸중을 판단하는 다른 방법으로, 칭찬과 꾸중에 해당되는 단어를 저장한 단어DB와 사용자의 음성입력을 비교하여 꾸중 또는 칭찬을 판단한다.As another method of determining the praise or scolding, the scrutiny or compliment is judged by comparing the user's voice input with the word DB storing the words corresponding to the compliment and scolding.

그 다음, 피드백된 외부의 입력이 사용자의 칭찬 또는 꾸중이면, 해당 행동 선택 확률을 증가 또는 감소시킨다.Then, if the inputted external input is praised or scolded by the user, the corresponding action selection probability is increased or decreased.

즉, 피드백된 외부의 입력이 사용자의 칭찬이면, 현재 저장되어 있는 관절 지시값의 시계열 데이터를 이용하여 현재 로봇 행동을 인식한후, 그 인식된 로봇행동의 발현 확률을 특정값 만큼 증가시키고, 피드백된 외부의 입력이 사용자의 꾸중이면, 현재 저장되어 있는 관절 지시값의 시계열 데이터를 이용하여 현재 로봇 행동을 인식한후, 그 인식된 로봇 행동의 발현 확률을 특정값 만큼 감소시킨다.That is, if the feedback input is praised by the user, the current robot behavior is recognized by using time series data of the currently stored joint indication value, and then the probability of expression of the recognized robot behavior is increased by a specific value, and the feedback is returned. If the external input is scolded by the user, the current robot behavior is recognized by using time series data of the currently stored joint indication value, and then the expression probability of the recognized robot behavior is reduced by a specific value.

그 다음, 상기 피드백된 행동과 연관된 행동들이 있으면, 그 연관된 행동들에 대한 선택 확률을 증가 또는 감소시키는데, 칭찬과 연관된 행동의 경우에는 발생 확률을 증가시키고, 꾸중과 연관된 행동의 경우에는 발생 확률을 감소시킨다.Then, if there are behaviors associated with the feedbacked behavior, increase or decrease the probability of selection for those associated behaviors, increase the probability of occurrence in the case of behaviors associated with praise, and increase the probability of occurrence in the case of behaviors associated with scolding. Decrease.

보다 상세하게 도3을 참조하여 설명하면, 행동선택과정은, Emotion Modeling부의 상태에 따라 각 동작들의 발현 확률을 설정하고, 그 설정된 발현확률에 따라 수행할 동작을 결정하는데, 상기 Emotion Modeling부는 외부 입력과 로봇의 행동 수행상황을 종합하여 지속적으로 갱신된다.Referring to FIG. 3 in detail, the action selection process sets an expression probability of each operation according to the state of the Emotion Modeling unit, and determines an operation to be performed according to the set expression probability, wherein the Emotion Modeling unit inputs an external input. It is continuously updated by synthesizing the performance status of robots and robots.

이렇게, 로봇이 정상적으로 동작하고 있을 때, 터치센서 또는 마이크를 통해 외부의 자극이 입력되면 이를 램에 저장하고, 피드백 프로세싱부는 그 입력이 사람의 칭찬 또는 꾸중인지를 판단하는데, 만약, 사람의 칭찬 또는 꾸중이 아니라고 판단되면 램에 저장된 외부입력은 사용되지 않고, 계속하여 행동 선택과정을 수행한다.As such, when the robot is operating normally, when an external stimulus is input through a touch sensor or a microphone, the external stimulus is stored in the RAM, and the feedback processing unit judges whether the input is a person's praise or adoration. If it is judged not to be scolded, the external input stored in the RAM is not used and the action selection process continues.

반대로, 칭찬 또는 꾸중이라고 판단되면, 기저장되어 있는 관절 지시값의 시계열 데이터를 이용하여 현재 로봇의 행동이 무엇인지를 알아내어 그 행동의 발현확률을 칭찬 또는 꾸중에 맞게 특정값 만큼 증가 또는 감소시킨후, 그 행동의 발현확률을 저장한다.On the contrary, if it is judged to be praised or scolded, the time series data of the pre-stored joint indication value is used to find out what the current robot's behavior is and increase or decrease the expression probability of the behavior by a specific value according to the praise or scolding. Then, the probability of expression of the action is stored.

이후, 마이크로프로세서는 칭찬 또는 꾸중의 피드백을 받은 로봇의 해당 행동과 연관된 행동들이 있는지를 Commom-Sense Stereotype DB를 이용하여 판단하는데, 연관된 행동들이 있을 경우에는 해당 행동의 차후 발현 확률을 칭찬 또는 꾸중에 맞게 증가 또는 감소시켜 저장한후, 상기 행동선택과정을 지속적으로 계속하는 자율모드로 복귀한다.The microprocessor then uses Commom-Sense Stereotype DB to determine if there are behaviors associated with the behavior of the robot that received compliment or scolding feedback. After increasing or decreasing as appropriate, it returns to the autonomous mode which continues the action selection process continuously.

상기 본 발명의 상세한 설명에서 행해진 구체적인 실시 양태 또는 실시예는 어디까지나 본 발명의 기술 내용을 명확하게 하기 위한 것으로 이러한 구체적 실시예에 한정해서 협의로 해석해서는 안되며, 본 발명의 정신과 다음에 기재된 특허청구의 범위내에서 여러가지 변경 실시가 가능한 것이다.The specific embodiments or examples made in the detailed description of the present invention are intended to clarify the technical contents of the present invention to the extent that they should not be construed as limited to these specific examples and should not be construed as consultations. Various changes can be made within the scope of.

이상에서 상세히 설명한 바와같이 본 발명은, 로봇의 행동을 변화시킬 때 연관관계가 있는 행동을 같이 강화시켜 로봇의 행동변화가 일관성 있고 의미있게 보이는 효과를 주어 사용자로 하여금 로봇이 지능이 있다고 보여지게 하는 효과가 있다.As described in detail above, the present invention reinforces related behaviors when changing the behavior of the robot so that the behavior change of the robot is consistent and meaningful, thereby allowing the user to see that the robot is intelligent. It works.

Claims

In a robot with a microprocessor that determines the movement of the robot by the external stimulus input through the touch panel and the microphone,

In the microprocessor, a first process of determining an expression probability of each operation according to an internal state of the robot and determining an operation to be performed according to the setting;

A second process of analyzing, by the microprocessor, the external command signal when a user inputs an external command signal through a touch panel with respect to a specific operation of the robot;

A third step of increasing or decreasing the corresponding specific action selection probability according to a result of analyzing the external command signal fed back by the user in the microphone processor;

In the microcomputer, if the behaviors associated with the feedback behaviors are stored in the RAM, a fourth process of increasing or decreasing the selection probability for the associated behaviors and then returning the robot's behavior mode to the autonomous mode is performed. How to strengthen behavior learning of robots

The method of claim 1, wherein the command signal,

A method for enhancing behavior learning of a robot, comprising praise and scolding for the currently selected behavior.

The method of claim 2, wherein praise and scolding

Behavior learning strengthening method of the robot, characterized in that determined by the user input through the touch sensor.

The method according to claim 3, wherein the robot is characterized by distinguishing whether the touch sensor which is fed back by the user is complimented or praised.

4. The method of claim 3, wherein the robot distinguishes between praise and admiration according to the duration of the touch screen pressed by the user.

The method of claim 2, wherein praise and scolding

Compensation and behavior reinforcement method of the robot, characterized by judging the scolding or praise by comparing the word DB storing the words corresponding to the praise and the scolding.

The method of claim 1, wherein the third process comprises:

If the feedback input is praised by the user, recognizing the current robot behavior using the time series data of the currently stored joint indication value, and then increasing the expression probability of the recognized robot behavior by a specific value. Behavior learning strengthening method of the robot, characterized in that.

The method of claim 1, wherein the third process comprises:

Recognizing the current robot behavior using the time series data of the currently stored joint indication value, if the external input feedback is scolded, and then reducing the expression probability of the recognized robot behavior by a specific value. Behavior learning strengthening method of the robot, characterized in that.

According to claim 1,

In the microcomputer, after analyzing the behavior of the robot executable by the current mechanical device, and further comprising the step of implementing the Commom-Sense Stereotype DB by setting the connection ratio of each behavior, the robot's behavior learning reinforcement further comprising Way.

The method of claim 9, wherein the Commom-Sense Stereotype DB is

How to enhance behavior learning of a robot, characterized in that to modify or add a specific item to adjust the connection ratio between arbitrary behaviors.

The method of claim 9 or 10, wherein the Commom-Sense Stereotype DB,

Connected to the PC using a wireless LAN or RS232C cable, the behavior learning strengthening method of the robot, characterized in that the contents are replaced by the user.