KR20070071087A

KR20070071087A - Control method of group robot using local communication

Info

Publication number: KR20070071087A
Application number: KR1020050134260A
Authority: KR
Inventors: 이동욱; 이호길; 김홍석
Original assignee: 한국생산기술연구원
Priority date: 2005-12-29
Filing date: 2005-12-29
Publication date: 2007-07-04

Abstract

A group robot control method is provided to allow robots to cooperate with each other through a distributed learning function of each robot. A group robot control method includes a step(S10) of initializing learning and evolution parameters of all robots in a group; a step(S11) of allowing each robot to learn a behavior regulation; a step(S13) of allowing each robot to evaluate a fidelity through the behavior learning; a step(S14) of allowing each robot to search another robot to communicate; and a step(S15,S16) of allowing each robot to update a table if the fidelity of another robot is higher than the fidelity thereof.

Description

Control Method of Group Robot Using Local Communication

도 1은 본 발명에 따른 지역통신을 이용한 군로봇 제어방법을 설명하기 위한 로봇의 구성 블록도.1 is a block diagram illustrating a robot for explaining a military robot control method using local communication according to the present invention;

도 2는 본 발명에서 군로봇의 통신범위를 설명하기 위한 예시도.Figure 2 is an exemplary view for explaining the communication range of the military robot in the present invention.

도 3은 본 발명에 따른 지역통신을 이용한 군로봇 제어방법을 설명하기 위한 순서도.Figure 3 is a flow chart for explaining a military robot control method using local communication according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

10 : 로봇 1 11 : 구동부10: robot 1 11: drive unit

12 : 센서 13 : 행동학습부12 sensor 13 behavior learning unit

14 : Q-테이블 16 : 진화부14 Q-Table 16: Evolutionary Section

17 : 통신부 20 : 로봇 217: communication unit 20: robot 2

22 : 로봇 3 23 : 로봇 422: robot 3 23: robot 4

본 발명은 지역통신을 이용한 군로봇 제어방법에 관한 것으로, 특히 여러 대 의 자율 이동 로봇으로 구성된 군로봇 시스템에서 각각의 로봇이 각자 주어진 지역적 환경에서 학습을 하면서 다른 로봇과 통신하여 학습한 결과를 진화시켜 군로봇 전체의 작업 수행능력을 높이는 지역통신을 이용한 군로봇 제어방법에 관한 것이다.The present invention relates to a method for controlling a military robot using local communication, and in particular, in a military robot system composed of several autonomous mobile robots, each robot learns each other in a given local environment while communicating with other robots to evolve the results. The present invention relates to a military robot control method using local communication which improves the overall performance of military robots.

일반적으로, 군로봇을 제어하는 방법은 중앙관리자의 여부에 따라 중앙 집중형 방식과 자율 분산형 방식으로 나뉜다.In general, the method of controlling a military robot is divided into a centralized method and an autonomous distributed method depending on whether or not a central administrator is used.

상기 중앙 집중형 방식은 중앙관리자가 개개의 로봇을 제어함으로써 시스템의 목적을 달성하는 방식으로, 로봇의 수가 적을 때 효율적이나 로봇의 수가 증가할수록 시스템의 복잡도가 증가하여 제어하기가 어려운 단점이 있다.The centralized method is a method in which the central manager achieves the purpose of the system by controlling individual robots. When the number of robots is small, the centralized method is efficient, but as the number of robots increases, the complexity of the system increases, making it difficult to control.

반면 상기 자율 분산형 방식은 중앙관리자를 두지 않고 시스템을 구성하는 개개의 로봇이 개별적으로 시스템의 목적 및 환경, 다른 로봇의 거동 등을 인식하여 자신의 행동을 자율적으로 결정하여 각 요소간의 협조를 도모하는 시스템으로, 로봇의 수가 적을 때에는 상기 중앙 집중형 방식에 비해 효율성이 떨어지지만 로봇의 대수가 증가해도 시스템의 복잡도가 증가하지 않는 특징을 가지고 있다.On the other hand, in the autonomous distributed method, each robot that composes the system without having a central manager individually recognizes the purpose and environment of the system and the behavior of other robots to autonomously determine its own behavior to cooperate with each other. When the number of robots is small, the efficiency is lower than that of the centralized method, but the complexity of the system does not increase even if the number of robots increases.

특히, 상기 자율 분산형 군로봇 시스템에서 개개의 로봇은 사실상 다른 모든 로봇의 정보를 알 필요가 없으며, 자신이 처한 환경만 인식하여 행동하면 된다.In particular, in the autonomous distributed military robot system, each robot does not need to know information of virtually all other robots, and only needs to recognize and act on its own environment.

이를 위하여 종래에는 각 로봇의 제어방식을 사용자가 설계하여 시스템을 구성하였다.To this end, in the prior art, a system was designed by a user designing a control method of each robot.

그러나 일반적으로 여러 대의 로봇이 존재하는 동적인 시스템에서 각 로봇이 협조를 위한 최선의 행동을 하도록 설계하는 것은 매우 어려운 일이다.However, in a dynamic system with many robots in general, it is very difficult to design each robot to do its best to cooperate.

따라서 사용자는 많은 시행착오를 거쳐 설계를 해야 하고, 이와 같이 설계한 후에도 종종 예기치 못한 문제점이 발생한다.Therefore, the user has to design through a lot of trial and error, and even after designing like this, unexpected problems often occur.

본 발명의 목적은 상기와 같은 종래기술의 문제점을 해소하기 위한 것으로, 개개의 로봇의 행동시스템을 사전에 설계해주는 것이 아니라 학습을 통하여 스스로 생성하며, 통신을 이용하여 다른 로봇과 학습한 테이터를 교환함으로써 그 성능을 높여 주는 지역통신을 이용한 군로봇 제어방법을 제공하는 데 있다.An object of the present invention is to solve the problems of the prior art as described above, not to design the behavior system of each robot in advance, but to generate by themselves through learning, and exchange the learned data with other robots using communication By providing a military robot control method using a local communication to improve the performance.

본 발명은 상기와 같은 목적을 달성하기 위해, 군 로봇을 제어하는 방법에 있어서, 군을 형성하는 모든 로봇의 학습 및 진화 파라미터를 초기화하는 단계; 미리 설정된 시간동안 행동 학습하는 단계; 상기 행동 학습을 통해 적합도를 계산하는 단계; 통신 가능한 다른 로봇을 찾는 단계; 및 통신이 연결된 다른 로봇의 적합도가 자신의 적합도보다 높으면, 갱신하는 단계를 포함하는 것을 특징으로 하는 지역통신을 이용한 군로봇 제어방법을 제공한다.In order to achieve the above object, the present invention provides a method for controlling a group robot, the method comprising: initializing learning and evolution parameters of all robots forming a group; Learning behavior for a preset time; Calculating a goodness of fit through the behavioral learning; Finding another communicable robot; And if the fitness of the other robot connected to the communication is higher than the fitness of its own, providing a military robot control method using a local communication comprising the step of updating.

(실시예)(Example)

본 발명에 따른 지역통신을 이용한 군로봇 제어방법에 대하여 본 발명의 바람직한 실시예를 나타낸 첨부도면을 참조하여 상세하게 설명한다.With reference to the accompanying drawings showing a preferred embodiment of the present invention with respect to a military robot control method using a local communication according to the present invention will be described in detail.

첨부한 도면, 도 1은 본 발명에 따른 지역통신을 이용한 군로봇 제어방법을 설명하기 위한 로봇의 구성 블록도, 도 2는 본 발명에서 군로봇의 통신범위를 설명하기 위한 예시도, 도 3은 본 발명에 따른 지역통신을 이용한 군로봇 제어방법을 설명하기 위한 순서도이다.1 is a block diagram illustrating a robot for explaining a military robot control method using local communication according to the present invention, FIG. 2 is an exemplary diagram for explaining a communication range of a military robot in the present invention, and FIG. It is a flowchart for explaining a military robot control method using local communication according to the present invention.

본 발명은 강화학습을 통한 군로봇 시스템에 적합한 분산형 진화알고리즘을 이용한다.The present invention utilizes a distributed evolutionary algorithm suitable for military robot systems through reinforcement learning.

상기 진화알고리즘은 적합도의 평가, 선택, 교차 및 돌연변이 등의 연산이 순차적이며 일괄적으로 수행된다.The evolutionary algorithm is performed sequentially and collectively, such as the evaluation of fitness, selection, crossover and mutation.

하지만 군로봇 시스템에서는 진화알고리즘을 전담할 중앙관리자나 로봇이 존재하지 않으며 자연계에서 진화가 이루어지는 것과 같이 각각의 로봇이 진화의 주체가 되어 진화해나가야 한다.However, in the military robot system, there is no central manager or robot dedicated to the evolutionary algorithm, and each robot must evolve as the subject of evolution, just as evolution occurs in the natural world.

즉 각각의 로봇은 자신의 적합도를 평가해야 하고, 다른 로봇을 선택하며, 염색체의 교차 및 돌연변이 연산을 수행하여야 한다.Each robot must evaluate its fitness, select another robot, and perform crossover and mutation operations on its chromosomes.

따라서 본 발명은 군로봇 시스템을 위한 강화학습과 진화알고리즘에 기반한 분산형 진화알고리즘과, 염색체 표현방법, 학습회수를 활용한 경험기반 교차방법을 제시하는 것이다.Therefore, the present invention proposes a distributed evolutionary algorithm based on reinforcement learning and evolutionary algorithm for military robot system, and an experience-based crossover method using chromosome expression method and learning frequency.

본 발명은 군을 구성하는 로봇의 자율분산 학습 시스템은 도 1과 같이 구성되어 있다. 각 로봇(10, 20, 22, 23)은 환경을 인식할 수 있는 센서(12), 행동할 수 있는 구동부(11), 그리고 다른 로봇과 정보를 교환할 수 있는 통신부(17)를 가지고 있다.The present invention is a self-distributed learning system of the robot constituting the group is configured as shown in FIG. Each robot 10, 20, 22, 23 has a sensor 12 that can recognize the environment, a driver 11 that can act, and a communication unit 17 that can exchange information with other robots.

또한, 상기 각 로봇(10, 20, 22, 23)은 각자 지역적 환경에서 Q-학습부(13)에 의한 Q-학습(Q-learning)을 통하여 상태-행동 규칙을 학습하고 다른 로봇과의 통신으로부터 얻은 정보를 이용해 상태-행동 규칙을 진화시키는 구조로 되어있다.In addition, each of the robots 10, 20, 22, 23 learns the state-behavioral rules through Q-learning by the Q-learning unit 13 in a local environment and communicates with other robots. It is structured to evolve state-behavior rules using information from

자율 분산형 군로봇 시스템의 통신방법은 사용하는 통신수단에 따라 크게 대역적 통신(global communication)과 지역적 통신(local communication)으로 나누어진다.The communication method of the autonomous distributed military robot system is largely divided into global communication and local communication according to the communication means used.

일반적으로 전자인 대역적 통신은 무선 등의 광역성이 있는 통신매체를 이용하여, 대상으로 하는 로봇의 개수가 적은 경우에 유효하다. 그러나 대상으로 하는 로봇의 개수가 증가하면 통신능력의 제한이나 로봇 상호간의 간섭, 코스트의 증가 등의 문제가 발생하여 모든 로봇이 서로 통신하는 것이 곤란하여 이 경우에는 후자인 지역적 통신이 유리하다.In general, electronic broadband communication is effective when the number of target robots is small by using a communication medium having a wide area such as wireless. However, when the number of robots to be increased increases, problems such as limitation of communication capability, interference between robots, increase of cost, etc., make it difficult for all robots to communicate with each other. In this case, the latter regional communication is advantageous.

상기 지역적 통신방법은 통신반경내의 주변의 로봇에게만 정보를 전달할 수 있으므로 얻어진 정보가 시스템 전체로 전달되는데는 시간이 걸린다. 하지만 한 로봇이 다른 모든 로봇의 정보를 알 필요가 없는 경우 지역적 통신방법을 이용함으로써 불필요한 정보의 범람을 막을 수 있다.Since the local communication method can transmit information only to the robots around the communication radius, it takes time for the obtained information to be transmitted to the entire system. However, if one robot does not need to know the information of all other robots, it is possible to prevent unnecessary information overflow by using local communication method.

도 2는 본 발명에서 사용한 로봇의 지역적 통신방법을 나타낸다. 로봇 1(10)은 통신범위(Rc) 안에 있는 로봇 2(20)와만 통신이 가능하며 통신범위(Rc) 밖의 로봇 3(22) 및 로봇 4(23)와는 통신할 수 없다.2 shows a local communication method of the robot used in the present invention. The robot 1 10 may communicate only with the robot 2 20 within the communication range Rc, and may not communicate with the robot 3 22 and the robot 4 23 outside the communication range Rc.

로봇이 인식할 수 있는 상태를 S_i∈S, 취할 수 있는 행동을 a_j∈A라고 하면, 상기 Q-학습(Q-learning)에서 사용되는 상태-행동 규칙을 나타내는 Q-값은 Q(S_i,S_j)로 나타낼 수 있다. 단 i=1, 2, …, N_s, j=1, 2, …, N_a, N_s는 상태의 총 수, N_a는 행동의 총 수를 나타낸다. 그리고, N_sㅧ N_a개의 Q-값을 모아놓은 것을 Q-테이블(14)이라 한다.When the state in which the robot recognizes that the S _i ∈S, an action which can take a _j ∈A, conditions used in the study Q- (Q-learning) - Q- value representing the behavior rule Q (S _i , S _j ). I = 1, 2,... , N _s , j = 1, 2,... , N _a , N _s are the total number of states, and N _a is the total number of actions. The collection of N _s N N _a Q-values is referred to as a Q-table 14.

개개의 로봇은 통신을 통하여 다른 로봇의 Q-테이블을 전달받아 자신의 Q-테이블을 발전시킨다.Each robot receives the Q-table of another robot through communication and develops its own Q-table.

도 3은 로봇의 자율분산 학습을 위한 분산형 진화알고리즘을 나타내는 순서도이다.3 is a flowchart showing a distributed evolutionary algorithm for autonomous dispersion learning of a robot.

로봇은 스스로 Q-학습을 수행하면서 지정된 평가시간(T_eval)동안 받은 보상(reward)값을 이용해 적합도를 계산한다.The robot performs Q-learning itself and calculates the goodness of fit using the reward value received during the specified evaluation time (T _eval ).

알고리즘의 수행순서는 다음과 같다.The order of execution of the algorithm is as follows.

처음에 모든 로봇은 학습 및 진화에 관련된 파라미터를 초기화 한다(S 10). 그 후 로봇은 Q-학습(Q-learning)을 통하여 학습을 하면서 지정된 평가시간(T_eval)동안 받은 보상값을 이용해 적합도를 계산한다(S 11).Initially, all robots initialize parameters related to learning and evolution (S 10). After that, the robot calculates the fitness using the compensation value received during the designated evaluation time (T _eval ) while learning through Q-learning (S 11).

이때 평가시간을 카운트하는 파라미터 u를 이용해 평가시간의 소요여부를 판단한다.At this time, it is determined whether the evaluation time is required using the parameter u that counts the evaluation time.

일단 상기 평가시간(T_eval)이 지나서(즉, u>T_eval)(S13), 적합도가 계산되면(S 13), 로봇은 자신의 주변에 통신 가능한 로봇을 찾는다(S 14).Once the evaluation time (T _eval ) has passed (ie u> T _eval ) (S13), when the goodness of fit is calculated (S 13), the robot finds a robot that can communicate with its surroundings (S 14).

이때 통신 가능한 로봇의 적합도가 자신의 적합도보다 높으면(S 15), 그 로봇으로부터 Q-테이블을 전달받아 자신의 Q-테이블과 경험기반 교차연산을 수행해 새로운 Q-테이블을 생성한다(S 16).At this time, if the suitability of the communicable robot is higher than its own suitability (S15), the Q-table is received from the robot and the new Q-table is generated by performing experience-based cross-operation with its own Q-table (S16).

이후 평가시간 파라미터 u를 0으로 초기화 하여, 상기 T_eval이 지나기 전까지 적합도를 계산하지 않는다(S 17).After that, the evaluation time parameter u is initialized to 0, and the fitness is not calculated until the T _eval passes (S 17).

로봇의 적합도(F)는 수학식 1과 같이 상기 T_eval시간동안 받은 보상(R_i)의 누적합으로 표현할 수 있다.The fitness (F) of the robot may be expressed as a cumulative sum of the rewards (R _i ) received during the T _eval time as in Equation 1.

단, 수학식 1에서 상기 t는 현재 시간을 나타낸다.In Equation 1, however, t represents the current time.

본 발명에서는 더 많이 학습된 Q-값을 다음 세대로 전달하기 위해 경험기반 교차연산자를 개발하였다.In the present invention, an experience-based cross-operator was developed to deliver more learned Q-values to the next generation.

일반적인 진화알고리즘에서는 두 개의 부모개체로부터 두 개의 자손개체를 얻는다.In general evolutionary algorithms, two descendants are obtained from two parents.

하지만 로봇은 한 번에 하나의 개체만 보유할 수 있기 때문에 두 개의 자손개체 중 임의로 하나만을 선택하는 방법은 우수한 유전자를 손실할 가능성이 발생한다.However, since a robot can only hold one individual at a time, the method of randomly selecting one of the two progeny creates the possibility of losing a good gene.

경험기반 교차방법은 높은 학습회수를 가진 유전자들로 자손개체 하나만을 생성함으로서 이러한 문제점을 극복한다.Experience-based crossovers overcome this problem by generating only one offspring with genes with high learning counts.

본 발명에서는 Q-값과 함께 학습회수(L-값으로 정의)을 포함한 염색체 표현방법을 개발한다.The present invention develops a chromosome expression method including the number of learnings (defined as L-value) together with the Q-value.

본 발명에서 사용되는 염색체 H는 수학식 2와 같이 Q-값과 L-값의 행렬로 나타낼 수 있다.The chromosome H used in the present invention may be represented by a matrix of Q-values and L-values as shown in Equation 2.

단, 수학식 2에서, x_ij=Q(s_i,s_j)는 상태-행동 쌍 (s_i,s_j)의 Q-값을 나타내고, l_i=L(s_i)는 상태 s_i의 Q-값의 학습(갱신)회수를 나타낸다.However, in Equation 2, x _ij = Q (s _i , s _j ) represents the Q-value of the state-action pair (s _i , s _j ), and l _i = L (s _i ) represents the state s _i . Indicates the number of learning (updates) of the Q-value

염색체 H에서 상태가 동일한 열, 즉 i번째 열을 수학식 3과 같이 나타내고, 이를 유전자(gene)라고 부르기로 한다.In the chromosome H, a column having the same state, that is, the i-th column, is represented as in Equation 3, and this is called a gene.

단, 수학식 3에서, i=1, 2, …, N_s,이다.However, in Equation 3, i = 1, 2,... , N _s ,

경험기반 교차방법은 학습회수에 따라 부모개체로부터 자손개체를 생성한다. 자손개체의 유전자 g_i ^*는 두 개의 부모개체의 유전자 g_i ⁰과 g_i ¹를 수학식 4와 같은 확률에 의해 물려받는다.The experience-based crossover method creates offspring from the parent according to the number of learnings. The gene g _i ^* of the progeny inherits the genes g _i ⁰ and g _i ¹ of the two parental entities with the same probability as in Equation 4.

단, 수학식 4에서, p_i는 [0, 1] 사이의 균일한 분포의 임의의 수(uniform random number)이다.However, in Equation 4, p _i is a uniform random number of the uniform distribution between [0, 1].

만약 l_i ⁰이 l_i ¹보다 크면 g_i ⁰이 g_i ¹보다 선택될 확률이 높다.If l _i ⁰ is greater than l _i ¹ , then g _i ⁰ is more likely to be selected than g _i ¹ .

이와 같은 경험기반 교차방법에 의해 로봇의 우수한 유전자를 다음 세대로 전달하며 로봇이 학습하지 못한 상태에 대한 데이터를 공유하는 효과가 있음으로써 군 전체의 학습능력을 향상시킨다.By this experience-based crossover method, the robot's superior genes are transferred to the next generation, and the robot has the effect of sharing data about the state that the robot has not learned, thereby improving the learning ability of the entire group.

상기와 같이 이루어지는 본 발명은 지역적 통신시스템을 갖춘 기능이 제한된 소형 군 로봇시스템의 협조행동을 통하여 단일 매크로 시스템으로 해결하기 어려운 문제(장소, 시간, 에너지 등)를 해결해 주는 효과를 제공한다.The present invention made as described above provides an effect of solving a problem (place, time, energy, etc.) that is difficult to solve with a single macro system through a cooperative action of a small military robot system having a limited function with a local communication system.

그리고, 기능이 제한된 로봇의 협조행동은 차세대 로봇이라 할 수 있는 마이크로 로봇 연구에 기여할 수 있다.In addition, the cooperative behavior of robots with limited functions may contribute to the research of micro robots, which can be called next generation robots.

또한, 군로봇의 분산학습기능을 제공함으로써 낯선 환경에 적응을 통한 협조행동을 실현할 수 있다. 또한 이를 통하여 알려지지 않은 환경에서의 지뢰 탐색, 천연 자원 탐색, 오염지역 청소, 해저탐사 및 우주개발 등에 적용할 수 있는 효과 를 제공한다.Also, by providing distributed learning function of military robot, it is possible to realize cooperative behavior through adaptation to unfamiliar environment. It also provides effects that can be applied to mine exploration in unknown environments, natural resource exploration, polluted area cleanup, seabed exploration and space development.

Claims

In the method of controlling a military robot,

Initializing the learning and evolution parameters of all the robots forming the group;

Learning behavior for a preset time;

Calculating a goodness of fit through the behavioral learning;

Finding another communicable robot;

If the fitness of the other robot to which the communication is connected is higher than its fitness, updating;

Military robot control method using a local communication comprising a.