KR20200055495A

KR20200055495A - Reinforcement learning model creation method for automatic control of PTZ camera, and the reinforcement learning model

Info

Publication number: KR20200055495A
Application number: KR1020180139202A
Authority: KR
Inventors: 김동칠; 박성주; 김경만
Original assignee: 전자부품연구원
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2020-05-21
Also published as: KR102142651B1; WO2020101062A1

Abstract

Disclosed in the present invention is a method for generating a reinforcement learning model which can improve image control efficiency of an intelligence security control system by generating an efficient reinforcement learning model for the automatic control of a PTZ camera used in the intelligence security control system. To this end, the present method comprises: (1) a learning image data acquisition and analysis step in which learning image data including object information are acquired to perform reinforcement learning, and information of the position and the size of an object in an image is analyzed; (2) a step in which the action value of a PTZ camera is selected for controlling the PTZ camera; (3) a step in which a reward is calculated by estimating the control direction of the PTZ camera; (4) a step in which a data set including the state before and after the action of the camera and a reward is stored; (5) a step in which it is determined whether the number of the stored data sets is equal to or greater than the predetermined number; and (6) a step in which a reinforcement learning model is generated based on the stored data sets.

Description

Reinforcement learning model creation method for automatic control of PTZ camera, and the reinforcement learning model}

본 발명은 지능형 보안 관제 시스템이 적용된 종합상황실 또는 CCTV 통합관제센터와 같은 환경에서의 영상관제 효율 향상을 위하여 PTZ 카메라 자동제어를 위한 강화학습 모델을 생성하는 방법, 그리고 이에 의해 생성된 강화학습 모델에 관한 것이다.The present invention relates to a method for generating a reinforcement learning model for automatic control of a PTZ camera to improve image control efficiency in an environment such as a general situation room or CCTV integrated control center to which an intelligent security control system is applied, and a reinforcement learning model generated thereby. It is about.

본 발명은 2017년도 과학기술정보통신부 및 정보통신기술진흥센터의 정보통신·방송 연구개발 사업(과제고유번호 2017-0-00250; 연구과제명: 엣지카메라 임베디드 시스템과 영상분석 시스템의 협업 강화학습 기반 지능형 국방경계 감시 시스템 핵심 기술 개발)의 연구결과로서 수행되었다.The present invention is an information communication and broadcasting research and development project of the Ministry of Science and ICT and the Information and Communication Technology Promotion Center in 2017. It was carried out as a result of research on the development of core technologies for intelligent defense surveillance systems.

현재 많은 CCTV 통합관제센터가 구축되어 운영 중이며 여기에 많은 카메라가 연계되어 있다(2018년 현재 대한민국내에 약 190개소 이상의 CCTV 통합관제센터가 운영중이며 약 40만 9000여대의 카메라가 사용되고 있음).Currently, many CCTV integrated control centers have been built and operated, and many cameras are linked to it (as of 2018, more than 190 CCTV integrated control centers are in operation in Korea and about 409,000 cameras are in use).

지속적인 CCTV 수요의 증가로 인해 지방정부 및 산업현장에서 CCTV 통합관제센터 구축 후 관제인력 부족으로 어려움을 호소하고 있다. 이로 인해, 관제요원 1인당 수십 내지 수백 대의 카메라를 관제해야 하는 문제가 발생하고 있고, 이는 관제 효율 하락의 주요 원인이 되고 있다. 또한 관제요원을 추가 배치시에는 많은 예산을 확보해야 하는 문제가 발생한다.Due to the continued increase in CCTV demand, local governments and industrial sites are complaining of difficulties due to lack of control personnel after establishing a CCTV integrated control center. For this reason, there is a problem of controlling tens to hundreds of cameras per person, and this is a major cause of a decrease in control efficiency. In addition, there is a problem that a lot of budget needs to be secured when additionally deploying control personnel.

최근 이러한 문제점을 완화하고 CCTV 통합관제센터의 관제 시스템 효율성을 높이기 위해 스마트 관제가 도입되고 있다. 스마트 관제에는 영상 분석 서비스, 관제 요원의 관제 채널의 수 감소, 카메라의 우선순위 판단 등 여러 기술이 적용된다. 그러나 위험상황 발생과 같은 이벤트 감지시에 카메라의 PTZ(pan, tilt, zoom)를 수동으로 제어해야 한다.Recently, smart control has been introduced to alleviate these problems and increase the efficiency of the control system of the CCTV integrated control center. Various technologies are applied to smart control, such as video analysis service, reduction in the number of control channels of control personnel, and priority determination of cameras. However, it is necessary to manually control the PTZ (pan, tilt, zoom) of the camera when detecting an event such as a dangerous situation.

종래의 PTZ 카메라 제어 기술은 카메라 화면 중심에 객체를 위치시키기 위해 객체와 카메라 중심점의 거리의 차이를 기반으로 카메라를 제어하고 있다. 그러나 이 방법은 객체 크기를 고려하지 않은 단순한 카메라 이동 방법으로, 객체 추적 분야에만 적용할 수 있는 단점이 있다. 또한, 호모그래피를 이용한 PTZ 제어 방법은 객체 크기를 고려하지만 객체와 카메라의 물리적 거리를 고려해 Zoom을 제어하기 때문에 Zoom 제어가 정확하지 않다.Conventional PTZ camera control technology controls the camera based on the difference between the object and the center point of the camera in order to position the object in the center of the camera screen. However, this method is a simple camera movement method that does not consider the object size, and has a disadvantage that can be applied only to the object tracking field. In addition, the PTZ control method using homography considers the object size, but the zoom control is not accurate because the zoom is controlled in consideration of the physical distance between the object and the camera.

본 발명은 상기 문제를 극복하기 위하여 지능형 보안관제 시스템에서 사용되는 PTZ 카메라의 자동제어를 위해 효율적인 강화학습 모델을 생성함으로써, 지능형 보안관제 시스템의 영상 관제 효율성을 향상시키는 강화학습 모델 생성 방법을 제안한다.In order to overcome the above problems, the present invention proposes a method for generating a reinforced learning model that improves the image control efficiency of the intelligent security control system by generating an efficient reinforcement learning model for automatic control of the PTZ camera used in the intelligent security control system. .

본 발명에 따르면, 강화학습을 이용한 PTZ 카메라 자동제어를 위해 카메라 이동을 예측하고 상황에 적응적으로 Reward를 부여하는 강화학습 모델을 생성함으로써 지능형 보안 관제 시스템의 영상 관제를 효율적으로 수행할 수 있게 된다.According to the present invention, for automatic control of a PTZ camera using reinforcement learning, it is possible to efficiently perform video control of an intelligent security control system by generating a reinforcement learning model that predicts camera movement and adaptively rewards a situation. .

강화학습(reinforcement learning: RL)은 확률적 의사결정 문제를 푸는 기계학습의 새로운 방법론으로, 지도학습(supervised learning)과 달리 보상 또는 리워드(reward) 함수가 주어져서 미래에 얻어질 리워드값들의 평균을 최대로 하는 정책 함수를 찾는 기계학습 기법이다. 즉, 지도학습과 달리 목표값(target)은 리워드(reward)이고 예측값은 정책(policy) 또는 액션(action)이다.Reinforcement learning (RL) is a new methodology for machine learning that solves probabilistic decision-making problems. Unlike supervised learning, reward or reward functions are given to maximize the average of reward values to be obtained in the future. It is a machine learning technique to find the policy function of. That is, unlike supervised learning, the target value is a reward and the predicted value is a policy or an action.

상기 과제를 해결하기 위한 한 측면에 따르면, PTZ 카메라 자동제어를 위한 강화학습 모델을 생성하는 방법이 제공된다. 이 방법은 1) 강화학습을 수행하기 위해 객체 정보가 포함된 학습용 영상 데이터를 취득하고, 영상 내 객체의 위치 및 크기 정보를 분석하는 학습용 영상데이터 취득 및 분석 단계; 2) PTZ 카메라 제어를 위해 PTZ 카메라의 액션값을 선택하는 단계; 3) Pan Left, Pan Right, Tilt Up, Tilt Down, Zoom이 포함된 PTZ 카메라의 제어 방향을 추정해 리워드(Reward)를 계산하여, 상기 선택된 액션값을 이용하여 추정된 카메라 제어 방향으로 PTZ 카메라를 이동시키고, PTZ 카메라 이동 후 영상 내 객체의 위치 변화에 적응적으로 Reward를 부여하는 단계; 4) 카메라의 액션 전,후 상태와 Reward가 포함된 데이터셋을 저장하는 단계; 5) 저장 데이터셋의 개수가 사전 결정된 개수 이상인지 판단하는 단계; 6) 저장된 데이터셋을 기반으로 강화학습 모델을 생성하는 단계를 포함한다. According to one aspect for solving the above problems, a method for generating a reinforcement learning model for automatic control of a PTZ camera is provided. The method includes: 1) acquiring and analyzing learning image data to acquire learning image data including object information, and analyzing location and size information of objects in the image to perform reinforcement learning; 2) selecting an action value of the PTZ camera to control the PTZ camera; 3) Calculate the Reward by estimating the control direction of the PTZ camera including Pan Left, Pan Right, Tilt Up, Tilt Down, and Zoom, and use the selected action value to move the PTZ camera to the estimated camera control direction. Moving, and adaptively granting a Reward to a change in the position of the object in the image after moving the PTZ camera; 4) storing the data set including the state before and after the camera action and the reward; 5) determining whether the number of stored data sets is greater than or equal to a predetermined number; 6) creating a reinforcement learning model based on the stored dataset.

또한, 상기 과제를 해결하기 위한 다른 측면에 따르면, 입력층, 출력층, 및 은닉층을 포함하는 인공신경망을 포함하는 강화학습 모델이 제공된다. 이 강화학습 모델은, 상기 인공신경망의 가중치(weight)를 이용해 딥 Q-러닝(deep Q-learning) 함수를 계산하여 PTZ 카메라 제어에 대한 현재 학습 목표를 설정하는 수단; 상기 설정된 PTZ 카메라 제어에 대한 현재 학습 목표에 따라 실제의 PTZ 카메라 액션을 수행하여 PTZ 카메라 제어에 대한 다음 단계 학습 목표를 설정하는 수단; PTZ 카메라 제어에 대한 상기 현재 학습 목표와 다음 단계 학습 목표의 에러를 줄이도록 가중치를 갱신하여서 PTZ 카메라 제어에 대해 강화학습을 수행하는 수단을 포함한다. In addition, according to another aspect for solving the above problem, a reinforcement learning model including an artificial neural network including an input layer, an output layer, and a hidden layer is provided. This reinforcement learning model includes means for setting a current learning target for PTZ camera control by calculating a deep Q-learning function using the weight of the artificial neural network; Means for setting a next step learning target for PTZ camera control by performing an actual PTZ camera action according to the set current learning target for PTZ camera control; And means for performing reinforcement learning on the PTZ camera control by updating weights to reduce errors in the current learning target and the next learning target for the PTZ camera control.

이상에서 소개한 본 발명의 구성 및 작용은 차후에 도면과 함께 설명하는 구체적인 실시예를 통하여 더욱 명확해질 것이다. The configuration and operation of the present invention introduced above will be further clarified through specific embodiments described with reference to the drawings.

본 발명에 따르면, 강화학습을 이용함으로써 객체의 위치와 크기에 최적화 된 PTZ 카메라 제어가 가능하며, 이러한 강화학습 기반 PTZ 카메라 자동 제어에 의해 지능형 보안관제 시스템의 영상관제 효율성이 향상되고, 업무 부담이 감소될 수 있다.According to the present invention, it is possible to control the PTZ camera optimized for the position and size of the object by using reinforcement learning, and the automatic control of the PTZ camera based on the reinforcement learning improves the video control efficiency of the intelligent security control system and increases the workload. Can be reduced.

도 1은 본 발명의 강화학습 모델의 인공신경망 구조
도 2는 도 1의 인공신경망에 적용된 파라미터
도 3은 본 발명에서 제안하는 지능형 영상 관제 시스템의 PTZ 카메라 자동제어를 위한 강화학습 모델 생성 방법의 흐름도 1 is an artificial neural network structure of the reinforcement learning model of the present invention
Figure 2 is a parameter applied to the artificial neural network of Figure 1
3 is a flowchart of a method for generating a reinforcement learning model for automatic control of a PTZ camera of an intelligent video control system proposed in the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 기술되어 있는 실시예를 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있는 것이며, 단지 본 실시예는 본 발명의 개시가 완전하도록 하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 발명의 기술적 범위는 청구항의 기재에 의해 정의된다. Advantages and features of the present invention and methods for achieving them will be made clear by referring to embodiments described in detail with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete so that ordinary knowledge in the technical field to which the present invention pertains is provided. It is provided to fully inform the possessor of the scope of the invention. The technical scope of the invention is defined by the claims.

한편, 본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것이 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자 이외의 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.On the other hand, the terms used in this specification are for describing the embodiments and are not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, "comprises" or "comprising" means the presence of one or more other components, steps, operations and / or elements other than the components, steps, operations and / or elements mentioned, or Addition is not excluded.

이하, 본 발명의 바람직한 실시예를 첨부 도면을 참조하여 상세히 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가급적 동일한 부호를 부여하고 또한 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있는 경우에는 그 상세한 설명을 생략한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In adding reference numerals to the components of each drawing, the same components are assigned the same reference numerals as possible, even though they are shown on different drawings, and in describing the present invention, detailed descriptions of related well-known components or functions When the subject matter of the present invention can be obscured, the detailed description thereof will be omitted.

도 1은 본 발명의 강화학습 모델에 사용된 인공신경망 구조를 나타낸다. 도 1에 나타낸 인공신경망은 입력층(Innput layer)(10)과 출력층(Output layer)(20) 사이에 4개의 은닉층(Hidden layer)(30-1, 30-2, 30-3, 30-4)을 갖고 있다. 1 shows an artificial neural network structure used in the reinforcement learning model of the present invention. The artificial neural network shown in FIG. 1 has four hidden layers 30-1, 30-2, 30-3, and 30-4 between the input layer 10 and the output layer 20. ).

도 2는 도 1의 인공신경망에 적용된 파라미터를 나타낸다. 도 2에서 각 파라미터에 대해 개략적으로 설명한다. FIG. 2 shows parameters applied to the artificial neural network of FIG. 1. Each parameter is schematically described in FIG. 2.

- 최적화 기법(Optimizer)으로는 Adam을 사용하였다. 최적화는 신경망 모델의 학습과 그 결과에 따른 손실함수의 값을 최소화하는 방향으로 하이퍼파라미터의 값을 찾는 것이며, 이러한 최적화를 위한 다양한 기법 중 하나인 Adam 기법을 사용하였다. -Adam was used as the optimization technique. The optimization is to find the value of the hyperparameter in the direction of minimizing the value of the loss function according to the training of the neural network model and the result, and the Adam technique, one of various techniques for this optimization, was used.

- 손실함수(Loss functioin)는 모델을 통해 생성된 결과값과 실제로 발생하기를 원했던 값 간의 차이를 계산하는 함수로, 목적에 따라 여러 종류의 함수가 존재할 수 있는데, 이를 위해 많이 사용되는 mse(mean square error)를 본 발명에서도 사용하였다. mse는 추정한 값과 기대했던 값간의 차이를 측정하기 위해 에러 또는 편차의 제곱의 평균을 측정한다. 제곱값을 이용하므로 에러에 따른 값 변화가 커서 추측의 정확성이 높아지는 장점이 있다. -The loss function (Loss functioin) is a function that calculates the difference between the result value generated through the model and the value that actually wanted to occur, and there can be various types of functions depending on the purpose. square error) was also used in the present invention. mse measures the mean of the squares of errors or deviations to measure the difference between the estimated and expected values. Since the squared value is used, there is an advantage in that the accuracy of the guess is increased because the value change due to the error is large.

- 활성화 함수(Activation function)는 신호를 입력받아 이를 적절한 처리를 하여 출력해주는 함수로서, 이를 통해 출력된 신호가 다음 단계에서 활성화되는지를 결정한다. 활성화 함수로, 은닉층에는 ReLU(Rectified Linear Unit) 함수를, 출력층에는 Linear 함수를 사용하였다.-Activation function is a function that receives a signal and outputs it through appropriate processing. It determines whether the outputted signal is activated in the next step. As an activation function, a ReLU (Rectified Linear Unit) function was used for the hidden layer and a linear function was used for the output layer.

- 학습률(Learning Rate)은 한 번 학습할 때 얼마만큼 학습해야 하는지의 학습의 양을 의미하며 한 번의 학습량으로 학습한 이후에 가중치 파라미터가 갱신된다. 학습률 값은 미리 0.01, 0.001과 같이 특정 값을 정해두어야 하며 일반적으로 이 값이 너무 크거나 작으면 적합한 지점으로 찾아가기가 어렵다. 신경망 학습에서는 보통 이 학습률 값을 변경하면서 올바르게 학습하고 있는지를 확인한다. 학습률이 너무 크면 큰 값을 출력하고, 너무 작으면 거의 갱신되지 않고 학습이 끝나버린다. 본 발명에서는 0.001을 학습률로 설정하였다.-Learning rate means the amount of learning how much to learn when learning once, and the weighting parameter is updated after learning with one learning amount. The learning rate value should be determined in advance, such as 0.01 or 0.001. In general, if this value is too large or too small, it is difficult to find a suitable point. In neural network learning, we usually check if we are learning correctly by changing this learning rate value. If the learning rate is too large, a large value is output. If it is too small, the learning is ended without being updated. In the present invention, 0.001 was set as the learning rate.

- 뱃치사이즈(Batch size)는 한 번의 학습시 사용되는 데이터의 수를 의미하는 것으로 여기서는 128개로 설정하였다. -Batch size (Batch size) means the number of data used in one learning, was set to 128 here.

- 세대수(Epochs)는 전체 데이터에 대한 한 번의 학습수(forward 와 backward 포함)를 의미하는데, 본 발명에서는 35로 설정하였다.-The number of households (Epochs) means one learning number (including forward and backward) for the entire data, and was set to 35 in the present invention.

- 리워드가 리턴(return)되는 단위 구간인 에피소드(Episode)는 1000으로 설정하였다.-The episode (Episode), which is the unit section in which the reward is returned, is set to 1000.

이러한 인공신경망을 포함하는 강화학습 모델은, 이 인공신경망의 가중치(weight)를 이용해 딥 Q-러닝(deep Q-learning) 함수를 계산하여 PTZ 카메라 제어에 대한 현재 학습 목표를 설정한 후에 실제 PTZ 카메라 제어 액션을 수행하여 PTZ 카메라 제어에 대한 다음 단계 학습 목표를 설정한다. PTZ 카메라 제어에 대한 현재 학습 목표와 다음 단계 학습 목표의 에러를 줄이도록 가중치를 갱신하여서 PTZ 카메라 제어에 대해서 강화학습을 수행한다. 강화학습 모델에 대해서는 추후에 다시 설명한다.The reinforcement learning model including such an artificial neural network calculates a deep Q-learning function using the weight of the artificial neural network, sets a current learning target for PTZ camera control, and then sets an actual PTZ camera. Perform control actions to set next-level learning goals for PTZ camera control. Reinforcement learning is performed on the PTZ camera control by updating the weights to reduce errors in the current learning target for the PTZ camera control and the next learning target. The reinforcement learning model will be described again later.

도 3은 본 발명에서 제안하는 지능형 영상 관제 시스템의 PTZ 카메라 자동제어를 위한 강화학습 모델 생성 방법의 흐름도이다. 도 3을 참조하여 본 발명에 따른 PTZ 카메라 자동제어를 위한 강화학습 모델을 생성하는 방법에 대해 설명한다. 3 is a flowchart of a method for generating a reinforced learning model for automatic control of a PTZ camera of an intelligent video control system proposed in the present invention. A method of generating a reinforcement learning model for automatic control of a PTZ camera according to the present invention will be described with reference to FIG. 3.

110: 강화학습을 수행하기 위해 객체 정보가 포함된 학습용 영상 데이터를 취득하고, 영상 내 객체의 위치 및 크기 [x, y, w(width), h(height)] 정보를 분석하는 학습용 영상데이터 취득 및 분석 단계.110: Acquiring learning image data including object information to perform reinforcement learning, and acquiring learning image data analyzing the location and size [x, y, w (width), h (height)] of the object in the image And analysis steps.

120: 110의 단계를 수행한 후, PTZ 카메라 제어를 위해 PTZ 카메라의 액션값을 선택하는 단계.After performing steps 120 and 110, selecting an action value of the PTZ camera to control the PTZ camera.

여기서 카메라의 액션값은 (0.02, 0.06, 0.1, 0.14, 0.18, 0.22, 0.26, 0.3)으로 정의하며, 정의된 액션값은 수학식 1을 통해 선택된다. PTZ 카메라의 액션값

는 수학식 1에서

값에 의해 무작위로 선택되거나, 학습되지 않은 강화학습 모델로 액션값을 선택한다. 여기에서

는 1이며, 반복횟수

가 증가할수록 수학식 2의 제약 조건에 따라

값이 감소한다. 즉, 감소된

값이 임계값인

와 같거나 이보다 클 경우에만 액션값

를 무작위로 선택하며, 그렇지 않은 경우에는 현재 상태

와 학습되지 않은 강화학습 모델로 액션값을 선택한다. Here, the action value of the camera is defined as (0.02, 0.06, 0.1, 0.14, 0.18, 0.22, 0.26, 0.3), and the defined action value is selected through Equation 1. Action value of PTZ camera

In Equation 1

It is randomly selected by value, or an action value is selected by an untrained reinforcement learning model. From here

Is 1 and the number of repetitions

According to the constraints of equation (2)

Value decreases. That is, reduced

With a threshold value

Action value only if equal to or greater than

Randomly selects, otherwise the current state

And the action value is selected as a reinforcement learning model that is not learned.

수학식 2는 제약조건으로,

는

를 감소시키는 값이고,

은

값이 마이너스로 떨어지지 않게 하는 양수 값이다.Equation 2 is a constraint,

The

Is a value that decreases

silver

A positive value that prevents the value from falling negatively.

130: 이전 120 단계를 수행한 후, PTZ 카메라의 자동제어 방향을 추정해 리워드(Reward)를 계산하는 단계130: After performing the previous 120 steps, calculating a reward by estimating the automatic control direction of the PTZ camera

화면의 중심 좌표와 객체의 현재 좌표를 수학식 3을 통해 분석하여 PTZ 카메라의 제어 방향(즉, Pan Left, Pan Right, Tilt Up, Tilt Down, Zoom)을 추정한다. Equation 3 analyzes the center coordinates of the screen and the current coordinates of the object to estimate the control direction of the PTZ camera (ie, Pan Left, Pan Right, Tilt Up, Tilt Down, Zoom).

여기서

는 현재 객체의 가로 위치와 취득한 영상의 가로 중심 거리의 차이를 나타내고,

는 현재 객체의 세로 위치와 취득한 영상의 세로 중심 거리의 차이를 말한다. 또한

는 임계값 상수이고,

와

가

보다 작을 때 Zoom In을 카메라 제어 방향으로 추정한다. 그 외에는 Pan, Tilt를 수행하는데

가 양수일 경우 Pan Right, 음수일 경우 Pan Left를 이동 방향으로 추정하고,

가 양수일 경우 Tilt Down, 음수일 경우 Tilt UP으로 카메라 제어 방향을 추정한다.here

Indicates the difference between the horizontal position of the current object and the horizontal center distance of the acquired image,

Is the difference between the vertical position of the current object and the vertical center distance of the acquired image. In addition

Is the threshold constant,

Wow

end

When smaller, Zoom In is estimated as the camera control direction. Other than that, Pan, Tilt is performed.

If is positive, Pan Right, if negative, Pan Left is estimated as the moving direction,

If is positive, Tilt Down, and if negative, Tilt UP to estimate the camera control direction.

이 단계에서는 앞의 단계 120에서 선택된 액션값을 이용하여 추정된 카메라 제어 방향으로 PTZ 카메라를 이동시킨다. PTZ 카메라 이동 후 영상 내 객체의 위치 변화에 적응적으로 Reward를 부여한다. In this step, the PTZ camera is moved in the estimated camera control direction using the action value selected in step 120 above. Reward is given adaptively to the change of the position of the object in the image after moving the PTZ camera.

이를 위해 먼저 Pan과 Tilt에 관한 PTZ 카메라 움직임 발생시 수학식 4, 5, 6을 이용하여 이동 방향 및 액션값에 대응되는 Reward r _t 를 계산한다. 여기서

는 카메라 움직임 이전 상태에서 객체의 위치와 화면 중심의 거리 차이고,

은 카메라 이동 후 변한 객체의 위치와 화면 중심의 거리 차이다.

은 정규화 상수이고, τ ₁ 은 Pan과 Tilt의 목표 크기이다.To this end, when PTZ camera movements related to Pan and Tilt occur, Reward r _t corresponding to the movement direction and the action value is calculated using Equations 4, 5, and 6. here

Is the distance between the object's position and the center of the screen in the state before the camera movement,

Is the difference between the position of the object changed after the camera moves and the center of the screen.

Is the normalization constant, and τ ₁ is the target size of Pan and Tilt.

다음으로 수학식 7, 8을 이용하여 Zoom에 관한 Reward r _t 를 계산한다. 여기서

와

는 각각 카메라 이동 전,후에 대한 객체 크기 정보이고, 앞의

과 수학식 5, 6의

는 정규화 상수로 예컨대 각각 100, 10으로 설정가능하다. 그리고

는

를 반영하는 상수로 예컨대 1.2로 설정가능하며,

는 Zoom의 목표 크기로 예컨대 70으로 설정할 수 있다. Next, Reward r _t for Zoom is calculated using Equations 7 and 8. here

Wow

Is object size information for before and after camera movement, respectively.

And Equations 5 and 6

Can be set to 100 and 10, respectively, as normalization constants. And

The

It can be set to 1.2 as a constant reflecting,

Can be set to 70 as the target size of Zoom.

이와 같이 수학식 4~8을 이용하여 Pan, Tilt, Zoom 이동이 각각 가로, 세로의 중심과 목표 크기에 가까워질 경우 +Reward를 갖고 반대의 경우 -Reward를 갖기 때문에 이동값 및 방향에 대해 적응적으로 Reward를 부여할 수 있다.As described above, using the equations 4 to 8, Pan, Tilt, and Zoom movements have + Reward when they are close to the horizontal and vertical centers and target sizes, and -Reward in the opposite direction, so they are adaptive to the movement values and directions. Reward can be granted.

140: 이전 130 단계를 수행한 후, 액션 전,후 상태와 Reward를 저장하는 단계.140: After performing the previous 130 steps, before and after the action and storing the state and Reward.

저장할 액션 전,후 상태 및 Reward가 포함된 데이터셋은 11개의 세부 데이터로 구성되어 있으며 각 세부 데이터 항목은 다음과 같다.The data set including the state before and after the action to be saved and the reward is composed of 11 detailed data, and each detailed data item is as follows.

- 카메라 이동 전 객체 상태정보(x, y, width, height)-Object status information before moving the camera (x, y, width, height)

- 액션값 추정 정보(0~7)-Action value estimation information (0 ~ 7)

- Reward-Reward

- 카메라 이동 후 객체 상태정보(x, y, width, height)-Object status information after moving the camera (x, y, width, height)

- 종료 Flag-Exit Flag

150: 이전 140 단계를 수행한 후, 저장 데이터셋의 개수가 일정 개수 이상인지 판단하는 단계.150: After performing the previous 140 steps, determining whether the number of stored datasets is a predetermined number or more.

일 실시예에서는 강화학습에 사용할 목적으로 액션 전,후 상태와 Reward로 구성된 데이터셋 개수가 3000개 이상인지 확인한다.In one embodiment, for the purpose of use in reinforcement learning, it is checked whether the number of datasets composed of before and after actions and rewards is 3000 or more.

160: 이전 150 단계를 수행한 후, 저장된 데이터셋을 기반으로 강화학습 모델을 생성하는 단계.160: After performing the previous 150 steps, generating a reinforcement learning model based on the stored dataset.

강화학습 모델을 생성하기 위해 수학식 9를 사용한다. Equation (9) is used to create the reinforcement learning model.

위 수학식 9는 강화학습시의 학습 목표 설정에 관한 수식으로서, 각 변수의 정의는 다음과 같다.Equation 9 above is a formula for setting a learning goal in reinforcement learning, and the definition of each variable is as follows.

: 반복 횟수,

: 총 반복 횟수

: Number of repetitions,

: Total number of iterations

: 현재 단계의 상태,

: 다음 단계의 상태. 좀 더 구체적으로,

는 객체의 현재 좌표 및 크기이고,

은 120과 130 단계에서 추정된 액션값과 방향으로 PTZ 카메라를 이동시켰을 때의 객체의 좌표 및 크기이다(객체의 좌표 및 크기

임).

: Status of the current stage,

: Status of the next step. More specifically,

Is the current coordinate and size of the object,

Is the coordinates and size of the object when the PTZ camera is moved in the direction and action values estimated in steps 120 and 130 (the coordinates and size of the object)

being).

: 현재 단계의 액션값,

: 다음 단계의 액션값 (본 발명에서 액션값은 (0.02, 0.06, 0.1, 0.14, 0.18, 0.22, 0.26, 0.3) 중 선택된 값)

: Action value of the current step,

: Action value of the next step (in the present invention, the action value is a value selected from (0.02, 0.06, 0.1, 0.14, 0.18, 0.22, 0.26, 0.3))

: 무작위성을 조절하는 가중치(Weight)(E-greedy 방법 사용)

: Weight to control randomness (using E-greedy method)

: 현재 단계의 Reward

: Reward at the current stage

: 인공신경망의 Weight,

: 복사된 인공신경망의 Weight

: Weight of artificial neural network,

: Weight of the copied artificial neural network

: Deep Q-learning 함수 (Q값을 산출한다)

: Deep Q-learning function (calculates Q value)

수학식 9에서 ①번 항은 인공신경망

를 이용해 PTZ 카메라의 현재 상태

와 현재 액션값

로 Q값(딥 Q-러닝 함수값)을 계산한다(PTZ 카메라에 대한 현재 학습 목표 설정). ②번 항은 ①번 항에서 선택한

값으로 실제 PTZ 카메라 액션을 수행한 후 변경된 다음 단계의 PTZ 카메라 상태

와 새롭게 선택된 액션값

를 이용하여 복사된 인공신경망

을 통해 최대가 되는 Q값을 계산한다. 산출된 Q값은 가중치

로 조절하며, ①번 항 수행 후에 습득한 Reward를 결합한다(PTZ 카메라에 대한 다음 단계 학습 목표 설정). ① 항과 ② 항의 에러를 줄일 수 있는 인공신경망

를 학습함으로써 PTZ 카메라에 대한 강화학습이 수행된다.Paragraph ① in Equation 9 is an artificial neural network.

Current status of the PTZ camera using

And current action value

Calculate the Q value (dip Q-learning function value) with (Set current learning target for PTZ camera). Item ② is selected in the item ①

After performing the actual PTZ camera action with the value, the status of the next PTZ camera changed

And the newly selected action value

Artificial neural network copied using

Calculate the maximum Q value. The calculated Q value is the weight

Adjust to, and combine the Rewards acquired after performing item ① (setting the next step learning target for PTZ camera). Artificial neural network that can reduce the errors in terms ① and ②

Reinforcement learning for the PTZ camera is performed by learning.

이상에서, 본 발명의 바람직한 실시예를 통하여 본 발명의 구성을 상세히 설명하였으나, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 명세서에 개시된 내용과는 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호범위는 상기 상세한 설명보다는 후술한 특허청구범위에 의하여 정해지며, 특허청구의 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태는 본 발명의 기술적 범위에 포함되는 것으로 해석되어야 한다.In the above, the configuration of the present invention has been described in detail through a preferred embodiment of the present invention, but those skilled in the art to which the present invention pertains are disclosed herein without changing the technical spirit or essential features of the present invention. It will be understood that it may be implemented in a specific form different from the content. It should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention is defined by the claims below, rather than the above detailed description, and all changes or modifications derived from the claims and equivalent concepts should be interpreted to be included in the technical scope of the present invention. .

인공신경망 입력층(10), 출력층(20), 은닉층(30-1, 30-2, 30-3, 30-4)Artificial neural network input layer 10, output layer 20, hidden layer (30-1, 30-2, 30-3, 30-4)

Claims

As a method of creating reinforcement learning model for automatic control of PTZ camera,
1) acquiring and analyzing learning image data for acquiring learning image data including object information and analyzing position and size information of objects in the image to perform reinforcement learning;
2) selecting an action value of the PTZ camera to control the PTZ camera;
3) Calculate the Reward by estimating the control direction of the PTZ camera including Pan Left, Pan Right, Tilt Up, Tilt Down, and Zoom, and use the selected action value to move the PTZ camera to the estimated camera control direction. Moving, and adaptively rewarding a change in the position of the object in the image after moving the PTZ camera;
4) storing the data set including the state before and after the camera action and the reward;
5) determining whether the number of stored data sets is greater than or equal to a predetermined number;
6) A method of generating a reinforcement learning model for automatic control of a PTZ camera, comprising the step of generating a reinforcement learning model based on the stored dataset.

In claim 1, In the step of selecting the action value of the PTZ camera for the 2) PTZ camera control, the action value

The

Is selected through the equation of
Constant

The value is the threshold that was passed in advance

If equal to or greater than, the action value is randomly selected, otherwise the current state

And a reinforcement learning model generation method for automatic control of a PTZ camera in which an action value is selected as a reinforcement learning model that is not learned.

According to claim 1, In the step of calculating a Reward by estimating the control direction of the PTZ camera, the control direction of the PTZ camera is

Estimated by the equation of,

Wow

end

If it is smaller, Zoom In is estimated as the camera control direction. Otherwise, Pan and Tilt are estimated as the camera control direction.
here

Is the difference between the horizontal position of the current object and the horizontal center distance of the acquired image,

Is the difference between the vertical position of the current object and the vertical center distance of the acquired image,

Is a method of generating a reinforcement learning model for automatic control of a PTZ camera, which means a predetermined threshold constant.

In claim 3, the

If is positive, Pan Right, if negative, Pan Left is estimated as the camera control direction,

A method of generating a reinforcement learning model for automatic control of a PTZ camera that estimates Tilt Down when is positive and Tilt UP when it is negative in the camera control direction.

In claim 3, in relation to the Pan and Tilt of the PTZ camera

,

,

Reward r _t is calculated using the equation of
here

Is the difference between the position of the object changed after the camera moves and the center of the screen,

Is a normalization constant, and τ ₁ is a method of generating a reinforcement learning model for automatic control of a PTZ camera, which is a target size of Pan and Tilt.

In claim 3, with respect to the zoom of the PTZ camera

And

Reward r _t is calculated using the equation of
here

Wow

Is object size information for before and after camera movement,

Is the normalization constant,

The

Is a constant reflecting

Is a method of creating a reinforcement learning model for automatic control of a zoom target PTZ camera.

In the method of claim 1, 4) In the step of saving the data set including the before and after states of the camera action and the reward, the stored data set is the object state information before the camera movement, the action value estimation information, the reward, the object after the camera movement Method for creating reinforcement learning model for automatic control of PTZ camera including status information.

In the method of claim 1, 5) In the step of determining whether the number of stored datasets is greater than or equal to a predetermined number, a method of creating a reinforcement learning model for automatic control of a PTZ camera having 3000 predetermined datasets.

In step 1, 6) generating a reinforcement learning model based on the stored dataset,
Means for setting a current learning target for PTZ camera control by calculating a deep Q-learning function with a current state and a current action value using an artificial neural network;
Obtaining a reward after performing the current learning goal setting step,
After performing the actual action with the current action value, the maximum deep Q-learning function value is calculated through the artificial neural network copied using the changed state of the next step and the newly selected action value, and the calculated deep Q-learning function A method of generating a reinforced learning model for automatic control of a PTZ camera, comprising adjusting a value with a weight and combining the rewards acquired after performing the current learning target setting step to set a next learning target.

A reinforcement learning model used for automatic control of the PTZ camera according to any one of claims 1 to 9,
An artificial neural network including an input layer, an output layer, and a hidden layer;
Means for setting a current learning target for PTZ camera control by calculating a deep Q-learning function using the weight of the artificial neural network;
Means for setting a next step learning target for PTZ camera control by performing an actual PTZ camera action according to the set current learning target for PTZ camera control;
A reinforcement learning model for automatic control of a PTZ camera, comprising means for performing reinforcement learning on a PTZ camera control by updating weights to reduce errors in the current learning target and the next learning target for PTZ camera control.

In claim 10, In the artificial neural network
Loss functioin uses mse (mean square error),
The Activation function uses the ReLU (Rectified Linear Unit) function for the hidden layer and the Linear function for the output layer.
Learning Rate is set to 0.001,
Batch size is reinforced learning model for automatic control of PTZ camera set to 128.