KR20190127261A

KR20190127261A - Method and Apparatus for Learning with Class Score of Two Stream Network for Behavior Recognition

Info

Publication number: KR20190127261A
Application number: KR1020180051667A
Authority: KR
Inventors: 변혜란; 김호성; 어영정; 김태형; 홍종광; 황선희; 기민송; 홍용원
Original assignee: 연세대학교 산학협력단
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2019-11-13
Also published as: KR102123388B1

Abstract

Embodiments of the present invention provide a method for learning a class score and an apparatus thereof. The method and the apparatus may increase the behavior recognition rate of a learning model and the extendability and compatibility of the learning model by applying an expected function for state and behavior pairs to association between class scores of a plurality of learning models and integrating the class scores of the learning models.

Description

Method and Apparatus for Learning with Class Score of Two Stream Network for Behavior Recognition}

본 발명이 속하는 기술 분야는 디바이스에서 복수의 스트림에 관한 네트워크의 클래스 스코어를 통합하여 학습하는 방법 및 장치에 관한 것이다.TECHNICAL FIELD The present invention relates to a method and apparatus for integrating and learning class scores of a network for a plurality of streams in a device.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the present embodiment and do not constitute a prior art.

기계 학습은 학습 방식에 따라 지도 학습(Supervised Learning), 준지도 학습(Semi-supervised Learning), 비지도 학습(Unsupervised Learning, 자율 학습), 강화 학습(Reinforcement Learning)으로 분류된다.Machine learning is classified into supervised learning, semi-supervised learning, unsupervised learning, and reinforcement learning.

지도 학습은 미리 구축된 학습용 데이터(Training Data)를 활용하여 모델을 학습하며, 준지도 학습은 학습용 데이터와 정리되지 않은 데이터를 모두 훈련에 사용하는 방법이다. 비지도 학습은 별도의 학습용 데이터를 구축하는 것이 아니라 데이터 자체를 분석하거나 군집(Clustering)하면서 학습한다. 강화 학습은 학습 수행 결과에 대해 적절한 보상을 주면서 피드백을 통해 학습한다.Supervised learning trains models using pre-built training data, and semisupervised learning uses both training data and unorganized data for training. Unsupervised learning does not construct separate learning data, but learns by analyzing or clustering the data itself. Reinforcement learning is learned through feedback, giving appropriate rewards for the outcome of learning.

외형(Appearance)과 움직임(Motion)은 비디오로부터 추출할 수 있는 사람의 행동을 특징짓는 요소이며, 공간 스트림(Spatial Stream)과 시간 스트림(Temporal Stream)으로부터 각각 추출이 가능하다. 공간 네트워크(Spatial Network)는 비디오의 각 프레임으로부터 행동과 물체들이 외양 정보를 추출하고 시간 네트워크(Temporal Network)는 연속하는 프레임 사이의 광학 흐름 영역(Optical Flow Field)로부터 복잡한 행동의 움직임 특징을 추출한다.Appearance and motion are factors that characterize human behavior that can be extracted from video, and can be extracted from spatial streams and temporal streams, respectively. Spatial Network extracts behavior and objects appearance information from each frame of video, while Temporal Network extracts complex behavioral movement features from optical flow fields between successive frames. .

본 발명의 실시예들은 복수의 학습 경로를 통하여 학습한 네트워크들의 클래스 스코어를 통합함으로써, 이미 학습을 마친 복수의 학습 모델의 클래스 스코어 간의 연관성을 개선하는 데 주된 목적이 있다.Embodiments of the present invention have a main object of improving the correlation between class scores of a plurality of learning models that have already learned by integrating class scores of networks learned through a plurality of learning paths.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Still other objects of the present invention may be further considered without departing from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면, 컴퓨팅 디바이스에 의한 클래스 스코어 학습 방법에 있어서, 미디어에 관한 제1 학습 모델의 출력 레이어로부터 제1 클래스 스코어를 획득하는 단계, 상기 미디어에 관한 제2 학습 모델의 출력 레이어로부터 제2 클래스 스코어를 획득하는 단계, 및 상기 제1 클래스 스코어에 관한 제1 가중치와 상기 제2 클래스 스코어에 관한 제2 가중치를 학습하여 통합 클래스 스코어를 생성하는 단계를 포함하는 클래스 스코어 학습 방법을 제공한다.According to an aspect of the present embodiment, in the class score learning method by the computing device, obtaining a first class score from the output layer of the first learning model for the media, the output layer of the second learning model for the media Obtaining a second class score, and learning a first weight value for the first class score and a second weight value for the second class score to generate an integrated class score. to provide.

본 실시예의 다른 측면에 의하면, 미디어에 관한 제1 학습 모델의 출력 레이어로부터 제1 클래스 스코어를 획득하는 제1 스코어 획득부, 상기 미디어에 관한 제2 학습 모델의 출력 레이어로부터 제2 클래스 스코어를 획득하는 제2 스코어 획득부, 및 상기 제1 클래스 스코어에 관한 제1 가중치와 상기 제2 클래스 스코어에 관한 제2 가중치를 학습하여 통합 클래스 스코어를 생성하는 통합 스코어 생성부를 포함하는 클래스 스코어 학습 장치를 제공한다.According to another aspect of the present embodiment, a first score obtaining unit which obtains a first class score from an output layer of a first learning model about media, and obtains a second class score from an output layer of the second learning model about the media. And a second score obtaining unit and an integrated score generating unit configured to learn a first weight value related to the first class score and a second weight value related to the second class score to generate an integrated class score. do.

본 실시예의 또 다른 측면에 의하면, 프로세서에 의해 실행 가능한 컴퓨터 프로그램 명령어들을 포함하는 비일시적(Non-Transitory) 컴퓨터 판독 가능한 매체에 기록되어 클래스 스코어 학습을 위한 컴퓨터 프로그램으로서, 상기 컴퓨터 프로그램 명령어들이 컴퓨팅 디바이스의 적어도 하나의 프로세서에 의해 실행되는 경우에, 미디어에 관한 제1 학습 모델의 출력 레이어로부터 제1 클래스 스코어를 획득하는 단계, 상기 미디어에 관한 제2 학습 모델의 출력 레이어로부터 제2 클래스 스코어를 획득하는 단계, 및 상기 제1 클래스 스코어에 관한 제1 가중치와 상기 제2 클래스 스코어에 관한 제2 가중치를 학습하여 통합 클래스 스코어를 생성하는 단계를 포함한 동작들을 수행하는 컴퓨터 프로그램을 제공한다.According to another aspect of this embodiment, a computer program for class score learning, which is recorded on a non-transitory computer readable medium containing computer program instructions executable by a processor, the computer program instructions being a computing device. Obtaining a first class score from an output layer of a first learning model on media, when executed by at least one processor of, obtaining a second class score from an output layer of a second learning model on the media And learning a first weight associated with the first class score and a second weight associated with the second class score to generate an integrated class score.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 복수의 학습 모델의 클래스 스코어 간의 연관성에 상태 및 행동 쌍에 관한 기대 함수를 적용하여 복수의 학습 모델의 클래스 스코어를 통합함으로써, 학습 모델의 행동 인식률을 개선하고 학습 모델의 확장성과 호환성을 향상시킬 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, by applying the expectation function for the state and behavior pairs to the association between the class scores of the plurality of learning models, the class scores of the plurality of learning models are integrated to integrate the class scores of the learning models. This can improve the recognition rate and improve the scalability and compatibility of the learning model.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned herein, the effects described in the following specification and the tentative effects expected by the technical features of the present invention are treated as described in the specification of the present invention.

도 1 및 도 2는 본 발명의 실시예들에 따른 클래스 스코어 학습 장치를 예시한 블록도이다.
도 3 및 도 4는 본 발명의 본 발명의 실시예들에 따른 클래스 스코어 학습 장치의 학습 모델 및 클래스 스코어를 예시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 클래스 스코어 학습 장치의 Q-value와 인식 정확도의 관계를 나타낸 그래프이다.
도 6은 본 발명의 다른 실시예에 따른 클래스 스코어 학습 방법을 예시한 흐름도이다.
도 7은 본 발명의 실시예들에 따라 수행된 모의실험 결과를 도시한 것이다.1 and 2 are block diagrams illustrating a class score learning apparatus according to embodiments of the present invention.
3 and 4 are diagrams illustrating a learning model and a class score of a class score learning apparatus according to embodiments of the present invention.
5 is a graph showing the relationship between the Q-value and the recognition accuracy of the class score learning apparatus according to an embodiment of the present invention.
6 is a flowchart illustrating a class score learning method according to another embodiment of the present invention.
7 shows simulation results performed in accordance with embodiments of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다.Hereinafter, in the following description of the present invention, if it is determined that the subject matter of the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted and some embodiments of the present invention will be omitted. It will be described in detail with reference to the exemplary drawings.

도 1 및 도 2는 클래스 스코어 학습 장치를 예시한 블록도이다.1 and 2 are block diagrams illustrating a class score learning apparatus.

도 1에 도시한 바와 같이, 클래스 스코어 학습 장치(100)는 제1 스코어 획득부(110), 제2 스코어 획득부(120), 및 통합 스코어 생성부(130)를 포함한다. 클래스 스코어 학습 장치(100)는 도 1에서 예시적으로 도시한 다양한 구성요소들 중에서 일부 구성요소를 생략하거나 다른 구성요소를 추가로 포함할 수 있다. 예컨대, 클래스 스코어 학습 장치(200)는 제3 스코어 획득부(230), 행동 인식부(250), 또는 이들의 조합을 추가로 포함할 수 있다.As shown in FIG. 1, the class score learning apparatus 100 includes a first score obtaining unit 110, a second score obtaining unit 120, and an integrated score generating unit 130. The class score learning apparatus 100 may omit some of the various components illustrated in FIG. 1 or further include other components. For example, the class score learning apparatus 200 may further include a third score acquirer 230, a behavior recognizer 250, or a combination thereof.

클래스 스코어 학습 장치(100, 200)는 복수의 학습 경로를 통하여 학습한 네트워크들의 클래스 스코어에 가중치를 적용하여 클래스 스코어를 통합한다. 클래스 스코어 학습 장치(100, 200)는 클래스에 따른 가중치를 학습하여 통합된 클래스 스코어를 갱신한다.The class score learning apparatus 100 or 200 integrates class scores by applying weights to class scores of networks learned through a plurality of learning paths. The class score learning apparatus 100, 200 learns weights according to classes and updates the integrated class scores.

제1 스코어 획득부(110, 210)는 미디어에 관한 제1 학습 모델의 출력 레이어로부터 제1 클래스 스코어를 획득한다. 제1 학습 모델은 컨볼루션 네트워크로 구현되어 외형 특징을 기반으로 클래스를 분류하여 제1 클래스 스코어를 생성할 수 있다.The first score acquirers 110 and 210 obtain a first class score from an output layer of the first learning model about the media. The first learning model may be implemented as a convolutional network to classify classes based on appearance features to generate a first class score.

제2 스코어 획득부(120, 220)는 미디어에 관한 제2 학습 모델의 출력 레이어로부터 제2 클래스 스코어를 획득한다. 제2 학습 모델은 컨볼루션 네트워크로 구현되어 움직임 특징을 기반으로 클래스를 분류하여 제2 클래스 스코어를 생성할 수 있다.The second score acquirers 120 and 220 obtain a second class score from an output layer of the second learning model about the media. The second learning model may be implemented as a convolutional network to classify classes based on movement characteristics to generate a second class score.

제3 스코어 획득부(230)는 미디어에 관한 제3 학습 모델의 출력 레이어로부터 클래스에 관한 제3 클래스 스코어를 획득한다. 제3 학습 모델은 소리 특징을 기반으로 클래스를 분류하여 제3 클래스 스코어를 생성할 수 있다.The third score obtaining unit 230 obtains a third class score of the class from the output layer of the third learning model of the media. The third learning model may generate a third class score by classifying classes based on sound features.

통합 스코어 생성부(130)는 제1 클래스 스코어에 관한 제1 가중치 및 제2 클래스 스코어에 관한 제2 가중치를 학습하여 통합 클래스 스코어를 생성한다. 통합 스코어 생성부(240)는 제1 클래스 스코어에 관한 제1 가중치, 제2 클래스 스코어에 관한 제2 가중치, 및 제3 클래스 스코어에 관한 제3 가중치를 학습하여 통합 클래스 스코어를 생성할 수 있다.The integrated score generator 130 learns a first weight related to the first class score and a second weight related to the second class score to generate an integrated class score. The integrated score generator 240 may generate the integrated class score by learning a first weight relating to the first class score, a second weight relating to the second class score, and a third weight relating to the third class score.

통합 스코어 생성부(130)는 상태(State) 및 행동(Action) 쌍에 관한 기대 함수를 정의하고, 상태를 클래스로 설정하고, 행동을 제1 가중치와 제2 가중치로 설정한다. 통합 스코어 생성부(240)는 행동을 제1 가중치, 제2 가중치, 및 제3 가중치로 설정할 수 있다.The integration score generator 130 defines an expectation function for a state and action pair, sets a state to a class, and sets an action to a first weight and a second weight. The integrated score generator 240 may set an action as a first weight, a second weight, and a third weight.

통합 스코어 생성부(130, 240)는 상태 중 현재 상태에서 행동에 대한 보상을 인식의 정확도로 설정하고, 행동 중 임의의 행동을 선택하는 탐색(Explore) 동작과 현재 상태에 대해 기대 함수가 기 설정된 값보다 큰 값을 갖는 행동을 선택하는 획득(Exploit) 동작을 반복하여 기대 함수를 갱신한다.The integrated score generators 130 and 240 set the compensation for the behavior in the current state among the states as the accuracy of recognition, and the expectation function is preset for the Explore action and the current state to select an arbitrary action among the actions. The expectation function is updated by repeating the Exploit action to select an action with a value greater than the value.

행동 인식부(250)는 통합 클래스 스코어를 기준으로 클래스를 분류하여 미디어 내에서 표현된 행동을 인식할 수 있다.The behavior recognizer 250 may classify the class based on the integrated class score to recognize the behavior expressed in the media.

이하에서는 도 3 내지 도 5를 참조하여, 클래스 스코어 학습 장치가 학습 모델의 클래스 스코어를 통합하는 동작을 설명하기로 한다.Hereinafter, an operation of integrating a class score of a learning model by the class score learning apparatus will be described with reference to FIGS. 3 to 5.

미디어는 이미지, 비디오, 오디오, 또는 이들의 조합을 포함하는 멀티미디어일 수 있다. The media can be multimedia including images, video, audio, or a combination thereof.

제1 학습 모델 내지 제3 학습 모델은 하나 이상의 레이어에서 컨볼루션 연산자를 통해 특징을 추출하여 특징 맵을 생성하고, 하나 이상의 레이어의 노드들은 네트워크로 연결되며, 추출한 특징을 다른 레이어에 전달하고, 서브샘플링을 통해 상기 추출한 특징을 통합하여 공간적 차원을 축소시키는 과정을 수행한다. The first to third learning models extract a feature from a convolution operator in one or more layers to generate a feature map, nodes of the one or more layers are networked, transfer the extracted feature to another layer, and Integrating the extracted features through sampling reduces the spatial dimension.

클래스 스코어 학습 장치는 제1 학습 모델 내지 제3 학습 모델의 파라미터를 학습한다. 레이어는 파라미터를 포함할 수 있고, 레이어의 파라미터는 학습가능한 필터 집합을 포함한다. 파라미터는 노드 간의 가중치 및/또는 바이어스를 포함한다.The class score learning apparatus learns the parameters of the first to third learning models. The layer may include a parameter, and the parameter of the layer includes a set of learnable filters. The parameters include weights and / or biases between nodes.

제1 학습 모델은 프레임의 시각적 정보를 기반으로 외형 특징을 추출한다. The first learning model extracts appearance features based on the visual information of the frame.

제2 학습 모델은 광학 흐름(Optical Flow) 방식을 이용하여 연속하는 프레임 사이의 모션 벡터를 추출할 수 있다. 모션 벡터는 방향과 크기를 갖는다. 제2 학습 모델은 프레임 내부의 영역마다 블록들을 설정한다. 블록 내부에 있는 픽셀들에서의 모션 벡터를 추출한다. 제2 학습 모델은 밀집 광학 흐름(Dense Optical Flow) 방식을 이용하여, 픽셀들에서의 모션 벡터를 추출할 수 있다. 제2 학습 모델은 Farneback Optical Flow를 이용할 수 있다. Dense Optical Flow는 특징 점이 아닌 픽셀마다 모션을 측정한다.The second learning model may extract a motion vector between successive frames using an optical flow method. Motion vectors have a direction and magnitude. The second learning model sets blocks for each region within the frame. Extract the motion vector at the pixels inside the block. The second learning model may extract a motion vector in the pixels by using a dense optical flow method. The second learning model may use Farneback Optical Flow. Dense Optical Flow measures motion per pixel rather than feature points.

제3 학습 모델은 시간에 따른 소리 정보를 이미지 형태로 변환하고 소리 특징을 추출한다.The third learning model converts sound information over time into an image form and extracts sound features.

아키텍쳐는 5개의 컨볼루션(Convolutional) 레이어와 3개의 풀리 커넥티드(Fully Connected) 레이어를 갖는 총 8개 레이어로 구성될 수 있다. 첫 번째 컨볼루션 레이어는 11x11x96 (필터 가로 크기x필터 세로 크기x필터 채널 수) 크기의 필터와 스트라이드(Stride)를 4로 설정할 수 있다. 두 번째 컨볼루션 레이어는 5x5x256 크기의 필터를 갖고 맥스 풀링(Max Pooling)을 수행하며, 세 번째 컨볼루션 레이어는 3x3x384 크기의 필터를 갖고 맥스 풀링을 수행하며, 네 번째 컨볼루션 레이어는 3x3x384 크기의 필터를 갖고, 다섯 번째 컨볼루션 레이어는 3x3x256 크기의 필터를 갖고 맥스 풀링을 수행한다. The architecture may consist of a total of eight layers with five convolutional layers and three fully connected layers. The first convolution layer can be set to 4 filters and strides of size 11x11x96 (filter width x filter height x number of filter channels). The second convolutional layer has a 5x5x256 filter with Max Pooling, the third convolutional layer has a 3x3x384 filter with Max Pooling, and the fourth convolutional layer has a 3x3x384 Filter. The fifth convolutional layer performs max pooling with a 3x3x256 filter.

첫 번째 풀리 커넥티드 레이어는 4,096차원, 두 번째 풀리 커넥티드 레이어는 4,096차원, 세 번째 풀리 커넥티드 레이어는 7차원 또는 15차원으로 구성될 수 있다. 마지막 세 번째 풀리 커넥티드 레이어 이후에는 소프트맥스(Softmax)를 수행하여 예측된 결과와 진실 데이터(Ground Truth) 사이의 오류를 활용하여 SGD(Stochastic Gradient Descent with Momentum) 기반의 역전파(Back Propagation) 방식으로 전체 네트워크의 파라미터들을 학습할 수 있다. The first pulley connected layer may have 4,096 dimensions, the second pulley connected layer may have 4,096 dimensions, and the third pulley connected layer may have 7 dimensions or 15 dimensions. After the third pulley-connected layer, Softmax is used to take advantage of the error between the predicted result and the ground truth, and back propagation based on the Stochastic Gradient Descent with Momentum (SGD). You can learn the parameters of the entire network.

공간 네트워크(Spatial Network)의 입력은 224x224x3 (이미지 가로 크기x이미지 세로 크기x채널 수)의 RGB 이미지이며, 시간 네트워크(Temporal Network)의 입력은 224x224x3의 광류(optical flow)로 구성된 이미지이다.The input of the spatial network is an RGB image of 224x224x3 (image width x image height x number of channels), and the input of a temporal network is an image composed of 224x224x3 optical flow.

공간 네트워크의 미니 배치 사이즈는 64, SGD의 모멘텀(Momentum)은 0.9, SGD의 가중치 감퇴(Weight Decay)는 0.0005, 학습률(Learning Rate)의 감소 스텝 사이즈(Step Size)는 20000, 초기 학습률은 0.001, 매 스텝 사이즈마다 1/10씩 학습률을 감소시키며 50,000회(Iteration)까지 반복하여 학습을 진행한다.The mini batch size of the spatial network is 64, the momentum of SGD is 0.9, the weight decay of SGD is 0.0005, the step size of learning rate is 20000, the initial learning rate is 0.001, Each step size decreases the learning rate by 1/10 and repeats the learning up to 50,000 times.

시간 네트워크의 미니 배치 사이즈는 64, SGD의 모멘텀(Momentum)은 0.9, SGD의 가중치 감퇴(Weight Decay)는 0.0005, 초기 학습률은 0.001, 50000회 반복할 경우 학습률을 0.0001로 감소시키며, 100,000회반복할 경우 학습률을 0.00001로 감소시키며, 125,000회까지 반복하여 학습을 진행한다. 시간 네트워크에서는 대용량 데이터세트에서 프리 트레이닝(Pre-Train)된 네트워크가 없기 때문에, 학습 반복 횟수를 2.5배 가량 늘려서 학습할 수 있다.The mini batch size of the time network is 64, the momentum of the SGD is 0.9, the weight decay of the SGD is 0.0005, the initial learning rate is 0.001, and the learning rate is reduced to 0.0001 when it is repeated 0.001, 50000 times. In this case, the learning rate is reduced to 0.00001 and the learning is repeated up to 125,000 times. In a time network, there is no network pre-trained on a large dataset, so we can learn by increasing the number of iterations by 2.5 times.

클래스 스코어 학습 장치는 각 행동 클래스마다 외형 의존도와 움직임 의존도가 다르다고 설정한다. 각각의 의존도에 해당하는 각 클래스마다 다른 공간 스트림(Spatial Stream)과 시간 스트림(Temporal Stream)의 가중치를 학습하여 인식 성능을 향상시킨다.The class score learning apparatus sets each behavior class to have different appearance dependencies and movement dependencies. The recognition performance is improved by learning the weights of different spatial streams and temporal streams for each class corresponding to each dependency.

데이터세트 중 N개의 샘플(x_i는 d 차원의 특징, y_i∈{1, ... , C}는 x_i의 Ground Truth 레이블, C는 클래스 개수)을 포함하는 트레이닝세트를 X=(x_i,y_i)|i={1, ... , N}라고 할 수 있다. w_k∈[0,1]는 k번째 클래스의 공간 스트림의 가중치로 정의하고 모든 클래스의 공간 스트림의 가중치 W=[w₁, ... , w_k , ... , w_C }라고 표현하면, 스트림 단위 퓨전(Stream-Wise Fusion)의 경우 w₁≠w₂≠...≠w_C이 된다. A training set containing N samples of the dataset, where x _i is a feature of the d dimension, y _i ∈ {1, ..., C} is the Ground Truth label of x _i , and C is the number of classes. _i , y _i ) | i = {1, ..., N}. w _k ∈ [0,1] is defined as the weight of the spatial stream of the kth class and expressed as the weight of the spatial streams of all classes W = [w ₁ , ..., w _k , ..., w _C } , in the case of the stream unit fusion _{(stream-Wise Fusion) w 1} ≠ w 2 ≠ ... ≠ w is a _C.

클래스 스코어 학습 장치는 Q-learning을 이용한 강화 학습(Reinforcement Learning) 방식으로 학습하여 각 클래스의 가중치를 학습할 수 있다. 처음 상태(State) s, 행동(Action) a에 대해서 Q(s,a)←0되도록 초기화한다. 그 뒤 현재 상태 s를 관측(Observation)하고, 다음 과정을 지속적으로 반복한다. 탐색(Explore) 및 획득(Exploit) 정책(Policy)을 이용하여 행동을 선택하고, 그 때 즉시 발생한 보상(Reward) r을 받은 뒤에 새로운 상태 s'를 관측하고, Q(s,a)을 수학식 1과 같이 업데이트한다. 마지막으로 현재 상태 s를 새로운 상태 s'로 업데이트한다.The class score learning apparatus may learn weights of each class by learning in a reinforcement learning method using Q-learning. Initialize state Q (s, a) ← 0 for state s and action a. Then observe the current state s (Observation) and repeat the following process continuously. Choose an action using the Explore and Exploit Policy, then receive a reward r that immediately occurred, observe the new state s', and then calculate Q (s, a) Update as in 1. Finally, we update the current state s to the new state s'.

수학식 1에서 γ는 할인 요소(Discounted Factor)이며, 추후에 발생하는 미래 보상에 대한 값어치를 현재 보상의 값어치에 비해 낮은 가중치를 두기 위한 파라미터이다. 차감 요소는 0.9로 설정할 수 있다. 차감된 미래 보상(Discounted Future Reward) 방식을 적용한 이유는 일반적인 미래 보상(Future Reward)만으로도 결과에 도달할 수 있지만 수렴(Convergence)에 필요한 시간을 단축시킬 수 있는 효과가 있다.In Equation 1, γ is a discount factor, and is a parameter for assigning a lower value to a future value of the future compensation than the value of the current compensation. The subtraction factor can be set to 0.9. The reason for using the discounted future reward method is to achieve the result by using only future reward, but it has the effect of reducing the time required for convergence.

일반적인 미래 보상(Future Reward)은 수학식 2와 같이 표현된다.General Future Reward is represented by Equation 2.

수학식 2를 시간 t에 대해 확장하면 수학식 3과 같이 표현된다.If Equation 2 is expanded for time t, it is expressed as Equation 3.

수학식 3에서 n은 수렴할 때까지 반복되는 무한한 수라고 가정한다. In Equation 3, n is assumed to be an infinite number that is repeated until convergence.

수학식 3에 대하여 차감된 미래 보상 방식을 적용하면 수학식 4와 같이 표현된다. Applying the future compensation method subtracted from Equation 3 is expressed as Equation 4.

수학식 4에 벨만방정식(Bellman Equation)을 적용하면 수학식 5 및 수학식 6과 같이 표현된다.When Bellman Equation is applied to Equation 4, Equation 5 and Equation 6 are expressed.

차감된 미래 보상 방식을 적용하여 에이전트(Agent)가 보상을 최대화하는 정책을 학습함으로써 최종값에 수렴하는 Q(s,a)를 찾고, 이를 바탕으로 학습된 각 상태에 대한 행동을 찾는다.By applying the deducted future reward method, the agent learns the policy of maximizing the reward, finds Q (s, a) that converges to the final value, and finds the behavior for each learned state based on this.

수학식 1에서 α는 학습률(Learning Rate)이며, 확률론적인(Stochastic) 환경 혹은 비결정적(Non-Deterministic) 환경에서 발생하는 무작위성(Randomness) 문제를 해결하기 위해 도입하였고, 학습률을 0.1로 설정할 수 있다. 지금까지 학습해 온 과거의 Q에 0.9의 가중치를 두고 현재 획득한 보상에 대해 0.1의 가중치를 둠으로써 증분식 학습(Incremental Learning)을 한다. 이는 학습에 신중을 기한다는 의미를 갖는다.In Equation 1, α is a learning rate, which is introduced to solve a randomness problem occurring in a stochastic or non-deterministic environment, and the learning rate may be set to 0.1. . Incremental Learning is performed by weighting 0.9 to the previous Q that has been learned so far and weighting 0.1 to the rewards currently acquired. This means being careful about learning.

클래스 스코어 학습 장치는 Q-learning 알고리즘을 통하여 학습된 모델과 기 개발한 투 스트림 컨볼루션 네트워크(Two-Stream Convolutional Neural Network)를 혼합한다. 클래스 기반으로 가중치를 학습하기 위해 Q-learning 알고리즘을 적용하며, 검증 정확도(validation Accuracy)를 Q 값으로 설정하고 각 클래스 기반 가중치를 설정하는 것을 행동(Action)으로 정의하여 학습한다. 학습은 450,000 반복하여 진행하였고, 탐색(Explore) 동작의 입실론 값은 0.1로 설정할 수 있다. 각각의 클래스 스코어에 대응하는 가중치의 합은 일정한 값으로 설정될 수 있으며, 예컨대, 제1 가중치와 제2 가중치의 합이 1로 설정되거나 제1 가중치, 제2 가중치, 및 제3 가중치의 합이 1로 설정될 수 있다.The class score learning apparatus mixes a model trained through a Q-learning algorithm and a previously developed two-stream convolutional neural network. Q-learning algorithm is applied to learn weight based on class, and learning is defined by setting validation accuracy to Q value and defining each class based weight as action. The learning was repeated over 450,000, and the epsilon value of the explore operation may be set to 0.1. The sum of the weights corresponding to each class score may be set to a constant value, for example, the sum of the first weight and the second weight is set to 1 or the sum of the first weight, the second weight, and the third weight is It can be set to one.

클래스 스코어 학습 장치의 성능 평가 방법은 인식 정확도(Accuracy)를 사용할 수 있다. 초기 가중치는 모두 동일하게 0.7로 시작할 수 있고, 상태는 각 클래스의 개수인 15개로 설정하고, 행동은 {-0.001, 0.001}로 정의하여 탐색 과정인 경우 임의로 행동을 선택하여 수행하고, 획득 과정인 경우 현재 상태에 대해 Q-value가 큰 행동을 수행하도록 한다. 현재 상태의 행동에 대한 보상은 인식 정확도로 설정할 수 있다.The performance evaluation method of the class score learning apparatus may use recognition accuracy. The initial weights can all start with the same 0.7, the state is set to 15, the number of each class, and the behavior is defined as {-0.001, 0.001}. In this case, let Q-value perform a big action on the current state. Compensation for the behavior of the current state can be set to the recognition accuracy.

도 5는 클래스 스코어 학습 장치가 450,000회 반복 수행한 전체 실험에서 Q-value가 수렴한 10,000회 반복 수행한 모델에 대한 Q-value와 정확도 결과 그래프를 보여준다. 학습이 진행됨에 따라 Q-value와 정확도가 비슷한 추이를 갖고 증가하는 것을 확인할 수 있다. 비교적 짧은 횟수로 반복 학습하였음에도 3.6%의 뛰어난 성능 향상을 보이며 수렴함을 파악할 수 있다. FIG. 5 shows a graph of Q-values and accuracy results for a model of 10,000 iterations of convergence of Q-values in a total experiment of 450,000 iterations performed by a class score learning apparatus. As learning progresses, we can see that Q-value and accuracy increase with similar trends. We can see the convergence with 3.6% excellent performance improvement even though we have repeated the training relatively short times.

클래스 스코어 학습 장치에 포함된 구성요소들이 도 1 및 도 2에서는 분리되어 도시되어 있으나, 복수의 구성요소들은 상호 결합되어 적어도 하나의 모듈로 구현될 수 있다. 구성요소들은 장치 내부의 소프트웨어적인 모듈 또는 하드웨어적인 모듈을 연결하는 통신 경로에 연결되어 상호 간에 유기적으로 동작한다. 이러한 구성요소들은 하나 이상의 통신 버스 또는 신호선을 이용하여 통신한다.Although components included in the class score learning apparatus are illustrated separately in FIGS. 1 and 2, the plurality of components may be combined with each other and implemented as at least one module. The components are connected to the communication path connecting the software module or the hardware module inside the device and operate organically with each other. These components communicate using one or more communication buses or signal lines.

클래스 스코어 학습 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The class score learning apparatus may be implemented in logic circuitry by hardware, firmware, software, or a combination thereof, or may be implemented using a general purpose or special purpose computer. The device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

클래스 스코어 학습 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The class score learning apparatus may be mounted on a computing device provided with hardware elements in the form of software, hardware, or a combination thereof. The computing device includes various or all communication devices such as a communication modem for performing communication with various devices or wired and wireless communication networks, a memory for storing data for executing a program, a microprocessor for executing and operating a program, and the like. It can mean a device.

도 6은 본 발명의 다른 실시예에 따른 클래스 스코어 학습 방법을 예시한 흐름도이다. 클래스 스코어 학습 방법은 컴퓨팅 디바이스에 의하여 수행될 수 있으며, 클래스 스코어 학습 장치가 수행하는 동작에 관한 상세한 설명과 중복되는 설명은 생략하기로 한다.6 is a flowchart illustrating a class score learning method according to another embodiment of the present invention. The class score learning method may be performed by a computing device, and a description that overlaps with a detailed description of an operation performed by the class score learning apparatus will be omitted.

단계 S610에서, 컴퓨팅 디바이스는 미디어에 관한 제1 학습 모델의 출력 레이어로부터 제1 클래스 스코어를 획득한다. 여기서 미디어는 이미지, 비디오, 오디오, 또는 이들의 조합을 포함하는 멀티미디어일 수 있다. 제1 학습 모델은 컨볼루션 네트워크로 구현되어 외형 특징을 기반으로 클래스를 분류하여 제1 클래스 스코어를 생성할 수 있다.In step S610, the computing device obtains a first class score from the output layer of the first learning model for the media. Here, the media may be multimedia including images, video, audio, or a combination thereof. The first learning model may be implemented as a convolutional network to classify classes based on appearance features to generate a first class score.

단계 S620에서, 컴퓨팅 디바이스는 미디어에 관한 제2 학습 모델의 출력 레이어로부터 제2 클래스 스코어를 획득한다. 제2 학습 모델은 컨볼루션 네트워크로 구현되어 움직임 특징을 기반으로 클래스를 분류하여 제2 클래스 스코어를 생성할 수 있다. In step S620, the computing device obtains a second class score from the output layer of the second learning model about the media. The second learning model may be implemented as a convolutional network to classify classes based on movement features to generate a second class score.

단계 S630에서, 컴퓨팅 디바이스는 제1 클래스 스코어에 관한 제1 가중치와 제2 클래스 스코어에 관한 제2 가중치를 학습하여 통합 클래스 스코어를 생성한다.In operation S630, the computing device learns a first weight value for the first class score and a second weight value for the second class score to generate an integrated class score.

클래스 스코어 학습 방법은 미디어에 관한 제3 학습 모델의 출력 레이어로부터 클래스에 관한 제3 클래스 스코어를 획득하는 단계를 추가로 포함할 수 있다. 제3 학습 모델은 소리 특징을 기반으로 상기 클래스를 분류하여 제3 클래스 스코어를 생성할 수 있다. The class score learning method may further comprise obtaining a third class score for the class from an output layer of the third learning model for the media. The third learning model may generate a third class score by classifying the class based on a sound feature.

통합 클래스 스코어를 생성하는 단계(S630)는 제1 클래스 스코어에 관한 제1 가중치, 제2 클래스 스코어에 관한 제2 가중치, 및 제3 클래스 스코어에 관한 제3 가중치를 학습하여 통합 클래스 스코어를 생성할 수 있다.Generating the unified class score (S630) may generate a unified class score by learning a first weight for a first class score, a second weight for a second class score, and a third weight for a third class score. Can be.

통합 클래스 스코어를 생성하는 단계(S630)는 상태 및 행동 쌍에 관한 기대 함수를 정의하고, 상태를 클래스로 설정하고, 행동을 제1 가중치와 제2 가중치로 설정한다. 행동을 제1 가중치, 제2 가중치, 및 제3 가중치로 설정할 수 있다. Generating an integrated class score (S630) defines an expectation function for the state and behavior pairs, sets the state to class, and sets the action to the first and second weights. An action may be set to a first weight, a second weight, and a third weight.

통합 클래스 스코어를 생성하는 단계(S630)는 상태 중 현재 상태에서 행동에 대한 보상을 인식의 정확도로 설정하고, 행동 중 임의의 행동을 선택하는 탐색 동작과 현재 상태에 대해 기대 함수가 기 설정된 값보다 큰 값을 갖는 행동을 선택하는 획득 동작을 반복하여 기대 함수를 갱신할 수 있다.Generating the integrated class score (S630) sets the compensation for the behavior in the current state of the state to the accuracy of recognition, the expectation function for the search action and the current state to select any of the behaviors than the predetermined value The expectation function can be updated by iterating the acquisition operation that selects the action with the large value.

클래스 스코어 학습 방법은 통합 클래스 스코어를 기준으로 클래스를 분류하여 미디어 내에서 표현된 행동을 인식하는 단계를 추가로 포함할 수 있다.The class score learning method may further include classifying the class based on the integrated class score to recognize the behavior expressed in the media.

도 7은 본 발명의 실시예들에 따라 수행된 모의실험 결과를 도시한 것이다. 투 스트림 컨볼루션 네트워크에 대한 대부분의 클래스에서 높은 행동 인식율을 갖는 것을 확인할 수 있다.7 shows simulation results performed in accordance with embodiments of the present invention. It can be seen that most of the classes for two-stream convolutional networks have high behavioral recognition rates.

도 6에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 6에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In FIG. 6, each process is described as being sequentially executed, but this is merely an example, and a person skilled in the art may change the order described in FIG. 6 without departing from the essential characteristics of the exemplary embodiment of the present invention. It may be possible to apply various modifications and variations, or to execute one or more processes in parallel or to add other processes.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.The operations according to the embodiments may be implemented in the form of program instructions that may be executed by various computer means and may be recorded in a computer readable medium. Computer-readable media refers to any medium that participates in providing instructions to a processor for execution. Computer-readable media can include program instructions, data files, data structures, or a combination thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over networked computer systems so that the computer readable code is stored and executed in a distributed fashion. Functional programs, codes, and code segments for implementing the present embodiment may be easily inferred by programmers in the art to which the present embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The present embodiments are for describing the technical idea of the present embodiment, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

100, 200: 클래스 스코어 학습 장치
110, 210: 제1 스코어 획득부
120, 220: 제2 스코어 획득부
230: 제3 스코어 획득부
130, 240: 통합 스코어 생성부100, 200: class score learning apparatus
110, 210: first score acquisition unit
120, 220: second score acquisition unit
230: third score acquisition unit
130, 240: integrated score generator

Claims

In the class score learning method by the computing device,
Obtaining a first class score from an output layer of a first learning model relating to the media;
Obtaining a second class score from an output layer of a second learning model relating to the media; And
Generating a consolidated class score by learning a first weight associated with the first class score and a second weight associated with the second class score
Class score learning method comprising a.

The method of claim 1,
And the media is multimedia including image, video, audio, or a combination thereof.

The method of claim 1,
The first learning model is implemented as a convolutional network to classify the class based on an appearance feature to generate the first class score,
And the second learning model is implemented as a convolutional network to classify the class based on a motion feature to generate the second class score.

The method of claim 1,
Obtaining a third class score for the class from an output layer of a third learning model for the media,
Generating the unified class score may include learning the unified class score by learning a first weight value for the first class score, a second weight value for the second class score, and a third weight value for the third class score. Class score learning method, characterized in that the generation.

The method of claim 4, wherein
And the third learning model classifies the class based on a sound feature to generate the third class score.

The method of claim 1,
Generating the integration class score,
Define an expectation function for a state and action pair, set the state to the class, and set the action to the first and second weights. .

The method of claim 6,
Generating the integration class score,
In the current state of the state, the compensation for the action is set to the accuracy of recognition, an exploration action of selecting any of the actions, and an expectation function having a value greater than a predetermined value for the current state. And updating the expectation function by repeating an exploit operation of selecting an action.

The method of claim 1,
Classifying the class based on the unified class score and recognizing a behavior expressed in media.

A first score obtaining unit obtaining a first class score from an output layer of a first learning model relating to the media;
A second score obtaining unit obtaining a second class score from an output layer of a second learning model relating to the media; And
An integrated score generator configured to generate a unified class score by learning a first weight associated with the first class score and a second weight associated with the second class score.
Class score learning device comprising a.

The method of claim 9,
The first learning model is implemented as a convolutional network to classify the class based on an appearance feature to generate the first class score,
And the second learning model is implemented as a convolutional network to classify the class based on a motion feature to generate the second class score.

The method of claim 9,
A third score obtainer for obtaining a third class score for the class from an output layer of a third learning model for the media,
The integrated score generator generates the integrated class score by learning a first weight associated with the first class score, a second weight associated with the second class score, and a third weight associated with the third class score. Class score learning apparatus.

The method of claim 9,
The integrated score generator,
Define an expectation function for a state and action pair, set the state to the class, and set the action to the first and second weights .

The method of claim 12,
Generating the integration class score,
In the current state of the state, the compensation for the action is set to the accuracy of recognition, an exploration action of selecting any of the actions, and an expectation function having a value greater than a predetermined value for the current state. And updating the expected function by repeating an exploit operation of selecting an action.

The method of claim 9,
And classifying the class based on the integrated class score to further recognize a behavior expressed in media.

A computer program for class score learning recorded on a non-transitory computer readable medium containing computer program instructions executable by a processor, the computer program instructions being executed by at least one processor of a computing device. If the,
Obtaining a first class score from an output layer of a first learning model relating to the media;
Obtaining a second class score from an output layer of a second learning model relating to the media; And
Generating a consolidated class score by learning a first weight associated with the first class score and a second weight associated with the second class score
Computer program for performing operations, including.