KR20230071705A

KR20230071705A - Learning Method and System for Object Tracking Based on Hybrid Neural Network

Info

Publication number: KR20230071705A
Application number: KR1020220086571A
Authority: KR
Inventors: 차문현; 정일채; 한보형; 박대영; 정창욱
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2021-11-16
Filing date: 2022-07-13
Publication date: 2023-05-23

Abstract

Disclosed are a hybrid neural network-based object tracking learning method and system. The object tracking learning system according to an embodiment of the present invention comprises: a first neural network module that expresses and learns a first parameter for an input image from a first type to a second type and outputs the learned result as a first learning result; a second neural network module that removes and learns a connection of a part of a second parameter for the input image and outputs the learned result as a second learning result; a prediction module that generates a prediction value for an object of the input image from a summation result obtained by summing the first learning result and the second learning result; and an optimization module that updates the first parameter and the second parameter based on the prediction value. Accordingly, calculations can be processed quickly with few resources while maintaining accuracy.

Description

Object tracking learning method and system based on hybrid neural network {Learning Method and System for Object Tracking Based on Hybrid Neural Network}

본 발명은 객체 추적 학습 방법 및 시스템에 관한 것으로, 하이브리드 신경망 기반 객체 추적 학습 방법 및 시스템에 관한 것이다. The present invention relates to an object tracking learning method and system, and relates to a hybrid neural network based object tracking learning method and system.

신경망(Neural Network), 예를 들어 심층신경망(Deep Neural Network, DNN)을 이용하여 객체 추적 기술 개발이 활발히 진행되고 있다. 그런데 객체 검출의 정확도가 보장되기 위해 심층신경망에 요구되는 매개변수 개수가 나날이 증가하는 추세이다. 예를 들어, ILSVRC(ImageNet Large Scale Visual Recognition Challenge)의 2014년도 우승 모델의 경우 4백만 개의 매개변수로 74.8%의 top-1 정확도를 보인 반면, 2017년도 우승 모델의 경우 145.8백만 개의 82.7%의 top-1 정확도를 보여, 매개변수 수가 약 36배 이상 증가되었다. 2. Description of the Related Art Object tracking technology is being actively developed using a neural network, for example, a deep neural network (DNN). However, the number of parameters required for deep neural networks to ensure object detection accuracy tends to increase day by day. For example, the 2014 winning model of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has a top-1 accuracy of 74.8% with 4 million parameters, while the winning model in 2017 has a top-1 accuracy of 82.7% with 145.8 million parameters. -1 accuracy, increasing the number of parameters by about 36 times.

이에, 객체 검출에 대한 정확도를 유지하거나 정확도 손실(accuracy loss)을 최소화하면서도 적은 리소스(resource)로 빠르게 연산 처리를 할 수 있는 신경망에서의 경량화 또는 가속화할 수 있는 방안이 요구되고 있다. Accordingly, there is a need for a method for lightening or accelerating a neural network capable of rapidly performing computational processing with a small amount of resources while maintaining object detection accuracy or minimizing accuracy loss.

본 발명은 상술한 과제를 해결하기 위한 것으로서,　정확도를 유지하면서도 적은 리소스로 빠르게 연산 처리를 할 수 있는 하이브리드 신경망 기반 객체 추적 학습 방법 및 시스템을 제공하고자 한다. The present invention is to solve the above problems, and to provide a hybrid neural network-based object tracking learning method and system capable of quickly performing calculation processing with small resources while maintaining accuracy.

상기 기술적 과제를 해결하기 위한 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템은, 입력 영상에 대한 제1 매개변수를 제1 타입에서 제2 타입으로 표현하고 학습하여 제1 학습 결과로 출력하는 제1 신경망 모듈; 상기 입력 영상에 대한 제2 매개변수 중 일부의 연결을 제거하고 학습하여 제2 학습 결과로 출력하는 제2 신경망 모듈; 상기 제1 학습 결과 및 상기 제2 학습 결과를 합산한 합산 결과로부터 상기 입력 영상의 객체에 대한 예측값을 생성하는 예측 모듈; 및 상기 예측값에 근거하여 상기 제1 매개변수 및 제2 매개변수를 업데이트하는 최적화 모듈;을 포함한다. A hybrid neural network-based object tracking learning system according to an embodiment of the present invention for solving the above technical problem expresses a first parameter of an input image from a first type to a second type, learns, and outputs a first learning result. a first neural network module; a second neural network module that removes connections of some of the second parameters of the input image, learns, and outputs a second learning result; a prediction module generating a predicted value for the object of the input image from a result obtained by summing the first learning result and the second learning result; and an optimization module that updates the first parameter and the second parameter based on the prediction value.

상기 제1 신경망 모듈은, 실수형의 상기 제1 타입의 매개변수를 정수형의 상기 제2 타입으로 양자화하는 제1-1 양자화부:를 포함할 수 있다.The first neural network module may include a 1-1 quantization unit for quantizing the first type parameter of real number type into the second type of integer type parameter.

상기 제1-1 양자화부는, 상기 제1 타입의 매개변수에 대한 타겟 구간의 중심과 폭에 대응되어 상기 제1 타입의 매개변수를 상기 제2 타입으로 양자화할 수 있다. The 1-1 quantization unit may quantize the parameter of the first type into the parameter of the second type in correspondence with the center and width of the target interval for the parameter of the first type.

상기 제1 신경망 모듈은, 상기 제1 타입의 활성화 값을 상기 제2 타입으로 양자화하는 제1-2 양자화부;를 더 포함할 수 있다. The first neural network module may further include a first-second quantization unit that quantizes the activation value of the first type into the second type.

상기 제2 신경망 모듈은, 상기 제1 타입의 상기 제2 매개변수의 일부에 대응되는 채널(channel)을 프루닝(pruning)하는 채널 프루닝부;를 포함할 수 있다. The second neural network module may include a channel pruning unit that prunes a channel corresponding to a part of the second parameter of the first type.

상기 제1 신경망 모듈은, 실수형의 상기 제1 타입의 제1 매개변수를 정수형의 상기 제2 타입으로 양자화하는 제1-1 양자화부:를 포함하고, 상기 제2 신경망 모듈은, 상기 제1 타입의 상기 제2 매개변수의 일부에 대응되는 채널을 프루닝하는 채널 프루닝부;를 포함할 수 있다. The first neural network module includes a 1-1 quantization unit for quantizing the first parameter of the first type of real number type into the second type of integer type, and the second neural network module includes: A channel pruning unit for pruning a channel corresponding to a part of the second parameter of the type; may include.

상기 최적화 모듈은, 상기 제1 학습 결과, 상기 제2 학습 결과 및 상기 예측값의 손실을 조정하는 손실 조정부; 및 상기 조정된 손실에 근거하여 역전파(backpropagation)를 수행하는 역전파 수행부;를 포함할 수 있다. The optimization module may include: a loss adjustment unit configured to adjust losses of the first learning result, the second learning result, and the predicted value; and a backpropagation performer performing backpropagation based on the adjusted loss.

상기 제1 신경망 모듈 및 상기 제2 신경망 모듈을 사전 학습하여 테스트 영상에 대한 사전 학습을 수행하여 상기 입력 영상에 대한 매개변수의 초기값을 설정하는 사전 학습 모듈;이 더 포함될 수 있다.A pre-learning module for pre-learning the first neural network module and the second neural network module to perform pre-learning on a test image to set initial values of parameters for the input image; may be further included.

스트리밍 영상(streaming video)인 상기 입력 영상에 대해 상기 제2 신경망 모듈의 실시간 학습을 제어하는 온라인 추적 모듈;이 더 포함될 수 있다.An online tracking module for controlling real-time learning of the second neural network module for the input video, which is a streaming video, may be further included.

상기 예측 모듈은, 사전 학습되는 상기 제1 신경망 모듈의 상기 제1 학습 결과와 온라인 학습되는 상기 제2 신경망 모듈의 상기 제2 학습 결과를 합산하여 상기 합산 출력으로 생성하는 학습 결과 합산부;를 포함할 수 있다.The prediction module includes a learning result summing unit for summing the first learning result of the first neural network module that is pre-learned and the second learning result of the second neural network module that is online learning and generating the sum output. can do.

제1 기준에 근거하여 상기 제1 신경망 모듈 및 상기 제2 신경망 모듈의 조합을 달리 설정하는 신경망 설계 모듈;이 더 포함될 수 있다. A neural network design module that differently sets a combination of the first neural network module and the second neural network module based on a first criterion; may be further included.

상기 제1 학습 결과는 정수형으로 표현되고, 상기 제2 학습 결과는 실수형으로 표현되며, 상기 합산 결과는 정수형 및 실수형이 혼합되어 표현될 수 있다. The first learning result is expressed in an integer type, the second learning result is expressed in a real number type, and the summation result may be expressed in a mixture of integer and real number types.

상기 기술적 과제를 해결하기 위한 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템은, 이종의 적어도 둘 이상의 신경망 모듈; 상기 적어도 둘 이상의 신경망 모듈의 학습 결과가 합산되는 합산 결과로부터 예측값을 생성하는 예측 모듈; 및 상기 예측값에 근거하여 상기 적어도 둘 이상의 신경망의 매개변수를 업데이트하는 최적화 모듈;을 포함하고, 상기 합산 결과는, 이종의 형식으로 표현되는 적어도 둘 이상의 요소를 포함할 수 있다. A hybrid neural network-based object tracking learning system according to an embodiment of the present invention for solving the above technical problem includes at least two or more heterogeneous neural network modules; a prediction module generating a predicted value from a summation result obtained by summing the learning results of the at least two or more neural network modules; and an optimization module that updates the parameters of the at least two neural networks based on the prediction value, and the summation result may include at least two or more elements expressed in heterogeneous formats.

상기 적어도 둘 이상의 신경망 모듈은, 상기 입력 영상에 대한 매개변수를 정수형으로 양자화하는 제1 신경망 모듈; 및 상기 입력 영상에 대한 매개변수의 일부에 대응되는 채널을 프루닝하는 채널 프루닝부;를 포함할 수 있다.The at least two or more neural network modules may include: a first neural network module that quantizes parameters of the input image into integers; and a channel pruning unit configured to prun channels corresponding to some of the parameters of the input image.

상기 적어도 둘 이상의 신경망 모듈은, 사전 학습되는 제1 신경망 모듈; 및 사전 학습 및 실시간 학습되는 제2 신경망 모듈;을 포함할 수 있다. The at least two or more neural network modules may include a first neural network module pretrained; and a second neural network module that is trained in advance and in real time.

상기 적어도 둘 이상의 신경망 모듈은, 기 입력 영상의 주요 정보에 대해 거친 스케일(coarse scale)의 제1 학습 결과를 출력하는 제1 신경망 모듈; 및 상기 입력 영상의 보충 정보에 대해 세밀한 스케일(Fine scale)의 제2 학습 결과를 출력하는 제2 신경망 모듈;을 포함할 수 있다.The at least two or more neural network modules may include: a first neural network module outputting a first learning result of a coarse scale with respect to main information of a previously input image; and a second neural network module outputting a second learning result of a fine scale with respect to the supplementary information of the input image.

상기 기술적 과제를 해결하기 위한 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법은, 하이브리드 신경망 기반 객체 추적 학습 방법으로, 입력 영상에 대한 제1 매개변수를 제1 타입에서 제2 타입으로 표현하여 제1 학습 결과로 출력하는 단계; 상기 입력 영상에 대한 제2 매개변수 중 일부의 연결을 제거하여 제2 학습 결과로 출력하는 단계; 상기 제1 학습 결과 및 상기 제2 학습 결과가 합산되는 합산 결과에 근거하여 예측값을 생성하는 단계; 및 상기 예측값에 근거하여 상기 제1 매개변수 및 상기 제2 매개변수를 업데이트하는 단계;를 포함한다.A hybrid neural network-based object tracking learning method according to an embodiment of the present invention for solving the above technical problem is a hybrid neural network based object tracking learning method, in which a first parameter for an input image is expressed from a first type to a second type. and outputting it as a first learning result; removing a connection of some of the second parameters of the input image and outputting the result as a second learning result; generating a predicted value based on a summation result in which the first learning result and the second learning result are summed; and updating the first parameter and the second parameter based on the prediction value.

상기 제1 학습 결과로 출력하는 단계는, 실수형의 상기 제1 타입의 매개변수를 정수형의 상기 제2 타입으로 양자화하는 단계:를 포함하고, 상기 제2 학습 결과로 출력하는 단계는, 상기 제2 매개변수의 일부에 대응되는 채널을 프루닝하는 단계;를 포함할 수 있다.The outputting as the first learning result includes quantizing the first type parameter of real number type into the second type of integer type, and outputting the second learning result includes: 2 pruning a channel corresponding to a part of the parameter; may include.

상기 예측값을 생성하는 단계는, 사전 학습되는 상기 제1 신경망 모듈의 상기 제1 학습 결과와 온라인 학습되는 상기 제2 신경망 모듈의 상기 제2 학습 결과를 상기 합산 결과로 합산하는 단계;를 포함할 수 있다. The generating of the predicted value may include summing the first learning result of the first neural network module that is pretrained and the second learning result of the second neural network module that is online learning as the summation result. there is.

상기 제1 학습 결과는 정수형으로 표현되고, 상기 제2 학습 결과는 실수형으로 표현되며, 제1 학습 결과 및 상기 제2 학습 결과를 합산한 합산 결과는 정수형 및 실수형이 혼합되어 표현될 수 있다.The first learning result is expressed as an integer type, the second learning result is expressed as a real number type, and the result of summing the first learning result and the second learning result may be expressed as a mixture of integer type and real number type. .

본 발명의 하이브리드 신경망 기반 객체 추적 학습 방법 및 시스템에 의하면, 영상 프레임에 대해 별개의 신경망을 병렬적으로 활용하여, 정확도를 유지하면서도 적은 리소스로 빠르게 연산 처리할 수 있다. According to the hybrid neural network-based object tracking learning method and system of the present invention, a separate neural network is used in parallel for an image frame, so that calculation processing can be performed quickly with a small amount of resources while maintaining accuracy.

도 1은 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이다.
도 2는 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법을 나타내는 도면이다.
도 3은 본 발명의 실시예에 따른 제1 신경망 모듈을 나타내는 도면이다.
도 4는 본 발명의 실시예에 따른 제1-1 양자화부의 동작을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 제1 신경망 모듈을 나타내는 도면이다.
도 6은 본 발명의 실시예에 따른 제1-2 양자화부의 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 제2 신경망 모듈을 나타내는 도면이다.
도 8은 본 발명의 실시예에 따른 예측 모듈을 나타내는 도면이다.
도 9 및 도 10은 각각 본 발명의 실시예에 따른 최적화 모듈을 나타내는 도면이다.
도 11 및 도 12는 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이다.
도 13 및 도 14는 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법을 나타내는 도면이다.
도 15는 본 발명의 실시예에 따른 객체 추적 학습 시스템의 성능 분석을 나타내는 표이다.
도 16 내지 도 18은 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이다. 1 is a diagram showing a hybrid neural network-based object tracking learning system according to an embodiment of the present invention.
2 is a diagram illustrating a hybrid neural network-based object tracking learning method according to an embodiment of the present invention.
3 is a diagram showing a first neural network module according to an embodiment of the present invention.
4 is a diagram for explaining the operation of a 1-1 quantization unit according to an embodiment of the present invention.
5 is a diagram showing a first neural network module according to an embodiment of the present invention.
6 is a diagram for explaining the operation of a 1-2 quantization unit according to an embodiment of the present invention.
7 is a diagram showing a second neural network module according to an embodiment of the present invention.
8 is a diagram showing a prediction module according to an embodiment of the present invention.
9 and 10 are diagrams each illustrating an optimization module according to an embodiment of the present invention.
11 and 12 are diagrams illustrating an object tracking learning system based on a hybrid neural network according to an embodiment of the present invention, respectively.
13 and 14 are diagrams illustrating a hybrid neural network-based object tracking learning method according to an embodiment of the present invention, respectively.
15 is a table showing performance analysis of an object tracking learning system according to an embodiment of the present invention.
16 to 18 are diagrams illustrating an object tracking learning system based on a hybrid neural network according to an embodiment of the present invention, respectively.

이하에서,　본 발명의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있을 정도로,　본 발명의 실시 예들이 명확하고 상세하게 기재될 것이다.In the following, the embodiments of the present invention will be described clearly and in detail to the extent that a person skilled in the art can easily practice the present invention.

도 1은 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이고, 도 2는 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법을 나타내는 도면이다. 1 is a diagram showing a hybrid neural network based object tracking learning system according to an embodiment of the present invention, and FIG. 2 is a diagram showing a hybrid neural network based object tracking learning method according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템(100)은 제1 신경망 모듈(120), 제2 신경망 모듈(140), 예측 모듈(160) 및 최적화 모듈(180)을 포함하여, 입력 영상(IVD)의 객체에 대한 추적 학습을 정확하고 빠르게 수행할 수 있다. 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법(200)은 제1 학습 결과를 출력하는 단계(S220), 제2 학습 결과를 출력하는 단계(S240), 예측값을 생성하는 단계(S260) 및 매개변수를 업데이트하는 단계(S280)를 포함하여, 입력 영상(IVD)에 대한 객체 추적 학습을 정확하고 빠르게 수행할 수 있다. 1 and 2, a hybrid neural network-based object tracking learning system 100 according to an embodiment of the present invention includes a first neural network module 120, a second neural network module 140, a prediction module 160 and optimization Including the module 180, it is possible to accurately and quickly perform tracking learning on the object of the input image IVD. The hybrid neural network-based object tracking learning method 200 according to an embodiment of the present invention includes outputting a first learning result (S220), outputting a second learning result (S240), and generating a predicted value (S260). and updating parameters (S280), so that object tracking learning for the input image IVD can be accurately and quickly performed.

본 발명의 실시예에 따른 객체 추적 학습 시스템(100)은 본 발명의 실시예에 따른 객체 추적 학습 방법(200)으로 동작할 수 있다. 또한, 객체 추적 학습 방법(200)은 객체 추적 학습 시스템(100)에서 실행될 수 있다. 그러나, 이에 한정되는 것은 아니다. 객체 추적 학습 시스템(100)은 객체 추적 학습 방법(200)과 다른 방법으로 동작할 수 있다. 또한, 객체 추적 학습 방법(200)은 객체 추적 학습 시스템(100)과 다른 시스템에서 실행될 수 있다. 다만, 이하에서는 설명의 편의를 위해, 객체 추적 학습 시스템(100)이 객체 추적 학습 방법(200)으로 동작하고, 객체 추적 학습 방법(200)이 객체 추적 학습 시스템(100)에서 실행되는 예에 한하여 기술한다.The object tracking learning system 100 according to an embodiment of the present invention may operate as the object tracking learning method 200 according to an embodiment of the present invention. Also, the object tracking learning method 200 may be executed in the object tracking learning system 100 . However, it is not limited thereto. The object tracking learning system 100 may operate in a method different from the object tracking learning method 200 . Also, the object tracking learning method 200 may be executed in a system different from the object tracking learning system 100 . However, in the following description, for convenience of explanation, the object tracking learning system 100 operates as the object tracking learning method 200 and the object tracking learning method 200 is executed in the object tracking learning system 100 only. describe

계속해서 도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 제1 신경망 모듈(120)은 입력 영상(IVD)에 대한 제1 매개변수(PAR1)를 제1 타입에서 제2 타입으로 표현하고 학습하여 제1 학습 결과(LR1)를 출력한다(S220).1 and 2, the first neural network module 120 according to the embodiment of the present invention expresses the first parameter PAR1 of the input image IVD from the first type to the second type. and learning to output a first learning result LR1 (S220).

제1 신경망 모듈(120)은 제1 신경망을 포함할 수 있다. 예를 들어, 제1 신경망 모듈(120)은 입력 영상(IVD)으로부터 유용한 특징을 추출하여 입력 영상(IVD)을 클래스(class) 등으로 분류하고 제1 경량화 알고리즘이 적용되는 제1 신경망을 포함할 수 있다. 제1 신경망은 다수의 레이어(layer)로 구성되고, 각 레이어에서는 입력 영상(IVD) 또는 이전 레이어의 출력(예를 들어, 특징 맵(feature map))에 대해 대응되는 필터(filter) 또는 커널(kernel)과 합성곱 연산이 수행될 수 있다. 제1 신경망은 입력층(input layer)과 출력층(output layer) 사이에 다수의 은닉층(hidden layer)으로 이루어지는 심층신경망으로 구현될 수 있다. The first neural network module 120 may include a first neural network. For example, the first neural network module 120 may include a first neural network that extracts useful features from the input image IVD, classifies the input image IVD into classes, and applies a first lightweight algorithm. can The first neural network is composed of a plurality of layers, and in each layer, a filter or kernel (filter or kernel) corresponding to the input image (IVD) or the output (eg, feature map) of the previous layer kernel) and convolution operation can be performed. The first neural network may be implemented as a deep neural network including a plurality of hidden layers between an input layer and an output layer.

예를 들어, 제1 신경망은 입력 영상의 임의의 프레임 A₀에 대해, 특징 추출기(feature extracor)

(·;·)를 학습하여 다음의 수학식 (1)과 같은 특징 맵 Aconv를 학습 결과로 출력할 수 있다. For example, the first neural network uses a feature extractor for a random frame A ₀ of an input image.

(·;·) may be learned and a feature map Aconv such as Equation (1) below may be output as a learning result.

(1)

(One)

이때,

는

레이어까지의 전체 합성곱 필터에 대한 집합 {

, …,

}을 의미한다. At this time,

Is

set for all convolutional filters up to layer {

, … ,

} means.

제1 신경망 모듈(120)은 제1 경량화 알고리즘을 적용하여 입력 영상(IVD)에 대한 제1 매개변수(PAR1)의 정밀도를 낮춰 심층신경망의 각 레이어에서의 연산량을 줄일 수 있다. 제1 매개변수(PAR1)는 각 레이어의 대응되는 노드(node) 또는 유닛(unit)에 달리 적용되는 합성곱 필터, 즉 가중치(weight)일 수 있다. The first neural network module 120 may reduce the amount of computation in each layer of the deep neural network by lowering the precision of the first parameter PAR1 for the input image IVD by applying the first lightweight algorithm. The first parameter PAR1 may be a convolution filter that is differently applied to a node or unit corresponding to each layer, that is, a weight.

이때. 임의의 제i 레이어의 출력, 즉 특징 맵 A_i는 다음의 수학식 (2)와 같이 생성될 수 있다. At this time. An output of an arbitrary i-th layer, that is, a feature map A _i , may be generated as shown in Equation (2) below.

(2)

수학식 (2)에서 함수

는 비선형 활성화 함수를 나타내고, 연산자 *는 합성곱 연산을 의미한다. 즉, 제i 레이어의 출력

는 제i 레이어의 이전 레이어인 제i-1 레이어의 출력

과 제i 레이어의 가중치

와의 합성곱 연산한 결과에 활성화 함수(activation function)를 적용한 것으로, 다음 레이어인 제i+1 레이어로 전달될 수 있다. 예를 들어, 함수

는 시그모이드 함수(sigmoid function) 또는 ReLU 함수(Rectified Linear Unit function) 중 하나로 구현될 수 있다. 다만, 이에 한정되는 것은 아니고 다른 활성화 함수로 구현될 수 있다.function in Equation (2)

denotes a nonlinear activation function, and operator * denotes a convolution operation. That is, the output of the i-th layer

Is the output of the i-1th layer, which is the previous layer of the ith layer.

and the weight of the ith layer

An activation function is applied to the result of the convolution operation with , and can be passed to the i+1th layer, which is the next layer. For example, the function

may be implemented as either a sigmoid function or a Rectified Linear Unit function (ReLU). However, it is not limited thereto and may be implemented with other activation functions.

따라서, 제1 신경망 모듈(120)의 제1 학습 결과(LR1), 즉 제1 신경망의 출력층의 출력

는 다음의 수학식 (3)과 같을 수 있다. Therefore, the first learning result LR1 of the first neural network module 120, that is, the output of the output layer of the first neural network

may be equal to the following Equation (3).

(3)

제1 학습 결과(LR1)는 학습 과정에서의 연산량을 줄이기 위해 제1 경량화 알고리즘

가 적용된 결과이다. 이에 대하여 더 자세히 설명한다. 참고로, 매개변수, 가중치, 커널, 필터는 같은 의미로 사용되거나 혼용하여 사용될 수 있다. 각 레이어의 출력, 특성 맵, 활성화 맵(activation map)에 대하여도 마찬가지이다. The first learning result LR1 is a first lightweight algorithm to reduce the amount of calculation in the learning process.

is the applied result. This will be explained in more detail. For reference, parameters, weights, kernels, and filters may be used interchangeably or interchangeably. The same applies to the output, feature map, and activation map of each layer.

도 3은 본 발명의 실시예에 따른 제1 신경망 모듈을 나타내는 도면이고, 도 4는 본 발명의 실시예에 따른 제1-1 양자화부의 동작을 설명하기 위한 도면이다. 3 is a diagram showing a first neural network module according to an embodiment of the present invention, and FIG. 4 is a diagram for explaining the operation of a 1-1 quantization unit according to an embodiment of the present invention.

도 1, 도 3 및 도 4를 참조하면, 본 발명의 실시예에 따른 제1 신경망 모듈(120)은 전술한 바와 같이, 제1 경량화 알고리즘이 적용될 수 있다. 제1 경량화 알고리즘은 예를 들어, 제1 경량화 알고리즘은 후술되는 바와 같이 양자화에 대한 것으로, 특히 낮은 비트 양자화(Low Bit Quantization) 알고리즘에 관한 것일 수 있다.Referring to FIGS. 1, 3, and 4 , the first neural network module 120 according to an embodiment of the present invention may be applied with the first weight reduction algorithm as described above. For example, the first lightweight algorithm may relate to quantization as will be described later, and in particular to a low bit quantization algorithm.

이를 위해 본 발명의 실시예에 따른 제1 신경망 모듈(120)은 제1-1 양자화부(122)를 포함할 수 있다. 즉, 제1 신경망 모듈(120)은 연산량을 낮추기 위해, 제1 타입의 제1-1 매개변수(PAR1-1)를 제1 타입보다 연산량이 적은 제2 타입의 제1-1 매개변수(PAR1-1)로 양자화하는 제1-1 양자화부(122)를 포함할 수 있다. To this end, the first neural network module 120 according to an embodiment of the present invention may include a 1-1 quantization unit 122 . That is, the first neural network module 120 converts the 1-1 parameter PAR1-1 of the first type to the 1-1 parameter PAR1 of the second type with a smaller amount of calculation than the 1-1 parameter PAR1 of the first type in order to reduce the amount of calculation. -1) may include a 1-1st quantization unit 122 for quantization.

제1-1 매개변수(PAR1-1)는 제1 매개변수(PAR1) 중 하나의 유형일 수 있다. 예를 들어, 제1-1 매개변수(PAR1-1)는 제1 신경망의 가중치에 대한 것일 수 있다. The 1-1 parameter PAR1-1 may be one of the first parameters PAR1. For example, the 1-1 parameter PAR1-1 may be a weight of the first neural network.

이때, 제1 타입은 실수형이고 제2 타입은 정수형일 수 있다. 예를 들어, 제1 타입은 32비트 부동소수점으로 표현되고, 제2 타입은 4비트 정수로 표현될 수 있다. 도 3은 IEEE 754 표준에 따른 32비트 부동소수점 포맷을 도시하고 있으나, 이에 한정되는 것은 아니다. In this case, the first type may be a real number type and the second type may be an integer type. For example, the first type may be expressed as a 32-bit floating point, and the second type may be expressed as a 4-bit integer. 3 shows a 32-bit floating point format according to the IEEE 754 standard, but is not limited thereto.

이때, 제1 신경망 모듈(120)의 제1-1 양자화부(122)는 실수형의 제1 타입의 제1-1 매개변수(PAR1-1)를 정수형의 제2 타입의 제1-1 매개변수(PAR1-1)로 양자화함에 있어, 제1 타입의 제1-1 매개변수(PAR1-1)에 대한 타겟 구간(

)의 중심(

)과 폭(

)에 대응되도록 할 수 있다. At this time, the 1-1 quantization unit 122 of the first neural network module 120 converts the 1-1 parameter PAR1-1 of the first type of real number type to the 1-1 parameter of the second type of integer type. In quantization with the variable PAR1-1, the target interval for the first type 1-1 parameter PAR1-1 (

) center of (

) and width (

) can be matched.

예를 들어, 제1 타입의 제1-1 매개변수(PAR1-1)가 제1 신경망의 제l 레이어에 대한 가중치(

)인 경우, 제1-1 양자화부(122)는 가중치 양자화 함수

를 통해 제l 레이어에 대한 가중치(

) 중 임의의 가중치

을 입력으로 하여 양자화 연산하고 그 결과를

로 출력할 수 있다. 예를 들어, 제l 레이어에 대한 가중치(

)는

행렬의 합성곱 필터이고, 임의의 가중치

는 가중치(

)의 행렬의 임의의 요소(element)일 수 있다. For example, the first type 1-1 parameter PAR1-1 is a weight for the 1st layer of the 1st neural network (

), the 1-1st quantization unit 122 performs a weighted quantization function

The weight for the lth layer through

) of any weight

quantization operation with as input, and the result

can be output as For example, the weight for the first layer (

)Is

is a convolutional filter of matrices, with arbitrary weights

is the weight (

) can be any element of the matrix of

제1-1 양자화부(122)에서 가중치 양자화 함수

는 다음의 두 단계의 동작을 수행하여 가중치(

)에 대한 출력

을 생성할 수 있다. The weight quantization function in the 1-1 quantization unit 122

performs the following two-step operation to weight (

) output for

can create

먼저, 제l 레이어의 임의의 가중치

은 학습 가능한 타겟 구간(

)의 중심(

)과 폭(

)에 의해 구간

내의 값으로 선형 변환될 수 있다. 즉, 임의의 가중치

은 제1 양자화 구간

내의 값만 양자화되고 나머지는 "

" 또는 "0"으로 고정될 수 있다. 이를 수학식으로 나타내면 다음의 수학식 (4)와 같다.First, random weights of the 1st layer

is the learnable target interval (

) center of (

) and width (

) interval by

It can be linearly transformed to a value in That is, any weight

is the first quantization interval

Only the values within are quantized, the rest are "

It can be fixed as " or "0". Expressing this as an equation, the following equation (4) is obtained.

(4)

수학식 (4)에서 함수

은 입력

의 부호를 의미하고

및

는 각각 "

"및 "

"를 의미한다. function in Equation (4)

is the input

means the sign of

and

are respectively "

"and "

" means

다음으로, 상기 수학식 (4)에 의해 양자화된 제l 레이어에 대한 가중치(

)는 다음의 수학식 (5)에 의해

로 정규화될 수 있다. Next, the weight for the first layer quantized by Equation (4) above (

) is obtained by the following equation (5)

can be normalized to

)) (5)

수학식 (5)에서 함수

는 행렬 요소별 라운딩(element-wise rounding), 즉 행렬 요소별 소수점 이하의 값을 올림 또는 버림 하는 것을 의미하고,

는 양자화 레벨에 따른 비트 폭(비트 수)를 의미한다. 예를 들어, 4 비트로 양자화하고자 하는 경우,

는 4일 수 있다.The function in Equation (5)

means matrix element-wise rounding, that is, rounding up or truncating values below the decimal point for each matrix element,

denotes a bit width (number of bits) according to a quantization level. For example, if you want to quantize to 4 bits,

may be 4.

제1-1 매개변수(PAR1-1)에 대한 타겟 구간(

)은 본 발명의 실시예에 따른 객체 추적 학습 방법 및 시스템에 요구되는 정확도 또는 연산량에 근거하여 설정될 수 있다.The target interval for the 1-1 parameter (PAR1-1) (

) may be set based on accuracy or amount of computation required for the object tracking learning method and system according to an embodiment of the present invention.

본 발명의 실시예에 따른 제1 신경망 모듈(120)은 제1-1 양자화부(122)에 의해 양자화된 가중치

에 대응되는 임의의 레이어의 출력

을 생성하는 제1 결과 출력부(124)를 더 포함할 수 있다. 예를 들어, 제1 결과 출력부(124)는 제l 레이어에 대한 상기 수학식 (5)의 가중치

를 상기 수학식 (2)에 대입하여, 제l 레이어에 대한 결과

를 출력할 수 있다. 이때, 제1 결과 출력부(124)는 제1 신경망의 최종 레이어, 즉 출력층의 출력

은 제1 신경망 모듈(120)의 제1 학습 결과(LR1)로 출력할 수 있다. The first neural network module 120 according to an embodiment of the present invention is a weight quantized by the 1-1 quantization unit 122

Output of an arbitrary layer corresponding to

It may further include a first result output unit 124 that generates. For example, the first result output unit 124 calculates the weight of Equation (5) for the first layer.

By substituting into Equation (2) above, the result for the lth layer

can output At this time, the first result output unit 124 outputs the final layer of the first neural network, that is, the output layer.

may be output as the first learning result LR1 of the first neural network module 120 .

본 발명의 실시예에 따른 제1-1 양자화부(122) 및 제1 결과 출력부(124)의 각 레이어에 대한 출력(

,

)은 제1 신경망 모듈(120)의 내부 또는 외부의 저장 수단(미도시)에 저장된 후 다음 레이어에 대한 출력을 생성하는 때에 참조될 수 있다. 예를 들어, 제1-1 양자화부(122)는 상기 수학식 (3)의 각 레이어에 대해 양자화된 가중치

의 집합

를 생성할 수 있다. 이 경우, 제1 결과 출력부(124)는 제1 학습 결과(LR1)를 양자화된 가중치 집합

에 근거하여 생성할 수 있다. Output for each layer of the 1-1 quantization unit 122 and the first result output unit 124 according to an embodiment of the present invention (

,

) may be stored in an internal or external storage means (not shown) of the first neural network module 120 and then referred to when generating an output for the next layer. For example, the 1-1st quantization unit 122 quantizes weights for each layer in Equation (3) above.

set of

can create In this case, the first result output unit 124 converts the first learning result LR1 into a quantized weight set.

can be created based on

도 5는 본 발명의 실시예에 따른 제1 신경망 모듈을 나타내는 도면이고, 도 6은 본 발명의 실시예에 따른 제1-2 양자화부의 동작을 설명하기 위한 도면이다. 5 is a diagram showing a first neural network module according to an embodiment of the present invention, and FIG. 6 is a diagram for explaining the operation of a 1-2 quantization unit according to an embodiment of the present invention.

도 1, 도 5 및 도 6을 참조하면, 본 발명의 실시예에 따른 제1 신경망 모듈(120)은 제1-1 양자화부(122), 제1-2 양자화부(126) 및 제1 결과 출력부(124)를 포함할 수 있다. 제1-1 양자화부(122)는 도 3 및 도 4에서 설명한 바와 동일하게 동작할 수 있다.1, 5 and 6, the first neural network module 120 according to an embodiment of the present invention includes a 1-1 quantization unit 122, a 1-2 quantization unit 126 and a first result. An output unit 124 may be included. The 1-1 quantization unit 122 may operate in the same manner as described in FIGS. 3 and 4 .

제1-2 양자화부(126)는 본 발명의 실시예에 따른 제1 신경망 모듈(120)이 제1-1 매개변수(PAR1-1)와 함께, 제1-2 매개변수(PAR1-2)에 대한 양자화를 더 수행하기 의해 구비될 수 있다. 제1-2 매개변수(PAR1-2)는, 예를 들어, 제1 신경망의 각 레이어의 활성화 맵(activation map) 또는 활성화 값(activation)일 수 있다. The 1-2 quantization unit 126 includes the 1-2 parameter PAR1-2 together with the 1-1 parameter PAR1-1 of the first neural network module 120 according to the embodiment of the present invention. It may be provided by further performing quantization on . The first-second parameter PAR1-2 may be, for example, an activation map or an activation value of each layer of the first neural network.

제1-2 양자화부(126)는 예를 들어, 제1-2 매개변수(PAR1-2)에 대한 연산량을 줄이기 위해 양자화를 수행할 수 있다. 제1-2 양자화부(126)는 제1 타입의 제1-2 매개변수(PAR1-2)를 제2 타입으로 양자화할 수 있다. 전술된 바와 같이, 제1 타입은 실수형이고, 제2 타입은 정수형일 수 있다. For example, the 1-2 quantization unit 126 may perform quantization to reduce the amount of calculation for the 1-2 parameter PAR1-2. The 1-2 quantization unit 126 may quantize the 1-2 parameters PAR1-2 of the first type into the second type. As described above, the first type may be a real number type, and the second type may be an integer type.

제1 신경망 모듈(120)의 제1-2 양자화부(126)는 실수형의 제1 타입의 제1-2 매개변수(PAR1-2)를 정수형의 제2 타입의 제1-2 매개변수(PAR1-2)로 양자화함에 있어, 제1 타입의 제1-2 매개변수(PAR1-2)에 대한 타겟 구간(

)의 중심(

)과 폭(

)에 대응되도록 할 수 있다. The 1-2 quantization unit 126 of the first neural network module 120 converts the 1-2 parameter PAR1-2 of the first type of real number to the 1-2 parameter of the second type of integer ( In quantization with PAR1-2), the target interval for the first type 1-2 parameter (PAR1-2) (

) center of (

) and width (

) can be matched.

예를 들어, 제1-2 양자화부(126)는 활성화 양자화 함수

를 통해 제l 레이어에 대한 활성화 맵(

) 중 임의의 활성화 값

을 입력으로 하여 양자화 연산하고 그 결과를

로 출력할 수 있다. 예를 들어, 제l 레이어에 대한 활성화 맵(

)은

행렬로 표현되고, 임의의 활성화 값

은 활성화 맵(

)의 행렬의 임의의 요소일 수 있다. For example, the 1-2 quantization unit 126 is an activation quantization function.

Activation map for the first layer via

) of any activation value

quantization operation with as input, and the result

can be output as For example, the activation map for the first layer (

)silver

Represented as a matrix, with arbitrary activation values

is the activation map (

) can be any element of the matrix of

제1-2 양자화부(126)에서 활성화 양자화 함수

는 다음의 두 단계의 동작을 수행하여 활성화 맵(

)에 대한 출력

을 생성할 수 있다. The activation quantization function in the 1-2 quantization unit 126

performs the following two-step operation to map the activation (

) output for

can create

먼저, 제l 레이어의 임의의 활성화 값

은 학습 가능한 타겟 구간(

)의 중심(

)과 폭(

)에 의해 일정 구간 내의 값으로 선형 변환될 수 있다. 예를 들어, 활성화 함수가 ReLU 함수인 경우, 제l 레이어의 임의의 활성화 값

은 구간

내의 값으로 선형 변환될 수 있다. 즉, 임의의 활성화 값

은 제2 양자화 구간

내의 값만 양자화되고 나머지는 "1" 또는 "0"으로 고정될 수 있다. 이를 수학식으로 나타내면 다음의 수학식 (6)와 같다. First, a random activation value of the 1st layer

is the learnable target interval (

) center of (

) and width (

), it can be linearly converted to a value within a certain interval. For example, if the activation function is the ReLU function, any activation value of the first layer

silver section

It can be linearly transformed to a value in i.e. any activation value

is the second quantization interval

Only the values within are quantized, and the rest can be fixed to "1" or "0". If this is expressed as an equation, it is the following equation (6).

(6)

수학식 (6)에서

및

는 각각 "

"및 "

"를 의미한다. In Equation (6)

and

are respectively "

"and "

" means

다음으로, 제1-2 매개변수(PAR1-2)는 다음의 수학식 (7)에 의해 정규화될 수 있다. Next, the first-second parameters PAR1-2 may be normalized by Equation (7) below.

(7)

수학식 (7)에서 함수

는

의 개별 요소의 출력, 즉 임의의 출력

각각에 대한 소수점 이하의 값을 올림 또는 버림 하는 것을 의미하고,

는 4일 수 있다.The function in Equation (7)

Is

The output of the individual elements of , i.e. any output

It means rounding up or truncating the value below the decimal point for each,

may be 4.

제1 타입의 제1-2 매개변수(PAR1-2)에 대한 타겟 구간(

)은 본 발명의 실시예에 따른 객체 추적 학습 방법 및 시스템에 요구되는 정확도 또는 연산량에 근거하여 설정될 수 있다. The target interval for the first type 1-2 parameter (PAR1-2) (

제1 결과 출력부(124)는 제1-1 양자화부(122)로부터 제1+1 레이어의 제1+1 매개변수

의 양자화 결과인

를 수신하고, 제1-2 양자화부(126)로부터 제1 레이어의 제1-2 매개변수

의 양자화 결과

를 수신하여, 대응되는 제1+1 레이어의 출력

로 출력할 수 있다. 예를 들어, 제1 결과 출력부(124)는 다음의 수학식 (8)과 같이, 제1 레이어의 제1-2 매개변수

의 양자화 결과

와 제1+1 레이어의 제1-1 매개변수

의 양자화 결과인

를 합성곱 연산한 결과에 대한 활성화 함수

를 적용하여 제1+1 레이어의 출력

을 생성할 수 있다. The first result output unit 124 outputs the 1+1 parameter of the 1+1 layer from the 1-1 quantization unit 122.

which is the result of the quantization of

is received, and the 1-2 parameters of the 1st layer are received from the 1-2 quantization unit 126.

quantization result of

is received, and the corresponding output of the 1st + 1st layer

can be output as For example, as shown in Equation (8) below, the first result output unit 124 outputs parameters 1-2 of the first layer.

quantization result of

And the 1-1 parameter of the 1+1 layer

which is the result of the quantization of

Activation function for the result of convolution operation

Output of the 1st + 1st layer by applying

can create

(8)

이때, 제1 결과 출력부(124)는 제1 신경망의 출력층의 출력

을 제1 신경망 모듈(120)의 제1 학습 결과(LR1)로 출력할 수 있다. At this time, the first result output unit 124 outputs the output layer of the first neural network.

다시 도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 제2 신경망 모듈(140)은 입력 영상(IVD)에 대한 제2 매개변수(PAR2) 중 일부의 연결을 제거하고 학습하여 제2 학습 결과(LR2)로 출력한다(S240). Referring back to FIGS. 1 and 2 , the second neural network module 140 according to an embodiment of the present invention removes the connection of some of the second parameters PAR2 with respect to the input image IVD and learns the second It is output as a learning result (LR2) (S240).

제2 신경망 모듈(140)은 제2 신경망을 포함할 수 있다. 예를 들어, 제2 신경망 모듈(140)은 입력 영상(IVD)으로부터 유용한 특징을 추출하여 입력 영상(IVD)을 클래스 등으로 분류하고 제2 경량화 알고리즘이 적용되는 제2 신경망을 포함할 수 있다.The second neural network module 140 may include a second neural network. For example, the second neural network module 140 may include a second neural network that extracts useful features from the input image IVD, classifies the input image IVD into classes, and applies a second weight reduction algorithm.

제2 신경망은 경량화 알고리즘을 제외하고는 제1 신경망과 동일하거나 유사한 구성으로 구비될 수 있다. 즉, 제2 신경망은 다수의 레이어로 구성되고, 각 레이어에서는 입력 영상(IVD) 또는 이전 레이어의 출력(예를 들어, 특징 맵)에 대해 대응되는 필터 또는 커널과 합성곱 연산이 수행될 수 있다. 또한, 제2 신경망은 입력층과 출력층 사이에 다수의 은닉층으로 이루어지는 심층신경망으로 구현될 수 있다. The second neural network may have the same or similar configuration as the first neural network except for a lightweight algorithm. That is, the second neural network is composed of a plurality of layers, and in each layer, a convolution operation with a filter or kernel corresponding to the input image (IVD) or the output (eg, feature map) of the previous layer may be performed. . Also, the second neural network may be implemented as a deep neural network including a plurality of hidden layers between an input layer and an output layer.

제2 신경망 모듈(140)은 제2 경량화 알고리즘을 적용하여 입력 영상(IVD)에 대한 제2 매개변수(PAR2) 중 일부 매개변수의 연결을 제거하여 심층신경망의 각 레이어에서의 연산량을 줄일 수 있다. The second neural network module 140 may reduce the amount of computation in each layer of the deep neural network by removing the connection of some parameters among the second parameters PAR2 for the input image IVD by applying the second lightweight algorithm. .

제2-1 매개변수(PAR2-1)는 제2 매개변수(PAR2) 중 하나의 유형일 수 있다. 예를 들어, 제2-1 매개변수(PAR2-1)는 제2 신경망의 가중치, 즉 제2 신경망의 각 레이어의 대응되는 노드 또는 유닛에 달리 적용되는 가중치일 수 있다. 제2-1 매개변수(PAR2-1)는 제1 신경망에 대한 제1 타입의 제1-1 매개변수(PAR1-1)와 같은 값을 가질 수 있다. The 2-1st parameter PAR2-1 may be of one type among the second parameters PAR2. For example, the 2-1 parameter PAR2-1 may be a weight of the second neural network, that is, a weight differently applied to a node or unit corresponding to each layer of the second neural network. The 2-1 parameter PAR2-1 may have the same value as the 1-1 parameter PAR1-1 of the first type for the first neural network.

따라서, 제2 신경망 모듈(140)의 제2 학습 결과(LR2), 예를 들어, 제2 신경망의 출력층의 출력

는 다음의 수학식 (9)와 같을 수 있다. Therefore, the second learning result LR2 of the second neural network module 140, for example, the output of the output layer of the second neural network

may be equal to the following Equation (9).

(9)

이러한 제2 신경망 모듈(140)의 제2 학습 결과(LR2)는 학습 과정에서의 연산량을 줄이기 위해 제2 경량화 알고리즘

가 적용된 결과이다.

는

레이어까지의 전체 합성곱 필터에 대한 가중치의 집합

을 의미한다. The second learning result LR2 of the second neural network module 140 is a second lightweight algorithm to reduce the amount of computation in the learning process.

is the applied result.

Is

The set of weights for the entire convolutional filter up to the layer

means

도 7은 본 발명의 실시예에 따른 제2 신경망 모듈을 나타내는 도면이다. 7 is a diagram showing a second neural network module according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 실시예에 따른 제2 신경망 모듈(140)은 전술한 바와 같이, 제2 경량화 알고리즘이 적용되는데, 예를 들어, 제2 경량화 알고리즘은 프루닝(pruning)에 대한 것으로, 특히 컴팩트(compact) 또는 소형의 완전정밀도(full-precision) 타입의 채널 프루닝(channel pruning) 알고리즘에 관한 것일 수 있다. 제2 신경망의 각 레이어는 노드 또는 유닛의 연결로 구성되는데, 각 레이어의 일부 채널을 마스킹(masking)하는 채널 프루닝 알고리즘을 통해 제2-1 매개변수(PAR2-1)의 연결이 일부 제거될 수 있다. 이를 위해 본 발명의 실시예에 따른 제2 신경망 모듈(140)은 채널 선택부(142)를 포함할 수 있다. Referring to FIG. 7 , as described above, the second neural network module 140 according to an embodiment of the present invention is applied with the second lightweight algorithm. For example, the second lightweight algorithm is for pruning. In particular, it may relate to a compact or small full-precision type channel pruning algorithm. Each layer of the second neural network is composed of connections of nodes or units. Some of the connections of the 2-1 parameter (PAR2-1) are removed through a channel pruning algorithm that masks some channels of each layer. can To this end, the second neural network module 140 according to an embodiment of the present invention may include a channel selector 142 .

채널 선택부(142)는 제2 신경망의 각 레이어의 채널 중 마스킹 하고자 하는 채널을 샘플링할 수 있다. 채널 선택부(142)는 각 레이어에 대한 채널 선택 확률 벡터의 집합

을 학습할 수 있다. 이때, B_l은 제l 레이어에서 샘플링된 채널을 의미한다. 채널 선택 확률 벡터는 제2 매개변수(PAR2) 중 제2-2 매개변수일 수 있다. The channel selector 142 may sample a channel to be masked from among channels of each layer of the second neural network. The channel selection unit 142 is a set of channel selection probability vectors for each layer.

can learn In this case, B _l means a channel sampled in the 1 th layer. The channel selection probability vector may be a 2-2 parameter among the second parameters PAR2.

채널 선택부(142)는 Gumbel-Softmax 기술을 적용하여 이산 채널 선택 마스크(discrete channel selection mask)

_l, 즉 제l 레이어에 대한 프루닝 마스크

을 다음의 수학식 (10)을 정규화하여 다음의 수학식 (11)과 같이 생성할 수 있다. The channel selector 142 applies the Gumbel-Softmax technique to generate a discrete channel selection mask

_l , the pruning mask for the lth layer

Can be generated as shown in Equation (11) by normalizing Equation (10) below.

(10)

(11)

이때,

와

는 Gumbel 분포의 랜덤 노이즈 샘플(random noise sample)을 나타내고,

는 온도를 나타낸다. 즉, 제l 레이어의 임의의 채널에 대한 채널 선택 확률 벡터가 임계값보다 작은 경우 해당 채널의 프루닝 마스크

은 "0"의 값을 가질 수 있다. At this time,

and

Represents a random noise sample of the Gumbel distribution,

represents the temperature. That is, if the channel selection probability vector for any channel of the first layer is smaller than the threshold value, the pruning mask of the corresponding channel

may have a value of "0".

본 발명의 실시예에 따른 제2 신경망 모듈(140)은 제l 레이어에 대한 가중치(

)를 수신하고 채널 선택부(142)로부터 제l 레이어에 대한 프루닝 마스크

를 수신하여 제l 레이어에 대한 출력(

)으로 생성하는 제2 결과 출력부(144)를 더 포함할 수 있다. 이때, 제l 레이어에 대한 가중치

는 제2 신경망 모듈(140)의 내부 또는 외부에서 생성되어 제2 결과 출력부(144)로 전달될 수 있다. 예를 들어, 제l 레이어에 대한 가중치

는 제1 신경망 모듈(120)에서 사용되는 제1 타입의 제1-1 매개변수(PAR1-1)와 동일할 수 있다. The second neural network module 140 according to an embodiment of the present invention is a weight for the first layer (

) is received and the pruning mask for the first layer is received from the channel selector 142.

Receive and output for the lth layer (

) may further include a second result output unit 144 that generates. At this time, the weight for the first layer

may be generated inside or outside the second neural network module 140 and transmitted to the second result output unit 144. For example, the weights for the lth layer

may be the same as the first type 1-1 parameter PAR1-1 used in the first neural network module 120.

제2 결과 출력부(144)는, 예를 들어, 제l 레이어에 대한 출력(

)을 다음의 수학식 (12)와 같이 생성할 수 있다. The second result output unit 144 outputs, for example, the first layer (

) can be generated as in Equation (12) below.

(12)

이때, 연산 ⊙은 채널별 곱셈(channel-wise multiplication)을 나타낸다. 제l 레이어의 출력(

)은 제l 레이어에 대한 가중치

및 제l-1 레이어의 출력

을 합성곱 연산한 결과에 대해 활성화 함수

를 적용한 값과 제l 레이어에 대한 프루닝 마스크

을 채널별 곱셈함으로써 생성될 수 있다. 따라서, 제l 레이어의 채널 중 프루닝 마스크

이 "0"인 채널은 제l 레이어에 대한 출력(

)에 영향을 미치지 아니할 수 있다. In this case, operation ⊙ represents channel-wise multiplication. The output of the first layer (

) is the weight for the lth layer

and the output of the l-1th layer.

Activation function for the result of convolution operation

and the pruning mask for the lth layer.

It can be generated by multiplying by channels. Therefore, the pruning mask among the channels of the 1st layer

The channel with this "0" is the output for the first layer (

) may not affect

제2 결과 출력부(144)는 제2 신경망의 최종 레이어, 즉 제2 신경망의 출력층의 출력을 제2 신경망 모듈(140)의 제2 학습 결과(LR2)로 출력할 수 있다. The second result output unit 144 may output the final layer of the second neural network, that is, the output of the output layer of the second neural network as the second learning result LR2 of the second neural network module 140 .

다시 도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 예측 모듈(160)은 제1 학습 결과(LR1) 및 제2 학습 결과(LR2)가 합산되는 합산 결과(SR)로부터 입력 영상(IVD)의 객체에 대한 예측값(PVL)을 생성한다(S260). Referring back to FIGS. 1 and 2 , the prediction module 160 according to an embodiment of the present invention obtains an input image (SR) obtained by summing the first learning result LR1 and the second learning result LR2. IVD) generates a predicted value (PVL) for the object (S260).

도 8은 본 발명의 실시예에 따른 예측 모듈을 나타내는 도면이다. 8 is a diagram showing a prediction module according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 실시예에 따른 예측 모듈(160)은 학습 결과 합산부(162) 및 예측값 생성부(164)를 포함할 수 있다. Referring to FIG. 8 , the prediction module 160 according to an embodiment of the present invention may include a learning result summation unit 162 and a prediction value generation unit 164.

학습 결과 합산부(162)는 제1 학습 결과(LR1) 및 제2 학습 결과(LR2)를 합산하여 합산 결과(SR)로 생성할 수 있다. 예를 들어, 제1 학습 결과(LR1)가 상기 수학식 (3)과 같고, 제2 학습 결과(LR2)가 상기 수학식 (9)와 같은 경우, 합산 결과(SR)는 다음의 수학식 (13)과 같을 수 있다. The learning result summing unit 162 may sum the first learning result LR1 and the second learning result LR2 to generate a summing result SR. For example, when the first learning result LR1 is equal to Equation (3) and the second learning result LR2 is equal to Equation (9), the summation result SR is obtained by the following Equation ( 13) can be the same.

(13)

전술된 예에 대하여, 제1 학습 결과(LR1)는 양자화 네트워크를 통하여, 제2 학습 결과(LR2)는 채널 프루닝 네트워크를 통하여 생성될 수 있고, 각 학습 결과(LR1, TR2)는 활성화 맵의 형태로 표현될 수 있다. 이때, 양자화된 제1 학습 결과(LR1)는 정수형의 요소로 표현되는 맵이고, 채널 프루닝된 제2 학습 결과(LR2)는 실수형의 요소로 표현되는 맵으로, 합산 결과(SR)인 활성화 맵에서는 정수형 및 실수형의 요소가 혼합되어 표현될 수 있다. Regarding the above example, the first learning result LR1 may be generated through a quantization network, and the second learning result LR2 may be generated through a channel pruning network, and each learning result LR1 or TR2 may be generated through a quantization network. can be expressed in the form In this case, the quantized first learning result LR1 is a map represented by integer elements, and the channel pruned second learning result LR2 is a map represented by real number elements, and the sum result SR is activation. In the map, elements of integer type and real number type can be mixed and expressed.

예측값 생성부(164)는 합산 결과(SR)로부터 추적하고자 하는 객체를 예측하여 예측값(PVL)을 생성할 수 있다. 예를 들어, 예측값(PVL)는 추적하고자 하는 객체의 위치에 대한 것일 수 있다. 예를 들어, 합산 결과(SR)로부터 추적하고자 하는 객체에 대한 샘플링된 후보군 윈도우(window) 또는 박스(box)의 정보(좌표, 폭, 깊이 등)에 대응되는 예측값(PVL)이 생성될 수 있다. The prediction value generation unit 164 may generate a prediction value PVL by predicting an object to be tracked from the summation result SR. For example, the predicted value (PVL) may be about the position of an object to be tracked. For example, a prediction value (PVL) corresponding to information (coordinates, width, depth, etc.) of a sampled candidate group window or box for an object to be tracked may be generated from the summation result (SR). .

예측값 생성부(164)는 추적하고자 하는 객체에 대한 연속된 값을 예측하는 회귀(regression) 모델 알고리즘 및 객체의 유형을 예측하는 분류(classification) 모델 알고리즘 중 적어도 하나를 적용하여 예측값(PVL)을 생성할 수 있다. The predicted value generation unit 164 generates a predicted value (PVL) by applying at least one of a regression model algorithm that predicts a continuous value for an object to be tracked and a classification model algorithm that predicts the type of an object. can do.

다시 도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 최적화 모듈(180)은 예측값(PVL)에 근거하여 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)를 업데이트 한다(S280). 최적화 모듈(180)은 예측값(PVL)이 일정 조건을 만족하는 경우, 예측값(PVL)은 객체에 대한 추적 결과(TR)로 출력할 수 있다. Referring back to FIGS. 1 and 2 , the optimization module 180 according to the embodiment of the present invention updates the first parameter PAR1 and the second parameter PAR2 based on the predicted value PVL (S280). ). The optimization module 180 may output the predicted value (PVL) as a tracking result (TR) for an object when the predicted value (PVL) satisfies a certain condition.

도 9 및 도 10은 각각 본 발명의 실시예에 따른 최적화 모듈을 나타내는 도면이다. 9 and 10 are diagrams each illustrating an optimization module according to an embodiment of the present invention.

먼저, 도 1 및 도 9를 참조하면, 본 발명의 실시예에 따른 최적화 모듈(180)은 손실 조정부(182) 및 역전파 수행부(184)를 포함할 수 있다.First, referring to FIGS. 1 and 9 , an optimization module 180 according to an embodiment of the present invention may include a loss adjustment unit 182 and a backpropagation unit 184.

손실 조정부(182)는 예측값(PVL)에 대한 손실(LS)이 최소값이 되도록 조정할 수 있다. 손실(LS)이 최소값에 이르는 경우, 예측값(PVL)은 객체에 대한 추적 결과로 출력할 수 있다.The loss adjusting unit 182 may adjust the loss LS with respect to the prediction value PVL to be a minimum value. When the loss (LS) reaches the minimum value, the predicted value (PVL) may be output as a tracking result for the object.

손실 조정부(182)는 회귀 모델에 대해 MSE(Mean Squere Error), MAE(Mean Absolute Error) 및 RMES(Root Mean Squere Error) 등의 손실 함수, 분류 모델에 대해 CEE(Cross Entropy Error) 등의 손실 함수를 적용할 수 있다. 나아가, 본 발명의 실시예에 따른 손실 조정부(182)는 손실 함수에 대한 정규화가 적용될 수 있다.The loss adjustment unit 182 is a loss function such as mean square error (MSE), mean absolute error (MAE), and root mean square error (RMES) for regression models, and loss functions such as cross entropy error (CEE) for classification models. can be applied. Furthermore, the loss adjuster 182 according to an embodiment of the present invention may apply normalization to the loss function.

역전파 수행부(184)는 손실 조정부(182)에 의해 조정된 손실(LS)에 근거하여 역전파(backpropagation)를 수행할 수 있다. 예를 들어, 역전파 수행부(184)는 손실 함수의 기울기(미분값)를 이용하여 신경망의 각 레이어 대한 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)를 업데이트할 수 있다. The backpropagation performer 184 may perform backpropagation based on the loss LS adjusted by the loss adjuster 182. For example, the backpropagation performer 184 may update the first parameter PAR1 and the second parameter PAR2 for each layer of the neural network using the gradient (differential value) of the loss function.

제1 신경망 모듈(120) 및 제2 신경망 모듈(140)은 업데이트 된 제1 매개변수(PRA1) 및 제2 매개변수(PAR2)에 따라 학습을 반복함으로써 객체 추적을 위한 보다 정확한 학습 결과(LR1, TR2)를 출력할 수 있다.The first neural network module 120 and the second neural network module 140 repeat learning according to the updated first parameter PRA1 and second parameter PAR2, resulting in a more accurate learning result LR1 for object tracking. TR2) can be output.

다음으로, 도 1 및 도 10을 참조하면, 본 발명의 실시예에 따른 최적화 모듈(180)은 도 9와 마찬가지로 손실 조정부(182) 및 역전파 수행부(184)를 포함할 수 있다. 이때, 도 10의 손실 조정부(182)는 제1 손실 조정부(182-2), 제2 손실 조정부(182-4) 및 제3 손실 조정부(182-6)를 포함할 수 있다.Next, referring to FIGS. 1 and 10 , the optimization module 180 according to an embodiment of the present invention may include a loss adjustment unit 182 and a backpropagation unit 184 similarly to FIG. 9 . In this case, the loss adjuster 182 of FIG. 10 may include a first loss adjuster 182-2, a second loss adjuster 182-4, and a third loss adjuster 182-6.

제1 손실 조정부(182-2)는 제1 신경망 모듈(120)에 대한 손실, 즉 제1 학습 결과(LR1)에 대한 제1 손실(LS1)을 산출할 수 있다. 전술된 예에 대해, 제1 손실 조정부(182-2)는 양자화 정규화 손실(Quantization Regularization Loss)을 제1 손실(LS1)로 산출할 수 있다. The first loss adjusting unit 182 - 2 may calculate a loss for the first neural network module 120 , that is, a first loss LS1 for the first learning result LR1 . Regarding the above example, the first loss adjustment unit 182-2 may calculate the quantization regularization loss as the first loss LS1.

제1 손실(LS1)은 다음의 수학식 (14)와 같이, 제1-1 매개변수에 대한 제1-1 손실(LS1-1) 및 제1-2 매개변수에 대한 제1-2 손실(LS1-2)로 구성될 수 있다. As shown in Equation (14) below, the first loss LS1 is the 1-1 loss LS1-1 for the 1-1 parameter and the 1-2 loss for the 1-2 parameter ( LS1-2).

(14)

제1 손실 조정부(182-2)는 예를 들어, 가중치

인 제1-1 매개변수에 대한 제1-1 손실(LS1-1)을 다음의 수학식 (15)를 이용하여 산출함으로써, 제1-1 매개변수(PAR1-1)가 특정 구간(예를 들어, 제1 양자화 구간

)에 위치하도록 정규화할 수 있다. 즉, 임의의 제1-1 매개변수

,

가 최적의 값을 갖도록 학습될 수 있다. 따라서, 제1-1 매개변수(PAR1-1)가 특정 구간 이외에 위치하여 클리핑(clipping) 됨에 따라 야기될 수 있는 오류를 줄일 수 있다. The first loss adjuster 182-2 may, for example, weight

By calculating the 1-1 loss (LS1-1) for the 1-1 parameter of which is calculated using the following Equation (15), the 1-1 parameter (PAR1-1) is a specific interval (eg For example, the first quantization interval

) can be normalized to be located at That is, any 1-1 parameter

,

can be learned to have an optimal value. Accordingly, an error that may be caused as the 1-1 parameter PAR1-1 is located outside of a specific section and is clipped can be reduced.

(15)

수학식 (15)에서 함수

및

는 각각 가중치 행렬

의 모든 요소의 평균 및 표준 편차를 나타내고,

및

은 평균 및 표준 편차 값에 대한 하이퍼-매개변수(hyperparameter)를 의미할 수 있다. The function in Equation (15)

and

are each weight matrix

represents the mean and standard deviation of all elements of

and

may mean a hyper-parameter for average and standard deviation values.

제1 손실 조정부(182-2)는 예를 들어, 활성화 값

인 제1-2 매개변수에 대한 제1-2 손실(LS1-2)을 다음의 수학식 (16)을 이용하여 산출함으로써, 제1-2 매개변수(PAR1-2)가 특정 구간(예를 들어, 제2 양자화 구간

)에 위치하도록 정규화할 수 있다. 즉, 임의의 제1-2 매개변수

,

가 최적의 값을 갖도록 학습될 수 있다. 따라서, 제1-2 매개변수(PAR1-2)가 특정 구간 이외에 위치하여 클리핑 됨에 따라 야기될 수 있는 오류를 줄일 수 있다. The first loss adjuster 182-2, for example, activates

By calculating the 1-2 loss (LS1-2) for the 1-2 parameter of is calculated using the following equation (16), the 1-2 parameter (PAR1-2) is a specific interval (eg For example, the second quantization interval

) can be normalized to be located at That is, any 1-2 parameter

,

can be learned to have an optimal value. Therefore, an error that may be caused as the 1-2 parameters PAR1-2 are located outside of a specific section and are clipped can be reduced.

(16)

수학식 (16)에서

은 배치 정규화(Batch Normalization)된 후의 활성화 값의 집합을, 함수

는 ReLU 함수를 나타낸다.

및

는 현재의 양자화 구간 매개변수

,

의 조합에 의해 결정될 수 있다. 활성화 값

이 가우시안 분포(Gaussian Distribution)를 따를 때, 수학식 (16)은 가우시안 평균보다 크고 "

"보다 작은 활성화 값이 활성화 범위 내에 있도록 강제한다.In Equation (16)

is a set of activation values after batch normalization,

represents the ReLU function.

and

is the current quantization interval parameter

,

can be determined by a combination of activation value

When following this Gaussian Distribution, Equation (16) is larger than the Gaussian mean and "

"Enforce smaller activation values to be within the activation range.

제2 손실 조정부(182-4)는 제2 신경망 모듈(140)에 대한 손실, 즉 제2 학습 결과(LR2)에 대한 제2 손실(LS2)을 산출하고 조정할 수 있다. 전술된 예에 대해, 제2 손실 조정부(182)는 채널 프루닝 정규화 손실(Channel Pruning Regularization Loss)을 제2 손실(LS2)로 산출할 수 있다. 이 경우, 제2 손실 조정부(182-4)는 제2 신경망 모듈(140)이 제2 학습 결과(LR2)를 생성하는데 사용되는 채널의 개수가 최소가 되도록 학습시킬 수 있다. The second loss adjusting unit 182 - 4 may calculate and adjust a loss for the second neural network module 140 , that is, a second loss LS2 for the second learning result LR2 . Regarding the above example, the second loss adjusting unit 182 may calculate the channel pruning regularization loss as the second loss LS2. In this case, the second loss adjuster 182-4 may train the second neural network module 140 to minimize the number of channels used to generate the second learning result LR2.

제2 손실 조정부(182-4)는 예를 들어, 상기 수학식 (10)의 채널 선택 확률 벡터 b_l이 다음의 수학식 (17)의 손실을 최소화하도록 학습시킬 수 있다. For example, the second loss adjuster 182-4 may train the channel selection probability vector b _l of Equation (10) to minimize the loss of Equation (17) below.

(17)

제3 손실 조정부(182-6)는 예측값(PVL)에 대한 제3 손실(LS3)을 산출할 수 있다. 전술된 예에 대해, 제3 손실 조정부(182)는 합산 결과(SR)에 대해 객체 추적 손실(Object Tracking Loss), 객체 분류 손실(Object Classification Loss) 및 경계 박스 손실(Bounding Box Loss) 등을 제3 손실(LS3)로 산출되고, 각 손실이 최소화될 수 있도록 조정할 수 있다. The third loss adjuster 182-6 may calculate a third loss LS3 for the predicted value PVL. Regarding the above example, the third loss adjuster 182 removes object tracking loss, object classification loss, and bounding box loss from the summation result (SR). It is calculated as 3 losses (LS3), and each loss can be adjusted to be minimized.

역전파 수행부(184)는 제1 내지 제3 손실 조정부(182)에 의해 조정된 손실에 근거하여 역전파를 수행할 수 있다. 예를 들어, 역전파 수행부(184)는 각 손실 함수의 기울기(미분값)를 이용하여 신경망의 각 레이어 대한 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)를 업데이트할 수 있다. The backpropagation unit 184 may perform backpropagation based on the losses adjusted by the first to third loss adjusters 182 . For example, the backpropagation performer 184 may update the first parameter PAR1 and the second parameter PAR2 for each layer of the neural network using the gradient (differential value) of each loss function.

다시 도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100) 및 객체 추적 학습 방법(200)은, 상술한 구조 및 동작을 통해, 제1 신경망 모듈(120)은 낮은 비용으로 대략적인 기본 정보를 학습하고 제2 신경망 모듈(140)은 영상 프레임 원본에 대한 충실도를 높이기 위한 보완 또는 잔여 정보를 학습함으로써, 정확도를 유지하면서도 빠른 연산을 수행할 수 있다. 즉, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100) 및 객체 추적 학습 방법(200)은, 효율적인 저비트폭 양자화 네트워크와 정밀한 채널 프루닝 네트워크를 각각 학습한 결과를 이용함으로써, 정확성과 효율성을 모두 갖춘 객체 추적 학습을 수행할 수 있다. Referring back to FIGS. 1 and 2 , the object tracking learning system 100 and the object tracking learning method 200 according to an embodiment of the present invention, through the above-described structure and operation, the first neural network module 120 By learning rough basic information at a low cost and the second neural network module 140 learning supplementary or residual information for increasing the fidelity of the original image frame, fast calculation can be performed while maintaining accuracy. That is, the object tracking learning system 100 and the object tracking learning method 200 according to an embodiment of the present invention use the result of learning an efficient low-bit-width quantization network and a precise channel pruning network, respectively, to achieve accuracy and efficiency. It is possible to perform object tracking learning with all of them.

이상에서는 본 발명의 실시예에 따른 매개변수가 가중치, 활성화 값 또는 채널 선택 확률 벡터인 경우에 대하여 설명되었으나 이에 한정되는 것은 아니다. 본 발명의 실시예에 따른 매개변수는 편향값(bias) 등일 수도 있다. 이하 마찬가지이다. In the above, the case where the parameter according to the embodiment of the present invention is a weight, an activation value, or a channel selection probability vector has been described, but is not limited thereto. A parameter according to an embodiment of the present invention may be a bias value or the like. Same below.

도 11 및 도 12는 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이다. 11 and 12 are diagrams illustrating an object tracking learning system based on a hybrid neural network according to an embodiment of the present invention, respectively.

먼저 도 1 및 도 11을 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100)은 도 1과 마찬가지로, 제1 신경망 모듈(120), 제2 신경망 모듈(140), 예측 모듈(160) 및 최적화 모듈(180)을 포함하여, 입력 영상(IVD)의 객체 추적을 정확하고 빠르게 수행할 수 있다. Referring first to FIGS. 1 and 11 , the object tracking learning system 100 according to an embodiment of the present invention, like FIG. 1 , includes a first neural network module 120, a second neural network module 140, and a prediction module 160. ) and the optimization module 180, object tracking of the input image IVD can be accurately and quickly performed.

나아가, 도 11의 객체 추적 학습 시스템(100)은 사전 학습 모듈(110)을 더 포함할 수 있다. 사전 학습 모듈(110)은 테스트 영상(TVD)으로 제1 신경망 모듈(120) 및 제2 신경망 모듈(140)에 대한 사전 학습을 수행하여, 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)의 초기값을 설정할 수 있다. Furthermore, the object tracking learning system 100 of FIG. 11 may further include a pre-learning module 110 . The pre-learning module 110 performs pre-learning on the first neural network module 120 and the second neural network module 140 with the test image TVD, and the first parameter PAR1 and the second parameter PAR2 ) can set the initial value.

제1 신경망 모듈(120) 및 제2 신경망 모듈(140)은 사전 학습 모듈(110)에 의해 사전 학습된 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)의 초기값(PAR1-0, PAR2-0)을 이용하여 입력 영상(IVD)에서의 객체 추적 학습을 시작할 수 있다. 전술된 바와 같이, 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)는 최적화 모듈(180)에 의해 업데이트 됨에 따라, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100)은 보다 정확하게 객체를 추적할 수 있다. The first neural network module 120 and the second neural network module 140 have initial values (PAR1-0, PAR2-0) can be used to start object tracking learning in the input image (IVD). As described above, as the first parameter PAR1 and the second parameter PAR2 are updated by the optimization module 180, the object tracking learning system 100 according to an embodiment of the present invention more accurately can be traced.

다음으로 도 1 및 도 12를 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100)은 도 11과 마찬가지로, 제1 신경망 모듈(120), 제2 신경망 모듈(140), 예측 모듈(160), 최적화 모듈(180) 및 사전 학습 모듈(110)을 포함하여, 입력 영상(IVD)의 객체 추적을 정확하고 빠르게 수행할 수 있다. Next, referring to FIGS. 1 and 12, the object tracking learning system 100 according to an embodiment of the present invention, like FIG. 11, includes a first neural network module 120, a second neural network module 140, a prediction module ( 160), the optimization module 180, and the pre-learning module 110, it is possible to accurately and quickly perform object tracking of the input image IVD.

나아가, 도 12의 객체 추적 학습 시스템(100)은 온라인 추적 모듈(190)을 더 포함할 수 있다. 이때, 입력 영상(IVD)은 스트리밍 영상(streaming video)일 수 있다. 온라인 추적 모듈(190)은, 제2 신경망 모듈(140)이 실시간 학습되도록 제어할 수 있다. Furthermore, the object tracking learning system 100 of FIG. 12 may further include an online tracking module 190 . In this case, the input video IVD may be a streaming video. The online tracking module 190 may control the second neural network module 140 to learn in real time.

전술된 예에서, 제1 신경망 모듈(120)은 양자화 알고리즘을 적용하여 모델 경량화를 수행하고, 제2 신경망 모듈(140)은 채널 프루닝 알고리즘을 적용하여 모델 경량화를 수행한다. 이때, 온라인 추적 모듈(190)은 사전 학습 결과가 유지되는 제1 신경망 모듈(120)을 통해 입력 영상(IVD)에 대한 주된 정보 또는 기본 정보를 처리하면서, 제2 신경망 모듈(140)을 실시간 학습하여 사전 학습 결과를 업데이트 함으로써 입력 영상(IVD)의 세부 정보에 대한 시각적 변동이 반영되도록 할 수 있다. 따라서, 실시간으로 변화하는 입력 영상(IVD)의 객체 위치가 순차적으로 추적될 수 있다. In the above example, the first neural network module 120 applies a quantization algorithm to perform model weight reduction, and the second neural network module 140 applies a channel pruning algorithm to perform model weight reduction. At this time, the online tracking module 190 learns the second neural network module 140 in real time while processing the main information or basic information of the input image IVD through the first neural network module 120 in which the pre-learning result is maintained. Thus, by updating the pre-learning result, it is possible to reflect the visual change of the detailed information of the input image IVD. Accordingly, object positions of the input image IVD that change in real time may be sequentially tracked.

이때, 예측 모듈(160)은 사전 학습 결과인 제1 학습 결과(LR1)와 실시간 학습된 제2 학습 결과(LR2)를 합산 결과(SR)로 합산하여 예측값(PVL)을 생성하고, 최적화 모듈(180)은 실시간 학습된 제2 학습 결과(LR2)가 반영된 예측값(PVL)에 근거하여 최적화 동작을 수행할 수 있다. At this time, the prediction module 160 generates a prediction value (PVL) by summing the first learning result (LR1), which is a pre-learning result, and the second learning result (LR2) learned in real time as a summation result (SR), and the optimization module ( 180) may perform an optimization operation based on the prediction value PVL to which the real-time learned second learning result LR2 is reflected.

도 13 및 도 14는 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 방법을 나타내는 도면이다.13 and 14 are diagrams illustrating a hybrid neural network-based object tracking learning method according to an embodiment of the present invention, respectively.

도 13 및 도 14를 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 방법(200)은 사전 학습 단계(S210), 양자화 단계(S220-2), 채널 프루닝 단계(S240-2), 제1 학습 결과 생성 단계(S220-4), 제2 학습 결과 생성 단계(S240-4), 학습 결과 합산 단계(S260-2), 예측 단계(S260-4), 최적화 단계(S280) 및 온라인 업데이트 단계(S290)를 포함할 수 있다. 각 단계에 대한 구체적인 설명은 전술된 바와 같을 수 있다. 13 and 14, the object tracking learning method 200 according to an embodiment of the present invention includes a pre-learning step (S210), a quantization step (S220-2), a channel pruning step (S240-2), 1 learning result generation step (S220-4), 2nd learning result generation step (S240-4), learning result summation step (S260-2), prediction step (S260-4), optimization step (S280) and online update step (S290) may be included. A detailed description of each step may be as described above.

이렇듯, 본 발명의 실시예에 따른 객체 추적 학습 방법 및 시스템에 의하면, 적은 자원으로도 정확도 요구를 만족시킬 수 있는 스트리밍 영상에 대한 객체 추적 학습이 가능하므로, 실시간 처리가 요구되는 오탐(false alarm) 검출 또는 자율주행 객체 검출 등의 분야에 적용될 수 있다. As such, according to the object tracking learning method and system according to an embodiment of the present invention, object tracking learning for streaming video that can satisfy the accuracy requirement is possible with a small amount of resources, so that false alarms requiring real-time processing It can be applied to fields such as detection or self-driving object detection.

도 15는 본 발명의 실시예에 따른 객체 추적 학습 시스템의 성능 분석을 나타내는 표이다. 15 is a table showing performance analysis of an object tracking learning system according to an embodiment of the present invention.

도 1 및 도 15를 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100)은 RT-MDNet(Real Time Multi-Domain Convolution Neural Network Tracker) 또는 SiamRPN++(Siamese Region Proposal Network) 모델에 대한 전술된 경량화 알고리즘을 플러그-인(plug-in)하여 구현될 수 있다. 도 15의 표에 나타난 결과는 RT-MDNet 및 SiamRPN++ 각각에 대한 경량화를 적용하지 아니한 경우, 양자화(Q) 알고리즘을 적용한 경우, 및 양자화(Q) 및 프루닝(P) 알고리즘을 적용한 경우의 성능을 나타낸다. 도 15에서 양자화(Q) 및 프루닝(P)는 정규화가 수행된 경우에 대한 것일 수 있다. Referring to FIGS. 1 and 15, the object tracking learning system 100 according to an embodiment of the present invention is a method for a Real Time Multi-Domain Convolution Neural Network Tracker (RT-MDNet) or a Siamese Region Proposal Network (SiamRPN++) model. It can be implemented by plug-in the lightweight algorithm. The results shown in the table of FIG. 15 show the performance of RT-MDNet and SiamRPN++ when lightweighting is not applied, when quantization (Q) algorithm is applied, and when quantization (Q) and pruning (P) algorithms are applied. indicate In FIG. 15 , quantization (Q) and pruning (P) may be for a case where normalization is performed.

도 15에서, 연산 오버헤드(computation overhead)는 비트별 합성연산(Bitwise Convolution Operations, BOPs)을 통해 산출될 수 있다. TotalHsize는 BOPs를 기반으로 RT-MDNet 및 SiamRPN++의 계산량에 대한 양자화(Q), 및/또는 프루닝(P)을 적용한 경우의 상대적인 계산량을 나타낸다. 이때, RT-MDNet 및 SiamRPN++의 계산량은 "1"로 본다. 양자화(Q) 및/또는 프루닝(P) 알고리즘을 적용한 경우 비트수는 32에서 4 또는 5로 줄어들었고, 계산량은 1에서 0.2 내지 0.32로 현저히 감소하는 것을 알 수 있다. 정밀도(Precision rate, Prec)는 RT-MDNet의 경우 85.3에서 각각 83.7 및 84.9로, SiamRPN++의 경우 87.6 또는 90.5에서 각각 87.3 및 89.5로 변경됨을 확인할 수 있다. 성공률(Success rate, Succ)도 정밀도와 유사한 결과를 나타낸다. 참고로, SiamRPN++의 경우 정밀도 및 성공률은 연구 상과 실제 상의 차이가 존재하는데 가로 안의 값이 실제 테스트 결과를 나타낸다. In FIG. 15 , computation overhead may be calculated through bitwise convolution operations (BOPs). TotalHsize represents the relative amount of computation when quantization (Q) and/or pruning (P) are applied to the complexity of RT-MDNet and SiamRPN++ based on BOPs. At this time, the computation amount of RT-MDNet and SiamRPN++ is regarded as "1". It can be seen that when the quantization (Q) and/or pruning (P) algorithm is applied, the number of bits is reduced from 32 to 4 or 5, and the amount of calculation is significantly reduced from 1 to 0.2 to 0.32. It can be seen that the precision rate (Prec) changes from 85.3 to 83.7 and 84.9 for RT-MDNet, and from 87.6 or 90.5 to 87.3 and 89.5 for SiamRPN++, respectively. Success rate (Succ) also shows results similar to precision. For reference, in the case of SiamRPN++, there is a difference between research and actual results in precision and success rate.

즉, 양자화(Q) 알고리즘만을 적용한 경우, 4~5배의 연산량 감소 효과는 있지만 정확도 측면에서는 열화가 크다. 반면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(100)과 같이, 양자화(Q) 및 프루닝(P) 알고리즘을 같이 적용한 경우, RT-MDNet 및 SiamRPN++의 정확도가 거의 회복된 것으로 확인되었다. That is, when only the quantization (Q) algorithm is applied, there is an effect of reducing the amount of operation by 4 to 5 times, but the deterioration is great in terms of accuracy. On the other hand, when the quantization (Q) and pruning (P) algorithms are applied together, as in the object tracking learning system 100 according to the embodiment of the present invention, it is confirmed that the accuracy of RT-MDNet and SiamRPN++ is almost recovered.

도 16 내지 도 18은 각각 본 발명의 실시예에 따른 하이브리드 신경망 기반 객체 추적 학습 시스템을 나타내는 도면이다. 16 to 18 are diagrams illustrating an object tracking learning system based on a hybrid neural network according to an embodiment of the present invention, respectively.

먼저 도 16을 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(1600)은 시불변 학습 모듈(time-invariant learning module, 1620), 시변 학습(time-varying learning module, 1640), 예측 모듈(1660) 및 최적화 모듈(1680)을 포함한다. 이때, 시불변 학습 모듈(1620)은 사전 학습로 모델링되어 입력 영상(IVD)에 대한 제1 학습 결과(LR1)를 출력하고, 시변 학습 모듈(1640)은 사전 학습 및 온라인 실시간 학습하여 입력 영상(IVD)에 대한 제2 학습 결과(LR2)를 출력할 수 있다. 시불변 학습 모듈(1620)은 제1 매개변수(PAR1)를 정수로 양자화하여 경량화되는 심층신경망이고, 시변 학습 모듈(1640)은 제2 매개변수(PAR2)의 연결을 일부 제거하는 채널 프루닝 방식으로 경량화되는 심층신경망일 수 있다. Referring first to FIG. 16 , an object tracking learning system 1600 according to an embodiment of the present invention includes a time-invariant learning module 1620, a time-varying learning module 1640, and a prediction module. 1660 and optimization module 1680. At this time, the time-invariant learning module 1620 is modeled by prior learning and outputs a first learning result LR1 for the input image IVD, and the time-varying learning module 1640 performs prior learning and online real-time learning to output the input image ( IVD) may output a second learning result LR2. The time-invariant learning module 1620 is a deep neural network that is lightweight by quantizing the first parameter PAR1 into an integer, and the time-varying learning module 1640 is a channel pruning method that partially removes the connection of the second parameter PAR2. It can be a deep neural network that is lightweight.

예측 모듈(160)은 제1 학습 결과(LR1) 및 제2 학습 결과(LR2)를 합산한 합산 결과(SR)에 근거하여 입력 영상(IVD)으로부터 추적하고자 하는 객체에 대한 예측값(PVL)을 생성할 수 있다. 최적화 모듈(1680)은 예측 모듈(160)로부터 수신되는 예측값(PVL)에 근거하여, 제1 매개변수(PAR1) 및 제2 매개변수(PAR2)를 업데이트할 수 있다. The prediction module 160 generates a prediction value (PVL) for an object to be tracked from the input image (IVD) based on a sum result (SR) obtained by summing the first learning result (LR1) and the second learning result (LR2). can do. The optimization module 1680 may update the first parameter PAR1 and the second parameter PAR2 based on the predicted value PVL received from the prediction module 160 .

다음으로 도 17을 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(1700)은, 제1 방식 신경망(1720), 제2 방식 신경망(1740), 예측 모듈(1760) 및 최적화 모듈(1780)을 포함할 수 있다. Next, referring to FIG. 17 , an object tracking learning system 1700 according to an embodiment of the present invention includes a first type neural network 1720, a second type neural network 1740, a prediction module 1760, and an optimization module 1780. ) may be included.

이때, 제1 방식 신경망(1720) 및 제2 방식 신경망(1740)은 이종의 신경망 모듈일 수 있다. 예를 들어, 제1 방식 신경망(1720)은 양자화 기술이 적용된 경량화된 심층신경망으로 구현되고, 제2 방식 신경망(1740)은 프루닝 기술이 적용된 경량화된 심층신경망으로 구현될 수 있다. 또는, 제1 방식 신경망(1720)은 시불변 학습되는 심층신경망으로 구현되고, 제2 방식 신경망(1740)은 시변 학습되는 경량화된 심층신경망으로 구현될 수 있다. In this case, the first type neural network 1720 and the second type neural network 1740 may be heterogeneous neural network modules. For example, the first type neural network 1720 may be implemented as a lightweight deep neural network to which quantization technology is applied, and the second type neural network 1740 may be implemented as a lightweight deep neural network to which pruning technology is applied. Alternatively, the first type neural network 1720 may be implemented as a time-invariant trained deep neural network, and the second type neural network 1740 may be implemented as a time-varying trained deep neural network.

또는, 제1 방식 신경망(1720) 및 제2 방식 신경망(1740)은 성능이 서로 상이한 신경망 모듈일 수 있다. 예를 들어, 제1 학습 결과(LR1)의 비트폭이 제2 학습 결과(LR2)의 비트폭과 상이할 수 있다. 예를 들어, 제1 방식 신경망(1720)은 입력 영상(IVD)에 대한 거친 스케일(coarse scale)의 제1 학습 결과(LR1)를 출력하고, 제2 방식 신경망(1740)은 입력 영상(IVD)에 대한 세밀한 스케일(Fine scale)의 제2 학습 결과(LR2)를 출력할 수 있다. Alternatively, the first type neural network 1720 and the second type neural network 1740 may be neural network modules having different performances. For example, the bit width of the first learning result LR1 may be different from the bit width of the second learning result LR2. For example, the first type neural network 1720 outputs a coarse scale first learning result LR1 for the input image IVD, and the second type neural network 1740 outputs the input image IVD. A second learning result LR2 of a fine scale may be output.

도 17의 객체 추적 학습 시스템(1700)은 신경망 설정부(1790)를 더 포함할 수 있다. 신경망 설정부(1790)는 입력 영상(IVD)의 객체 추적 학습에 요구되는 정확도 또는 연산량을 기준으로 제1 신경망 모듈(120) 및 제2 신경망 모듈(140)의 조합을 달리 설정할 수 있다. 예를 들어, 객체 추적 학습 시스템(100)에 대한 리소스 제약이 있고 정확도에 대한 일정 기준의 요구가 있는 경우, 신경망 설정부(1790)는 제1 신경망 모듈(120)을 저비트 양자화 네트워크로, 제2 신경망 모듈(140)을 고정밀 프루닝 네트워크로 설정할 수 있다. 예를 들어, 신경망 설정부(1790)는 신경망 구조 탐색(Neural Architecture Search, NAS) 알고리즘을 적용하여, 최적의 이종의 신경망 모듈을 설정할 수 있다. The object tracking learning system 1700 of FIG. 17 may further include a neural network setting unit 1790. The neural network setting unit 1790 may set different combinations of the first neural network module 120 and the second neural network module 140 based on the accuracy or amount of calculation required for object tracking learning of the input image IVD. For example, if there are resource constraints for the object tracking learning system 100 and a certain standard of accuracy is required, the neural network setting unit 1790 converts the first neural network module 120 into a low-bit quantization network, 2 The neural network module 140 may be configured as a high-precision pruning network. For example, the neural network setting unit 1790 may set an optimal heterogeneous neural network module by applying a neural architecture search (NAS) algorithm.

또한, 도시되지는 아니하였으나, 도 17의 객체 추적 학습 시스템(100)은 도 11의 사전 학습 모듈(110) 또는 도 12의 온라인 추적 모듈(190)을 더 포함할 수 있다. Also, although not shown, the object tracking learning system 100 of FIG. 17 may further include the pre-learning module 110 of FIG. 11 or the online tracking module 190 of FIG. 12 .

다음으로 도 18을 참조하면, 본 발명의 실시예에 따른 객체 추적 학습 시스템(1800)은, 제1 신경망 모듈(1820), 제2 신경망 모듈(1840), 예측 모듈(1860) 및 최적화 모듈(1880)을 포함할 수 있다. 제1 신경망 모듈(1820), 제2 신경망 모듈(1840), 예측 모듈(1860) 및 최적화 모듈(1880)은 도 1의 제1 신경망 모듈(120), 제2 신경망 모듈(140), 예측 모듈(160) 및 최적화 모듈(180)과 동일하거나 유사한 구조로 구비될 수 있다. 나아가, 도 18의 객체 추적 학습 시스템(100)은 입력 영상(IVD)의 객체 추적 학습에 요구되는 정확도 또는 연산량에 근거하여 제3 신경망 모듈(1850)을 더 포함할 수 있다. 이 경우, 예측 모듈(160)은 제1 내지 제3 신경망 모듈(1850)로부터 각각 학습 결과(LR1, TR2, TR3)를 수신하고 합산하여 예측값(PVL)을 생성할 수 있다. Next, referring to FIG. 18 , the object tracking learning system 1800 according to an embodiment of the present invention includes a first neural network module 1820, a second neural network module 1840, a prediction module 1860, and an optimization module 1880. ) may be included. The first neural network module 1820, the second neural network module 1840, the prediction module 1860, and the optimization module 1880 include the first neural network module 120, the second neural network module 140, the prediction module ( 160) and the optimization module 180 may have the same or similar structure. Furthermore, the object tracking learning system 100 of FIG. 18 may further include a third neural network module 1850 based on the accuracy or amount of computation required for object tracking learning of the input image IVD. In this case, the prediction module 160 may receive the learning results LR1 , TR2 , and TR3 from the first to third neural network modules 1850 and sum them to generate a prediction value PVL.

이렇듯, 본 발명의 실시예에 따른 객체 추적 학습 방법 및 시스템에 의하면, 다양한 상황에 대해서도 정확하고 효율적인 학습이 수행될 수 있다. As such, according to the object tracking learning method and system according to an embodiment of the present invention, accurate and efficient learning can be performed even in various situations.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those skilled in the art will understand that various modifications are possible to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention should not be limited to the described embodiments and should not be defined, and should be defined by not only the claims to be described later, but also those equivalent to these claims.

100: 객체 추적 학습 시스템
120: 제1 신경망 모듈
140: 제2 신경망 모듈
160: 예측 모듈
180: 최적화 모듈
200: 객체 추적 학습 방법100: object tracking learning system
120: first neural network module
140: second neural network module
160: prediction module
180: optimization module
200: object tracking learning method

Claims

A first neural network module that expresses a first parameter of an input image from a first type to a second type, learns, and outputs a first learning result;
a second neural network module that removes connections of some of the second parameters of the input image, learns, and outputs a second learning result;
a prediction module generating a predicted value for the object of the input image from a result obtained by summing the first learning result and the second learning result; and
and an optimization module for updating the first parameter and the second parameter based on the predicted value.

According to claim 1,
The first neural network module,
and a 1-1 quantization unit for quantizing the first type parameter of real number type into the second type of integer type.

According to claim 2,
The 1-1 quantization unit,
The object tracking learning system for quantizing the first type parameter into the second type corresponding to the center and width of the target interval for the first type parameter.

According to claim 2,
The first neural network module,
The object tracking learning system further comprising a first-second quantization unit for quantizing the activation value of the first type into the second type.

According to claim 1,
The second neural network module,
and a channel pruning unit pruning a channel corresponding to a part of the second parameter of the first type.

According to claim 1,
The first neural network module,
A 1-1 quantization unit for quantizing the first parameter of the first type of real number into the second type of integer type;
The second neural network module,
and a channel pruning unit pruning a channel corresponding to a part of the second parameter of the first type.

According to claim 1,
The optimization module,
a loss adjusting unit adjusting loss of the first learning result, the second learning result, and the predicted value; and
and a backpropagation performer performing backpropagation based on the adjusted loss.

According to claim 1,
A pre-learning module configured to pre-learn the first neural network module and the second neural network module, perform pre-learning on a test image, and set initial values of parameters for the input image.

According to claim 8,
The object tracking learning system further comprising: an online tracking module for controlling real-time learning of the second neural network module for the input video, which is a streaming video.

According to claim 9,
The prediction module,
and a learning result summing unit summing the first learning result of the first neural network module, which is pre-trained, and the second learning result of the second neural network module, which is online learning, and generating the sum output. .

According to claim 1,
The object tracking learning system further comprising: a neural network design module that differently sets a combination of the first neural network module and the second neural network module based on a first criterion.

According to claim 1,
The first learning result is expressed as an integer type,
The second learning result is expressed in real number type,
The object tracking learning system in which the summation result is expressed as a mixture of integer and real numbers.

at least two or more heterogeneous neural network modules;
a prediction module generating a predicted value from a summation result obtained by summing the learning results of the at least two or more neural network modules; and
An optimization module for updating parameters of the at least two or more neural networks based on the predicted values;
The summation result is,
An object tracking learning system that includes at least two elements expressed in heterogeneous form.

According to claim 13,
The at least two or more neural network modules,
a first neural network module that quantizes parameters of the input image into integers; and
and a channel pruning unit configured to prun channels corresponding to some of the parameters of the input image.

According to claim 13,
The at least two or more neural network modules,
A first neural network module pretrained; and
An object tracking learning system including a second neural network module that is pre-learned and real-time trained.

According to claim 13,
The at least two or more neural network modules,
a first neural network module outputting a first learning result of a coarse scale for main information of the input image; and
and a second neural network module outputting a second learning result of a fine scale with respect to the supplementary information of the input image.

In the hybrid neural network-based object tracking learning method,
expressing a first parameter of an input image from a first type to a second type and outputting the result as a first learning result;
removing a connection of some of the second parameters of the input image and outputting the result as a second learning result;
generating a predicted value based on a summation result in which the first learning result and the second learning result are summed; and
and updating the first parameter and the second parameter based on the predicted value.

According to claim 17,
The step of outputting as the first learning result,
Quantizing the parameter of the first type of real number type into the second type of integer type;
The step of outputting the second learning result,
and pruning a channel corresponding to a part of the second parameter.

According to claim 17,
Generating the predicted value,
and summing the first learning result of the first neural network module, which is pre-learned, and the second learning result of the second neural network module, which is online-learning, as the summation result.

According to claim 17,
The first learning result is expressed as an integer type,
The second learning result is expressed in real number type,
An object tracking learning method in which a sum result obtained by summing the first learning result and the second learning result is expressed in a mixture of integer type and real number type.