KR102351433B1

KR102351433B1 - Apparatus for Reularization of Video Overfitting and Driving Method Thereof

Info

Publication number: KR102351433B1
Application number: KR1020210083690A
Authority: KR
Inventors: 김종목; 나종근
Original assignee: 주식회사 스누아이랩
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2022-01-17

Abstract

The present invention relates to a video overfitting normalization device and a driving method thereof. According to an embodiment of the present invention, the video overfitting normalization device comprises: a communication interface unit for receiving video frames of a photographed image; and a control unit configured to determine an overfitting degree for each specified number of video frames among the received video frames and perform learning by dropping a pixel value of an arbitrary video frame to a meaningless value during data operation processing based on a probability value (p) adjusted according to the determined overfitting degree.

Description

Apparatus for Regularization of Video Overfitting and Driving Method Thereof

본 발명은 동영상 과적합 정규화 장치 및 그 장치의 구동방법에 관한 것으로서, 더 상세하게는 가령 과적합이 일어나기 쉬운 환경이나 비디오의 특성상 연속된 동작으로부터 그 부류(class)를 파악해야 할 때 동영상에서 과적합을 방지하기 위해 학습 단계에서 임의 비디오 프레임의 화소값 전체를 의미없는 값(예: 0, gray 등)으로 드롭(drop)시켜 그에 따른 비디오 프레임의 학습이 이루어지도록 하는 동영상 과적합 정규화 장치 및 그 장치의 구동방법에 관한 것이다.The present invention relates to an apparatus for normalizing video overfitting and a driving method of the apparatus, and more particularly, when it is necessary to determine the class from continuous operation in an environment where overfitting is easy to occur or the nature of video In order to prevent summation, a video overfit normalization device and its device that drop all pixel values of arbitrary video frames to meaningless values (eg 0, gray, etc.) It relates to a method of driving a device.

CCTV(Closed Circuit Television), 휴대폰 카메라 등을 활용한 다양한 컴퓨터 비젼 기술이 사진(이미지) 및 동영상(비디오)에 적용되어 사진이나 동영상으로부터 사람, 자동차, 동물 등의 원하는 객체를 검출하고 인식할 수 있게 되었고, 딥 러닝(Deep Learning) 등의 기계 학습법과 접목하여 객체 검출 및 인식의 정확성 향상을 도모하고 있으나, 만족할 만한 수준에 이르지 못하였다.Various computer vision technologies using CCTV (Closed Circuit Television) and mobile phone cameras are applied to photos (images) and videos (videos) to detect and recognize desired objects such as people, cars, and animals from photos or videos. It is trying to improve the accuracy of object detection and recognition by grafting it with machine learning methods such as deep learning, but it has not reached a satisfactory level.

예를 들어 기존의 딥러닝 네트워크 학습시 과적합(overfitting) 문제가 있고, 이를 해결하기 위한 다양한 정규화(regularization) 기술들이 개발되어 왔다. 이와 같이 이미지 분류(image classification) 분야에서 가장 쉽게 널리 사용되는 기술 중 하나는 드롭아웃(drop out) 기술이다. 드롭아웃은 학습시 네트워크 내부의 신경망을 p의 확률로 0으로 떨어뜨리면서 학습시키는 것을 말한다. 이와 같이 같은 입력값(image, whatever)에 대해서 매번 p의 확률로 신경망을 끊어버리기 때문에 다른 특징들로 네트워크를 학습시켜 과적합을 피하게 된다. 이와 유사한 기술로는 드롭커넥트(dropconnect), 드롭블록(dropblock) 등이 있다.For example, there is an overfitting problem when learning an existing deep learning network, and various regularization techniques have been developed to solve this problem. As such, one of the most easily and widely used techniques in the image classification field is a dropout technique. Dropout refers to learning while dropping the neural network inside the network to 0 with probability p. Since the neural network is cut with a probability of p every time for the same input value (image, whatever), overfitting is avoided by learning the network with different features. Similar technologies include dropconnect and dropblock.

여러 장의 연속된 이미지에 대한 분류를 하는 분야는 동영상 분류(video classification)이며 이미지 분류와 비슷하지만 더욱 심한 과적합 현상이 일어나는 분야이다. 물론 비디오 분류에서도 역시 과적합을 방지하기 위하여 정규화 기술들이 많이 연구되어 왔다.The field that classifies several consecutive images is video classification, which is similar to image classification, but is a field in which more severe overfitting occurs. Of course, many regularization techniques have been studied to prevent overfitting in video classification as well.

이와 같이 현재까지도 비디오 분류에서 과적합을 방지하기 위하여 정규화 기술이 연구되고 있지만, 인공지능의 딥러닝 기술에 접목하기 위한 만족할 만한 성과를 보이지 못하는 문제가 있다.As described above, although regularization technology is being studied to prevent overfitting in video classification, there is a problem that it does not show satisfactory results for grafting it to deep learning technology of artificial intelligence.

한국공개특허공보 제10-2020-0041154호(2020.04.21)Korean Patent Publication No. 10-2020-0041154 (2020.04.21) 한국공개특허공보 제10-2021-0024935호(2021.03.08)Korea Patent Publication No. 10-2021-0024935 (2021.03.08) 한국공개특허공보 제10-2018-0075368호(2018.07.04)Korean Patent Publication No. 10-2018-0075368 (2018.07.04)

본 발명의 실시예는 가령 과적합이 일어나기 쉬운 환경이나 비디오의 특성상 연속된 동작으로부터 그 부류를 파악해야 할 때 동영상에서 과적합을 방지하기 위해 학습 단계에서 임의 비디오 프레임의 화소값 전체를 의미없는 값(예: 0, gray 등)으로 드롭시켜 그에 따른 비디오 프레임의 학습이 이루어지도록 하는 동영상 과적합 정규화 장치 및 그 장치의 구동방법을 제공함에 그 목적이 있다.In an embodiment of the present invention, for example, in an environment where overfitting is easy to occur or when it is necessary to identify a class from a continuous operation due to the nature of the video, in order to prevent overfitting in a moving picture, the entire pixel value of an arbitrary video frame is set to a meaningless value in the learning stage. An object of the present invention is to provide an apparatus for overfitting normalization of a moving image that drops to (eg, 0, gray, etc.) to allow learning of a video frame according to the drop and a method of driving the apparatus.

본 발명의 실시예에 따른 동영상 과적합 정규화 장치는, 촬영 영상의 비디오 프레임을 수신하는 통신 인터페이스부, 및 상기 수신한 비디오 프레임 중 지정된 수의 비디오 프레임마다 과적합 정도를 판단하고, 상기 판단한 과적합 정도에 따라 조절되는 확률값(p)에 근거해 임의의 비디오 프레임의 화소값을 데이터 연산 처리시 의미없는 값으로 드롭(drop)시켜 학습이 이루어지도록 하는 제어부를 포함한다.A video overfit normalization apparatus according to an embodiment of the present invention includes a communication interface unit for receiving a video frame of a captured image, and a degree of overfitting for each designated number of video frames among the received video frames, and determining the overfitting The control unit includes a control unit for learning by dropping pixel values of arbitrary video frames to meaningless values during data operation processing based on the probability value p adjusted according to the degree.

상기 제어부는, 상기 확률값에 근거하여 상기 지정된 수의 비디오 프레임에서 상기 학습을 위해 유지해야 하는 비디오 프레임 수 및 상기 드롭을 위한 비디오 프레임을 각각 결정할 수 있다.The controller may determine the number of video frames to be maintained for the learning and the video frames for the drop from the specified number of video frames, respectively, based on the probability value.

상기 제어부는, 상기 지정된 수의 비디오 프레임에 대한 학습 손실값(train loss)와 검증 손실값(validation loss)의 비율 차이로 상기 과적합 정도(val_loss(t)/train_loss(t))를 판단할 수 있다.The control unit may determine the degree of overfitting (val_loss(t)/train_loss(t)) as a difference in a ratio between a training loss value and a validation loss value for the specified number of video frames. have.

상기 제어부는, 상기 과적합 정도가 높을수록 상기 지정된 수의 비디오 프레임에서 드롭되는 비디오 프레임의 수를 증가시킬 수 있다.The controller may increase the number of video frames dropped from the specified number of video frames as the degree of overfitting increases.

상기 제어부는, 상기 지정된 수의 비디오 프레임에 대하여 비디오의 특성상 연속된 동작으로 부류(class)를 판단해야 하는 경우 드롭되는 비디오 프레임의 수를 감소시킬 수 있다.The controller may reduce the number of dropped video frames when a class of the specified number of video frames is to be determined through continuous operations due to the characteristics of the video.

상기 제어부는, 상기 드롭되는 비디오 프레임의 화소값을 의미없는 값으로 변경하기 위해 화소값이 모두 동일 값을 갖도록 할 수 있다.In order to change the pixel value of the dropped video frame to a meaningless value, the controller may make all the pixel values have the same value.

상기 제어부는, 상기 드롭되는 비디오 프레임의 화소값을 변경하기 위하여 상기 드롭되는 비디오 프레임에 기생성된 마스킹 비디오 프레임을 마스킹하여 연산 동작을 수행할 수 있다.The controller may perform an operation by masking a previously generated masking video frame to the dropped video frame in order to change a pixel value of the dropped video frame.

또한, 본 발명의 실시예에 따른 동영상 과적합 정규화 장치는, 통신 인터페이스부가, 촬영 영상의 비디오 프레임을 수신하는 단계, 및 상기 수신한 비디오 프레임 중 지정된 수의 비디오 프레임마다 과적합 정도를 판단하고, 상기 판단한 과적합 정도에 따라 조절되는 확률값(p)에 근거해 임의의 비디오 프레임의 화소값을 데이터 연산 처리시 의미없는 값으로 드롭(drop)시켜 학습이 이루어지도록 하는 제어부를 포함한다.In addition, in the video overfitting normalization apparatus according to an embodiment of the present invention, the communication interface unit receives a video frame of a captured image, and determines the degree of overfitting for each specified number of video frames among the received video frames, and a control unit for learning by dropping pixel values of arbitrary video frames to meaningless values during data operation processing based on the probability value p adjusted according to the determined degree of overfitting.

본 발명의 실시예에 따르면 딥러닝 네트워크 학습시 발생하는 과적합 문제를 해결할 수 있을 것이다.According to an embodiment of the present invention, it will be possible to solve the overfitting problem that occurs during deep learning network training.

또한, 본 발명의 실시예에 따르면 네트워크 학습시의 동영상 과적합 방지를 위한 정규화에 가까워질 수 있을 것이다.In addition, according to an embodiment of the present invention, it will be possible to approach normalization for video overfit prevention during network learning.

도 1은 본 발명의 실시예에 따른 딥러닝 네트워크 시스템을 나타내는 도면,
도 2는 동영상 분류에서 일어나는 과적합을 방지하는 기술을 설명하기 위한 도면,
도 3은 도 1의 동영상 과적합 정규화장치의 세부구조를 예시한 블록다이어그램, 그리고
도 4는 도 1의 동영상 과적합 정규화장치의 구동과정을 나타내는 흐름도이다.1 is a diagram showing a deep learning network system according to an embodiment of the present invention;
2 is a diagram for explaining a technique for preventing overfitting that occurs in video classification;
3 is a block diagram illustrating the detailed structure of the video overfitting normalization apparatus of FIG. 1, and
4 is a flowchart illustrating a driving process of the video overfitting normalization apparatus of FIG. 1 .

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 딥러닝 네트워크 시스템을 나타내는 도면이며, 도 2는 동영상 분류에서 일어나는 과적합을 방지하는 기술을 설명하기 위한 도면이다.1 is a diagram illustrating a deep learning network system according to an embodiment of the present invention, and FIG. 2 is a diagram for explaining a technique for preventing overfitting that occurs in video classification.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 딥러닝 네트워크 시스템(90)은 사용자 장치(100), 통신망(110) 및 동영상 과적합 정규화 장치(120)의 일부 또는 전부를 포함한다.As shown in FIG. 1 , the deep learning network system 90 according to an embodiment of the present invention includes some or all of the user device 100 , the communication network 110 , and the video overfitting normalization device 120 .

여기서, "일부 또는 전부를 포함한다"는 것은 통신망(110)이나 동영상 과적합 정규화 장치(120)가 생략되어 사용자 장치(100)가 스탠드얼론 형태로 동작하여 본 발명의 실시예에 따른 동영상 과적합 정규화 동작을 수행할 수 있으며, 동영상 과적합 정규화 장치(120)를 구성하는 구성요소의 일부 또는 전부가 통신망(110)을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that the communication network 110 or the video overfitting normalization device 120 is omitted, so that the user device 100 operates in a standalone form to overfit the video according to the embodiment of the present invention. A normalization operation may be performed, and some or all of the components constituting the video overfit normalization device 120 may be integrated into a network device (eg, a wireless switching device, etc.) constituting the communication network 110 It is described as including all things to help a sufficient understanding of the invention as meaning.

사용자 장치(100)는 산업현장의 다양한 곳에 설치되어 촬영 동작을 수행하는 카메라 등의 촬영장치를 포함할 수 있다. 예를 들어, 자율주행차, 스마트공장, 지능형 CCTV, 지능형범죄분석, AI 기반 질환예측 등을 위해 다양한 장소에 설치되어 동작할 수 있다. 촬영장치는 이외에도 아이들이 활동하는 공간, 가령 어린이집이나 유치원 등의 임의 공간에 설치될 수 있다. 물론 촬영장치는 가정 내에 설치되어 영아와 보모를 관찰하기 위해 설치될 수도 있다. 촬영장치는 감시카메라로서 일반 CCTV(Closed Circuit Television) 카메라나 IP(Internet Protocol) 카메라 등을 포함한다. 또한, 촬영장치(100)는 고정식 카메라뿐 아니라 팬(Pan), 틸트(Tilt) 및 줌(Zoom) 동작이 가능한 PTZ(Pan-Tilt-Zoom) 카메라를 포함할 수 있다. 물론 촬영장치(100)는 스마트폰이나 태블릿PC 등의 사용자 단말장치에 탑재되는 카메라를 포함할 수도 있으므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The user device 100 may include a photographing device, such as a camera, which is installed in various places in an industrial site and performs a photographing operation. For example, it can be installed and operated in various places for self-driving cars, smart factories, intelligent CCTV, intelligent crime analysis, and AI-based disease prediction. In addition, the photographing device may be installed in a space where children are active, for example, in an arbitrary space such as a daycare center or kindergarten. Of course, the imaging device may be installed in the home to observe infants and nanny. The photographing device includes a general CCTV (Closed Circuit Television) camera or an IP (Internet Protocol) camera as a surveillance camera. In addition, the photographing apparatus 100 may include a PTZ (Pan-Tilt-Zoom) camera capable of pan, tilt, and zoom operations as well as a fixed camera. Of course, since the photographing apparatus 100 may include a camera mounted on a user terminal device such as a smart phone or a tablet PC, the embodiment of the present invention will not be particularly limited to any one form.

산업현장에 설치되는 촬영장치는 에지장치에 연동할 수도 있다. 다시 말해, 4차산업혁명 시대에 접어들면서 통신망(110) 등의 네트워크장치나 동영상 과적합 정규화장치(120)에서 처리해야 하는 데이터량이 폭증하면서 연산처리에 많은 부담을 느낄 수 있다. 이를 위하여 촬영장치의 주변에 에지장치를 설치하여 1차적으로 영상분석 동작을 수행하도록 함으로써 데이터 처리량을 급격히 줄일 수 있다. 에지장치는 동영상 과적합 정규화장치(120)와 연계하여 다양한 동작을 수행할 수 있으며, 예를 들어 분석을 위한 설정 데이터(예: 관찰 객체, 과적합 방지를 위한 하이퍼파라미터의 설정 등)를 변경함으로써 그에 상응하는 분석 결과를 동영상 과적합 정규화장치(120)로 제공하도록 할 수 있다. 또는 에지장치는 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)로서 동작할 수도 있을 것이다.A photographing device installed at an industrial site may be linked to an edge device. In other words, as the 4th industrial revolution era enters, the amount of data to be processed by a network device such as the communication network 110 or the video overfit normalization device 120 increases, so that a lot of burden on arithmetic processing can be felt. To this end, by installing an edge device around the imaging device to primarily perform an image analysis operation, the amount of data processing can be drastically reduced. The edge device may perform various operations in connection with the video overfitting normalizer 120, for example, by changing the setting data for analysis (eg, observation object, setting of hyperparameters for preventing overfitting, etc.) A corresponding analysis result may be provided to the video overfitting normalization apparatus 120 . Alternatively, the edge device may operate as the video overfitting normalization device 120 according to an embodiment of the present invention.

나아가, 사용자 장치(100)는 스마트폰이나 태블릿PC, 랩탑데컴퓨터, 데스크탑컴퓨터, 손목에 착용하는 웨어러블장치, 또 동영상과 같은 촬영영상을 저장하는 USB 등의 저장매체 등 다양한 장치를 더 포함할 수 있다. 엄격하게는 촬영장치를 탑재하는 사용자 단말장치 또는 영상표시장치가 바람직할 수 있다. 이와 같이 본 발명의 실시예에 따른 사용자 장치(100)는 다양한 장치를 포함한다.Furthermore, the user device 100 may further include various devices such as a smart phone or tablet PC, a laptop computer, a desktop computer, a wearable device worn on the wrist, and a storage medium such as a USB for storing a captured image such as a video. have. Strictly, a user terminal device or an image display device equipped with a photographing device may be preferable. As described above, the user device 100 according to an embodiment of the present invention includes various devices.

통신망(110)은 유무선 통신망을 모두 포함한다. 가령 통신망(110)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서, 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(110)은 이에 한정되는 것이 아니며, 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G망 등에 사용될 수 있다. 가령, 통신망(110)이 유선 통신망인 경우 통신망(110) 내의 액세스포인트는 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS Support Node)에 접속하여 데이터를 처리하거나, BTS(Base Transmissive Station), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 110 includes both wired and wireless communication networks. For example, as the communication network 110 , a wired or wireless Internet network may be used or interlocked. Here, the wired network includes an Internet network such as a cable network or a public telephone network (PSTN), and the wireless communication network includes CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), Wibro network, etc. is meant to include Of course, the communication network 110 according to the embodiment of the present invention is not limited thereto, and may be used, for example, in a cloud computing network under a cloud computing environment, a 5G network, and the like. For example, when the communication network 110 is a wired communication network, the access point in the communication network 110 can connect to a switching center of a telephone company, etc., but in the case of a wireless communication network, it accesses the SGSN or GGSN (Gateway GPRS Support Node) operated by the communication company to transmit data. or by accessing various repeaters such as a Base Transmissive Station (BTS), NodeB, and e-NodeB to process data.

통신망(110)은 액세스포인트(AP)를 포함할 수 있다. 여기서의 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 펨토 또는 피코 기지국은 소형 기지국의 분류상 사용자 장치(100) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 사용자 장치(100) 등과 지그비 및 와이파이 등의 근거리 통신을 수행하기 위한 근거리 통신모듈을 포함할 수 있다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선, UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 동영상 과적합 정규화장치(120) 등으로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함된다.The communication network 110 may include an access point (AP). Here, the access point includes a small base station, such as a femto or pico base station, which is often installed in a building. Femto or pico base stations are classified according to the maximum number of user equipment 100, etc. that can be accessed in the classification of small base stations. Of course, the access point may include a short-distance communication module for performing short-distance communication such as Zigbee and Wi-Fi with the user device 100 . The access point may use TCP/IP or Real-Time Streaming Protocol (RTSP) for wireless communication. Here, short-distance communication may be performed in various standards such as Bluetooth, Zigbee, infrared, radio frequency (RF) such as ultra high frequency (UHF) and very high frequency (VHF), and ultra-wideband communication (UWB) in addition to Wi-Fi. Accordingly, the access point extracts the location of the data packet, designates the best communication path for the extracted location, and passes the data packet along the designated communication path to the next device, such as the video overfit normalizer 120, etc. . The access point may share several lines in a general network environment, and includes, for example, a router, a repeater, and a repeater.

동영상 과적합 정규화장치(120)는 가령 딥러닝 네트워크 학습시 동영상의 분류 과정에서 일어나는 과적합 문제를 개선한다. 여기서, "과적합"이란 기계 학습(machine learning)에서 학습 데이터를 과하게 학습(overfitting)하는 것을 뜻한다. 일반적으로 학습 데이터는 실제 데이터의 부분 집합이므로 학습데이터에 대해서는 오차가 감소하지만 실제 데이터에 대해서는 오차가 증가하게 된다. 즉 (프로그램) 모델이 학습데이터에만 지나치게 적응해서 일반화 성능이 떨어지는 경우를 말한다. 따라서, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 이러한 문제를 해결하기 위해 더 정확하게는 정규화 혹은 일반화 동작을 수행한다고 볼 수 있다. 다시 말해, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 통신망(110)을 경유하여 사용자 장치(100)로부터 동영상이 제공되면, 해당 동영상에 대한 딥러닝 동작을 수행할 수 있는데, 무엇보다 그러한 딥러닝 동작을 수행하기에 앞서 일종의 전처리 동작으로서 과적합을 방지하기 위한 정규화 동작을 수행할 수 있다.The video overfitting normalization apparatus 120 improves overfitting problems that occur in the video classification process, for example, when learning a deep learning network. Here, "overfitting" means overfitting learning data in machine learning. In general, since the training data is a subset of the actual data, the error decreases with respect to the training data, but the error increases with respect to the actual data. That is, the (program) model is over-adapted only to the training data and the generalization performance is poor. Therefore, it can be seen that the video overfitting normalization apparatus 120 according to an embodiment of the present invention performs normalization or generalization more precisely to solve this problem. In other words, when a video is provided from the user device 100 via the communication network 110, the video overfit normalization apparatus 120 according to an embodiment of the present invention may perform a deep learning operation on the video, Above all, before performing such a deep learning operation, a regularization operation to prevent overfitting may be performed as a kind of preprocessing operation.

도 2에서 볼 수 있는 바와 같이, 동영상 과적합 정규화장치(120)는 동영상 분류에서 일어나는 과적합을 방지하기 위해 학습 단계(도 2의 (b))에서 확률적으로 임의 비디오 프레임(200, 201)의 전체를 의미 없는 값, 예를 들어 (0, gray) 등으로 드롭시킨다. 물론 도 2의 (b)는 데이터를 학습한다는 의미에서 상징적으로 멀티플렉서의 기호를 사용하여 나타내었다. 다시 말해, 하나의 즉 단위 비디오 프레임의 전체의 화소값을 의미없는 값으로 변환한다고 볼 수 있다. 이러한 변환을 통해 단위 비디오 프레임의 화소값은 모두 동일한 값을 가질 수 있다. 예를 들어, 0으로 변환하거나, 그레이(gray)를 127의 값으로 통일시키거나, 나아가서는 단위 비디오 프레임의 적(R), 녹(G), 청(B)의 화소값들에 대한 평균값이나 휘도값 등으로 표현하는 것도 얼마든지 가능할 수 있다. 따라서, 가령 이러한 비디오 프레임의 화소값을 확인하여 학습시 해당 비디오 프레임을 학습에 이용하지 않을 수 있다.As can be seen in FIG. 2 , the video overfitting normalizer 120 probabilistically randomizes video frames 200 and 201 in the learning step ((b) of FIG. 2) to prevent overfitting that occurs in video classification. Drop the whole of to a meaningless value, such as (0, gray). Of course, (b) of FIG. 2 is shown using the symbol of the multiplexer symbolically in the sense of learning data. In other words, it can be seen that one, that is, the entire pixel value of a unit video frame is converted into a meaningless value. Through this transformation, all pixel values of a unit video frame may have the same value. For example, converting to 0, unifying gray to a value of 127, or furthermore, the average value of red (R), green (G), and blue (B) pixel values of a unit video frame It can also be expressed as a luminance value or the like. Therefore, for example, when learning by checking the pixel values of such video frames, the corresponding video frame may not be used for learning.

좀더 구체적으로, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 수신된 비디오 프레임에서 지정된 수의 비디오 프레임을 단위로 과적합 판단 동작을 수행할 수 있다. 여기서, 확률에 따라 의미없는 값으로 변환되는 비디오 프레임의 사이에서 의미없는 값으로 변환되지 않는 즉 살아있는 비디오 프레임을 n이라 할 때, n은 하나의 비디오 시퀀스에서 최소한 존재해야 할 프레임의 수라고 볼 수 있다. 예를 들어, 지정된 수인 비디오 프레임의 시퀀스의 길이가 7이고, n은 2일 때 최소 2장의 비디오 프레임은 드롭되지 않고 RGB값을 유지해야 하는 것이다. 다시 말해, 드롭되지 않고 살아있는 비디오 프레임의 수(P_live)는 2가 되어야 한다.More specifically, the moving image overfitting normalization apparatus 120 according to an embodiment of the present invention may perform an overfitting determination operation in units of a specified number of video frames from the received video frames. Here, among video frames that are converted to meaningless values according to probability, when n is a live video frame that is not converted to a meaningless value, n can be regarded as the number of frames that should exist at least in one video sequence. have. For example, when the length of a sequence of a specified number of video frames is 7 and n is 2, at least two video frames are not dropped and RGB values should be maintained. In other words, the number of live video frames (P_live) that are not dropped should be 2.

따라서, 본 발명의 실시예에서는 동영상 과적합 정규화장치(120)의 동작을 좀더 설명하기에 앞서 하이퍼파라미터를 먼저 정의해볼 필요가 있다. 하이퍼파라미터는 통상의 파라미터와 구분된다. 하이퍼파라미터는 (프로그램 등을) 모델링할 때 사용자가 직접 설정해 주는 값을 의미한다. 사용자 인터페이스 등을 통해 자유롭게 변경이 가능할 수 있다. 따라서 정해진 최적의 값이 없다. 반면 파라미터는 우리말로 '매개변수'라 칭하며 하이퍼파라미터와 의미가 구별된다. 파라미터는 모델 내부에서 결정되는 변수다. 그 값은 데이터에 의해 결정된다. 따라서, 하이퍼파라미터와 파라미터는 사용자가 설정하느냐 설정하지 않느냐에 의해 구별된다고 볼 수 있다.Therefore, in the embodiment of the present invention, it is necessary to first define hyperparameters before further explaining the operation of the video overfitting normalization apparatus 120 . Hyperparameters are distinguished from normal parameters. Hyperparameter refers to a value directly set by the user when modeling (program, etc.). Changes may be made freely through a user interface or the like. Therefore, there is no set optimal value. On the other hand, parameters are called 'parameters' in Korean and have a different meaning from hyperparameters. A parameter is a variable determined within the model. Its value is determined by the data. Therefore, it can be seen that hyperparameters and parameters are distinguished by whether the user sets them or not.

이에 본 발명의 실시예에 따른 동작을 위해 조절해야 할 하이퍼파라미터들이 존재한다. p는 매 프레임마다 드롭될 확률을 의미한다. n은 하나의 비디오 시퀀스에서 최소한 존재해야 할 프레임의 수이다. n은 위에서 설명한 바 있다. pad_value는 드롭된 영역에 대해 채울 값을 의미한다. (0, 0, 0)과 같이 RGB의 화소값을 0으로 채우거나, (127, 127, 127)과 같이 RGB의 값을 특정 그레이(gray)의 값으로 채우거나, 평균값(mean value)의 형태로 채울 수도 있다. 평균값은 전체 학습 데이터의 R, G, B 평균값을 미리 계산 후 pad 값(예: r_mean, g_mean, b_mean)으로 채우는 것이다. 특정 비디오 프레임의 드롭을 나타내기 위해 가령 화소값을 다양한 형태로 표현할 수 있으므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 예를 들어, 특정 프레임을 태깅하는 방식으로 수행하는 것도 얼마든지 가능할 수 있다. Accordingly, there are hyperparameters to be adjusted for the operation according to the embodiment of the present invention. p means the probability of being dropped every frame. n is the number of frames that should at least exist in one video sequence. n has been described above. pad_value means the value to be filled for the dropped area. As in (0, 0, 0), RGB pixel values are filled with 0, or RGB values are filled with specific gray values as in (127, 127, 127), or in the form of a mean value. can also be filled with The average value is to fill in the pad values (eg, r_mean, g_mean, b_mean) after calculating the R, G, and B average values of the entire training data in advance. In order to represent the drop of a specific video frame, for example, since pixel values can be expressed in various forms, the embodiment of the present invention will not be limited to any one form. For example, it may be possible to perform by tagging a specific frame.

또한, N은 비디오 프레임의 길이(length)라 정의할 수 있다. 예를 들어, 도 2의 (a)에서 볼 때 N은 7이 되며, n은 2가 되는 것이다. 2개의 비디오 프레임마다 의미없는 값으로 변환되는 하나의 비디오 프레임이 존재하는 것을 확인할 수 있다. 해당 의미없는 값으로 변환되는 비디오 프레임(200, 201)은 각각 모든 화소의 화소값이 pad_value 값을 변환되어 연산이 이루어지거나 연산시 건너뛸 수 있다. 또한 P_d는 프레임 드롭 확률을 의미한다. P_live는 전체 프레임 길이에서 살아 있어야 하는 최소 비디오 퍼센티지(%)를 의미할 수 있다.Also, N may be defined as the length of a video frame. For example, as seen in (a) of FIG. 2 , N becomes 7, and n becomes 2. It can be seen that there is one video frame converted into a meaningless value for every two video frames. In the video frames 200 and 201 that are converted to corresponding meaningless values, pixel values of all pixels are converted to pad_value values, respectively, and an operation may be performed or may be skipped during operation. Also, P_d means a frame drop probability. P_live may mean a minimum video percentage (%) that must be alive in the entire frame length.

따라서, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 비디오 프레임의 드롭을 위해 다음과 같은 동작을 수행할 수 있다. p의 확률로 0이 되는 드롭프레임 마스크를 생성한다. 물론 이러한 마스크는 기생성되어 사용될 수 있다. 마스크란 임의 비디오 프레임의 크기(예: 20 × 20 픽셀)와 동일한 크기로 생성되어 픽셀을 중첩시킨 후 서로 대응하는 픽셀의 화소값을 서로 연산하여 pad_value로 변환시킬 수 있다. 예를 들어, 마스크는 P_d가 50%일 때 df_mask = [0, 0, 1, 1, 0, 0, 1, 1, 0, 0]의 형태를 가질 수 있다. Accordingly, the video overfitting normalization apparatus 120 according to an embodiment of the present invention may perform the following operation to drop a video frame. Creates a drop frame mask with a probability of p being 0. Of course, such a mask may be generated and used. A mask is created with the same size as the size of an arbitrary video frame (eg, 20 × 20 pixels), and after overlapping pixels, the pixel values of the corresponding pixels are calculated and converted into pad_value. For example, the mask may have the form df_mask = [0, 0, 1, 1, 0, 0, 1, 1, 0, 0] when P_d is 50%.

그리고, 다음의 조건을 만족하지 않으면 해당 비디오 프레임을 패스(pass) 즉 통과될 수 있다.And, if the following condition is not satisfied, the video frame may be passed, that is, passed.

만약, sum(df_mask) < N × p_live이면,If sum(df_mask) < N × p_live,

1) Random_index = df_mask의 값이 0인 index 중 랜덤하게 N × p_live - sum(df_mask)1) Random_index = N × p_live - sum(df_mask) randomly among the indices where the value of df_mask is 0

2) Df_mask[random_index] = 1 (random_index = 0, df_mask = [1, 0, 1, 1, 0, 0, 1, 1, 0, 0]2) Df_mask[random_index] = 1 (random_index = 0, df_mask = [1, 0, 1, 1, 0, 0, 1, 1, 0, 0]

이에 따라, df_mask가 0인 위치의 프레임의 RGB값을 pad_val로 대체한다.Accordingly, the RGB value of the frame at the position where df_mask is 0 is replaced with pad_val.

위의 조건에서 sum(df_mask) < N × p_live는 최소 드롭되지 않아야 할 프레임의 비중이 p_live이며, N은 프레임 수이므로, N × p_live는 최소 드롭되지 않아야 할 프레임 수이다. sum(df_mask)는 드롭되지 않은 프레임 수이므로 이를 정리하면, p_d로부터 랜덤하게 드롭 마스크를 생성했으나, 최소 드롭되지 않아야 할 프레임 수를 넘어서 드롭하게 되었다면으로 정리할 수 있다. 만약 그렇다면 이미 드롭으로 명기된 부분(예: 마스크에서 0) 중에 최소 드롭되지 않아야 할 수만큼 다시 복원(예: 마스크에서 0 → 1)하는 과정이다.In the above condition, sum(df_mask) < N × p_live indicates that the proportion of the minimum number of frames not to be dropped is p_live, and N is the number of frames, so N × p_live is the minimum number of frames that should not be dropped. Since sum(df_mask) is the number of frames that have not been dropped, it can be summarized as if a drop mask was randomly generated from p_d, but dropped beyond the minimum number of frames that should not be dropped. If so, it is the process of restoring the number of parts that have already been specified as drops (eg 0 in the mask) that should not be dropped again (eg 0 → 1 in the mask).

따라서, 위의 1)은 마스크에서 0으로 설정(set)된 프레임 즉 드롭시키기로 결정했던 프레임 중 최소 살릴 프레임 개수만큼에 맞춰서 랜덤하게 프레임들을 설정하는 것을 의미하며, 위의 2)는 그 랜덤한 프레임들을 다시 살림으로써(예: 마스크에서 0 → 1로 변환) 드롭되지 않아야 할 최소 개수의 프레임만큼 복원하는 것을 의미한다.Therefore, 1) above means randomly setting frames according to the frame set to 0 in the mask, that is, according to the minimum number of frames to be saved among the frames decided to be dropped, and 2) above is the random frame It means restoring the minimum number of frames that should not be dropped by bringing them back to life (eg, converting 0 to 1 in the mask).

나아가, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 상기의 가정하에 과적합이 일어나기 쉬운 환경에서는 비교적 높은 확률(예: p_d : 0.5~0.8)로 드롭 (비디오)프레임을 실행시키고, 과적합이 일어나기 비교적 어려운 환경에서는 비교적 낮은 확률(예: p_d : 0.2~0.5)로 드롭 프레임을 실행시킨다. 이를 위하여 동영상 과적합 정규화장치(120)는 수신한 비디오 프레임 중 지정된 수의 비디오 프레임에 대하여 과적합 상태를 판단할 수 있다. 예를 들어 해당 비디오 프레임의 화소 데이터가 수신될 때, 비디오 프레임의 속성 정보 등이 함께 수신되므로, 이를 확인하여 과적합이 일어나기 쉬운 환경인지를 판단할 수도 있다.Furthermore, the video overfitting normalization apparatus 120 according to an embodiment of the present invention executes a drop (video) frame with a relatively high probability (eg, p_d: 0.5 to 0.8) in an environment where overfitting is easy to occur under the above assumption, and , in an environment where overfitting is relatively difficult to occur, drop frames are executed with a relatively low probability (eg p_d: 0.2~0.5). To this end, the video overfitting normalization apparatus 120 may determine an overfitting state with respect to a specified number of video frames among the received video frames. For example, when pixel data of a corresponding video frame is received, property information of the video frame and the like are also received, so it is possible to determine whether an environment in which overfitting is likely to occur by checking this.

또한, 동영상 과적합 정규화장치(120)는 비디오의 특성상 연속된 동작으로부터 그 부류(class)를 파악해야 하는 문제에선 비교적 많은 프레임을 살려야 하고(예: p_live : 0.5~0.8), 단편적인 프레임 정보로도 그 부류를 파악하기 쉬운 문제에선 비교적 많은 프레임을 살릴 필요가 없다(예: p_live : 0.2~0.5). 대표적으로, 가령 수 프레임에 걸쳐 비슷한 장면이 반복되는 경우에는 비교적 많은 프레임을 살릴 필요가 없지만, 장면이 새로운 장면으로 빠르게 변화하는 경우에는 비교적 많은 프레임을 살릴 필요가 있다.In addition, the video overfitting normalizer 120 has to save a relatively large number of frames (eg, p_live: 0.5 to 0.8) in the problem of identifying the class from continuous motion due to the nature of the video, and use fragmentary frame information. It is not necessary to save a relatively large number of frames (eg p_live: 0.2~0.5) for a problem that is easy to understand. Typically, when a similar scene is repeated over several frames, it is not necessary to save a relatively large number of frames, but when the scene rapidly changes to a new scene, it is necessary to save a relatively large number of frames.

이러한 과정에서 동영상 과적합 정규화장치(120)는 p_d값을 다이나믹하게 조절하는 것도 얼마든지 가능하다. 가령 수신된 비디오 프레임의 환경에 따라 적응적으로 p-d값을 조절할 수 있다. 적응적이란 특정 환경에 맞게 값이 변화하는 것을 의미한다. 다이나믹 p_d 조절과 관련해 보면, 본 발명의 실시예에 다른 동영상 과적합 정규화장치(120)는 다이나믹하게 p_d를 조절하기 위해선 학습 진행 중 과적합 정도를 파악할 수 있으며, 과적합 정도는 학습 손실값(train loss)과 검증 손실값(validation loss)의 비율 차이 정도로 알 수 있다. 아래의 수학식과 같다. 이에 근거해 볼 때, 동영상 과적합 정규화장치(120)는 비디오 프레임의 화소값 등을 분석하여 학습 손실값과 검증 손실값을 계산할 수 있다.In this process, the video overfitting normalization apparatus 120 may dynamically adjust the p_d value. For example, the p-d value may be adaptively adjusted according to the environment of the received video frame. Adaptive means that a value changes according to a specific environment. In relation to dynamic p_d adjustment, the video overfitting normalization apparatus 120 according to an embodiment of the present invention can determine the degree of overfitting during learning in order to dynamically adjust p_d, and the degree of overfitting is a learning loss value (train). loss) and the difference in the ratio between the validation loss (validation loss). It is the same as the formula below. Based on this, the video overfitting normalization apparatus 120 may analyze a pixel value of a video frame and calculate a learning loss value and a verification loss value.

과적합(Overfitting) = 검증 손실값(val_loss[t])/ 학습 손실값( train_loss[t])Overfitting = validation loss (val_loss[t]) / training loss ( train_loss[t])

학습이 진행되면서 과적합 값이 커짐에 따라(overfitting > threshold, 예를 들어 threshold = 3) 동영상 과적합 정규화장치(120)는 p_d를 증가시킨다(예를 들어, p_d = 0.4 if overfitting < 3., else p_d = 0.6).As the overfitting value increases as learning proceeds (overfitting > threshold, for example, threshold = 3), the video overfitting regularizer 120 increases p_d (eg, p_d = 0.4 if overfitting < 3., else p_d = 0.6).

예를 들어 위의 학습 손실이나 검증 손실을 판단하기 위하여는 가령 사전에 학습이나 검증을 위한 가령 학습 또는 검증 데이터 셋(set)을 기저장하고, 이와의 데이터 비교를 통해 손실값(혹은 손실률)을 계산할 수 있을 것이다.For example, in order to determine the above learning loss or verification loss, for example, for example, a learning or verification data set for learning or verification is stored in advance, and the loss value (or loss rate) is calculated by comparing the data. you will be able to calculate

손실값이란 네트워크를 개발자가 원하는 목표(objective)를 달성할 수 있도록 설정한 손실함수를 통과한 결과값이다. 딥러닝 네트워크를 학습할 때 데이터를 보통 학습 데이터와 검증 데이터로 나누게 되는데, 이 이유는 딥러닝 네트워크는 학습 데이터에 과적합되기 쉬우므로 학습 데이터에 대한 손실값은 거의 무제한적으로 낮춰질 수 있지만 학습 때 사용되지 못한 실제 테스트 데이터에 대해선 오히려 발산하기 쉽다.The loss value is the result of passing the loss function that sets the network so that the developer can achieve the desired objective. When training a deep learning network, data is usually divided into training data and validation data. The reason is that deep learning networks are prone to overfitting the training data, so the loss value for the training data can be reduced almost indefinitely, but It is rather easy to diverge about actual test data that has not been used.

따라서 학습이 제대로 진행되고 있는지를 모니터링 하기 위해 학습 손실값과 검증 손실값을 동시에 확인하는 것이 필요하다. 또한 학습 손실값이 매우 낮아진 것에 비해 검증 손실값이 낮아지지 않았다면, 과적합이 일어났다고 볼 수 있다. 과적합이 커질 경우 이에 대한 규제를 강하게 걸기 위해 p_d를 높이는 동작을 시킬 수 있다.Therefore, it is necessary to check the learning loss value and the validation loss value at the same time to monitor whether the learning is progressing properly. In addition, if the validation loss does not decrease compared to the very low learning loss, overfitting can be considered. If the overfitting becomes large, an operation to increase p_d may be performed in order to strongly regulate the overfitting.

상기한 바와 같이, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 가령 초당 60장의 비디오 프레임이 수신될 때, 7개나 10개 단위로 비디오 프레임의 특성 등을 분석할 수 있고, 이때 분석 결과를 근거로 결정된 확률값을 근거로 살아 있어야 하는, 다시 말해 의미없는 값으로 변환하지 않는 최소 프레임의 퍼센티지를 결정할 수 있다. 이에 따라 임의의 프레임에 대하여 프레임 드롭 동작을 수행한다. 프레임 드롭을 통해 해당 프레임의 화소값을 의미없는 값으로 통일시킨다고 볼 수 있다. 이후 과적합 방지 동작이 이루어지는 7개나 10개의 비디오 프레임은 딥러닝 네트워크 가령 딥러닝을 위한 S/W 모듈로 데이터가 제공될 수 있다. 물론 여기서의 딥러닝을 위한 SW 모듈은 동영상 과적합 정규화장치(120)에 내에 구성될 수 있지만, 별도의 장치로 구성될 수도 있으므로, 어느 하나의 형태에 특별히 한정하지는 않을 것이다.As described above, the video overfit normalization apparatus 120 according to the embodiment of the present invention may analyze the characteristics of the video frame in units of 7 or 10, for example, when 60 video frames per second are received, and at this time Based on the probability value determined based on the analysis result, it is possible to determine the percentage of the minimum frame that is alive, that is, does not convert to a meaningless value. Accordingly, a frame drop operation is performed on an arbitrary frame. It can be seen that the frame drop unifies the pixel values of the corresponding frame into meaningless values. After that, 7 or 10 video frames in which the overfitting prevention operation is performed may be provided with data to a deep learning network, such as a S/W module for deep learning. Of course, the SW module for deep learning here may be configured in the video overfitting normalization device 120, but may be configured as a separate device, so it will not be particularly limited to any one form.

도 3은 도 1의 동영상 과적합 정규화장치의 세부구조를 예시한 블록다이어그램이다.3 is a block diagram illustrating the detailed structure of the video overfitting normalization apparatus of FIG. 1 .

도 3에 도시된 바와 같이, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 통신 인터페이스부(300), 제어부(310), 프레임 드롭처리부(320) 및 저장부(330)의 일부 또는 전부를 포함한다.As shown in FIG. 3 , the video overfit normalization apparatus 120 according to an embodiment of the present invention includes a communication interface unit 300 , a control unit 310 , a frame drop processing unit 320 , and a portion of the storage unit 330 . or all inclusive.

여기서, "일부 또는 전부를 포함한다"는 것은 저장부(330)와 같은 일부 구성요소가 생략되어 동영상 과적합 정규화장치(120)가 구성되거나, 프레임 드롭처리부(320)와 같은 일부 구성요소가 제어부(310)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, “including some or all” means that some components such as the storage unit 330 are omitted to configure the video overfit normalization device 120, or some components such as the frame drop processing unit 320 are controlled As it means that it can be configured by being integrated with other components, such as 310, it will be described as including all in order to help a sufficient understanding of the invention.

통신 인터페이스부(300)는 도 1의 통신망(110)을 경유하여 사용자 장치(100)와 통신한다. 스마트폰이나 태블릿PC 등의 사용자 장치(100)로부터 수신되는 촬영영상이나 카메라에 의해 촬영되어 기저장된 동영상 파일을 수신할 수 있다. 예를 들어, 경찰서 등의 담당자로부터 동영상 파일 분석 요청이 있는 경우 통신 인터페이스부(300)는 해당 동영상 파일을 수신하여 제어부(310)로 전달할 수 있다.The communication interface unit 300 communicates with the user device 100 via the communication network 110 of FIG. 1 . A captured image received from the user device 100 such as a smartphone or tablet PC or a video file captured by a camera and stored in advance may be received. For example, when there is a request for analyzing a video file from a person in charge of a police station, the communication interface unit 300 may receive the video file and transmit it to the controller 310 .

통신 인터페이스부(300)는 사용자 장치(100)와 통신을 수행하는 과정에서, 변/복조, 먹싱/디먹싱, 인코딩/디코딩, 해상도를 변환하는 스케일링 등의 다양한 동작을 수행할 수 있으며, 이는 당업자에게 자명하므로 더 이상의 설명은 생략하도록 한다.The communication interface unit 300 may perform various operations, such as modulation/demodulation, muxing/demuxing, encoding/decoding, and scaling for converting resolution, in the process of performing communication with the user device 100 , which are those skilled in the art. Since it is self-explanatory, further explanation will be omitted.

제어부(310)는 도 1의 동영상 과적합 정규화장치(120)를 구성하는 도 3의 통신 인터페이스부(300), 프레임 드롭처리부(320) 및 저장부(330)의 전반적인 제어 동작을 담당한다. 대표적으로, 본 발명의 실시예에 따라 동영상의 과적합을 방지하기 위하여 제어부(310)는 프레임 드롭처리부(320)를 제어하여 내부에 탑재된 프로그램을 실행시킬 수 있다. 물론 이의 과정에서 제어부(310)는 프레임 드롭처리부(320)를 선택적으로 동작시킬 수도 있다. 다시 말해, 통신 인터페이스부(300)를 통해 동영상을 수신한 경우, 해당 동영상과 함께 동영상의 부가정보 등의 메타데이터를 함께 수신할 수 있으므로, 이를 통해 과적합이 일어나기 쉬운 환경인지 등을 판단하여 프레임 드롭처리부(320)를 제어할 수 있다. 물론 이러한 판단 동작의 경우 프레임 드롭처리부(320)에서 프레임 드롭 처리를 수행하기 전에 초기에 수행하는 것이 바람직하므로, 제어부(310)의 위의 동작에 특별히 한정하지는 않을 것이다.The control unit 310 is responsible for overall control operations of the communication interface unit 300, the frame drop processing unit 320, and the storage unit 330 of FIG. 3 constituting the video overfit normalization apparatus 120 of FIG. 1 . Representatively, in order to prevent overfitting of a moving picture according to an embodiment of the present invention, the control unit 310 may control the frame drop processing unit 320 to execute a program mounted therein. Of course, in this process, the control unit 310 may selectively operate the frame drop processing unit 320 . In other words, when a video is received through the communication interface unit 300 , metadata such as additional information of the video can be received together with the corresponding video. The drop processing unit 320 may be controlled. Of course, since it is preferable to initially perform this determination operation before the frame drop processing unit 320 performs the frame drop processing, the above operation of the control unit 310 will not be particularly limited.

또한, 제어부(310)는 도 1의 사용자 장치(100)의 촬영영상이나 기촬영된 동영상이 수신되는 경우 저장부(330)에 임시 저장한 후 불러내어 프레임 드롭처리부(320)에 제공할 수 있으며, 프레임 드롭처리부(320)에서 영상 처리되어 지정 포맷으로 생성되는 데이터를 저장부(330)에 임시 저장한 후 불러내어 도 1의 DB(120a)에 체계적으로 분류하여 저장시킬 수 있다. 이를 위하여 통신 인터페이스부(300)를 제어할 수 있다.In addition, when the captured image or pre-recorded video of the user device 100 of FIG. 1 is received, the control unit 310 temporarily stores it in the storage unit 330 and then calls it and provides it to the frame drop processing unit 320 , , data generated in a specified format by image processing by the frame drop processing unit 320 may be temporarily stored in the storage unit 330 and then retrieved and systematically classified and stored in the DB 120a of FIG. 1 . For this, the communication interface unit 300 may be controlled.

프레임 드롭처리부(320)는 크게는 영상분석 동작과 인공지능 딥러닝 등의 학습 동작을 수행할 수 있다. 더 정확하게는 프레임 드롭 동작을 수행할 수 있다. 이를 위하여 각각의 동작을 수행하는 SW 모듈을 포함할 수도 있다. 영상분석 동작은 학습 단계 이전에 프레임 드롭 처리를 위한 동작을 수행한다. 이를 위하여 수신된 동영상의 과적합 정도를 판단할 수 있다. 수신된 프레임 중 지정 수의 비디오 프레임마다 과적합 정도를 판단할 수 있으며, 이때 초기의 비디오 프레임의 화소 분석을 통해 과적합 정도를 판단하거나, 해당 비디오 프레임과 관련해 수신된 부가정보나, 자막 등의 정보를 포함하는 메타데이터를 확인하여 과적합 정도를 판단할 수도 있다. 앞서 언급한 대로 프레임 드롭처리부(320)는 프레임의 드롭 처리를 위하여 그 전에 다양한 상태를 판단할 수 있으며, 이러한 상태 판단은 사용자나 관리자, 또는 프로그래머에 의해 사전에 설정될 수 있다.The frame drop processing unit 320 may largely perform a learning operation such as an image analysis operation and artificial intelligence deep learning. More precisely, a frame drop operation can be performed. For this, a SW module that performs each operation may be included. The image analysis operation performs an operation for frame drop processing before the learning phase. To this end, it is possible to determine the degree of overfitting of the received video. The degree of overfitting can be determined for each of a specified number of video frames among the received frames. In this case, the degree of overfitting can be determined through pixel analysis of the initial video frame, or additional information or subtitles received in relation to the video frame can be determined. The degree of overfitting may be determined by checking metadata including information. As mentioned above, the frame drop processing unit 320 may determine various states prior to frame drop processing, and such state determination may be set in advance by a user, an administrator, or a programmer.

다시 말해, 과적합이 일어나기 쉬운 환경만 판단하도록 하는 경우, 비디오의 특성상 연속된 동작으로부터 그 부류를 파악해야 하는 경우, 또는 여러가지 과적합 정도를 판단하여 그때그때 상황에 맞는 판단이 이루어지도록 하여 역동적으로, 다시 말해 상황에 따라 적응적으로 판단하도록 하는 경우가 가능할 수 있다. 이와 같이 판단되는 과적합 정도는 확률값을 선택하고 변경하는 데에 영향을 미친다. 예를 들어, 과적합 정도가 심하면 높은 확률로 프레임 드롭이 이루어지도록 한다. 확률값이 높으면 드롭되는 프레임의 수가 적을 수 있다. 물론 그 반대의 경우도 얼마든지 가능하므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.In other words, if only the environment where overfitting is easy to occur is judged, if the class needs to be identified from continuous motion due to the nature of the video, or if various overfitting degrees are judged to make a decision that is appropriate for each situation, it is dynamically , in other words, it may be possible to make judgments adaptively according to the situation. The degree of overfitting determined in this way affects the selection and change of probability values. For example, if the degree of overfitting is severe, frame drop is made with a high probability. If the probability value is high, the number of dropped frames may be small. Of course, the reverse case is also possible, so the embodiment of the present invention will not be particularly limited to any one form.

프레임 드롭처리부(320)는 예를 들어 초당 수신되는 60장의 비디오 프레임에 대하여 7장이나 10장 이내의 범위에서 즉 지정 수의 비디오 프레임마다 프레임 드롭 처리를 위한 동작을 수행할 수 있다. 이를 위해 해당 비디오 프레임의 과적합 정도를 판단하고, 그 정도에 따라 확률값을 산출하여 산출한 확률값에 따라 임의 비디오 프레임의 드롭 처리를 수행한다. 예를 들어, 산출된 확률값에 따라 7장의 비디오 중 살려야 하는 비디오 프레임의 수를 2장으로 결정할 수 있다. 따라서, 최초의 2장의 비디오 프레임은 살리고, 이어 3번째의 비디오 프레임에 대한 프레임 의 드롭 처리를 수행한다. 다시 말해, 해당 비디오 프레임의 화소값을 모두 0으로 만들거나, 동일한 값으로 변환하는 등의 동작을 수행할 수 있다. 또한, 2장의 비디오 프레임을 살려야 하므로, 이어 4번째와 5번째 비디오 프레임을 다음의 학습 네트워크에서 학습을 위해 사용하도록 살려두고, 다음의 6번째 비디오 프레임을 다시 드롭 처리한다.The frame drop processing unit 320 may, for example, perform an operation for frame drop processing within a range of 7 or 10 video frames received per second, that is, for every specified number of video frames, with respect to 60 video frames received per second. To this end, the degree of overfitting of a corresponding video frame is determined, a probability value is calculated according to the degree of overfitting, and drop processing of an arbitrary video frame is performed according to the calculated probability value. For example, according to the calculated probability value, the number of video frames to be saved among the 7 videos may be determined to be two. Therefore, the first two video frames are saved, and then frame drop processing is performed for the third video frame. In other words, an operation such as making all pixel values of the corresponding video frame to 0 or converting them to the same value may be performed. In addition, since 2 video frames need to be saved, the 4th and 5th video frames are saved to be used for training in the next learning network, and the next 6th video frame is dropped again.

이와 같이 프레임 드롭처리부(320)는 수신된 비디오 프레임 중 시퀀스 길이를 결정하고, 하나의 비디오 시퀀스에서 최소한 존재해야 할 프레임 수를 결정할 수 있으며, 이에 따라 선택되는 임의 비디오 프레임을 드롭 처리한 후 해당 시퀀스의 비디오 프레임들을 학습 네트워크 즉 학습 SW로 제공한다. 임의 비디오 프레임의 드롭 처리를 위하여 다시 말해 화소값을 0으로 만들거나 동일한 값으로 변환하기 위한 드롭 프레임 마스크를 생성하여 이용하는 것을 설명한 바 있다. 마스킹을 한다는 것은 입력된 비디오 프레임상에 마스크에 지정되는 특정 값을, 더 정확하게는 서로 대응되는 화소값과 마스킹값을 연산하여 원하는 값으로 변환시키는 것이다. 이와 같이 드롭 프레임 마스크를 통해 마스킹함으로써 연산 속도를 빠르게 증가시킬 수 있다. 물론 변환 방법은 다양할 수 있으므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 무엇보다 프레임 드롭처리부(320)는 드롭 마스크 생성부(예를 들어, 마스크 생성을 위한 SW 모듈)를 포함할 수 있고, 이를 통해 연산이 이루어지도록 할 수 있을 것이다.In this way, the frame drop processing unit 320 may determine the sequence length among the received video frames and determine the number of frames that should be present at least in one video sequence. of video frames are provided as a learning network, that is, a learning SW. For drop processing of an arbitrary video frame, in other words, it has been described that a drop frame mask is created and used to make a pixel value 0 or convert it to the same value. To perform masking is to convert a specific value assigned to a mask on an input video frame into a desired value by calculating pixel values and masking values corresponding to each other more precisely. In this way, the operation speed can be quickly increased by masking through the drop frame mask. Of course, since the conversion method may be various, the embodiment of the present invention will not be particularly limited to any one form. Above all, the frame drop processing unit 320 may include a drop mask generation unit (eg, a SW module for generating a mask), through which an operation may be performed.

이외에도 프레임 드롭처리부(320)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition, the frame drop processing unit 320 may perform various operations, and since other detailed information has been sufficiently described above, those contents will be replaced.

저장부(330)는 제어부(310)의 제어하에 처리되는 다양한 유형의 데이터를 임시 저장할 수 있다. 예를 들어, 제어부(310)의 제어하에 저장부(330)는 동영상의 촬영영상을 수신하여 저장할 수 있다. 저장부(330)는 롬(ROM), 램(RAM) 및 HDD 등의 다양한 하드웨어 메모리를 포함할 수 있으며, 데이터를 영구 저장하는 것도 얼마든지 가능할 수 있다. 예를 들어, 저장부(330)는 촬영영상을 수신할 때, 메타데이터를 수신할 수 있으며, 메타데이터는 촬영된 영상과 관련한 속성 등의 다양한 정보를 포함할 수 있으며, 이러한 메타데이터가 저장부(330)에 저장될 수 있다.The storage 330 may temporarily store various types of data processed under the control of the controller 310 . For example, under the control of the controller 310 , the storage 330 may receive and store a captured image of a moving picture. The storage unit 330 may include various hardware memories such as ROM, RAM, and HDD, and it may be possible to permanently store data. For example, the storage 330 may receive metadata when receiving a photographed image, and the metadata may include various information such as properties related to the photographed image, and the metadata may be stored in the storage unit. It may be stored in 330 .

상기한 내용 이외에도 도 3의 통신 인터페이스부(300), 제어부(310), 프레임 드롭처리부(320) 및 저장부(330)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the communication interface unit 300, the control unit 310, the frame drop processing unit 320, and the storage unit 330 of FIG. 3 may perform various operations, and other details have been sufficiently described above, so the contents I want to replace them with

본 발명의 실시예에 따른 통신 인터페이스부(300), 프레임 드롭처리부(320) 및 저장부(330)는 서로 물리적으로 분리된 하드웨어 모듈로 구성되지만, 각 모듈은 내부에 상기의 동작을 수행하기 위한 소프트웨어를 저장하고 이를 실행할 수 있을 것이다. 다만, 해당 소프트웨어는 소프트웨어 모듈의 집합이고, 각 모듈은 하드웨어로 형성되는 것이 얼마든지 가능하므로 소프트웨어니 하드웨어니 하는 구성에 특별히 한정하지 않을 것이다. 예를 들어 저장부(330)는 하드웨어인 스토리지(storage) 또는 메모리(memory)일 수 있다. 하지만, 소프트웨어적으로 정보를 저장(repository)하는 것도 얼마든지 가능하므로 위의 내용에 특별히 한정하지는 않을 것이다.The communication interface unit 300, the frame drop processing unit 320, and the storage unit 330 according to an embodiment of the present invention are composed of hardware modules physically separated from each other, but each module has a You will be able to save the software and run it. However, since the software is a set of software modules, and each module can be formed of hardware, it will not be particularly limited to the configuration of software or hardware. For example, the storage unit 330 may be a hardware storage (storage) or a memory (memory). However, since it is possible to store information in software (repository), it will not be particularly limited to the above.

한편, 본 발명의 다른 실시예로서 제어부(310)는 CPU 및 메모리를 포함할 수 있으며, 원칩화하여 형성될 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함하며, 메모리는 램을 포함할 수 있다. 제어회로는 제어동작을, 그리고 연산부는 2진비트 정보의 연산동작을, 그리고 명령어해석부는 인터프리터나 컴파일러 등을 포함하여 고급언어를 기계어로, 또 기계어를 고급언어로 변환하는 동작을 수행할 수 있으며, 레지스트리는 소프트웨어적인 데이터 저장에 관여할 수 있다. 상기의 구성에 따라, 가령 동영상 과적합 정규화장치(120)의 동작 초기에 프레임 드롭처리부(320)에 저장되어 있는 프로그램을 복사하여 메모리 즉 램(RAM)에 로딩한 후 이를 실행시킴으로써 데이터 연산 처리 속도를 빠르게 증가시킬 수 있다.Meanwhile, as another embodiment of the present invention, the control unit 310 may include a CPU and a memory, and may be formed as a single chip. The CPU includes a control circuit, an arithmetic unit (ALU), a command interpreter and a registry, and the memory may include a RAM. The control circuit performs a control operation, the operation unit performs an operation operation of binary bit information, and the instruction interpretation unit converts a high-level language into a machine language and a machine language into a high-level language, including an interpreter or compiler. , the registry may be involved in software data storage. According to the above configuration, for example, at the beginning of the operation of the video overfit normalization apparatus 120, the program stored in the frame drop processing unit 320 is copied, loaded into a memory, that is, RAM, and then executed, thereby speeding up data operation processing. can be increased quickly.

더 나아가, 본 발명의 실시예에서는 설명의 편의를 위해 도 1에서 가령 서버의 형태로 동작하는 장치를 동영상 과적합 정규화장치(120)로 지정하여 설명하였지만, 스탠드얼론 형태로 동작하는 스마트폰이나, 랩탑컴퓨터, 데스크탑컴퓨터, 태블릿PC 등 다양한 유형의 사용자 장치도 동영상 과적합 정규화장치(120)와 같이 동작하는 것도 얼마든지 가능하므로 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 무엇보다 본 발명의 실시예에서는 딥러닝 네트워크 학습을 수행하기 위한 프로그램을 동작시키고, 이를 위하여 사용될 수 있는 것이 더욱 바람직하다.Furthermore, in the embodiment of the present invention, for convenience of explanation, in FIG. 1, for example, a device operating in the form of a server was designated as the video overfitting normalization device 120, but a smartphone operating in a standalone form, Various types of user devices such as a laptop computer, a desktop computer, and a tablet PC may also operate as the video overfit normalization device 120 , so it is not particularly limited to any one type. Above all, in an embodiment of the present invention, a program for performing deep learning network learning is operated, and it is more preferable that it can be used for this purpose.

도 4는 도 1의 동영상 과적합 정규화장치의 구동과정을 나타내는 흐름도이다.4 is a flowchart illustrating a driving process of the video overfitting normalization apparatus of FIG. 1 .

설명의 편의상 도 4를 도 1과 함께 참조하면, 본 발명의 실시예에 따른 동영상 과적합 정규화장치(120)는 촬영영상의 비디오 프레임을 수신한다(S400). 통상적으로 동영상은 초당 60장(혹은 45장, 80장 등)의 비디오 프레임을 순차적으로 수신하지만, 그에 특별히 한정하지는 않을 것이다. 이는 CPU 등의 성능에 따라 결정되는 것이며, 구동 주파수가 60㎐인 경우 초당 60장의 비디오 프레임을 수신한다고 볼 수 있다.Referring to FIG. 4 together with FIG. 1 for convenience of explanation, the moving image overfitting normalization apparatus 120 according to an embodiment of the present invention receives a video frame of a captured image (S400). In general, a moving picture sequentially receives video frames of 60 frames per second (or 45 frames, 80 frames, etc.), but it will not be particularly limited thereto. This is determined according to the performance of the CPU, etc., and if the driving frequency is 60 Hz, it can be seen that 60 video frames are received per second.

또한, 동영상 과적합 정규화장치(120)는 수신한 비디오 프레임 중 지정된 수(예: 7 ~ 10장)마다 비디오 프레임에 대하여 과적합 정도를 판단하고, 판단한 과적합 정도에 따라 조절되는 확률값(p)에 근거해 임의의 비디오 프레임의 화소값을 데이터 연산 처리시 의미없는 값으로 드롭시켜 학습이 이루어지도록 한다(S410).In addition, the video overfitting normalization apparatus 120 determines the degree of overfitting with respect to the video frame for every specified number (eg, 7 to 10) among the received video frames, and a probability value (p) adjusted according to the determined degree of overfitting. Based on this, learning is performed by dropping the pixel value of an arbitrary video frame to a meaningless value during data operation processing (S410).

확률값은 최대값이 1에 해당하므로 1의 범위 내에서 특정 범위를 지정하고 이에 매칭하여 드롭하려는 비디오 프레임의 형태를 사전에 규정할 수 있다. 여기서, 규정한다는 것은 비디오의 시퀀스 길이에서 몇장의 비디오를 다음의 딥러닝을 위해 살리고, 몇장의 비디오를 드롭시켜야 하는지에 관련된다고 볼 수 있다. 물론 이러한 동작은 일정한 패턴을 갖는 것이 바람직하지만 그에 특별히 한정하지는 않을 것이다. 다시 말해, 시퀀스 길이가 7이고, 하나의 시퀀스에서 존재해야 하는 프레임의 수가 2일 때, 입력된 1 및 2번째 비디오 프레임은 살리고, 즉 그대로 학습 네트워크로 제공하고, 3번째 비디오 프레임은 드롭시키며, 4번째 및 5번째 비디오 프레임은 다시 살리며, 6번째 비디오 프레임은 드롭시켜 2-1-2-1의 일정한 패턴을 갖도록 한다. 그러나, 해당 비디오 프레임들이 비디오 특성상 연속된 동작의 부류로 파악되어야 하는 경우에는 3-1-2-1의 일정하지 않는 즉 비대칭적인 패턴을 가질 수도 있으므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 물론, 앞서 언급한 대로 시퀀스의 길이의 경우에도 다양하게 조절될 수 있으므로, 그에 따라 드롭시키고 드롭시키지 않는 프레임의 수가 결정될 수 있을 것이다.Since the maximum value of the probability value corresponds to 1, it is possible to specify a specific range within the range of 1 and prescribe the form of a video frame to be dropped by matching it. Here, it can be seen that defining is related to how many videos should be saved for the next deep learning and how many videos should be dropped in the sequence length of the video. Of course, it is preferable that such an operation has a certain pattern, but it will not be particularly limited thereto. In other words, when the sequence length is 7 and the number of frames that should exist in one sequence is 2, the input 1st and 2nd video frames are saved, that is, they are provided to the learning network as they are, and the 3rd video frame is dropped, The 4th and 5th video frames are revived, and the 6th video frame is dropped to have a constant pattern of 2-1-2-1. However, when the corresponding video frames are to be regarded as a class of continuous motion due to the characteristics of the video, they may have a non-uniform, that is, asymmetric pattern of 3-1-2-1. will not be limited. Of course, as mentioned above, since the length of the sequence may be variously adjusted, the number of frames to be dropped and not dropped may be determined accordingly.

상기한 내용 이외에도 도 1의 동영상 과적합 정규화장치(120)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the video overfitting normalization apparatus 120 of FIG. 1 can perform various operations, and since other details have been sufficiently described above, those contents will be replaced.

한편, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, even though it has been described that all components constituting the embodiment of the present invention are combined or operated in combination, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more. In addition, all of the components may be implemented as one independent hardware, but a part or all of each component is selectively combined to perform some or all of the combined functions in one or a plurality of hardware program modules It may be implemented as a computer program having Codes and code segments constituting the computer program can be easily inferred by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media, read and executed by the computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium refers to a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, memory, etc. . Specifically, the above-described programs may be provided by being stored in a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In the above, preferred embodiments of the present invention have been illustrated and described, but the present invention is not limited to the specific embodiments described above, and it is common in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. Various modifications may be made by those having the knowledge of, of course, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

100: 사용자 장치 110: 통신망
120: 동영상 과적합 정규화장치 300: 통신 인터페이스부
310: 제어부 320: 프레임 드롭처리부
330: 저장부100: user device 110: communication network
120: video overfitting normalization device 300: communication interface unit
310: control unit 320: frame drop processing unit
330: storage

Claims

a communication interface unit for receiving a video frame of a captured image; and
The degree of overfitting is determined for each specified number of video frames among the received video frames, and pixel values of any video frame are calculated based on the probability value (p) adjusted according to the determined degree of overfitting. Including; a control unit to drop (drop) to the learning is made;
The controller determines the degree of overfitting (val_loss(t)/train_loss(t)) with a difference between a ratio of a training loss value and a validation loss value for the specified number of video frames. Sum regularizer.

According to claim 1,
The controller is configured to determine, respectively, the number of video frames to be maintained for the learning and the video frames to be dropped from the specified number of video frames based on the probability value.

delete

3. The method of claim 2,
The controller is configured to increase the number of video frames dropped from the specified number of video frames as the degree of overfitting increases.

3. The method of claim 2,
The controller is configured to reduce the number of dropped video frames when a class of the specified number of video frames is to be determined through continuous operations due to the characteristics of the video.

According to claim 1,
The controller is configured to change the pixel values of the dropped video frames to meaningless values so that all pixel values have the same value.

7. The method of claim 6,
The controller is configured to perform a calculation operation by masking a previously generated masking video frame on the dropped video frame in order to change a pixel value of the dropped video frame.

receiving, by the communication interface unit, a video frame of a captured image; and
The control unit determines the degree of overfitting for each specified number of video frames among the received video frames, and calculates the pixel value of any video frame based on the probability value (p) adjusted according to the determined degree of overfitting during data operation processing. Including the step of controlling the learning by dropping it to a meaningless value,
determining, by the controller, the number of video frames to be maintained for the learning and the video frames for the drop from the specified number of video frames based on the probability value; and
determining, by the control unit, the degree of overfitting (val_loss(t)/train_loss(t)) as a difference in a ratio between a training loss value and a validation loss value for the specified number of video frames; cast
Driving method of the video overfitting normalization device further comprising.