KR20220056599A

KR20220056599A - Learning-data generation method and apparatus for improving cctv image quality

Info

Publication number: KR20220056599A
Application number: KR1020200141322A
Authority: KR
Inventors: 신은영; 김경화; 원인수
Original assignee: 주식회사 케이티
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2022-05-06

Abstract

The present invention relates to a method for a computing device operated by at least one processor to generate learning data. The method comprises the following steps of: collecting low-definition and high-definition images photographed in the same photography condition; extracting frames with respect to each of the low-definition and high-definition images, and calculating photographing time information for each extracted frame; calculating similarities with high-definition frames for each low-definition frame, and matching a low-definition frame and a high-definition frame having the highest similarity to generate a plurality of pieces of candidate learning data; and extracting a difference value of photographing time information between a low-definition frame and a high-definition frame for each piece of candidate learning data, and generating candidate learning data having a difference value with the highest frequency among difference values, as final learning data. Therefore, the present invention is capable of generating learning data reflecting the characteristics of actual images.

Description

Learning data generation method and device for image quality improvement

본 발명은 화질 개선 모델을 위한 학습데이터 생성 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for generating training data for an image quality improvement model.

다양한 영상 콘텐츠 서비스가 발전하면서 영상의 화질 개선에 대한 요구가 증대되었으며, 특히, 최근 딥러닝이 대두 되면서 고가의 장비 없이 영상의 화질을 개선하기 위한 기술들이 활발히 연구되고 있다. With the development of various image content services, the demand for image quality improvement has increased. In particular, as deep learning has recently emerged, technologies for improving image quality without expensive equipment are being actively studied.

딥러닝 기반의 시스템 성능을 올리기 위해서는 학습데이터를 잘 구성하여 특성을 반영한 학습을 하는 것이 중요하다. 하지만 화질 개선 기술의 경우 저화질(입력)은 있으나 정답이 되는 고화질이 없으므로 그 특성을 가르치는데 어려움이 있다.In order to improve the performance of a deep learning-based system, it is important to configure the learning data well and to perform learning reflecting the characteristics. However, in the case of image quality improvement technology, there is low image quality (input), but there is no high quality that is the correct answer, so it is difficult to teach its characteristics.

종래의 딥러닝 기반 화질 개선 기술은 상기 문제를 극복하기 위해 저 화질에 대한 정확한 정답이 없으므로, 고화질의 영상을 디지털 변환을 통해 저화질로 변환하여 고화질-저화질의 학습 세트를 만들고 이를 다시 복원하는 방법을 사용한다. 이러한 방법은 규칙적인 변형이 있는 애니메이션/미디어 영상 등에 효과가 있으며 서비스에 적용되어 사용되고 있다.Since the conventional deep learning-based image quality improvement technology does not have an accurate answer for the low image quality to overcome the above problem, a method of converting a high-quality image to a low-quality through digital conversion to create a high-quality-low-quality learning set and restoring it again use. This method is effective for animation/media images with regular deformation, etc., and is applied and used in services.

하지만 CCTV, 블랙박스, 드론, 의료 영상 등 다양한 실생활에서 사용되고 있는 영상에는 조도 변화, 통신 환경 등으로 인한 왜곡 및 열화가 발생하여 일반 콘텐츠와는 다른 특성을 가진다. 이러한 환경의 영상에서는 종래의 화질개선 기술의 효과가 미미하며, 실제 환경에서 사용되는 영상의 고화질로의 화질 개선을 위해서는 실제 영상의 특성을 학습하는 기술이 필요하다. However, images used in various real life such as CCTVs, black boxes, drones, and medical images have characteristics different from general contents due to distortion and deterioration due to changes in illuminance and communication environments. In images in such an environment, the effect of the conventional image quality improvement technology is insignificant, and in order to improve the image quality of an image used in an actual environment to a high quality, a technology for learning the characteristics of an actual image is required.

해결하고자 하는 과제는 유사도에 기초하여 저화질 프레임과 고화질 프레임을 매칭시키고, 매칭된 저화질 프레임과 고화질 프레임간의 촬영 시간 차이 값이 저화질 영상과 고화질 영상간의 딜레이 시간에 해당되는 경우, 매칭된 프레임들을 학습 데이터로 생성하는 방법 및 장치를 제공하는 것이다. The task to be solved is to match the low-quality frame and the high-quality frame based on the similarity, and when the difference in shooting time between the matched low-quality frame and the high-quality frame corresponds to the delay time between the low-quality image and the high-quality image, the matched frames are used as learning data To provide a method and apparatus for generating

해결하고자 하는 과제는 실제 카메라를 통해 촬영한 고화질 영상과 저화질 영상을 매칭하여 실제 영상의 특성을 반영하면서 영상의 화질을 개선시키는 딥러닝 모델의 학습데이터를 생성하는 기술을 제공하는 것이다. The task to be solved is to provide a technology for generating learning data for a deep learning model that improves the image quality while reflecting the characteristics of the actual image by matching the high-quality image and the low-quality image captured by a real camera.

본 발명의 실시예에 따르면 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치가 학습데이터를 생성하는 방법으로서, 동일한 촬영 조건에서 실제 촬영된 저화질 영상과 고화질 영상을 수집하는 단계, 저화질 영상과 고화질 영상에 대해 각각의 프레임들을 추출하고, 추출된 프레임마다 촬영 시간 정보를 산출하는 단계, 저화질 프레임마다 고화질 프레임들과의 유사도들을 산출하고, 가장 높은 유사도를 가지는 저화질 프레임과 고화질 프레임을 매칭하여 복수 개의 후보 학습데이터들을 생성하는 단계, 그리고 후보 학습데이터들마다 저화질 프레임과 고화질 프레임간의 촬영 시간 정보의 차이값을 추출하고, 차이값들 중에서 빈도수가 가장 많은 차이값을 가지는 후보 학습데이터를 최종 학습데이터로 생성하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a method for a computing device operated by at least one processor to generate learning data, the method comprising: collecting low-quality images and high-quality images actually photographed under the same shooting conditions; Extracting each frame, calculating shooting time information for each extracted frame, calculating similarities with high-quality frames for each low-quality frame, and matching the low-quality frame and the high-quality frame with the highest similarity to a plurality of candidate learning data and extracting the difference value of the shooting time information between the low-resolution frame and the high-quality frame for each candidate learning data, and generating the candidate learning data having the most frequent difference value among the difference values as the final learning data. includes

수집하는 단계는, 저화질 영상과 고화질 영상이 저장된 VOD 정보를 파싱하여, 각 영상의 포맷 정보를 수집하고, 포맷 정보에 기초하여 시작 시간, 영상 주소 전체 영상의 초당 프레임 수를 확인할 수 있다. In the collecting step, it is possible to parse the VOD information in which the low-resolution image and the high-quality image are stored, collect format information of each image, and check the start time and the number of frames per second of the entire image based on the format information.

촬영 시간 정보를 산출하는 단계는, 포맷 정보에 포함된 영상의 시작 시간을 확인하고, 시작 시간에서부터 추출한 프레임까지의 경과 시간을 산출하여, 경과 시간과 시작 시간을 더한 시간을 프레임에 대한 촬영 시간 정보로 산출할 수 있다. The calculating of the shooting time information includes checking the start time of the image included in the format information, calculating the elapsed time from the start time to the extracted frame, and adding the elapsed time and the start time to the shooting time information for the frame. can be calculated as

최종 학습데이터로 생성하는 단계는, 빈도수가 가장 많은 차이값을 저화질 영상과 고화질 영상간의 딜레이 시간으로 추정할 수 있다. In the step of generating the final training data, a difference value having the highest frequency may be estimated as a delay time between a low-quality image and a high-quality image.

복수 개의 후보 학습데이터들을 생성하는 단계는, 하나의 저화질 프레임에 대한 고화질 프레임들의 평균 유사도를 산출하고, 평균 유사도와 가장 높은 유사도를 비교하여 임계치 이하의 차이 값을 가지는 경우, 저화질 프레임을 후보 학습데이터로부터 제외시킬 수 있다. The step of generating a plurality of candidate training data includes calculating an average similarity of high-quality frames with respect to one low-quality frame, comparing the average similarity with the highest similarity, and, when a difference value is less than or equal to a threshold, selects a low-quality frame as candidate training data can be excluded from

본 발명의 실시예에 따르면 학습데이터를 생성하는 장치로서,메모리, 그리고 메모리에 로드된 프로그램의 명령들(instructions)을 실행하는 적어도 하나의 프로세서를 포함하고, 프로그램은 동일한 촬영 조건에서 촬영된 화질이 상이한 실제 영상들로부터 각각 프레임을 추출하고, 동일한 화질을 가지는 프레임들을 그룹화하여 기준 그룹과 비교 그룹으로 분류하는 단계, 기준 그룹의 프레임마다 비교 그룹의 프레임들과의 유사도를 산출하여 유사도가 가장 높은 서로 상이한 화질의 프레임을 매칭하는 단계, 그리고 매칭된 프레임간의 촬영 시간의 시간 차이 값들에 기초하여 화질이 상이한 영상간의 딜레이 시간을 추정하고, 딜레이 시간만큼 시간 차이 값을 가지는 매칭된 프레임을 학습 데이터로 생성하는 단계를 실행하도록 기술된 명령들을 포함한다. According to an embodiment of the present invention, there is provided an apparatus for generating learning data, comprising: a memory; and at least one processor executing instructions of a program loaded into the memory, wherein the program has a quality captured under the same shooting conditions. Extracting frames from different actual images, grouping frames having the same image quality and classifying them into a reference group and a comparison group, calculating a similarity for each frame of the reference group with the frames of the comparison group Matching frames of different image quality and estimating delay time between images of different image quality based on time difference values of shooting time between the matched frames, and generating a matched frame having a time difference value by the delay time as training data instructions that are described to perform the steps of

촬영 시간은, 화질이 상이한 영상 각각의 포맷 정보에 포함된 영상의 시작 시간을 확인하고, 시작 시간에서부터 추출한 프레임까지의 경과 시간을 산출하여, 경과 시간과 시작 시간을 더한 시간일 수 있다. The shooting time may be a time obtained by adding the elapsed time and the start time by checking the start time of the image included in the format information of each image having different image quality, calculating the elapsed time from the start time to the extracted frame.

학습 데이터로 생성하는 단계는, 기준 그룹의 프레임마다 매칭된 비교 그룹의 프레임간의 촬영 시간에 대한 시간 차이 값을 산출하고, 산출된 시간 차이 값들 중에서 가장 많은 가장 많은 빈도수를 가지는 시간 차이 값을 딜레이 시간으로 추정할 수 있다. In the step of generating the learning data, the time difference value for the shooting time between frames of the comparison group matched for each frame of the reference group is calculated, and the time difference value having the highest frequency among the calculated time difference values is the delay time can be estimated as

그룹화하는 단계는, 프레임의 수가 상대적으로 적은 영상의 프레임들을 기준 그룹으로 선정하고, 기준 그룹의 영상의 화질과 상이한 영상의 프레임들을 비교 그룹으로 선정할 수 있다. In the grouping, frames of an image having a relatively small number of frames may be selected as a reference group, and frames of an image different from that of the image of the reference group may be selected as a comparison group.

학습 데이터로 생성하는 단계는, 기준 그룹의 프레임마다 비교 그룹의 프레임들과의 평균 유사도를 산출하고, 상기 기준 그룹의 프레임마다 평균 유사도와 가장 높은 유사도를 비교하여 비교한 결과가 임계치 이하의 차이 값을 가지면 해당 프레임을 제외시킬 수 있다. In the step of generating the training data, the average similarity with the frames of the comparison group is calculated for each frame of the reference group, and the average similarity and the highest similarity are compared for each frame of the reference group, and the comparison result is a difference value less than a threshold If you have , you can exclude the frame.

실시예에 따르면 실제 촬영된 영상 화질을 개선시키는 딥러닝 모델을 학습 시키기 위해 고화질 영상과 저화질 영상을 매칭하여 실제 영상의 특성을 반영한 학습데이터를 생성할 수 있다. According to an embodiment, in order to learn a deep learning model that improves the quality of an actual captured image, a high-quality image and a low-quality image are matched to generate learning data reflecting the characteristics of the actual image.

실시예에 따르면, 저화질 프레임과 고화질 프레임간의 시간 차이 값에 기초하여 영상 전체 딜레이 시간을 추정하고, 딜레이 시간을 고려하여 높은 유사도를 가지는 학습 데이터를 생성함으로써, 딥러닝 모델에 최적화된 많은 학습데이터를 확보할 수 있다. According to the embodiment, by estimating the total delay time of the image based on the time difference value between the low-resolution frame and the high-quality frame, and generating training data with high similarity in consideration of the delay time, a lot of training data optimized for the deep learning model can be obtained

도 1은 본 발명의 실시예에 따른 학습데이터 생성 장치의 구조도이다.
도 2는 본 발명의 실시예에 따른 학습데이터를 생성하는 방법을 나타낸 흐름도이다.
도 3은 본 발명의 실시예에 따른 추출된 프레임에 대한 프레임 정보를 생성하는 과정을 설명하는 예시도이다.
도 4는 본 발명의 실시예에 따른 저화질 프레임과 고화질 프레임간의 유사도를 산출하는 과정을 설명하기 위한 예시도이다.
도 5는 본 발명의 실시예에 따른 유사도 측정에 기초하여 산출된 시간차를 나타낸 예시도이다.
도 6은 본 발명의 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다.1 is a structural diagram of an apparatus for generating learning data according to an embodiment of the present invention.
2 is a flowchart illustrating a method of generating learning data according to an embodiment of the present invention.
3 is an exemplary diagram illustrating a process of generating frame information for an extracted frame according to an embodiment of the present invention.
4 is an exemplary diagram for explaining a process of calculating a degree of similarity between a low-quality frame and a high-quality frame according to an embodiment of the present invention.
5 is an exemplary diagram illustrating a time difference calculated based on similarity measurement according to an embodiment of the present invention.
6 is a hardware configuration diagram of a computing device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated. In addition, terms such as “…unit”, “…group”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. there is.

본 발명에서 설명하는 장치들은 적어도 하나의 프로세서, 메모리 장치, 통신 장치 등을 포함하는 하드웨어로 구성되고, 지정된 장소에 하드웨어와 결합되어 실행되는 프로그램이 저장된다. 하드웨어는 본 발명의 방법을 실행할 수 있는 구성과 성능을 가진다. 프로그램은 도면들을 참고로 설명한 본 발명의 동작 방법을 구현한 명령어(instructions)를 포함하고, 프로세서와 메모리 장치 등의 하드웨어와 결합하여 본 발명을 실행한다. The devices described in the present invention are composed of hardware including at least one processor, a memory device, a communication device, and the like, and a program to be executed in combination with the hardware is stored in a designated place. The hardware has the configuration and capability to implement the method of the present invention. The program includes instructions for implementing the method of operation of the present invention described with reference to the drawings, and is combined with hardware such as a processor and a memory device to execute the present invention.

본 명세서에서 "전송 또는 제공"은 직접적인 전송 또는 제공하는 것뿐만 아니라 다른 장치를 통해 또는 우회 경로를 이용하여 간접적으로 전송 또는 제공도 포함할 수 있다.As used herein, “transmission or provision” may include not only direct transmission or provision, but also transmission or provision indirectly through another device or using a detour path.

본 명세서에서 단수로 기재된 표현은 "하나" 또는 "단일" 등의 명시적인 표현을 사용하지 않은 이상, 단수 또는 복수로 해석될 수 있다.In this specification, expressions described in the singular may be construed in the singular or plural unless an explicit expression such as “a” or “single” is used.

도 1은 본 발명의 실시예에 따른 학습데이터 생성 장치의 구조도이다. 1 is a structural diagram of an apparatus for generating learning data according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 학습데이터 생성 장치(200)는 실제 촬영된 저화질 영상과 고화질 영상을 수집하는 영상 수집기(210), 수집된 영상에서 프레임을 추출하여 저장하는 프레임 추출부(220) 그리고 저화질 영상과 고화질 영상을 매칭하여 학습데이터를 생성하는 학습데이터 생성기(230)를 포함한다. As shown in FIG. 1, the learning data generating device 200 includes an image collector 210 that collects actually photographed low-quality images and high-quality images, a frame extractor 220 that extracts frames from the collected images and stores them, and It includes a learning data generator 230 that generates learning data by matching the low-quality image and the high-quality image.

설명을 위해, 영상 수집기(210), 프레임 추출부(220) 그리고 학습데이터 생성기(230)로 명명하여 부르나, 이들은 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치이다. 여기서, 영상 수집기(210), 프레임 추출부(220) 그리고 학습데이터 생성기(230)는 하나의 컴퓨팅 장치에 구현되거나, 별도의 컴퓨팅 장치에 분산 구현될 수 있다. 별도의 컴퓨팅 장치에 분산 구현된 경우, 영상 수집기(210), 프레임 추출부(220) 그리고 학습데이터 생성기(230)는 통신 인터페이스를 통해 서로 통신할 수 있다. 컴퓨팅 장치는 본 발명을 수행하도록 작성된 소프트웨어 프로그램을 실행할 수 있는 장치이면 충분하고, 예를 들면, 서버, 랩탑 컴퓨터 등일 수 있다. For description, the image collector 210 , the frame extractor 220 , and the learning data generator 230 are named and called, but these are computing devices operated by at least one processor. Here, the image collector 210 , the frame extractor 220 , and the learning data generator 230 may be implemented in one computing device or distributed in separate computing devices. When distributed in a separate computing device, the image collector 210 , the frame extractor 220 , and the learning data generator 230 may communicate with each other through a communication interface. The computing device may be any device capable of executing a software program written to carry out the present invention, and may be, for example, a server, a laptop computer, or the like.

학습데이터 생성 장치(200)는 영상의 품질(화질)을 개선시키는 딥러닝 모델을 학습하기 위한 학습데이터를 생성하는 장치로, 동일한 촬영 조건하에서 촬영된 서로 상이한 화질을 가지는 영상을 매칭하여 학습데이터로 생성한다. The learning data generating device 200 is a device for generating learning data for learning a deep learning model that improves the quality (image quality) of an image. create

이때, 딥러닝 모델에 입력하기 위해서는 동일한 촬영 조건하에서 화질만 상이한 영상이 학습데이터로 확보되어야 한다. At this time, in order to input to the deep learning model, images with different image quality under the same shooting conditions must be secured as training data.

하지만, 동일한 촬영 조건에서 촬영된 저화질 영상과 고화질 영상을 요청하더라도 실제 수신되는 저장 영상들은 촬영 시간 정보가 일치하지 않을 수 있다. 예를 들어, 촬영된 저화질 영상과 고화질 영상이 각각 저장되는 저장장치의 성능 차이, 네트워크 성능 차이, 카메라 자체의 성능 차이 등과 같은 원인으로 각각의 촬영 영상의 촬영 시간 정보가 다르게 저장될 수 있다. However, even if a low-quality image and a high-definition image captured under the same shooting conditions are requested, the actually received stored images may not match the shooting time information. For example, the shooting time information of each captured image may be differently stored due to a difference in performance of a storage device in which the captured low-quality image and the high-quality image are stored, a difference in network performance, a difference in performance of the camera itself, etc., for example.

학습데이터 생성 장치(200)는 요청한 촬영 시간을 포함한 복수개의 TS(transport stream) 조각으로 저화질 영상과 고화질 영상을 수신한다. 이때 전달되는 TS 정보는 일종의 포맷 형식으로, 각 영상 스트림마다 다르게 구성되므로 TS 정보에 기반하여 촬영 시간 정보를 확인할 수 있다. The learning data generating apparatus 200 receives a low-resolution image and a high-quality image as a plurality of transport stream (TS) pieces including the requested shooting time. At this time, the transmitted TS information is in a format format and is configured differently for each video stream, so that it is possible to check the shooting time information based on the TS information.

이에 학습데이터 생성 장치(200)는 실제 촬영된 저화질 영상과 고화질 영상이 각각 복수개의 TS 조각들로 전달받으므로 각 TS 조각들의 TS 정보에 기초하여 서로 동일한 저화질 영상과 고화질 영상을 매칭하는 과정이 필수적이다. Accordingly, the learning data generating apparatus 200 receives the actually captured low-quality image and high-quality image as a plurality of TS pieces, so the process of matching the same low-quality image and high-quality image with each other based on the TS information of each TS fragment is essential. am.

하지만, 앞서 설명한 바와 같이, 저장된 촬영 시간 정보의 오류로 인해 실제 동일한 촬영 시간 정보라고 하더라도 육안으로 확인하면 서로 상이한 고화질 영상과 저화질 영상일 수 있다. However, as described above, due to an error in the stored photographing time information, even if the actual photographing time information is the same, when visually confirmed, the high-quality image and the low-quality image may be different from each other.

그러므로 동일한 시간, 동일한 대상, 동일한 카메라 화각 등 동일한 촬영 조건하에서 촬영된 영상들을 학습데이터로 생성하기 위해 학습데이터 생성 장치(200)는 촬영 시간을 산출하고, 저화질 프레임과 고화질 프레임간의 유사도와 촬영 시간차에 기초하여 학습데이터를 생성한다. Therefore, in order to generate the images captured under the same shooting conditions, such as the same time, the same subject, and the same camera angle of view, as learning data, the learning data generating apparatus 200 calculates the shooting time, Based on the training data is generated.

먼저, 영상 수집기(210)는 실제 촬영된 영상이 저장된 영상 플랫폼(100)에 녹화된 영상을 요청하여 수신한다. First, the image collector 210 requests and receives a recorded image from the image platform 100 in which an actual photographed image is stored.

여기서, 영상 플랫폼(100)은 하나 이상의 카메라들에 의해 촬영된 영상들을 영상 저장소(데이터베이스)에 저장하고 관리하는 플랫폼으로, 영상 저장소의 각 영상들은 촬영된 카메라 고유 ID, 촬영 시간 등의 데이터와 함께 저장된다. Here, the image platform 100 is a platform that stores and manages images captured by one or more cameras in an image storage (database), and each image in the image storage is captured together with data such as camera unique ID and shooting time. is saved

여기서, 카메라 고유 ID는 카메라 종류, 촬영 모듈, 영상의 화질 등에 기초하여 설정되는 ID를 의미한다. Here, the camera-specific ID means an ID set based on a camera type, a photographing module, image quality, and the like.

카메라에는 내부에 고화질 영상을 촬영하는 촬영모듈과 저화질 영상을 촬영하는 촬영모듈이 포함되어 실시간으로 동시에 상이한 화질을 가지는 영상을 촬영할 수 있다. 그러므로 동일한 촬영 조건에서 서로 상이한 화질을 가지는 영상들을 확보할 수 있다. The camera includes a photographing module for photographing a high-quality image and a photographing module for photographing a low-quality image, so that images having different image quality can be photographed at the same time in real time. Therefore, images having different image quality can be secured under the same shooting condition.

그리고 카메라는 cctv, 블랙박스 내부의 카메라, 드론에 장착된 카메라, 사용자 단말에 장착된 카메라 등을 모두 포함하며, 특정 카메라에 한정하는 것은 아니다. In addition, the camera includes a cctv, a camera inside a black box, a camera mounted on a drone, a camera mounted on a user terminal, and the like, and is not limited to a specific camera.

영상 수집기(210)는 카메라 ID(촬영 모듈 ID), 영상이 촬영된 시간 등을 포함하는 VOD (video on demand) 정보를 요청하면, 영상 플랫폼(100)으로부터 VOD 저장 URL을 수신한다. The image collector 210 receives a VOD storage URL from the video platform 100 when video on demand (VOD) information including a camera ID (capturing module ID), an image capture time, and the like is requested.

영상 수집기(210)는 해당 VOD 저장 URL을 파싱하여 실제 영상의 TS 주소, 시작 시간, 전체 TS 재생 시간 정보 등을 포함하는 TS 정보를 수집한다. The image collector 210 parses the corresponding VOD storage URL to collect TS information including the TS address, start time, and total TS playback time information of the actual image.

그리고 영상 수집기(210)는 수집한 TS 정보를 실시간으로 프레임 추출부(220)에 전달하거나 수집한 영상 전체에 대한 TS 리스트를 생성한 후, 순차적으로 프레임 추출부(220)에 전달할 수 있다. In addition, the image collector 210 may transmit the collected TS information to the frame extractor 220 in real time or generate a TS list for the entire collected image and then sequentially transmit it to the frame extractor 220 .

프레임 추출부(220)는 전달받은 TS 정보를 수신하여 실제 영상을 TS 주소를 통해 수집하고, 해당 영상에 대한 프레임을 추출한다. 그리고 프레임 추출부(220)는 추출하는 프레임마다 촬영 시간을 산출하여 프레임 정보를 생성할 수 있다. The frame extractor 220 receives the received TS information, collects an actual image through the TS address, and extracts a frame for the image. In addition, the frame extractor 220 may generate frame information by calculating a shooting time for each extracted frame.

영상 수집기(210)에서 수집한 시작 시간은 TS에 대한 영상의 시작 시간 정보로, 프레임의 촬영 시간은 TS 시작 시간에서부터 추출한 프레임까지의 경과 시간을 구하고 경과 시간과 TS 시작 시간을 더하여 산출한다. The start time collected by the image collector 210 is the start time information of the image for the TS, and the frame shooting time is calculated by obtaining the elapsed time from the TS start time to the extracted frame and adding the elapsed time and the TS start time.

상세하게는 경과 시간은 하나의 Frame당 소요시간 (1/ FPS)과 현재 프레임 인덱스를 곱하여 구할 수 있으며, 이를 수학식으로 표현하면 다음 수학식 1과 같다. Specifically, the elapsed time is the time required per frame (1/ FPS) multiplied by the current frame index, and expressed as an equation as shown in Equation 1 below.

[수학식 1] [Equation 1]

Frame Time = (1/ FPS * Frame idx) + TS 시작 시간 Frame Time = (1/ FPS * Frame idx) + TS start time

영상 수집기(210)는 수학식 1을 통해 프레임마다 촬영 시간을 산출하여 각 프레임 정보(Frame Info)에 저장한다. The image collector 210 calculates the shooting time for each frame through Equation 1 and stores it in each frame information (Frame Info).

이때, 영상 수집기(210)는 수집되는 카메라 ID에 기초하여 저화질 영상에 대한 프레임들과 고화질 영상에 대한 프레임들을 각각 대응하는 저장소에 저장할 수 있다. In this case, the image collector 210 may store the frames for the low-quality image and the frames for the high-quality image in respective storages based on the collected camera ID.

학습데이터 생성기(230)는 추출된 프레임들에 대해서 저화질 영상과 고화질 영상과의 유사성을 판단하고, 유사도가 높은 동일 시점의 저화질 프레임과 고화질 프레임을 매칭하여 학습데이터로 생성할 수 있다. The training data generator 230 may determine the similarity between the low-resolution image and the high-definition image with respect to the extracted frames, and match the low-quality frame and the high-quality frame at the same viewpoint having a high degree of similarity to generate the training data.

학습데이터 생성기(230)는 수집한 저화질 프레임들과 고화질 프레임들간에 매칭하여 학습데이터를 생성하기 위해 기준 그룹(A)과 비교 그룹(B)으로 구분할 수 있다. The training data generator 230 may be divided into a reference group (A) and a comparison group (B) to generate training data by matching between the collected low-quality frames and high-quality frames.

이에 학습데이터 생성기(230)는 기준 그룹(A)에 포함된 프레임마다 비교 그룹(B)에 포함된 프레임들 간에 유사도를 산출하고, 산출된 유사도가 가장 높은 프레임을 선택한다. Accordingly, the training data generator 230 calculates a similarity between frames included in the comparison group B for each frame included in the reference group A, and selects a frame having the highest calculated similarity.

그리고 학습데이터 생성기(230)는 기준 그룹(A)에 포함된 프레임의 촬영 시간과 선택한 유사도가 가장 높은 프레임의 촬영 시간을 비교하여 시간 차이 값을 산출한다. And the learning data generator 230 calculates a time difference value by comparing the shooting time of the frame included in the reference group (A) with the shooting time of the frame having the highest selected similarity.

다시 말해, 기준 그룹(A)에 포함된 프레임마다 선택된 비교 그룹(B)의 프레임과의 시간차이를 산출하기 때문에, 기준 그룹(A)의 프레임 개수에 기초하여 시간 차이 값들이 산출된다. In other words, since the time difference with the frame of the selected comparison group B is calculated for each frame included in the reference group A, time difference values are calculated based on the number of frames in the reference group A.

학습데이터 생성기(230)는 산출된 시간 차이 값들 중에서 빈도수가 가장 많은 시간 차이 값을 선택하여 해당 시간 차이 값을 가지는 저화질 프레임과 고화질 프레임들을 추출하여 학습데이터들로 생성할 수 있다.The training data generator 230 may select a time difference value having the highest frequency from among the calculated time difference values, extract low-resolution frames and high-quality frames having the corresponding time difference values, and generate them as training data.

다시 말해 학습데이터 생성기(230)는 동일한 촬영 시간차를 가지는 데이터들을 학습데이터로 생성한다. 이처럼 학습데이터 생성기(230)는 기준 그룹의 프레임과 1대 1로 매칭된 비교 그룹의 프레임을 학습데이터를 생성하여 데이터 베이스에 저장할 수 있다. In other words, the learning data generator 230 generates data having the same shooting time difference as learning data. As such, the training data generator 230 may generate training data using the frame of the comparison group matched one-to-one with the frame of the reference group and store it in the database.

도 1에는 학습데이터 생성 장치(200) 내에 저장 DB들을 포함하는 것으로 도시하였지만, 반드시 이에 한정하는 것은 아니고, 별도의 장치나 서버에 저장 DB를 구현할 수 있다. Although it is illustrated in FIG. 1 that the storage DBs are included in the learning data generating device 200, the present invention is not limited thereto, and the storage DB may be implemented in a separate device or server.

이하에는 도 2 및 3을 이용하여 학습데이터 생성 장치가 학습데이터를 생성하는 방법에 대해서 상세하게 설명한다. Hereinafter, a method for generating the training data by the training data generating apparatus using FIGS. 2 and 3 will be described in detail.

도 2는 본 발명의 실시예에 따른 학습데이터를 생성하는 방법을 나타낸 흐름도이다. 2 is a flowchart illustrating a method of generating learning data according to an embodiment of the present invention.

도 2에 도시한 바와 같이, 학습데이터 생성 장치(200)는 동일한 촬영 조건에서 촬영된 저화질 영상과 고화질 영상을 수집한다(S110). As shown in FIG. 2 , the learning data generating apparatus 200 collects low-quality images and high-quality images captured under the same shooting conditions ( S110 ).

학습데이터 생성 장치(200)는 촬영 시간, 카메라 ID 등 동일한 촬영 조건으로 촬영된 저화질 영상과 고화질 영상을 영상 플랫폼을 통해 수집할 수 있다. The learning data generating apparatus 200 may collect a low-quality image and a high-quality image captured under the same shooting conditions, such as a shooting time and a camera ID, through an image platform.

여기서, 영상 플랫폼은 촬영된 영상들을 저장하는 저장소를 관리하는 플랫폼으로, 각 카메라 ID 별 촬영된 영상을 저장하여 관리하며, 사용자가 요청하는 영상을 제공한다. Here, the image platform is a platform that manages a storage for storing captured images, stores and manages captured images for each camera ID, and provides images requested by the user.

학습데이터 생성 장치(200)는 영상 플랫폼으로부터 요청한 VOD 정보에 대응한 VOD 저장 URL을 수신할 수 있다. 그리고 학습데이터 생성 장치(200)는 VOD 저장 URL을 파싱하여 실제 영상의 TS 주소, 시작 시간, 전체 TS 재생 시간 정보 등을 포함하는 TS 정보를 획득한다. The learning data generating apparatus 200 may receive a VOD storage URL corresponding to the VOD information requested from the video platform. Then, the training data generating apparatus 200 parses the VOD storage URL to obtain TS information including the TS address, start time, and total TS reproduction time information of the actual video.

학습데이터 생성 장치(200)는 획득된 TS 정보들에 기초하여 순차적으로 실제 영상을 획득하여 영상의 프레임을 추출한다. The learning data generating apparatus 200 extracts a frame of an image by sequentially acquiring an actual image based on the acquired TS information.

이때, 학습데이터 생성 장치(200)는 저화질 영상과 고화질 영상 각각에 대해서 동시에 프레임을 추출하거나 순차적으로 저화질 영상의 프레임들을 추출한 후 고화질 영상의 프레임을 추출할 수 있다. In this case, the learning data generating apparatus 200 may extract a frame from each of the low-quality image and the high-quality image at the same time or sequentially extract the frames of the low-quality image and then extract the frame of the high-definition image.

다음으로 학습데이터 생성 장치(200)는 저화질 영상에서 프레임들과 각 프레임의 촬영 시간 정보를 추출한다(S120). Next, the learning data generating apparatus 200 extracts frames and shooting time information of each frame from the low-quality image (S120).

학습데이터 생성 장치(200)는 추출된 프레임마다 촬영 시간을 산출하여 프레임 정보를 구성한다. 여기서, 프레임 정보는 카메라 ID, 프레임 인덱스(idx), 실제 프레임 영상, 산출된 촬영 시간 등을 포함하여 구성되지만 반드시 이에 한정하는 것은 아니다. The learning data generating apparatus 200 configures frame information by calculating a shooting time for each extracted frame. Here, the frame information includes, but is not limited to, a camera ID, a frame index (idx), an actual frame image, and a calculated shooting time.

여기서 프레임 인덱스(idx)는 TS의 전체 프레임의 개수를 구하고, 추출한 프레임마다 전체 프레임의 개수에 기초하여 설정된 번호를 나타낸다. Here, the frame index (idx) obtains the total number of frames of the TS and indicates a number set based on the total number of frames for each extracted frame.

그리고 학습데이터 생성 장치(200)는 각 프레임마다 TS 시작 시간에 경과 시간을 더하여 프레임별 촬영 시간을 산출할 수 있다.In addition, the training data generating apparatus 200 may calculate the shooting time for each frame by adding the elapsed time to the TS start time for each frame.

학습데이터 생성 장치(200)는 카메라 ID를 기준으로 확보한 프레임 정보를 데이터베이스에 저장할 수 있다. The learning data generating apparatus 200 may store frame information secured based on the camera ID in the database.

S120 단계에서 수행한 방법과 동일하게, 학습데이터 생성 장치(200)는 고화질 영상에서 프레임들과 각 프레임의 촬영 시간 정보를 추출한다(S130). In the same manner as in the method performed in step S120, the learning data generating apparatus 200 extracts frames and shooting time information of each frame from the high-definition image (S130).

학습데이터 생성 장치(200)는 고화질 영상에 대해서도 동일한 방법으로 프레임 정보를 구성하고, 이를 카메라 ID를 기준으로 데이터베이스에 저장한다. The learning data generating apparatus 200 configures frame information for a high-quality image in the same way, and stores it in a database based on the camera ID.

설명의 편의상 S120 단계와 S130 단계를 순차적으로 설명하였지만, 해당 단계는 동시에 진행되거나 고화질 영상을 먼저 수행한 후, 저화질 영상을 수행할 수 있다. Although steps S120 and S130 have been sequentially described for convenience of explanation, the corresponding steps may be performed simultaneously or a high-quality image may be performed first, and then a low-quality image may be performed.

학습데이터 생성 장치(200)는 저화질 프레임마다 고화질 프레임들과의 각각의 유사도와 평균 유사도를 산출한다(S140).The learning data generating apparatus 200 calculates each similarity and average similarity with high-quality frames for each low-resolution frame (S140).

이때, 학습데이터 생성 장치(200)는 저화질 영상과 고화질 영상에 대한 기준 그룹을 선정할 수 있다. In this case, the learning data generating apparatus 200 may select a reference group for the low-quality image and the high-quality image.

학습데이터 생성 장치(200)는 각 영상들의 프레임 정보를 탐색하여 두 영상 간에 동일한 시작 시간과 종료 시간을 수집 시간으로 선정하고 영상 비교에 필요한 기준그룹과 비교그룹을 선정할 수 있다. 그리고 수집 시간 내에 수집 프레임의 수가 상대적으로 적은 영상의 프레임들을 기준 그룹으로 선정한다. The learning data generating apparatus 200 may search for frame information of each image, select the same start time and end time between two images as a collection time, and select a reference group and a comparison group necessary for image comparison. In addition, frames of an image having a relatively small number of collected frames within the collection time are selected as a reference group.

일반적으로 저화질 영상이 같은 시간을 촬영한 고화질 영상에 비해 초당 프레임 수(FPS)가 낮기 때문에, 대체적으로 저화질 영상이 선정된다. In general, a low-quality image is generally selected because the frame rate per second (FPS) is lower than that of a high-quality image taken at the same time for a low-quality image.

다만, 상황에 따라 고화질 영상에서의 프레임 수가 더 적으면 고화질 영상이 기준 그룹으로 선정될 수 있으나 이하에서는 기준 그룹은 저화질 영상을 가정하여 설명한다. However, depending on circumstances, if the number of frames in the high-definition image is smaller, the high-definition image may be selected as the reference group, but hereinafter, the reference group will be described assuming a low-quality image.

학습데이터 생성 장치(200)는 복수개의 저화질 프레임마다 복수개의 고화질 프레임간의 유사도를 산출하고, 하나의 저화질 프레임에 대한 복수개의 고화질 프레임간의 유사도들에 기초하여 평균 유사도를 산출한다. The training data generating apparatus 200 calculates a similarity between a plurality of high-definition frames for each of the plurality of low-quality frames, and calculates an average similarity based on similarities between a plurality of high-definition frames with respect to one low-quality frame.

다음으로 학습데이터 생성 장치(200)는 프레임마다 가장 높은 유사도를 가지는 고화질 프레임을 매칭하고 매칭된 프레임에 대해 오탐 구간 여부를 확인하여 제외한다(S150). Next, the learning data generating apparatus 200 matches the high-definition frame having the highest similarity for each frame, checks whether the matched frame has a false positive section, and excludes it (S150).

학습데이터 생성 장치(200)는 높은 유사도를 가지는 저화질 프레임과 고화질 프레임을 매칭하여 후보 학습데이터로 생성할 수 있다. 학습데이터 생성 장치(200)는 각 저화질 프레임마다 매칭된 고화질 프레임을 통해 복수개의 후보 학습데이터를 생성한다. The learning data generating apparatus 200 may generate candidate learning data by matching a low-quality frame and a high-quality frame having a high degree of similarity. The training data generating apparatus 200 generates a plurality of candidate training data through a high-quality frame matched for each low-resolution frame.

이때, 학습데이터 생성 장치(200)는 하나의 저화질 프레임을 기준으로 복수개의 고화질 프레임들의 평균 유사도 그리고 가장 높은 유사도를 비교하여 오탐 구간을 확인할 수 있다.In this case, the learning data generating apparatus 200 may identify the false positive section by comparing the average similarity and the highest similarity of a plurality of high-quality frames with respect to one low-quality frame.

여기서 오탐 구간은 유효한 학습 데이터들만을 선택하기 위한 조건으로, 평균 유사도와 가장 높은 유사도가 서로 근소한 차이 값을 가지는 경우를 나타낸다. Here, the false positive section is a condition for selecting only valid training data, and represents a case in which the average similarity and the highest similarity have a slight difference.

예를 들어, 변화가 없는 고정 영상인 경우, 일정 시간 동안 동일한 영상이 반복되게 된다. 이러한 경우, 동일하지 않은 시점에서 촬영된 저화질 프레임과 고화질 프레임이라도 유사도가 높게 산출되어 매칭되어 학습 데이터의 정확도가 떨어지며, 오류가 발생할 수 있다. For example, in the case of a fixed image without change, the same image is repeated for a predetermined time. In this case, even if the low-quality frame and the high-quality frame captured at unequal points of view are matched, the similarity is calculated to be high, and thus the accuracy of the training data is lowered, and errors may occur.

그러므로 학습데이터 생성 장치(200)는 기준 그룹의 프레임별 평균 유사도와 가장 높은 유사도를 비교하여 임계치 이하의 차이 값을 가지면 오탐 구간으로 추정하여 해당 프레임은 제외할 수 있다. Therefore, the learning data generating apparatus 200 compares the average similarity of each frame of the reference group with the highest similarity, and if a difference value is less than or equal to a threshold value, it may be estimated as a false positive section and the corresponding frame may be excluded.

이때, 임계치 이하의 차이 값에 대한 정의는 추후에 사용자에 의해서 설정 가능하며, 예를 들어 유사도 값의 정수가 같거나 소수점 이하의 차이인 경우를 의미할 수 있다. In this case, the definition of the difference value less than the threshold can be set later by the user, and for example, it may mean a case where the integers of the similarity values are the same or the difference is less than a decimal point.

다음으로 학습데이터 생성 장치(200)는 매칭된 저화질 프레임과 고화질 프레임간의 촬영 시간 정보에 대한 차이 값을 산출한다(S160). Next, the learning data generating apparatus 200 calculates a difference value with respect to the photographing time information between the matched low-quality frame and the high-quality frame ( S160 ).

학습데이터 생성 장치(200)는 후보 학습데이터(매칭된 프레임)들마다 저화질 프레임의 촬영 시간 정보에서 고화질 프레임의 촬영 시간 정보의 차이 값을 산출할 수 있다. The training data generating apparatus 200 may calculate a difference value of the shooting time information of the high-quality frame from the shooting time information of the low-quality frame for each candidate learning data (matched frames).

학습데이터 생성 장치(200)는 산출된 차이 값들 중에서 빈도수가 가장 많은 시간 차이 값을 선택하여 해당 시간 차이 값을 가지는 저화질 프레임과 고화질 프레임을 학습 데이터로 생성한다(S170). The training data generating apparatus 200 selects a time difference value having the highest frequency among the calculated difference values and generates a low-resolution frame and a high-quality frame having the corresponding time difference value as training data (S170).

학습데이터 생성 장치(200)는 매칭된 데이터마다 산출된 시간 차이 값을 기록하여 동일한 시간 차이 값이 기록되면 카운팅한다. 그에 따라 가장 많이 카운팅된 차이 값을 선택하여 해당 차이 값을 가지는 매칭 데이터를 최종 학습데이터로 생성한다. The learning data generating apparatus 200 records the time difference value calculated for each matched data, and counts when the same time difference value is recorded. Accordingly, by selecting the most counted difference value, matching data having the corresponding difference value is generated as the final training data.

이에 따라 최종 학습데이터들은 동일한 시간 차이 값을 가지며, 매칭된 해당 저화질 영상과 고화질 영상 중에서 가장 많은 데이터를 학습데이터로 확보할 수 있다. Accordingly, the final training data have the same time difference value, and the most data among the matched low-quality image and high-quality image can be secured as training data.

이와 같이, 학습데이터 생성 장치(200)는 동일 시점에서 촬영된 저화질 프레임과 고화질 프레임을 서로 매칭하여 최종 학습데이터로 생성할 수 있다. In this way, the learning data generating apparatus 200 may generate the final learning data by matching the low-quality frame and the high-quality frame photographed at the same time point with each other.

다시 말해, 학습데이터 생성 장치(200)는 유사도를 기준으로 복수개의 후보 학습데이터들을 생성하고, 복수개의 후보 학습데이터들 중에서 시간 차이값을 기준으로 최종 학습데이터들을 생성할 수 있다. In other words, the training data generating apparatus 200 may generate a plurality of candidate training data based on the degree of similarity, and may generate final training data based on a time difference value among the plurality of candidate training data.

도 3는 본 발명의 실시예에 따른 추출된 프레임에 대한 프레임 정보를 생성하는 과정을 설명하는 예시도이다.3 is an exemplary diagram illustrating a process of generating frame information for an extracted frame according to an embodiment of the present invention.

도 3에 도시한 바와 같이, 학습데이터 생성 장치(200)는 영상 플랫폼(100)으로부터 요청한 VOD 정보에 대응한 VOD 저장 URL을 수신한다.As shown in FIG. 3 , the learning data generating apparatus 200 receives the VOD storage URL corresponding to the VOD information requested from the video platform 100 .

학습데이터 생성 장치(200)는 VOD 저장 URL을 파싱하여 실제 영상의 TS 주소, 시작 시간, 전체 TS 재생 시간 정보, 전체 FPS 등을 포함하는 TS 정보를 획득한다. 그리고 학습데이터 생성 장치(200)는 TS 주소에 기초하여 실제 영상을 획득한다. The training data generating apparatus 200 parses the VOD storage URL to obtain TS information including a TS address of an actual video, a start time, total TS playback time information, and total FPS. And the learning data generating apparatus 200 acquires an actual image based on the TS address.

학습데이터 생성 장치(200)는 각 저화질 영상 또는 고화질 영상에서 프레임들을 추출하면서 프레임 정보(Frame Info)를 생성한다.The learning data generating apparatus 200 generates frame information while extracting frames from each low-quality image or high-quality image.

학습데이터 생성 장치(200)는 획득한 TS 정보는 저화질 영상 또는 고화질 영상에 대한 정보이기 때문에 별도로 각 프레임마다 카메라 ID(camID), 프레임 idx, 실제 프레임, 프레임에 대한 촬영 시간 정보 등을 포함하는 프레임 정보를 생성한다. Since the acquired TS information is information about a low-quality image or a high-quality image, the learning data generating apparatus 200 includes a camera ID (camID), a frame idx, an actual frame, and shooting time information for each frame separately for each frame. create information

여기서, 카메라 ID를 통해 해당 프레임이 고화질 영상에 대한 것인지 저화질 영상에 대한 것인지 분류할 수 있으며, 프레임 idx를 통해 해당 프레임이 몇번째 프레임에 대한 것인지 분류할 수 있다. Here, it is possible to classify whether the corresponding frame is for a high-quality image or a low-quality image through the camera ID, and it is possible to classify which frame the corresponding frame is for through the frame idx.

프레임에 대한 촬영 시간 정보는 앞서 설명한 수학식 1을 이용하여 산출할 수 있으므로 반복되는 설명은 생략한다. Since the photographing time information for the frame can be calculated using Equation 1 described above, a repeated description will be omitted.

학습데이터 생성 장치(200)는 각 영상의 프레임마다 프레임 정보를 생성하면, 영상의 고화질 또는 저화질에 따라 A frame 저장 DB 또는 B frame 저장 DB에 분류하여 저장한다. When the learning data generating apparatus 200 generates frame information for each frame of each image, it is classified and stored in the A frame storage DB or the B frame storage DB according to the high or low quality of the image.

이와 같이, 학습데이터 생성 장치(200)는 각 영상의 프레임마다 프레임 정보를 생성함으로써, 해당 프레임 정보에 기초하여 저화질 프레임과 고화질 프레임을 서로 매칭할 수 있다. As described above, the apparatus 200 for generating training data generates frame information for each frame of each image, so that a low-quality frame and a high-quality frame can be matched with each other based on the corresponding frame information.

도 4는 본 발명의 실시예에 따른 저화질 프레임과 고화질 프레임간의 유사도를 산출하는 과정을 설명하기 위한 예시도이고, 도 5는 본 발명의 실시예에 따른 유사도 측정에 기초하여 산출된 시간차를 나타낸 예시도이다. 4 is an exemplary diagram for explaining a process of calculating a similarity between a low-quality frame and a high-quality frame according to an embodiment of the present invention, and FIG. 5 is an example showing a time difference calculated based on similarity measurement according to an embodiment of the present invention It is also

도 4에 도시한 바와 같이, 기준 그룹에 M개의 프레임이 있고, 각각의 프레임에 대해서는 A1,A2, … , AM에 대한 산출된 촬영 시간이 기재되어 있다. As shown in Fig. 4, there are M frames in the reference group, and for each frame, A1, A2, ... , the calculated imaging times for AM are described.

그리고 비교 그룹에는 N개의 프레임이 있고, 각각의 프레임에 대해서는 B1,B2, … , BN에 대한 산출된 촬영 시간이 기재되어 있다. (M과 N은 서로 다른 자연수)And there are N frames in the comparison group, and for each frame, B1, B2, ... , the calculated imaging times for BN are described. (M and N are different natural numbers)

학습데이터 생성 장치(200)는 상대적으로 적은 프레임 수를 가지는 기준 그룹의 프레임마다 비교 그룹의 프레임들을 비교하여 유사도를 산출한다. The training data generating apparatus 200 calculates a similarity by comparing the frames of the comparison group for each frame of the reference group having a relatively small number of frames.

여기서, 프레임들간의 유사도는 최대 신호 대 잡음비(Peak Signal-to-noise ratio, PSNR)의 산출식에 근거하여 산출할 수 있다. 프레임들간의 로그스케일에서 측정하기 때문에 주로 [db]의 단위가 사용되며, 손실이 적을수록 높은 값을 가지므로 높은 값을 가지면 두 프레임이 서로 유사하다고 추정 가능하다. Here, the similarity between frames may be calculated based on a formula for calculating a peak signal-to-noise ratio (PSNR). Since it is measured on a log scale between frames, the unit of [db] is mainly used, and the smaller the loss, the higher the value.

이러한 유사도 측정 방법은 한 예시로, 반드시 최대 신호 대 잡음비를 사용하는 것은 아니고 평균 제곱 오차(Mean square Error, MSE), 평균 제곱근 편차(Root Mean square Error , RMSE), 구조적 유사 지수(Structural Similarity Index, SSIM) 등 상황에 가장 적합한 프레임 간에 유사도 측정 방법을 사용할 수 있다. This similarity measurement method is an example, and does not necessarily use the maximum signal-to-noise ratio. Mean square error (MSE), root mean square error (RMSE), structural similarity index (Structural Similarity Index, SSIM), etc., may use a similarity measurement method between frames most suitable for the situation.

상세하게는 A1에 대해서 B1에서부터 BN의 프레임마다 총 N개의 유사도 점수를 산출하고, N개의 유사도에 대한 평균 유사도와 유사도 점수가 가장 큰 값을 가지는 유사도 값을 산출한다. In detail, with respect to A1, a total of N similarity scores are calculated for each frame from B1 to BN, and a similarity value having the largest average similarity and similarity score for the N similarities is calculated.

그리고 학습데이터 생성 장치(200)는 가장 큰 값을 가지는 유사도의 비교 그룹 프레임의 촬영 시간과 A1의 촬영 시간의 차이 값을 산출한다. In addition, the learning data generating apparatus 200 calculates a difference value between the photographing time of the comparison group frame having the greatest similarity and the photographing time of A1.

기준 그룹에 포함된 모든 프레임에 대해서 해당 과정을 반복하면 도 5과 같은 결과값을 얻게 된다. If the process is repeated for all frames included in the reference group, a result value as shown in FIG. 5 is obtained.

도 5의 매칭된 프레임들은 유사도에 기초하여 매칭된 후보 학습데이터들로, 후보 학습데이터들 중에서 시간차에 기초하여 최종 학습데이터들을 생성한다. The matched frames of FIG. 5 are candidate training data matched on the basis of similarity, and the final training data is generated based on a time difference among the candidate training data.

이때, 학습데이터 생성 장치(200)는 평균 유사도와 최고 유사도의 정수 값만을 이용하여 정수 값이 같으면 오탐 구간으로 판단하여 제외한다. In this case, the learning data generating apparatus 200 determines a false positive section and excludes if the integer values are the same using only the integer values of the average similarity and the highest similarity.

또는 학습데이터 생성 장치(200)는 평균 유사도와 최고 유사도가 임계치 이하의 차이 값을 가지고 있는 경우는 오탐 구간으로 판단하여 제외할 수 있다. 예를 들어, 임계치를 소수점 이하로 설정하면, 평균 유사도와 최고 유사도가 소수점 이하의 차이 값을 가지고 있는 경우를 오탐 구간으로 판단할 수 있다.Alternatively, when the learning data generating apparatus 200 has a difference value between the average similarity and the highest similarity equal to or less than a threshold value, the learning data generating apparatus 200 may determine it as a false positive section and exclude it. For example, if the threshold is set to less than a decimal point, a case in which the average similarity and the highest similarity have a difference value less than a decimal point may be determined as a false positive section.

이외에도 유사도 값을 소수점 반올림 등과 같이, 오탐 구간을 판단하는 기준을 설정할 수 있다. In addition, a criterion for judging a false positive section may be set, such as rounding the similarity value to a decimal point.

학습데이터 생성 장치(200)는 제외하고 남은 프레임들간의 비교 결과에 기초하여 각 기준 그룹의 프레임마다 시간차를 비교하여 가장 빈도수가 많은 시간 차이 값을 전체 딜레이 시간으로 추정할 수 있다. The learning data generating apparatus 200 may compare the time difference for each frame of each reference group based on the comparison result between the remaining frames except for and estimate the most frequent time difference value as the total delay time.

이에 따라 가장 빈도수가 많은 시간 차이 값을 가지는 매칭 데이터들은 전체 딜레이 시간이 적용된 매칭 데이터로 추정 가능하다. Accordingly, matching data having the most frequent time difference value can be estimated as matching data to which the entire delay time is applied.

도 5에서는 0.00423의 빈도수가 3이고 0.00342의 빈도수가 2로 0.00423의 시간차를 딜레이 시간으로 추정한다.In FIG. 5 , the frequency of 0.00423 is 3, the frequency of 0.00342 is 2, and the time difference of 0.00423 is estimated as the delay time.

이와 같이, 학습데이터 생성 장치(200)는 시간차를 딜레이 시간으로 추정하여 해당 기준 그룹과 비교 그룹간에 매칭하였던 유사도가 가장 큰 값을 가지는 프레임들 중에서 해당 딜레이 시간을 가지는 데이터들을 학습데이터로 생성할 수 있다. In this way, the training data generating apparatus 200 estimates the time difference as the delay time, and among the frames having the largest similarity value matched between the reference group and the comparison group, data having the corresponding delay time can be generated as training data. there is.

도 6은 본 발명의 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다6 is a hardware configuration diagram of a computing device according to an embodiment of the present invention;

도 6에 도시한 바와 같이, 컴퓨팅 장치(300)의 하드웨어는 적어도 하나의 프로세서(310), 메모리(320), 스토리지(330), 통신 인터페이스(340)를 포함할 수 있고, 버스를 통해 연결될 수 있다. 이외에도 입력 장치 및 출력 장치 등의 하드웨어가 포함될 수 있다. 컴퓨팅 장치(300)는 프로그램을 구동할 수 있는 운영 체제를 비롯한 각종 소프트웨어가 탑재될 수 있다.As shown in FIG. 6 , the hardware of the computing device 300 may include at least one processor 310 , a memory 320 , a storage 330 , and a communication interface 340 , and may be connected through a bus. there is. In addition, hardware such as an input device and an output device may be included. The computing device 300 may be loaded with various software including an operating system capable of driving a program.

프로세서(310)는 컴퓨팅 장치(300)의 동작을 제어하는 장치로서, 프로그램에 포함된 명령들을 처리하는 다양한 형태의 프로세서(310)일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 등 일 수 있다. 메모리(320)는 본 발명의 동작을 실행하도록 기술된 명령들이 프로세서(310)에 의해 처리되도록 해당 프로그램을 로드한다. 메모리(320)는 예를 들면, ROM(read only memory), RAM(random access memory) 등 일 수 있다. 스토리지(330)는 본 발명의 동작을 실행하는데 요구되는 각종 데이터, 프로그램 등을 저장한다. 통신 인터페이스(340)는 유/무선 통신 모듈일 수 있다.The processor 310 is a device for controlling the operation of the computing device 300 and may be various types of processors 310 that process instructions included in a program, for example, a central processing unit (CPU), an MPU (Central Processing Unit) It may be a micro processor unit), a micro controller unit (MCU), a graphic processing unit (GPU), or the like. The memory 320 loads the corresponding program so that the instructions described to execute the operation of the present invention are processed by the processor 310 . The memory 320 may be, for example, read only memory (ROM), random access memory (RAM), or the like. The storage 330 stores various data and programs required for executing the operation of the present invention. The communication interface 340 may be a wired/wireless communication module.

실시예에 따르면, 실제 촬영된 영상 화질을 개선시키는 딥러닝 모델을 학습 시키기 위해 고화질 영상과 저화질 영상을 매칭하여 실제 영상의 특성을 반영한 학습데이터를 생성할 수 있다. According to an embodiment, in order to learn a deep learning model that improves the quality of an actual captured image, a high-quality image and a low-quality image are matched to generate learning data reflecting the characteristics of the actual image.

또한, 저화질 영상과 고화질 영상간에 실제 촬영된 시간을 보정하여 동일 시간에 촬영된 프레임들을 매칭함으로써, 빠른 움직임이 촬영된 영상에서도 매칭된 프레임간의 유사도 정확성을 확보할 수 있다. In addition, by correcting the actual shooting time between the low-quality image and the high-quality image and matching frames captured at the same time, similarity accuracy between the matched frames can be secured even in an image in which a fast motion is captured.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiment of the present invention described above is not implemented only through the apparatus and method, and may be implemented through a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto. is within the scope of the right.

Claims

A method of generating learning data by a computing device operated by at least one processor,
Collecting low-quality images and high-quality images actually taken under the same shooting conditions;
extracting respective frames from the low-resolution image and the high-quality image, and calculating shooting time information for each extracted frame;
Calculating similarities with high-definition frames for each low-quality frame, and generating a plurality of candidate learning data by matching the low-quality frame and the high-definition frame having the highest similarity; and
extracting a difference value of shooting time information between the low-resolution frame and the high-quality frame for each of the candidate training data, and generating candidate training data having the highest frequency difference among the difference values as final training data;
A method of generating training data including

In claim 1,
The collecting step is
A method of generating training data that parses the VOD information in which the low-resolution image and the high-definition image are stored, collects format information of each image, and checks the start time, the image address, and the number of frames per second of the entire image based on the format information .

In claim 2,
Calculating the shooting time information includes:
Learning to check the start time of the image included in the format information, calculate the elapsed time from the start time to the extracted frame, and calculate the time obtained by adding the elapsed time and the start time as the shooting time information for the frame How to generate data.

In claim 1,
The step of generating the final learning data is,
A method of generating learning data for estimating a difference value with the highest frequency as a delay time between the low-quality image and the high-quality image.

In claim 1,
The generating of the plurality of candidate learning data includes:
Calculate the average similarity of the high-quality frames to one low-quality frame, and compare the average similarity with the highest similarity. How to.

As a device for generating learning data,
memory, and
at least one processor executing instructions of a program loaded into the memory;
the program is
Extracting each frame from actual images of different image quality captured under the same shooting condition, grouping the frames having the same image quality and classifying them into a reference group and a comparison group;
Calculating a degree of similarity with frames of the comparison group for each frame of the reference group and matching frames of different image quality with the highest degree of similarity; and
Estimating a delay time between images having different image quality based on time difference values of shooting times between the matched frames, and generating a matched frame having the time difference value by the delay time as training data.
Learning data generating device comprising instructions described to execute.

In claim 6,
The shooting time is
An apparatus for generating learning data, which is a time obtained by adding the elapsed time and the start time by checking the start time of an image included in the format information of each image having different image quality, calculating the elapsed time from the start time to the extracted frame.

In claim 7,
The step of generating the learning data is,
For each frame of the reference group, a time difference value for a shooting time between frames of a matched comparison group is calculated, and a time difference value having the highest frequency among the calculated time difference values is generated as the delay time. Device.

In claim 6,
The grouping step is
A learning data generating apparatus for selecting frames of an image having a relatively small number of frames as a reference group, and selecting frames of an image different from that of the image of the reference group as a comparison group.

In claim 6,
The step of generating the learning data is,
The average similarity with the frames of the comparison group is calculated for each frame of the reference group, and the average similarity and the highest similarity are compared for each frame of the reference group. Learning data generation device to exclude.