KR102573511B1

KR102573511B1 - Image data processing appratus and method

Info

Publication number: KR102573511B1
Application number: KR1020200158262A
Authority: KR
Inventors: 이경한; 남우승
Original assignee: 서울대학교산학협력단
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2023-08-31
Also published as: KR20220071056A

Abstract

본 발명은 실시간 화상 통신을 위한 AI(artificial intelligence) 기반의 영상 데이터 처리 장치 및 방법에 관한 것이다. 본 발명의 일 실시 예에 따른 영상 데이터 처리 장치 중 영상 데이터 송신 처리 장치는, 기저장된 영상 데이터를 이용하여, 영상 데이터에 포함된 객체의 특성에 따라 최적의 압축을 수행하도록 훈련된 심층신경망 모델들을 저장하는 저장부와, 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델이 저장부에 저장되어 있는지 판단하는 판단부와, 판단부의 판단 결과, 전송할 대상 영상 데이터에 사용할 심층신경망 모델이 저장부에 저장되어 있음에 따라, 저장부로부터 영상 데이터를 압축할 심층신경망 모델을 선정하고, 선정한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축한 압축 영상 데이터를 생성하는 압축부와, 압축 영상 데이터를 영상 데이터 수신 처리 장치로 전송하는 전송부를 포함할 수 있다.The present invention relates to an apparatus and method for processing image data based on artificial intelligence (AI) for real-time image communication. An image data transmission and processing device among image data processing devices according to an embodiment of the present invention uses pre-stored image data to train deep neural network models to perform optimal compression according to the characteristics of objects included in the image data. A storage unit for storing, a determination unit for determining whether a deep neural network model to be used for compressing the image data is stored in the storage unit based on information on the target image data to be transmitted, and a deep neural network to be used for the target image data to be transmitted based on the determination result of the determination unit. As the model is stored in the storage unit, a compression unit that selects a deep neural network model to compress image data from the storage unit and generates compressed image data by compressing target image data to be transmitted using the selected deep neural network model; It may include a transmitter for transmitting the compressed video data to the video data receiving and processing device.

Description

Image data processing device and method {IMAGE DATA PROCESSING APPRATUS AND METHOD}

본 발명은 실시간 화상 통신을 위한 AI(artificial intelligence) 기반의 영상 데이터 처리 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for processing image data based on artificial intelligence (AI) for real-time image communication.

송신단과 수신단 사이의 네트워크 통신에서의 트래픽은 비교적 크기가 작은 텍스트 데이터나 나일 이미지에 비해 크기가 큰 영상 데이터가 많은 비중을 차지하고 있다. 특히 4K 또는 8K 이상의 고품질 영상과 같은 고용량 영상 데이터의 경우 실시간 스트리밍을 지원하기 위해 높은 압축률을 요구한다. H.265/HEVC나 VP9과 같은 최신 영상 압축 기술로도 고품질 영상의 실시간 스트리밍을 지원하는데 한계가 있다. 압축 효율을 높이기 위한 AI 기반 영상 압축 기법이 제안되었지만, 일반적인 AI 기반 영상 압축 기법은 영상 콘텐츠 별 AI 모델 훈련에 필요한 지연시간 때문에 실시간 스트리밍에 적용 불가능하다.Traffic in the network communication between the transmitter and the receiver occupies a large proportion of large-sized video data compared to relatively small-sized text data or Nile images. In particular, in the case of high-capacity video data such as high-quality video over 4K or 8K, a high compression rate is required to support real-time streaming. Even the latest video compression technologies such as H.265/HEVC or VP9 have limitations in supporting real-time streaming of high-quality video. Although AI-based video compression techniques have been proposed to increase compression efficiency, general AI-based video compression techniques cannot be applied to real-time streaming due to the delay required for AI model training for each video content.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The foregoing background art is technical information that the inventor possessed for derivation of the present invention or acquired during the derivation process of the present invention, and cannot necessarily be said to be known art disclosed to the general public prior to filing the present invention.

국내 공개특허공보 제10-2019-0033541호(2019.03.29)Domestic Patent Publication No. 10-2019-0033541 (2019.03.29)

본 발명의 일 과제는, 기저장된 영상 또는 초기화 과정을 통해 획득한 영상으로부터 객체의 특성을 학습한 심층신경망 모델을 통해 전송할 대상 영상의 중복성 및 유사성을 제거하는 심층신경망 기반 영상 압축 기법을 제안하는데 있다.An object of the present invention is to propose a deep neural network-based image compression technique that removes redundancy and similarity of a target image to be transmitted through a deep neural network model that learns object characteristics from a pre-stored image or an image acquired through an initialization process. .

본 발명의 일 과제는, 기저장된 영상 또는 초기화 과정을 통해 획득한 영상으로부터 사용자별 특성을 미리 학습하고, 학습된 심층신경망 모델을 저장함으로써 화상통신과 같은 실시간 스트리밍을 지원하는데 있다.An object of the present invention is to support real-time streaming such as video communication by pre-learning characteristics of each user from a pre-stored image or an image acquired through an initialization process and storing the learned deep neural network model.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 발명의 다른 과제 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시 예에 의해보다 분명하게 이해될 것이다. 또한, 본 발명이 해결하고자 하는 과제 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.The problem to be solved by the present invention is not limited to the above-mentioned problems, and other problems and advantages of the present invention that are not mentioned can be understood by the following description and more clearly understood by the embodiments of the present invention. It will be. In addition, it will be appreciated that the problems and advantages to be solved by the present invention can be realized by the means and combinations indicated in the claims.

본 발명의 일 실시 예에 따른 영상 데이터 처리 장치 중 영상 데이터 송신 처리 장치는, 기저장된 영상 데이터를 이용하여, 영상 데이터에 포함된 객체의 특성에 따라 최적의 압축을 수행하도록 훈련된 심층신경망 모델들을 저장하는 저장부와, 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델이 저장부에 저장되어 있는지 판단하는 판단부와, 판단부의 판단 결과, 전송할 대상 영상 데이터에 사용할 심층신경망 모델이 저장부에 저장되어 있음에 따라, 저장부로부터 영상 데이터를 압축할 심층신경망 모델을 선정하고, 선정한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축한 압축 영상 데이터를 생성하는 압축부와, 압축 영상 데이터를 영상 데이터 수신 처리 장치로 전송하는 전송부를 포함할 수 있다.An image data transmission and processing device among image data processing devices according to an embodiment of the present invention uses pre-stored image data to train deep neural network models to perform optimal compression according to the characteristics of objects included in the image data. A storage unit for storing, a determination unit for determining whether a deep neural network model to be used for compressing the image data is stored in the storage unit based on information on the target image data to be transmitted, and a deep neural network to be used for the target image data to be transmitted based on the determination result of the determination unit. As the model is stored in the storage unit, a compression unit that selects a deep neural network model to compress image data from the storage unit and generates compressed image data by compressing target image data to be transmitted using the selected deep neural network model; It may include a transmitter for transmitting the compressed video data to the video data receiving and processing device.

본 발명의 일 실시 예에 따른 영상 데이터 처리 장치 중 영상 데이터 수신 처리 장치는, 영상 데이터 송신 처리 장치로부터 압축 영상 데이터를 수신하는 수신부와, 사용자 계정과 연관된 심층신경망 모델들을 저장하는 저장부와, 압축 영상 데이터를 송신한 사용자 계정의 정보에 따라 저장부에 저장된 심층신경망 모델들 중 압축 영상 데이터를 복원할 심층신경망 모델을 선정하고, 선정된 심층신경망 모델들을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원하는 복원부를 포함할 수 있다.An image data receiving and processing device among image data processing devices according to an embodiment of the present invention includes: a receiver for receiving compressed image data from the image data transmission and processing device; a storage unit for storing deep neural network models associated with a user account; According to the information of the user account that sent the image data, a deep neural network model to restore the compressed image data is selected among the deep neural network models stored in the storage unit, and the original image data is restored from the compressed image data using the selected deep neural network models. It may include a restoration unit that does.

본 발명의 일 실시 예에 따른 영상 데이터 처리 방법 중 영상 데이터 송신 처리 방법은, 기저장된 영상 데이터를 이용하여, 영상 데이터에 포함된 객체의 특성에 따라 최적의 압축을 수행하도록 훈련된 심층신경망 모델들을 저장하는 저장부를 구축하는 단계와, 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델이 저장부에 저장되어 있는지 판단하는 단계와, 판단의 판단 결과, 전송할 대상 영상 데이터에 사용할 심층신경망 모델이 저장부에 저장되어 있음에 따라, 저장부로부터 영상 데이터를 압축할 심층신경망 모델을 선정하고, 선정한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축한 압축 영상 데이터를 생성하는 단계와, 압축 영상 데이터를 영상 데이터 수신 처리 장치로 전송하는 단계를 포함할 수 있다.Among the image data processing methods according to an embodiment of the present invention, the image data transmission processing method includes deep neural network models trained to perform optimal compression according to the characteristics of objects included in the image data using pre-stored image data. The step of constructing a storage unit to store, the step of determining whether a deep neural network model to be used for compression of the image data is stored in the storage unit based on the information of the target image data to be transmitted, the determination result, the step of determining whether or not the deep neural network model to be used for the image data to be transmitted is stored in the storage unit, and As the deep neural network model is stored in the storage unit, selecting a deep neural network model to compress image data from the storage unit, and generating compressed image data by compressing target image data to be transmitted using the selected deep neural network model; , transmitting the compressed video data to the video data receiving and processing device.

본 발명의 일 실시 예에 따른 영상 데이터 처리 방법 중 영상 데이터 송신 처리 방법은, 영상 데이터 송신 처리 장치로부터 압축 영상 데이터를 수신하는 단계와, 사용자 계정과 연관된 심층신경망 모델들을 저장하는 저장부에서 압축 영상 데이터를 송신한 사용자 계정의 정보에 따라 저장부에 저장된 심층신경망 모델들 중 압축 영상 데이터를 복원할 심층신경망 모델을 선정하는 단계와, 선정된 심층신경망 모델을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원하는 단계를 포함할 수 있다.Among the image data processing methods according to an embodiment of the present invention, a method for transmitting and processing image data includes the steps of receiving compressed image data from an image data transmission processing device, and storing compressed image data in a storage unit for storing deep neural network models associated with a user account. Selecting a deep neural network model to restore compressed image data from among deep neural network models stored in a storage unit according to information of a user account that has transmitted data; and extracting original image data from compressed image data using the selected deep neural network model. Restoration may be included.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램이 저장된 컴퓨터로 판독 가능한 기록매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer readable recording medium storing a computer program for executing the method may be further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages other than those described above will become apparent from the following drawings, claims and detailed description of the invention.

본 발명에 의하면, 기저장된 영상 또는 초기화 과정을 통해 획득한 영상으로부터 객체의 특성을 학습한 심층신경망 모델을 통해 전송할 대상 영상을 압축하여 전송함으로써 전송할 영상 데이터의 중복성 및 유사성을 제거할 수 있다.According to the present invention, redundancy and similarity of image data to be transmitted can be removed by compressing and transmitting a target image to be transmitted through a deep neural network model that has learned characteristics of an object from a pre-stored image or an image acquired through an initialization process.

또한, 영상 콘텐츠 별 심층신경망 모델 훈련이 필요한 기존의 심층신경망 기반 영상 압축 기법과 달리, 기저장된 영상 또는 초기화 과정을 통해 획득한 영상으로부터 사용자(영상 송신자 또는 영상 제작자)별 특성을 미리 학습하고, 학습된 심층신경망 모델을 저장함으로써 화상통신과 같은 실시간 스트리밍을 지원할 수 있다.In addition, unlike conventional deep neural network-based video compression techniques that require deep neural network model training for each video content, the characteristics of each user (video sender or video producer) are learned in advance from pre-stored images or images acquired through an initialization process, and learning Real-time streaming such as video communication can be supported by storing the deep neural network model.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 실시 예에 따른 영상 데이터 처리 환경의 예시도이다.
도 2는 다른 실시 예에 따른 영상 콘텐츠를 송신 및 수신하는 영상 데이터 처리 환경의 예시도이다.
도 3은 도 1의 영상 데이터 처리 환경에 포함되는 영상 데이터 송신 장치 및 영상 데이터 수신 장치의 구성을 개략적으로 설명하기 위하여 도시한 블록도이다.
도 4 내지 도 7은 본 실시 예에 따른 영상 데이터 송신 장치 중 압축부의 동작을 개략적으로 설명하기 위하여 도시한 예시이다.
도 8은 본 실시 예에 따른 영상 데이터 처리 방법을 설명하기 위한 흐름도이다.1 is an exemplary view of an image data processing environment according to an exemplary embodiment.
2 is an exemplary diagram of a video data processing environment for transmitting and receiving video content according to another embodiment.
FIG. 3 is a block diagram schematically illustrating configurations of an image data transmission device and an image data reception device included in the image data processing environment of FIG. 1 .
4 to 7 are examples for schematically explaining the operation of the compression unit in the video data transmission apparatus according to the present embodiment.
8 is a flowchart illustrating a method of processing image data according to an exemplary embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 아래에 제시되는 실시 예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Advantages and features of the present invention, and methods for achieving them will become clear with reference to the detailed description of embodiments in conjunction with the accompanying drawings. However, it should be understood that the present invention is not limited to the embodiments presented below, but may be implemented in a variety of different forms, and includes all conversions, equivalents, and substitutes included in the spirit and scope of the present invention. . The embodiments presented below are provided to complete the disclosure of the present invention and to fully inform those skilled in the art of the scope of the invention to which the present invention belongs. In describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded. Terms such as first and second may be used to describe various components, but components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

이하, 본 발명에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same or corresponding components are assigned the same reference numerals, and overlapping descriptions thereof are omitted. I'm going to do it.

도 1은 본 실시 예에 따른 영상 데이터 처리 환경의 예시도이다. 도 1을 참조하면, 영상 데이터 처리 환경(1)은, 영상 데이터 송신 처리 장치(100), 영상 데이터 수신 처리 장치(200) 및 네트워크(300)를 포함할 수 있다. 1 is an exemplary view of an image data processing environment according to an exemplary embodiment. Referring to FIG. 1 , an image data processing environment 1 may include an image data transmission processing device 100 , an image data reception processing device 200 and a network 300 .

영상 데이터 송신 처리 장치(100)는 기저장된 심층신경망 모델들로부터, 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델을 선정할 수 있다. 영상 데이터 송신 처리 장치(100)는 전송할 대상 영상 데이터를 선정한 심층신경망 모델을 이용하여 압축한 압축 영상 데이터를 영상 데이터 수신 처리 장치(200)로 전송할 수 있다. 여기서, 영상 데이터의 정보는 영상 데이터를 생성하는 사용자 계정, 영상 데이터가 생성되는 시간, 영상 데이터가 생성되는 위치 중 적어도 하나를 포함할 수 있다. The image data transmission and processing apparatus 100 may select a deep neural network model to be used for compressing image data based on information on image data to be transmitted from pre-stored deep neural network models. The image data transmission and processing apparatus 100 may transmit compressed image data compressed using a deep neural network model in which target image data to be transmitted is selected to the image data reception and processing apparatus 200 . Here, the information of the image data may include at least one of a user account generating the image data, a time at which the image data is generated, and a location where the image data is generated.

영상 데이터 송신 처리 장치(100)는 기저장된 영상 데이터로부터 전송할 대상 영상 데이터와 유사한 영상 데이터를 선정하고, 선정한 유사한 영상 데이터와 유사한 영상 데이터에 포함된 객체의 특성을 훈련 데이터로 학습하여 심층신경망 모델을 새로이 생성할 수 있다. 영상 데이터 송신 처리 장치(100)는 대상 영상 데이터를 새로이 생성한 심층신경망 모델을 이용하여 압축한 압축 영상 데이터와, 새로이 생성한 심층신경망 모델을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다.The image data transmitting and processing device 100 selects image data similar to target image data to be transmitted from pre-stored image data, learns characteristics of objects included in the selected similar image data and similar image data as training data, and builds a deep neural network model. can be created anew. The image data transmitting and processing apparatus 100 may transmit compressed image data obtained by compressing the target image data using the newly generated deep neural network model and the newly generated deep neural network model to the image data receiving and processing apparatus 200 .

영상 데이터 송신 처리 장치(100)는 압축 영상 데이터와 선정한 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다.The image data transmission and processing apparatus 100 may transmit compressed image data and a training manual used for training the selected deep neural network model to the image data reception and processing apparatus 200 .

영상 데이터 수신 처리 장치(200)는 영상 데이터 송신 처리 장치(100)로부터 압축 영상 데이터를 수신하면, 기저장된 사용자 계정과 연관되어 저장된 심층신경망 모델들로부터, 압축 영상 데이터를 복원할 심층신경망 모델을 선정하고, 선정된 심층신경망 모델들을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원할 수 있다.When the image data receiving and processing device 200 receives compressed image data from the image data transmission and processing device 100, it selects a deep neural network model to restore the compressed image data from deep neural network models stored in association with a pre-stored user account. And, the original image data can be restored from the compressed image data using the selected deep neural network models.

영상 데이터 수신 처리 장치(200)는 영상 데이터 송신 처리 장치로(100)부터 압축 영상 데이터와 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 수신하는 경우, 수신한 훈련 메뉴얼을 이용하여 심층신경망 모델을 구축하고, 구축한 심층신경망 모델을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원할 수 있다. 여기서, 훈련 메뉴얼을 이용하여 심층신경망 모델 구축을 완료하는 시점은 원본 영상 데이터를 복원하기 전을 포함할 수 있다.When the image data receiving and processing device 200 receives compressed image data and a training manual used to train a deep neural network model from the image data transmission processing device 100, a deep neural network model is built using the received training manual. And, the original image data can be restored from the compressed image data using the built deep neural network model. Here, the time point at which the deep neural network model construction is completed using the training manual may include before restoring the original image data.

본 실시 예에서, 영상 데이터 송신 처리 장치(100)는 발신자 단말기(미도시)를 포함할 수 있고, 영상 데이터 수신 처리 장치(200)는 착신자 단말기(미도시)를 포함할 수 있다. 따라서 영상 데이터 송신 처리 장치(100)와 영상 데이터 수신 처리 장치(200)가 영상 데이터를 송수신한다 함은, 발신자 단말기와 착신자 단말기 사이에 화상 통신을 수행함을 포함할 수 있다.In this embodiment, the video data transmission and processing apparatus 100 may include a caller terminal (not shown), and the video data reception and processing apparatus 200 may include a called party terminal (not shown). Accordingly, when the video data transmission processing device 100 and the video data reception processing device 200 transmit and receive image data, video communication may be performed between the caller terminal and the called party terminal.

본 실시 예에서, 심층신경망 모델은 인코더를 이용한 압축 심층신경망 모델 및 디코더를 이용한 복원 심층신경망 모델을 포함할 수 있다. 즉, 하나의 심층신경망 모델은 압축 심층신경망 모델 및 복원 심층신경망 모델의 집합으로 이루어 질 수 있다. 심층신경망 모델로 입력되는 데이터의 종류에 따라 압축 심층신경망 모델이 선택되거나, 복원 심층신경망 모델이 선택되고, 출력되는 데이터 또한 달라질 수 있다. 심층신경망 모델로 입력되는 데이터가 일반적인 Raw 영상 데이터인 경우 압축 심층신경망 모델이 선택되어 압축 영상 데이터를 출력할 수 있다. 또한 심층신경망 모델로 입력되는 데이터가 압축 영상 데이터인 경우 복원 심층신경망 모델이 선택되어 복원 영상 데이터를 출력할 수 있다.In this embodiment, the deep neural network model may include a compressed deep neural network model using an encoder and a reconstructed deep neural network model using a decoder. That is, one deep neural network model may be composed of a set of compressed deep neural network models and reconstructed deep neural network models. Depending on the type of data input to the deep neural network model, a compressed deep neural network model is selected or a reconstructed deep neural network model is selected, and output data may also vary. If data input to the deep neural network model is general raw image data, a compressed deep neural network model may be selected and compressed image data may be output. In addition, when data input to the deep neural network model is compressed image data, a reconstructed deep neural network model may be selected and reconstructed image data may be output.

영상 데이터 송신 처리 장치(100)에서 사용하거나 생성하는 심층신경망 모델은 압축 심층신경망 모델일 수 있다. 압축 심층신경망 모델은 객체가 포함된 영상 데이터를 이용하여 객체가 포함된 압축 영상 데이터를 출력하도록 미리 훈련될 수 있다. 여기서 객체가 포함된 영상 데이터라 함은, 상황 및/또는 시간에 따른 사용자별 특성(예를 들어, 화상 통화의 경우 화상 통화를 하고 있는 사용자의 얼굴, 배경, 오전 시간대의 사용자, 저녁 시간대의 사용자, 사용자가 주로 찍는 영상의 종류로서 축구 영상, 농구 영상 등) )이 포함된 영상 데이터일 수 있다. The deep neural network model used or generated by the image data transmission and processing apparatus 100 may be a compressed deep neural network model. The compressed deep neural network model may be trained in advance to output compressed image data including objects using image data including objects. Here, video data including an object refers to user-specific characteristics according to situation and/or time (for example, in the case of a video call, the face of a user making a video call, background, user in the morning, user in the evening) , It may be image data including soccer images, basketball images, etc.) as types of images mainly taken by the user.

영상 데이터 수신 처리 장치(200)에서 사용하거나 구축하는 심층신경망 모델은 복원 심층신경망 모델일 수 있다. 복원 심층신경망 모델은 객체가 포함된 압축 영상 데이터를 이용하여 객체가 포함된 원본 영상 데이터를 출력하도록 미리 훈련될 수 있다.The deep neural network model used or constructed by the image data receiving and processing apparatus 200 may be a reconstructed deep neural network model. The reconstructed deep neural network model may be trained in advance to output original image data including objects using compressed image data including objects.

네트워크(300)는 영상 데이터 송신 처리 장치(100)와, 영상 데이터 수신 처리 장치(200)를 연결하는 역할을 수행할 수 있다. 이러한 네트워크(300)는 예컨대 LANs(local area networks), WANs(wide area networks), MANs(metropolitan area networks), ISDNs(integrated service digital networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 또한 네트워크(300)는 근거리 통신 및/또는 원거리 통신을 이용하여 정보를 송수신할 수 있다. 여기서, 근거리 통신은 블루투스(bluetooth), RFID(radio frequency identification), 적외선 통신(IrDA, infrared data association), UWB(ultra-wideband), ZigBee, Wi-Fi (wireless fidelity) 기술을 포함할 수 있고, 원거리 통신은 CDMA(code division multiple access), FDMA(frequency division multiple access), TDMA(time division multiple access), OFDMA(orthogonal frequency division multiple access), SC-FDMA(single carrier frequency division multiple access) 기술을 포함할 수 있다.The network 300 may play a role of connecting the image data transmission and processing device 100 and the image data reception and processing device 200 . Such a network 300 may be, for example, a wired network such as local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), and integrated service digital networks (ISDNs), wireless LANs, CDMA, Bluetooth, and satellite communication. However, the scope of the present invention is not limited thereto. Also, the network 300 may transmit and receive information using short-range communication and/or long-distance communication. Here, the short-range communication may include Bluetooth, radio frequency identification (RFID), infrared data association (IrDA), ultra-wideband (UWB), ZigBee, wireless fidelity (Wi-Fi) technology, Telecommunications include code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), and single carrier frequency division multiple access (SC-FDMA) technologies. can do.

네트워크(300)는 허브, 브리지, 라우터, 스위치와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크(300)는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크(300)에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다. 더 나아가 네트워크(300)는 사물 등 분산된 구성 요소들 간에 정보를 주고 받아 처리하는 IoT(Internet of Things, 사물인터넷) 망 및/또는 5G 통신을 지원할 수 있다.The network 300 may include connections of network elements such as hubs, bridges, routers, and switches. Network 300 may include one or more connected networks, such as a multiple network environment, including a public network such as the Internet and a private network such as a secure corporate private network. Access to network 300 may be provided via one or more wired or wireless access networks. Furthermore, the network 300 may support an Internet of Things (IoT) network and/or 5G communication in which information is exchanged and processed between distributed components such as things.

도 2는 다른 실시 예에 따른 영상 콘텐츠를 송신 및 수신하는 영상 데이터 처리 환경의 예시도이다. 도 2를 참조하면, 영상 데이터 송신 처리 장치(100)는 영상 콘텐츠를 저장 및/또는 제공하는 서버(미도시) 또는 영상 콘텐츠를 생성 및/또는 저장하여 제공하는 영상 콘텐츠 제공자 단말기(미도시)를 포함할 수 있다. 또한 영상 데이터 수신 처리 장치(200)는 서버 및/또는 영상 콘텐츠 제공자 단말기에 접속하여 영상 콘텐츠를 수신 및 재생하는 영상 콘텐츠 수신자 단말기(미도시)를 포함할 수 있다.2 is an exemplary diagram of a video data processing environment for transmitting and receiving video content according to another embodiment. Referring to FIG. 2 , the video data transmitting and processing apparatus 100 includes a server (not shown) that stores and/or provides video contents or a video contents provider terminal (not shown) that generates,/or stores, and provides video contents. can include In addition, the video data reception and processing apparatus 200 may include a video content receiver terminal (not shown) that accesses a server and/or a video content provider terminal to receive and reproduce video content.

도 2로부터, 영상 데이터 송신 처리 장치(100, 서버 또는 영상 콘텐츠 송신자 단말기)가 현재 전송하는 영상 콘텐츠와, 영상 데이터 수신 처리 장치(200, 영상 콘텐츠 수신자 단말기)가 3일전에 수신한 영상 콘텐츠를 비교하면, 배경과 인물이 동일하고, 인물이 먹고 있는 음식이 다른 것을 알 수 있다.From FIG. 2 , the video content currently transmitted by the video data transmission processing device 100 (server or video content sender terminal) is compared with the video content received 3 days ago by the video data reception processing device 200 (video content receiver terminal). Then, it can be seen that the background and the character are the same, and the food the character is eating is different.

이와 같은 경우 영상 데이터 송신 처리 장치(100)는 기저장된 복수의 심층신경망 모델들 중, 3 일전에 저장된 영상 콘텐츠의 객체의 특성을(예를 들어, 배경, 인물)을 훈련한 심층신경망 모델을 이용하여 전송할 대상 영상 콘텐츠(도 2의 On air)를 압축할 수 있다. 영상 데이터 송신 처리 장치(100)는 객체의 특성을 훈련한 심층신경망 모델을 이용하여 영상 콘텐츠를 압축하기 때문에, 영상 콘텐츠 압축 시에 중복성 및 유사성을 제거함으로써 실시간으로 영상 콘텐츠를 전송할 수 있다.In this case, the image data transmission and processing apparatus 100 uses a deep neural network model trained on the object characteristics (eg, background, person) of image content stored 3 days ago among a plurality of pre-stored deep neural network models. to compress the target video content (on air in FIG. 2) to be transmitted. Since the image data transmission and processing apparatus 100 compresses image content using a deep neural network model trained on object characteristics, it is possible to transmit image content in real time by removing redundancy and similarity when compressing image content.

영상 데이터 송신 처리 장치(100)는 압축한 대상 영상 콘텐츠와, 전송할 대상 영상 콘텐츠 압축 시에 사용한 심층신경망 모델(또는 심층신경망 훈련에 사용한 훈련 메뉴얼)을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다. The image data transmission and processing device 100 may transmit compressed target image content and a deep neural network model used for compressing the target image content to be transmitted (or a training manual used for deep neural network training) to the image data reception and processing device 200. .

압축한 대상 영상 콘텐츠를 수신한 영상 데이터 수신 처리 장치(200)는 사용자 계정과 연관되어 저장된 심층신경망 모델을 이용하여 압축한 대상 영상 콘텐츠로부터 원본 영상 콘텐츠를 복원할 수 있다. Upon receiving the compressed target video content, the video data receiving and processing apparatus 200 may restore original video content from the compressed target video content using a deep neural network model stored in association with a user account.

영상 데이터 수신 처리 장치(200)는 객체의 특성을 훈련한 심층신경망 모델을 이용하여 원본 영상 콘텐츠를 복원하기 때문에, 영상 콘텐츠 복원 시에 중복성 및 유사성을 제거함으로써 실시간으로 영상 콘텐츠를 재생할 수 있다.Since the image data reception and processing apparatus 200 restores original image content using a deep neural network model trained on object characteristics, it can reproduce image content in real time by removing redundancy and similarity when reconstructing image content.

또는 압축한 대상 영상 콘텐츠와 심층신경망 훈련에 사용한 훈련 메뉴얼을 수신한 영상 데이터 수신 처리 장치(200)는 훈련 메뉴얼을 사용하여 심층신경망 모델을 구축하고, 구축한 심층신경망 모델을 이용하여 압축한 대상 영상 콘텐츠로부터 원본 영상 콘텐츠를 복원할 수 있다.Alternatively, the image data receiving and processing apparatus 200 receiving the compressed target image content and the training manual used for training the deep neural network builds a deep neural network model using the training manual, and uses the built deep neural network model to compress the target image. Original video content may be restored from content.

도 3은 도 1의 영상 데이터 처리 환경에 포함되는 영상 데이터 송신 장치 및 영상 데이터 수신 장치의 구성을 개략적으로 설명하기 위하여 도시한 블록도이다. 이하의 설명에서 도 1 및 도 2에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다. FIG. 3 is a block diagram schematically illustrating configurations of an image data transmission device and an image data reception device included in the image data processing environment of FIG. 1 . In the following description, descriptions of portions overlapping those of FIGS. 1 and 2 will be omitted.

도 3을 참조하면, 영상 데이터 송신 처리 장치(100)는 제1 저장부(110), 판단부(120), 압축부(130), 전송부(140), 데이터베이스(150), 생성부(160) 및 제1 제어부(170)를 포함할 수 있다.Referring to FIG. 3 , the image data transmission and processing apparatus 100 includes a first storage unit 110, a determination unit 120, a compression unit 130, a transmission unit 140, a database 150, and a generation unit 160. ) and a first control unit 170.

또한, 도 3을 참조하면, 영상 데이터 수신 처리 장치(200)는 수신부(210), 제2 저장부(220), 검색부(230), 복원부(240), 구축부(250) 및 제2 제어부(260)를 포함할 수 있다. 또한, 본 명세서에서, "부"는 프로세서 또는 회로와 같은 하드웨어 구성(hardware component), 및/또는 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component)일 수 있다.Also, referring to FIG. 3 , the image data receiving and processing apparatus 200 includes a receiving unit 210, a second storage unit 220, a search unit 230, a restoration unit 240, a construction unit 250, and a second storage unit 220. A controller 260 may be included. Also, in this specification, “unit” may be a hardware component such as a processor or a circuit, and/or a software component executed by the hardware component such as a processor.

먼저, 영상 데이터 송신 처리 장치(100)를 설명하기로 한다.First, the video data transmission and processing apparatus 100 will be described.

제1 저장부(110)는 데이터베이스(150)에 저장되어 있는 영상 데이터를 이용하여, 영상 데이터에 포함된 객체의 특성에 따라 최적의 압축을 수행하도록 훈련된 심층신경망 모델들을 저장할 수 있다. 여기서, 심층신경망 모델들은, 영상 데이터를 생성하는 사용자 계정에 따라 미리 저장된 영상 데이터를 이용하여 압축을 수행하도록 훈련된 학습 모델로서, 제1 사용자 계정의 영상 데이터를 압축하기 위해 제1 사용자 계정으로 생성되었던 기존 영상 데이터를 훈련 데이터로 사용하여 미리 훈련된 제1 심층신경망 모델을 포함할 수 있다. The first storage unit 110 may store deep neural network models trained to perform optimal compression according to characteristics of objects included in the image data using image data stored in the database 150 . Here, the deep neural network models are training models trained to perform compression using pre-stored image data according to the user account that generates the image data, and are created in the first user account to compress the image data of the first user account. It may include a first deep neural network model trained in advance by using existing image data as training data.

일 실시 예로, 제1 저장부(110)에는 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 저장할 수 있다. 여기서 훈련 메뉴얼은 예를 들어, 훈련 데이터 세트의 정보, 훈련 횟수, 랜덤 시드, 옵티마이저(optimizer)의 종류 등, 압축 심층신경망 모델을 만들기 위해 필요한 모든 정보를 저장할 수 있다.As an embodiment, the first storage unit 110 may store training manuals used when training a deep neural network model. Here, the training manual may store all information required to create a compressed deep neural network model, such as, for example, training data set information, number of times of training, random seed, and type of optimizer.

여기서, 제1 저장부(110)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 이러한 제1 저장부(110)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD. CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다.Here, the first storage unit 110 may include magnetic storage media or flash storage media, but the scope of the present invention is not limited thereto. The first storage unit 110 may include a built-in memory and/or an external memory, and may include volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, Non-volatile memory such as flash ROM, NAND flash memory, or NOR flash memory, SSD. It may include a compact flash (CF) card, a flash drive such as an SD card, a Micro-SD card, a Mini-SD card, an Xd card, or a memory stick, or a storage device such as an HDD.

판단부(120)는 영상 데이터 수신 처리 장치(200)로 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델이 제1 저장부(110)에 저장되어 있는지 판단할 수 있다.The determination unit 120 may determine whether a deep neural network model to be used for compressing the image data is stored in the first storage unit 110 based on information on target image data to be transmitted to the image data receiving and processing apparatus 200 .

여기서, 대상 영상 데이터의 정보는, 영상 데이터를 생성하는 사용자 계정, 영상 데이터가 생성되는 시간, 영상 데이터가 생성되는 위치 중 적어도 하나를 포함할 수 있으며, 대상 영상 데이터의 정보는 제1 저장부(110)에 저장되어 있는 심층신경망 모델에 태깅되어 있을 수 있다.Here, the information of the target image data may include at least one of a user account for generating the image data, a time at which the image data is generated, and a location where the image data is generated, and the information of the target image data may include a first storage unit ( 110) may be tagged with the deep neural network model.

따라서, 판단부(120)는 제1 저장부(110)로부터, 전송할 대상 영상 데이터의 정보에 기초하여, 동일한 태깅 정보를 갖는 심층신경망 모델이 있는지 여부를 판단할 수 있다. Accordingly, the determination unit 120 may determine whether there is a deep neural network model having the same tagging information based on the information of the target image data to be transmitted from the first storage unit 110 .

다른 실시 예로, 판단부(120)는 영상 데이터 수신 처리 장치(200)로 전송할 대상 영상 데이터의 특징을 추출하여 산출한 제1 유사도 정보를 기반으로, 영상 데이터의 압축에 사용할 심층신경망 모델이 제1 저장부(110)에 저장되어 있는지 판단할 수 있다.As another embodiment, the determination unit 120 determines the first deep neural network model to be used for compressing the image data based on the first similarity information calculated by extracting the characteristics of the target image data to be transmitted to the image data receiving and processing device 200. It may be determined whether or not it is stored in the storage unit 110 .

본 실시 예에서, 심층신경망 모델을 훈련하기 위해 유사한 특징을 가지는 영상 데이터를 훈련 데이터로 사용할 수 있는데, 이때 사용한 영상 데이터들의 특징을 추출하여 산출한 제2 유사도 정보를 훈련된 심층신경망 모델과 함께 제1 저장부(110)에 저장할 수 있다. 여기서, 유사도 정보의 산출은, 이미지에 포함된 윤곽선 및 색상의 특성과 같은 이미지 특징(feature)을 추출하여 유사도를 산출하는 이미지 특징 매칭, Context-based Sementic Matching, Context-based Visual Maching 등 영상간 유사도를 산출하는 기술 중 하나를 이용해 산출할 수 있다. In this embodiment, image data having similar characteristics can be used as training data to train the deep neural network model. In this case, the second similarity information calculated by extracting the features of the image data used is provided along with the trained deep neural network model. 1 can be stored in the storage unit 110. Here, the calculation of the similarity information is similarity between images, such as image feature matching, context-based semantic matching, and context-based visual matching, which calculate similarity by extracting image features such as outlines and color characteristics included in the image. can be calculated using one of the techniques for calculating .

따라서, 판단부(120)는 제1 저장부(110)로부터, 제1 유사도 정보와 가장 유사한 제2 유사도 정보를 갖는 심층신경망 모델이 있는지 여부를 판단할 수 있다.Accordingly, the determination unit 120 may determine whether there is a deep neural network model having second similarity information most similar to the first similarity information, from the first storage unit 110 .

압축부(130)는 판단부(120)의 판단 결과, 전송할 대상 영상 데이터에 사용할 심층신경망 모델이 제1 저장부(110)에 저장되어 있음에 따라, 제1 저장부(110)로부터 영상 데이터를 압축할 심층신경망 모델을 선정하고, 선정한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축한 압축 영상 데이터를 생성할 수 있다. 이하 압축부에 대한 상세한 설명은 도 4 내지 도 7을 참조하여 설명하기로 한다.The compression unit 130 extracts the image data from the first storage unit 110 according to the determination result of the determination unit 120 that the deep neural network model to be used for the image data to be transmitted is stored in the first storage unit 110. A deep neural network model to be compressed may be selected, and compressed image data obtained by compressing target image data to be transmitted may be generated using the selected deep neural network model. Hereinafter, a detailed description of the compression unit will be described with reference to FIGS. 4 to 7 .

전송부(140)는 압축부(130)에서 생성한 압축 영상 데이터를 영상 데이터 수신 처리 장치(200)로 전송할 수 있다. The transmission unit 140 may transmit the compressed image data generated by the compression unit 130 to the image data receiving and processing device 200 .

선택적 실시 예로, 전송부(140)는 압축부(130)에서 사용한 심층 신경망 모델을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다. 이와 같이 영상 데이터 송신 처리 장치(100)가 심층신경망 모델을 영상 데이터 수신 처리 장치(200)로 전송하는 경우, 영상 데이터 송신 처리 장치(100) 입장에서는 트랜스미션(transmission) 코스트 손해를 보지만, 영상 데이터 수신 처리 장치(200) 입장에서는 심층신경망 모델을 훈련시키는 컴퓨팅(computing) 코스트가 필요 없는 효과를 창출할 수 있다.As an optional embodiment, the transmission unit 140 may transmit the deep neural network model used in the compression unit 130 to the image data receiving and processing device 200 . In this way, when the video data transmission and processing apparatus 100 transmits the deep neural network model to the video data reception and processing apparatus 200, the video data transmission and processing apparatus 100 suffers a transmission cost loss, but receives the video data. From the viewpoint of the processing device 200, it is possible to create an effect that does not require a computing cost for training a deep neural network model.

선택적 실시 예로, 전송부(140)는 심층신경망 모델을 훈련할 수 있는 훈련 메뉴얼을 제1 저장부(110)로부터 선정하고, 압축 영상 데이터와 훈련 메뉴얼을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다. 이와 같이 영상 데이터 송신 처리 장치(100)가 심층신경망 모델의 훈련에 사용한 훈련 메뉴얼을 영상 데이터 수신 처리 장치(200)로 전송하는 경우, 영상 데이터 송신 처리 장치(100) 입장에서는 트랜스미션(transmission) 코스트를 절약할 수 있지만, 영상 데이터 수신 처리 장치(200) 입장에서는 심층신경망 모델을 생성하기 위한 컴퓨팅(computing) 코스트가 필요한 효과를 창출할 수 있다.As an optional embodiment, the transmission unit 140 may select a training manual capable of training a deep neural network model from the first storage unit 110 and transmit the compressed image data and the training manual to the image data receiving and processing device 200. there is. In this way, when the video data transmission and processing apparatus 100 transmits the training manual used for training the deep neural network model to the video data reception and processing apparatus 200, the video data transmission and processing apparatus 100 reduces the transmission cost. Savings can be achieved, but from the point of view of the image data receiving and processing device 200, an effect requiring computing cost for generating a deep neural network model can be created.

데이터베이스(150)는 영상 데이터를 저장하고 있다. 여기서 영상 데이터라 함은, 이미지, 또는 동영상, 또는 텍스트, 이미지, 동영상 및 음원 등을 조합한 영상 콘텐츠를 포함할 수 있다. The database 150 stores image data. Here, the video data may include image, video, or video content combining text, image, video, and sound source.

데이터베이스(150)에 저장되어 있는 영상 데이터라 함은, 기존에 전송된 영상 데이터, 영상 데이터 수신 처리 장치(200)와 동기화된 영상 데이터, 또는 영상 데이터 수신 처리 장치(200)와 초기화 과정을 통해 획득한 영상 데이터 중 하나 이상을 포함할 수 있다.Image data stored in the database 150 refers to previously transmitted image data, image data synchronized with the image data receiving and processing device 200, or acquired through an initialization process with the image data receiving and processing device 200. One or more of the image data may be included.

생성부(160)는 판단부(120)의 판단 결과, 전송할 대상 영상 데이터에 사용할 심층신경망 모델이 제1 저장부(110)에 저장되어 있지 않음에 따라, 데이터베이스(150)로부터 전송할 대상 영상 데이터와 유사한 영상 데이터를 선정하고, 선정한 유사한 영상 데이터와 유사한 영상 데이터에 포함된 객체의 특성을 훈련 데이터로 학습하여 심층신경망 모델을 생성할 수 있다.As a result of the determination of the determination unit 120, the generating unit 160 determines that the deep neural network model to be used for the target image data to be transmitted is not stored in the first storage unit 110, and thus the target image data to be transmitted and the target image data to be transmitted from the database 150. A deep neural network model may be created by selecting similar image data and learning characteristics of an object included in the selected similar image data and the similar image data as training data.

여기서 압축부(130)는 생성부(160)가 생성한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축한 압축 영상 데이터를 생성할 수 있고, 전송부(140)는 압축 영상 데이터와 생성부(160)가 생성한 심층신경망 모델을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다.Here, the compression unit 130 may generate compressed image data obtained by compressing target image data to be transmitted using the deep neural network model generated by the generation unit 160, and the transmission unit 140 may generate the compressed image data and the generation unit ( 160) may transmit the generated deep neural network model to the image data receiving and processing device 200.

제1 제어부(170)는 영상 데이터 송신 처리 장치(100) 전체의 동작을 제어할 수 있다. 제1 제어부(170)는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령어로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The first control unit 170 may control the entire operation of the video data transmission and processing apparatus 100 . The first controller 170 may include all types of devices capable of processing data, such as a processor. Here, a 'processor' may refer to a data processing device embedded in hardware having a physically structured circuit to perform functions expressed by codes or instructions included in a program, for example. As an example of such a data processing device built into hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) circuit), field programmable gate array (FPGA), etc., but the scope of the present invention is not limited thereto.

다음에, 영상 데이터 수신 처리 장치(200)를 설명하기로 한다.Next, the video data receiving and processing device 200 will be described.

수신부(210)는 영상 데이터 송신 처리 장치(100)로부터 압축 영상 데이터를 수신할 수 있다.The receiving unit 210 may receive compressed video data from the video data transmission and processing apparatus 100 .

제2 저장부(220)는 사용자 계정과 연관된 심층신경망 모델들을 저장할 수 있다. 여기서, 심층신경망 모델들은, 영상 데이터를 송신하는 사용자 계정에 따라 미리 저장된 영상 데이터를 이용하여 복원을 수행하도록 훈련된 학습 모델로서, 제1 사용자 계정의 영상 데이터의 압축 영상 데이터를 복원하기 위해 제1 사용자 계정으로부터 수신한 기존 영상 데이터를 훈련 데이터로 사용하여 미리 훈련된 제1 복원 심층신경망 모델들을 포함할 수 있다.The second storage unit 220 may store deep neural network models associated with a user account. Here, the deep neural network models are learning models trained to perform restoration using pre-stored image data according to the user account that transmits the image data, and to restore compressed image data of the image data of the first user account, the first First reconstructed deep neural network models previously trained using existing image data received from a user account as training data may be included.

검색부(230)는 수신부(210)에 의해, 영상 데이터 송신 처리 장치(100)로부터 압축 영상 데이터와, 압축 영상 데이터에 포함된 송신자의 사용자 계정 정보를 수신함에 따라, 제2 저장부(220)로부터 사용자 계정과 연관되어 저장된 심층신경망 모델을 검색할 수 있다.The search unit 230 receives the compressed video data and the sender's user account information included in the compressed video data from the video data transmission processing apparatus 100 by the receiver 210, so that the second storage unit 220 A deep neural network model stored in association with a user account can be retrieved from .

복원부(240)는 압축 영상 데이터를 송신한 사용자 계정의 정보에 따라 제2 저장부(220)에 저장된 심층신경망 모델들 중 압축 영상 데이터를 복원할 심층신경망 모델을 선정하고, 선정된 심층신경망 모델들을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원할 수 있다.The restoration unit 240 selects a deep neural network model to restore the compressed image data from among the deep neural network models stored in the second storage unit 220 according to the information of the user account that transmitted the compressed image data, and selects the selected deep neural network model. Original video data can be restored from compressed video data using .

일 실시 예로, 수신부(210)는 영상 데이터 송신 처리 장치(100)로부터 압축 영상 데이터와 심층신경망 모델을 수신할 수 있다. 복원부(220)는 수신한 심층신경망 모델을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원할 수 있다.As an example, the receiving unit 210 may receive compressed image data and a deep neural network model from the image data transmission and processing apparatus 100 . The restoration unit 220 may restore original image data from compressed image data using the received deep neural network model.

일 실시 예로, 수신부(210)는 영상 데이터 송신 처리 장치(100)로부터 압축 영상 데이터와 심층신경망 모델의 훈련에 사용한 훈련 메뉴얼을 수신할 수 있다.As an example, the receiving unit 210 may receive compressed image data and a training manual used for training a deep neural network model from the image data transmission and processing device 100 .

구축부(230)는 수신부(210)가 심층신경망 모델의 훈련에 사용한 훈련 메뉴얼을 수신함에 따라, 훈련 메뉴얼을 사용하여 심층신경망 모델을 구축할 수 있다. 구축부(230)가 구축한 심층신경망 모델은 제2 저장부(240)에 저장될 수 있다. 여기서, 훈련 매뉴얼을 사용하여 심층신경망 모델을 구축하는 경우는, 제2 저장부(220)에 압축 영상 데이터를 복원할 심층신경망이 저장되어 있지 않고, 수신부(210)가 심층신경망 모델의 훈련에 사용한 훈련 메뉴얼을 수신한 경우를 포함할 수 있다.The builder 230 may build a deep neural network model by using the training manual as the receiver 210 receives the training manual used to train the deep neural network model. The deep neural network model built by the construction unit 230 may be stored in the second storage unit 240 . Here, when the deep neural network model is built using the training manual, the deep neural network to restore the compressed image data is not stored in the second storage unit 220, and the receiving unit 210 uses the deep neural network model for training. This may include a case where a training manual is received.

복원부(220)는 구축부(230)가 구축한 심층신경망 모델을 이용하여 압축 영상 데이터로부터 원본 영상 데이터를 복원할 수 있다.The restoration unit 220 may restore original image data from the compressed image data using the deep neural network model built by the construction unit 230 .

제2 제어부(260)는 영상 데이터 수신 처리 장치(200) 전체의 동작을 제어할 수 있으며, 상세한 내용은 제1 제어부(170)와 동일하므로 생략하기로 한다.The second controller 260 can control the entire operation of the image data receiving and processing device 200, and since details are the same as those of the first controller 170, they will be omitted.

도 4 내지 도 7은 본 실시 예에 따른 영상 데이터 송신 장치 중 압축부의 동작을 개략적으로 설명하기 위하여 도시한 예시이다. 이하의 설명에서 도 1 내지 도 3에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다.4 to 7 are examples for schematically explaining the operation of the compression unit in the video data transmission apparatus according to the present embodiment. In the following description, descriptions of portions overlapping those of FIGS. 1 to 3 will be omitted.

일반적으로 영상 압축은 공간적 중복성 제거(spatial redundancy elimination)와 시간적 중복성 제거(temporal redundancy elimination)으로 구성될 수 있다. 시간적 중복성 제거를 위해 영상의 프레임을 총 세가지 종류(I 프레임, P 프레임, B 프레임)로 구분할 수 있다. In general, image compression may be composed of spatial redundancy elimination and temporal redundancy elimination. In order to remove temporal redundancy, the frame of an image can be divided into three types (I frame, P frame, B frame).

I 프레임은 영상의 랜덤 액세스(random)를 보장하기 위해서, 시간적 중복성 제거를 고려하지 않고, JPEG과 같이 공간적 중복성 제거만을 수행하며, 다른 프레임들의 참조 프레임이 될 수 있다. In order to ensure random access of images, the I frame performs only spatial redundancy removal like JPEG without considering temporal redundancy removal, and may serve as a reference frame for other frames.

P 프레임은 이전에 재생된 I 프레임 또는 P 프레임을 참조하여 영상 사이의 시간적 중복성 제거를 수행하여, 압축 효율을 높일 수 있다.The P frame may increase compression efficiency by removing temporal redundancy between images by referring to a previously reproduced I frame or P frame.

B 프레임은 이전에 재생된 I 프레임 또는 P 프레임과 이후에 재생될 프레임을 이용한 양방향 예측(bidirectional prediction)을 통해서 압축 효율을 높이며, B 프레임은 다른 프레임의 참조 프레임 될 수 없다.The B frame increases compression efficiency through bidirectional prediction using a previously played I or P frame and a frame to be played later, and the B frame cannot be a reference frame for other frames.

비디오는 도 4에 도시된 바와 같이 GOP(group of picture) 단위로 인코딩 되며, GOP 당 일반적으로 한 개 또는 예외적으로 그 이상의 I 프레임을 가지고 있다. 시간적 중복성 제거를 수행하지 못하는 I 프레임이 전체 영상에서 차지하는 비율이 크기 때문에 영상을 더 압축할 여지가 있다(GPO로 30 프레임을 사용하는 H.265 압축 기준 I 프레임이 전체 영상 크기의 약 40%를 차지함).As shown in FIG. 4, a video is encoded in a group of picture (GOP) unit, and generally has one or exceptionally more I-frames per GOP. Since I-frames that cannot perform temporal redundancy removal take up a large portion of the entire video, there is room for further compression of the video (the H.265 compression standard I-frame using 30 frames with GPO occupies about 40% of the total video size). occupied).

현재까지 영상 압축 기술(예를 들어, VP8, VP9, AV1, H.264/AVC, H.265/HEVC 등)은 자신의 영상 데이터 내부의 공간적 및 시간적 중복성 제거(spatial & temporal redundancy elimination)만을 이용해서 압축하고 있으며, 랜덤 액세스 기능을 위해 I 프레임의 경우는 공간적 중복성 제거만을 수행할 수 있다.Until now, video compression technologies (e.g., VP8, VP9, AV1, H.264/AVC, H.265/HEVC, etc.) only use spatial and temporal redundancy elimination inside their video data. For the random access function, only spatial redundancy removal can be performed in the case of an I frame.

본 실시 예의 경우 기존 영상 압축 기법의 단일 영상 내 시/공간적 중복성 제거를 넘어 미리 전송/동기화/저장된 또는 초기화 과정을 통해서 획득한 영상 데이터로부터 사용자별 특성을 학습한 심층신경망 모델을 통해 영상 데이터들 간의 즉, 영상 파일간의 중복성/유사성을 영상 압축 전송에 이용할 수 있다.In the case of the present embodiment, it goes beyond the elimination of temporal/spatial redundancy in a single image of the existing image compression technique, and between image data through a deep neural network model that learns user-specific characteristics from image data that has been previously transmitted/synchronized/saved or acquired through an initialization process. That is, redundancy/similarity between video files can be used for video compression transmission.

기존 영상 압축 기술은 도 5에 도시된 바와 같이 매크로 블록(H.264) 및 그 변경인 CTU(H.265) 단위를 기반으로 일치하는 패턴을 찾는 방식인데, 일반적으로 한 개의 GOP 단위의 시간적으로 극히 인접하거나, 한 이미지에서 공간적으로 극히 인접한 경우를 제외하고는 정확히 일치하는 패턴을 찾는 것이 쉽지 않다.As shown in FIG. 5, the existing video compression technology is a method of finding a matching pattern based on a macroblock (H.264) and its modified CTU (H.265) unit, generally in terms of time of one GOP unit. It is not easy to find an exact matching pattern except for extremely contiguous or spatially contiguous cases in one image.

또한 시간적으로 인접한 프레임 사이에서 일치하는 매크로블록/CTU를 찾을 대, 인접 프레임 전체를 탐색하기에는 계산 복잡도가 너무 크기 때문에 탐색 시간을 최소화 하기 위해 탐색 범위를 대상 매크로블록/CTU 주변(예를 들어, 64×64, 또는 128×128 등)으로 제한한다.In addition, when searching for macroblocks/CTUs that match between temporally adjacent frames, the computational complexity is too large to search all adjacent frames. ×64, or 128 × 128, etc.).

따라서, 본 실시 예에서는 한 개의 GOP 단위(짧은 timescale)로 중복성을 찾거나 프레임간 중복성 제거에서 탐색 범위를 제한하는 것이 아닌, 미리 전송된 또는 초기화 과정을 통해 획득한 영상 데이터로부터 사용자별 특성을 학습한 심층신경망 모델을 통해 대상 영상 데이터의 중복성/유사성을 제거하는 AI 기반 영상 압축 기법을 제안할 수 있다.Therefore, in this embodiment, rather than finding redundancy in one GOP unit (short timescale) or limiting the search range in removing redundancy between frames, learning characteristics for each user from image data transmitted in advance or obtained through an initialization process An AI-based image compression technique that removes redundancy/similarity of target image data can be proposed through a deep neural network model.

또한 본 실시 예는 미리 전송/동기화/저장된 데이터를 통해 학습된 심층신경망 모델로 압축 및 복원을 하기 때문에 I 프레임이 기존의 랜덤 액세스 기능을 잃지 않으며, 파일 간 중복성/유사성을 이용할 수 있다.In addition, since this embodiment compresses and restores the deep neural network model learned through data transmitted/synchronized/stored in advance, the I frame does not lose the existing random access function, and redundancy/similarity between files can be used.

영상 콘텐츠별 AI 모델 학습이 필요한 기존 AI 기반 영상 압축 기법과 달리, 본 실시 예는 기저장된 영상 또는 초기화 과정을 통해 얻어진 영상으로부터 사용자별 특성을 미리 학습하고 학습된 심층신경망 모델을 저장함으로써 화상통신과 같은 실시간 스트리밍 지원이 가능하다.Unlike existing AI-based video compression techniques that require AI model learning for each video content, this embodiment learns the characteristics of each user in advance from a pre-stored video or an image obtained through an initialization process and stores the learned deep neural network model to improve image communication and The same real-time streaming support is possible.

본 실시 예에서, 압축부(130)에서 수행하는 영상 압축 방법은 크게 인트라(intra) 프레임(I 프레임 압축) 압축과 인터(inter) 프레임(P 프레임 또는 B 프레임) 압축으로 분류할 수 있다.In this embodiment, the image compression method performed by the compression unit 130 can be largely classified into intra frame (I frame compression) compression and inter frame (P frame or B frame) compression.

압축부(130)는 전송할 대상 영상 데이터를 이미지 프레임 그룹들인 GOP(group of pictures)로 먼저 변환할 수 있다. 각각의 이미지 프레임 그룹들은 인트라 프레임(intra frame) 및 인터 프레임(inter frame)을 포함할 수 있다.The compression unit 130 may first convert target video data to be transmitted into group of pictures (GOP), which are image frame groups. Each image frame group may include an intra frame and an inter frame.

여기서 인트라 프레임은 I 프레임 또는 키프레임으로 지칭되기도 하며, 모든 이미지 프레임 그룹은 적어도 하나의 I 프레임을 가져야 한다. I 프레임은 영상의 랜덤 액세스(random access)를 보장하기 위해서, 시간적 중복성 제거를 고려하지 않고, 입력 원본 그대로 저장되거나, 예를 들어, JPEG와 같이 공간적 중복성만 제거하여 형성되며 다른 프레임들을 참조하지 않고 다른 프레임들이 참조할 수 있는 프레임이다.Here, an intra frame is also referred to as an I frame or a key frame, and every image frame group must have at least one I frame. In order to ensure random access of the image, the I frame is stored as the original input without considering temporal redundancy removal, or is formed by removing only spatial redundancy like, for example, JPEG, and does not refer to other frames. It is a frame that other frames can refer to.

인터 프레임 중 P 프레임은 순방향 예측 프레임으로서, 바로 이전의 프레임을 기준으로 차이가 나는 부분 데이터만 예측하여 저장한 프레임이고, 프레임들 사이의 시간적 중복성을 제거해 압축 효율을 높이는 프레임이다.Among the inter-frames, the P frame is a forward prediction frame, which is a frame in which only partial data that differs from the immediately previous frame is predicted and stored, and temporal redundancy between frames is removed to increase compression efficiency.

인터 프레임 중 B 프레임은 양방향 예측 프레임으로서, I 프레임과 P 프레임 사이에서 양쪽 프레임 모두를 참조하며 두 프레임 사이의 움직임을 추측 데이터로 저장함으로써 압축 효율을 높이는 프레임이다.Among the inter-frames, the B frame is a bi-directional prediction frame, which improves compression efficiency by referring to both frames between the I frame and the P frame and storing motion between the two frames as guess data.

압축부(130)는 사용자 특성이 반영되어 생성된 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축하는 과정에서 각각의 이미지 프레임 그룹들 내의 인트라 프레임은 상기 심층신경망 모델을 적용하여 압축시키고, 상기 인터 프레임은 상기 인트라 프레임을 기준으로 압축시켜 압축 영상 데이터를 생성하도록 구성될 수 있다.In the process of compressing target image data to be transmitted using a deep neural network model generated by reflecting user characteristics, the compression unit 130 compresses intra frames in each image frame group by applying the deep neural network model, and the inter frame is compressed by applying the deep neural network model. The frame may be compressed based on the intra frame to generate compressed image data.

도 6을 참조하여, 압축부(130)에서의 인트라 프레임 압축을 설명하면, 기존에 전송/동기화/저장된 도는 초기화 과정을 통해 획득한 영상 데이터를 이용해 사용자별 특성을 학습하여 프레임을 압축 및 복원하는 AI 기반 심층신경망 모델을 생성할 수 있다. 생성된 심층신경망 모델의 인코더(encoder)를 통해 GOP의 프레임들 중 인트라 프레임 즉, I 프레임을 압축 할 수 있다. 생성된 심층신경망 모델의 디코더(decoder)를 통해 압축된 인트라 프레임을 복원할 수 있다. 여기서 사용자별 특성은 사용자에 따라 제작되는 영상의 배경, 물체, 영상에 나타나는 얼굴 등이 될 수 있다.Referring to FIG. 6, intra frame compression in the compression unit 130 is described. Using image data previously transmitted/synchronized/stored or obtained through an initialization process, user-specific characteristics are learned to compress and restore frames. AI-based deep neural network models can be created. An intra frame, that is, an I frame among frames of the GOP may be compressed through an encoder of the generated deep neural network model. The compressed intra frame may be restored through a decoder of the generated deep neural network model. Here, the characteristics of each user may be a background of an image produced according to the user, an object, a face appearing in the image, and the like.

인트라 프레임 압축 기법 심층신경망 모델의 학습을 예시하자면, 프레임 복원 딥러닝 기법 중 하나인 SR(super resolution) 기법을 포함할 수 있다. SR 기법은 저화질 프레임을 고화질 프레임으로 복원하는 심층신경망 모델로, 본 실시 예에서는 유사한 이미지/영상들로부터 객체의 특성을 학습한 SR 모델을 인트라 프레임 압축 기법 심층신경망 모델로 사용할 수 있다.Intra-frame compression technique As an example, learning of a deep neural network model may include a super resolution (SR) technique, which is one of frame restoration deep learning techniques. The SR technique is a deep neural network model that reconstructs low-quality frames into high-quality frames. In this embodiment, the SR model that learns object characteristics from similar images/videos can be used as an intra-frame compression technique deep neural network model.

또한, 인트라 프레임 압축 기법 심층신경망 모델의 학습을 예시하자면, 원본 프레임을 딥 뉴럴 네트워크인 인코더 및 디코더로 압축 및 복원하는 오토인코더(AE: autoencoder) 기법을 포함할 수 있다. 본 실시 예에서는 객체의 특성을 학습한 오토인코더 모델을 인트라 프레임 압축 기법 심층신경망 모델로 사용할 수 있으며, 기존 딥 러닝 뉴럴 네트워크와의 차별점은 객체 영상을 집중 학습시키고 이를 선 공유하는데 있다.In addition, as an example of learning a deep neural network model, an intra-frame compression technique may include an autoencoder (AE) technique that compresses and restores an original frame with an encoder and a decoder, which are deep neural networks. In this embodiment, the autoencoder model that has learned the characteristics of an object can be used as an intra-frame compression method deep neural network model, and the difference from the existing deep learning neural network is that the object image is intensively learned and shared in advance.

본 실시 예에서는 SR 또는 AE 등을 포함하는 딥 뉴럴 네트워크의 학습 방식은 크게 1) 원본 영상과 복원 영상의 차인 오프셋 비트(offset bit)와 압축 영상 크기의 합을 최소화 하는 방향으로 학습하는 lossless 압축 방식과, 2) 원본 영상과 복원 영상의 PSNR 또는 SSIM 값을 최대화하는 방향으로 학습하는 lossy 압축 방식 두 가지로 나눌 수 있다.In this embodiment, the learning method of the deep neural network including SR or AE is largely 1) a lossless compression method that learns in the direction of minimizing the sum of the compressed image size and the offset bit, which is the difference between the original image and the reconstructed image. and 2) a lossy compression method that learns in the direction of maximizing the PSNR or SSIM values of the original image and the reconstructed image.

도 7을 참조하여, 압축부(130)에서의 인터 프레임 압축을 설명하면, 인접한 참조 프레임과 해당 프레임 사이의 일치하는 매크로블록/CTU의 관계(제한된 탐색 범위가 아닌 프레임 내 전 범위에서 탐색)를 사용자별 특성에 기반해 학습한 심층 신경망 모델을 생성할 수 있다. 즉, 프레임 전체 범위 내에서 프레임간 중복성 탐색을 학습된 심층신경망 모델을 통해 수행함으로써 낮은 계산 복잡도(computational complexity)로 탐색할 수 있다. 생성된 심층신경망 모델의 인코더를 통해 이전 프레임들 중 선정된 참조 프레임으로부터 일치하는 블록을 찾고 움직임 벡터를 통해 인터 프레임 즉, P 프레임 또는 B 프레임을 압축할 수 있다. 생성된 심층신경망 모델의 디코더를 통해 이전에 복원된 참조 프레임으로부터 전송 받은 움직임 벡터를 통해 인터 프레임을 복원할 수 있다.Referring to FIG. 7, inter-frame compression in the compression unit 130 is described. A deep neural network model learned based on user-specific characteristics can be created. That is, it is possible to search with low computational complexity by performing inter-frame redundancy search within the entire frame range through a trained deep neural network model. Through the encoder of the generated deep neural network model, it is possible to find a matching block from a selected reference frame among previous frames and compress inter-frames, that is, P-frames or B-frames, through motion vectors. The inter-frame may be reconstructed through the motion vector transmitted from the previously reconstructed reference frame through the decoder of the generated deep neural network model.

인터 프레임 압축 기법 심층신경망 모델의 학습을 예시하자면, P 프레임은 연속된 이전 두 프레임을 이용해 현재 프레임의 움직임 벡터를 예측하는 단방향 예측을, B 프레임은 직전 프레임과 직후 프레임을 이용해 현재 프레임의 움직임 벡터를 예측하는 양방향 예측을 딥 뉴럴 네트워크 예측 모델을 통해 수행하는 기법을 포함할 수 있다.Inter-Frame Compression Technique As an example of deep neural network model learning, P frame uses unidirectional prediction to predict the motion vector of the current frame using two consecutive previous frames, and B frame uses the previous and next frames to predict the motion vector of the current frame. It may include a technique of performing bi-directional prediction for predicting through a deep neural network prediction model.

또한, 인접한 프레임 사이에서 매칭되는 움직임 벡터들의 패턴을 학습한 모델을 통해 일치하는 움직임 벡터를 찾는 움직임 추정을 수행하는 기법을 포함할 수 있다.In addition, a motion estimation technique for finding a matched motion vector through a model learned from patterns of motion vectors matched between adjacent frames may be included.

본 실시 예에서는 상술한 두 가지 기법 등 움직임 추정에 딥 뉴럴 네트워크를 적용한 기법들의 DNN 모델을 유사한 이미지/영상들을 이용해 학습하며 추정된 움직임 벡터와 실제 일치하는 움직임 벡터의 차를 최소화 하는 방향으로 학습하고, 이 심층신경망 모델을 선 공유함으로써 AI 기반 인터 프레임 압축을 실시간 스트리밍에 적용할 수 있다.In this embodiment, DNN models of techniques applying deep neural networks to motion estimation, such as the above two techniques, are learned using similar images/videos, and learning in a direction of minimizing the difference between the estimated motion vector and the motion vector that actually matches, , AI-based inter-frame compression can be applied to real-time streaming by pre-sharing this deep neural network model.

도 8은 본 실시 예에 따른 영상 데이터 처리 방법을 설명하기 위한 흐름도 이다. 이하의 설명에서 도 1 내지 도 7에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다.8 is a flowchart illustrating a method of processing image data according to an exemplary embodiment. In the following description, descriptions of portions overlapping those of FIGS. 1 to 7 will be omitted.

도 8을 참조하면, S801단계에서, 영상 데이터 송신 처리 장치(100)는 기저장된 영상 데이터를 이용하여, 영상 데이터에 포함된 객체의 특성에 따라 최적의 압축을 수행하도록 훈련된 심층신경망 모델들을 저장하는 제1 저장부를 구축한다. 여기서, 심층신경망 모델들은, 영상 데이터를 생성하는 사용자 계정에 따라 미리 저장된 영상 데이터를 이용하여 압축을 수행하도록 훈련된 학습 모델로서, 제1 사용자 계정의 영상 데이터를 압축하기 위해 제1 사용자 계정으로 생성되었던 기존 영상 데이터를 훈련 데이터로 사용하여 미리 훈련된 제1 심층신경망 모델을 포함할 수 있다.Referring to FIG. 8 , in step S801, the image data transmission processing apparatus 100 uses pre-stored image data to store deep neural network models trained to perform optimal compression according to the characteristics of objects included in the image data. A first storage unit to be built. Here, the deep neural network models are training models trained to perform compression using pre-stored image data according to the user account that generates the image data, and are created in the first user account to compress the image data of the first user account. It may include a first deep neural network model trained in advance by using existing image data as training data.

S803단계에서, 영상 데이터 송신 처리 장치(100)는 전송할 대상 영상 데이터의 정보에 기초하여 영상 데이터의 압축에 사용할 심층신경망 모델이 제1 저장부에 저장되어 있는지 판단한다. 여기서, 영상 데이터의 정보는 영상 데이터를 생성하는 사용자 계정, 영상 데이터가 생성되는 시간, 영상 데이터가 생성되는 위치 중 적어도 하나를 포함할 수 있으며, 제1 저장부에 저장된 심층신경망 모델에 태깅되어 있을 수 있다.In step S803, the video data transmission processing apparatus 100 determines whether a deep neural network model to be used for compression of the video data is stored in the first storage unit based on the information of the target video data to be transmitted. Here, the information of the image data may include at least one of a user account for generating the image data, a time at which the image data is generated, and a location where the image data is generated, and may be tagged with a deep neural network model stored in the first storage unit. can

선택적 실시 예로, 영상 데이터 송신 처리 장치(100)는 전송할 대상 영상 데이터의 특징을 추출하여 산출한 제1 유사도 정보를 기반으로, 영상 데이터의 압축에 사용할 심층신경망 모델이 제1 저장부에 저장되어 있는지 판단할 수 있다. 본 실시 예에서, 심층신경망 모델을 훈련하기 위해 유사한 특징을 가지는 영상 데이터를 훈련 데이터로 사용할 수 있는데, 이때 사용한 영상 데이터들의 특징을 추출하여 산출한 제2 유사도 정보를 훈련된 심층신경망 모델과 함께 제1 저장부에 저장할 수 있다. 따라서, 영상 데이터 송신 처리 장치(100)는 제1 저장부로부터, 제1 유사도 정보와 가장 유사한 제2 유사도 정보를 갖는 심층신경망 모델이 있는지 여부를 판단할 수 있다.As an optional embodiment, the image data transmission and processing apparatus 100 determines whether a deep neural network model to be used for compression of the image data is stored in the first storage unit based on the first similarity information calculated by extracting the characteristics of the target image data to be transmitted. can judge In this embodiment, image data having similar characteristics can be used as training data to train the deep neural network model. In this case, the second similarity information calculated by extracting the features of the image data used is provided along with the trained deep neural network model. 1 can be stored in the storage unit. Accordingly, the image data transmission and processing apparatus 100 may determine whether there is a deep neural network model having second similarity information most similar to the first similarity information, from the first storage unit.

S805단계에서, 영상 데이터 송신 처리 장치(100)는 제1 저장부에 전송할 영상 데이터 압축을 위한 심층신경망 모델이 저장되어 있다고 판단한 경우, 제1 저장부로부터 전송할 대상 영상 데이터에 사용될 심층신경망 모델을 선정한다.In step S805, when it is determined that the deep neural network model for compressing the image data to be transmitted is stored in the first storage unit, the image data transmission processing apparatus 100 selects a deep neural network model to be used for target image data to be transmitted from the first storage unit. do.

S807단계에서, 영상 데이터 송신 처리 장치(100)는 제1 저장부에 전송할 영상 데이터 압축을 위한 심층신경망 모델이 저장되어 있지 않다고 판단한 경우, 데이터베이스로부터 전송할 대상 영상 데이터와 유사한 영상 데이터를 선정한다.In step S807, the image data transmission processing apparatus 100 selects image data similar to the target image data to be transmitted from the database when it is determined that the deep neural network model for compressing the image data to be transmitted to the first storage unit is not stored.

S809단계에서, 영상 데이터 송신 처리 장치(100)는 선정한 유사한 영상 데이터와 유사한 영상 데이터에 포함된 객체의 특성을 훈련 데이터로 학습하여 심층신경망 모델을 생성한다.In step S809, the image data transmitting and processing apparatus 100 learns the selected similar image data and the characteristics of objects included in the similar image data as training data to create a deep neural network model.

S811단계에서, 영상 데이터 송신 처리 장치(100)는 제1 저장부로부터 선정한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축하여 압축 영상 데이터를 생성하거나, 생성한 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축하여 압축 영상 데이터를 생성한다. 여기서, 영상 데이터 송신 처리 장치(100)는 영상 데이터를 각각의 그룹이 인트라 프레임(intra frame) 및 인터 프레임(inter frame)을 포함하는 이미지 프레임 그룹들로 변환하고, 심층신경망 모델을 이용하여 전송할 대상 영상 데이터를 압축하는 과정에서 각각의 이미지 프레임 그룹들 내의 인트라 프레임은 심층신경망 모델을 적용하여 압축시키고, 인터 프레임은 인트라 프레임을 기준으로 압축시켜 압축 영상 데이터를 생성할 수 있다. 또한, 심층신경망 모델은 딥러닝 기반의 슈퍼 레졸루션 모델 또는 오토 인코더 모델 중 하나일 수 있다.In step S811, the image data transmission processing apparatus 100 compresses the target image data to be transmitted using the deep neural network model selected from the first storage unit to generate compressed image data, or uses the generated deep neural network model to generate the target image to be transmitted. Compresses data to generate compressed video data. Here, the image data transmission and processing apparatus 100 converts the image data into image frame groups, each group including an intra frame and an inter frame, and transmits the object using a deep neural network model. In the process of compressing image data, intra frames in each image frame group are compressed by applying a deep neural network model, and inter frames are compressed based on the intra frames to generate compressed image data. Also, the deep neural network model may be one of a deep learning-based super-resolution model and an auto-encoder model.

S813단계에서, 영상 데이터 송신 처리 장치(100)는 압축 영상 데이터 또는, 압축 영상 데이터 및 심층신경망 모델 또는, 압축 영상 데이터 및 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 영상 데이터 수신 처리 장치(200)로 전송할 수 있다.In step S813, the image data transmitting and processing device 100 transfers the compressed image data, the compressed image data and the deep neural network model, or the training manual used when training the compressed image data and the deep neural network model to the image data receiving and processing device 200. can be sent to

S815단계에서, 영상 데이터 수신 처리 장치(200)는 압축 영상 데이터 또는, 압축 영상 데이터 및 심층신경망 모델 또는, 압축 영상 데이터 및 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 수신할 수 있다.In step S815, the image data receiving and processing apparatus 200 may receive compressed image data, compressed image data and deep neural network model, or compressed image data and training manual used to train the deep neural network model.

S817단계에서, 영상 데이터 수신 처리 장치(200)는 압축 영상 데이터를 복원할 심층신경망 모델이 완전한 심층신경망 모델인지 판단한다. 여기서 완전한 심층신경망 모델인 경우, 압축 영상 데이터에 포함된 사용자 계정에 해당하는 심층신경망 모델이 제2 저장부에 저장되어 있는 경우 완전한 심층신경망 모델이라고 판단할 수 있다. 또한 완전한 심층신경망 모델인 경우, 압축 영상 데이터와 함께 심층신경망 모델을 수신한 경우에 완전한 심층 신경망 모델이라고 판단할 수 있다. 또한, 완전하지 않은 심층신경망 모델인 경우, 영상 데이터 수신 처리 장치(200)는 영상 데이터 송신 처리 장치(100)로부터 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 수신한 것일 수 있다.In step S817, the image data receiving and processing device 200 determines whether the deep neural network model to restore the compressed image data is a complete deep neural network model. In the case of the complete deep neural network model, if the deep neural network model corresponding to the user account included in the compressed image data is stored in the second storage unit, it may be determined that the deep neural network model is complete. In addition, in the case of a complete deep neural network model, when the deep neural network model is received together with compressed image data, it may be determined that the deep neural network model is complete. Also, in the case of an incomplete deep neural network model, the image data receiving and processing device 200 may receive a training manual used to train the deep neural network model from the image data transmitting and processing device 100.

S819단계에서, 영상 데이터 수신 처리 장치(200)는 수신한 심층신경망 모델이 완전한 심층신경망 모델이 아닌 경우, 심층신경망 모델을 훈련할 때 사용한 훈련 메뉴얼을 이용하여 압축에 사용된 심층신경망 모델과 동일한 심층신경망 모델을 구축한다.In step S819, if the received deep neural network model is not a complete deep neural network model, the image data receiving and processing device 200 uses the same deep neural network model used for compression as the deep neural network model used for training the deep neural network model. Build a neural network model.

S821단계에서, 영상 데이터 수신 처리 장치(200)는 제2 저장부로부터 선정한 심층신경망 모델, 또는 영상 데이터 송신 처리 장치(100)로부터 수신한 심층신경망 모델, 또는 수신한 훈련 메뉴얼을 사용하여 구축한 심층신경망 모델을 이용하여 압축 영상 데이터를 원본 영상 데이터로 복원한다.In step S821, the image data receiving and processing device 200 uses the deep neural network model selected from the second storage unit, the deep neural network model received from the image data transmission and processing device 100, or the deep neural network model built using the received training manual. Compressed video data is restored to original video data using a neural network model.

이상 설명된 본 발명에 따른 실시 예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Embodiments according to the present invention described above may be implemented in the form of a computer program that can be executed on a computer through various components, and such a computer program may be recorded on a computer-readable medium. At this time, the medium is a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as a CD-ROM and a DVD, a magneto-optical medium such as a floptical disk, and a ROM hardware devices specially configured to store and execute program instructions, such as RAM, flash memory, and the like.

한편, 상기 컴퓨터 프로그램은 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art of computer software. An example of a computer program may include not only machine language code generated by a compiler but also high-level language code that can be executed by a computer using an interpreter or the like.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. In the specification of the present invention (particularly in the claims), the use of the term "above" and similar indicating terms may correspond to both singular and plural. In addition, when a range is described in the present invention, it includes an invention in which individual values belonging to the range are applied (unless there is a description to the contrary), and each individual value constituting the range is described in the detailed description of the invention Same as

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.The steps constituting the method according to the present invention may be performed in any suitable order unless an order is explicitly stated or stated to the contrary. The present invention is not necessarily limited according to the order of description of the steps. The use of all examples or exemplary terms (eg, etc.) in the present invention is simply to explain the present invention in detail, and the scope of the present invention is limited due to the examples or exemplary terms unless limited by the claims. it is not going to be In addition, those skilled in the art can appreciate that various modifications, combinations and changes can be made according to design conditions and factors within the scope of the appended claims or equivalents thereof.

따라서, 본 발명의 사상은 상기 설명된 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments and should not be determined, and all scopes equivalent to or equivalently changed from the claims as well as the claims described below are within the scope of the spirit of the present invention. will be said to belong to

100: 영상 데이터 송신 처리 장치
200: 영상 데이터 수신 처리 장치
300: 네트워크100: video data transmission processing device
200: image data receiving and processing device
300: network

Claims

An apparatus for transmitting and processing video data, comprising:
a database for storing the image data;
a storage unit for storing deep neural network models for performing compression according to object characteristics according to circumstances and time included in the image data stored in the database;
a determination unit that determines whether a deep neural network model to be used for compression of the image data is stored in the storage unit based on information on image data to be transmitted;
As a result of the determination of the determination unit, as the deep neural network model to be used for the target image data to be transmitted is stored in the storage unit, a deep neural network model to compress the image data is selected from the storage unit, and the selected deep neural network model is used. a compression unit which generates compressed image data obtained by compressing the target image data to be transmitted; and
A transmission unit for transmitting the compressed video data to a video data receiving and processing device;
As a result of the determination of the determination unit, when the deep neural network model to be used for the target image data to be transmitted is not stored in the storage unit, image data similar to the target image data to be transmitted is selected from the database, and the similar image data similar to the selected similar image data is selected. Further comprising a generation unit for generating a new deep neural network model by learning the characteristics of an object included in the image data as training data,
Video data transmission processing device.

According to claim 1,
The deep neural network models are training models trained to perform compression using pre-stored image data according to a user account that generates image data, and are created in the first user account to compress image data of the first user account. Including a first deep neural network model pretrained using existing image data as training data,
Video data transmission processing device.

According to claim 1,
The information of the image data includes at least one of a user account that generates the image data, a time when the image data is generated, and a location where the image data is generated.
Video data transmission processing device.

According to claim 1,
the storage unit,
Some of the trained deep neural network models may use image data having similar features as training data when learning the deep neural network model, and similarity information calculated by extracting features of the training data and the trained deep neural network You can save the model together,
The judge,
configured to determine whether a deep neural network model having similarity information most similar to the similarity information calculated by extracting features of the target image data to be transmitted is stored in the storage unit,
Video data transmission processing device.

According to claim 1,
the compression unit,
Converting the target image data to be transmitted into image frame groups, each group including an intra frame and an inter frame, and compressing the target image data to be transmitted using the deep neural network model The intra frames in each image frame group are compressed by applying the deep neural network model, and the inter frames are compressed based on the intra frames to generate compressed image data.
Video data transmission processing device.

According to claim 5,
The deep neural network model is one of a deep learning-based super resolution model or an auto encoder model,
Video data transmission processing device.

delete

According to claim 1,
the compression unit,
configured to generate compressed image data obtained by compressing the target image data to be transmitted using a deep neural network model generated by the generating unit;
the transmission unit,
configured to transmit the compressed image data and the deep neural network model generated by the generating unit to the image data receiving and processing device;
Video data transmission processing device.

According to claim 1,
the transmission unit,
And configured to transmit the compressed image data and a training manual used when training the selected deep neural network model to the image data receiving and processing device.
Video data transmission processing device.

delete

As a method of transmitting and processing video data,
constructing a database for storing the image data;
constructing a storage unit for storing deep neural network models for performing compression for each characteristic of an object according to a situation and time included in the image data stored in the database;
determining whether a deep neural network model to be used for compressing the image data is stored in the storage unit based on information on target image data to be transmitted;
As a result of the determination, as the deep neural network model to be used for the target image data to be transmitted is stored in the storage unit, a deep neural network model to compress the image data is selected from the storage unit, and the selected deep neural network model is used. generating compressed image data obtained by compressing the target image data to be transmitted; and
Transmitting the compressed video data to a video data receiving and processing device;
As a result of the determination, if the deep neural network model to be used for the target image data to be transmitted is not stored in the storage unit, image data similar to the target image data to be transmitted is selected from the database, and the similar image data similar to the selected similar image data is selected. generating a new deep neural network model by learning characteristics of an object included in the image data as training data;
generating compressed image data obtained by compressing the target image data to be transmitted using a deep neural network model generated using the selected similar image data as training data; and
Transmitting the compressed image data and the deep neural network model generated by using the selected similar image data as training data to the image data receiving and processing device,
Video data transmission processing method.

15. The method of claim 14,
The deep neural network models are training models trained to perform compression using pre-stored image data according to a user account that generates image data, and are created in the first user account to compress image data of the first user account. Including a first deep neural network model pretrained using existing image data as training data,
Video data transmission processing method.

15. The method of claim 14,
Generating the compressed video data,
converting the target image data to be transmitted into image frame groups, each group including an intra frame and an inter frame;
In the process of compressing the target image data to be transmitted using the deep neural network model, compressing an intra frame in each image frame group by applying the deep neural network model; and
Generating compressed image data by compressing the inter frame based on the intra frame.
Video data transmission processing method.

delete