KR102313651B1

KR102313651B1 - Video streaming method between server and client

Info

Publication number: KR102313651B1
Application number: KR1020200056604A
Authority: KR
Inventors: 류지훈; 백두인; 강한길
Original assignee: 수니코리아 엘엘씨(한국뉴욕주립대학교)
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2021-10-15

Abstract

According to an embodiment of the present invention, a server comprises: a communication unit for streaming and transmitting a video image to a client device; at least one processor; and a memory for storing an instruction executed by the processor. The instruction stored in memory divides the video image to be streamed into a plurality of first image groups, extracts a first I-frame of each of the first image groups, reduces a size of the first I-frame to generate a second I-frame, performs a super-resolution through a trained deep neural network (DNN) to generate a third I-frame from the second I-frame, encodes a first P-frame a first B-frame included in each of the first image groups using the third I-frame to generate a second P-frame and a second B-frame, and re-groups the second I-frame, the second P-frame and the second B-frame into a second image group to transmit the same to the client device. The present invention is an innovative technology that can provide a service to a client device with the same or superior image quality to or than that of high-capacity image content only by transmitting a small amount of data from a client end by performing a super-resolution only on an I-frame without super-resolution on the entire image content frame in a frame unit.

Description

{Video streaming method between server and client}

본 발명은 서버와 클라이언트 장치 간 비디오 스트리밍 방법에 관한 것으로, 보다 구체적으로 슈퍼 레졸루션을 이용하여 네트워크 사용량을 효과적으로 줄이는 새로운 인코딩과 디코딩 방법, 그리고 관련 서버와 클라이언트 장치 사이의 스트리밍에 방법에 관한 발명이다.The present invention relates to a video streaming method between a server and a client device, and more particularly, to a new encoding and decoding method for effectively reducing network usage by using super resolution, and a method for streaming between a related server and a client device.

영상 컨텐츠의 화질은 시간이 지날 수록 점점 발전되어 왔고 앞으로도 더욱 발전할 것이다. 과거에 높은 화질로 인식되던 화질은 이제 사용자들의 체감상 아주 낮은 화질이 되었다. 이와 마찬가지로 현재의 높은 화질은 미래에 낮은 화질이 될것이 자명하다. 이러한 현상은 기존의 영상 은 물론이고, HMD(Head Mount Display)를 통해 체험하는 VR컨텐츠에서는 사용자의 집중도가 훨씬 높기 때문에 화질저감의 체험은 더욱 가중화 된다. 만족할만한 화질의 스트리밍을 위해서는 획기적인 네트워크 기술이 개발되거나 획기적인 영상의 인코딩/디코딩 기술이 필요하다. 현재의 컨텐츠는 8K 해상도의 스트리밍 서비스를 요구한다.The image quality of video content has been gradually developed over time and will continue to develop further. The picture quality that was previously perceived as high picture quality has now become very low picture quality for users. Likewise, it is clear that the high picture quality today will be the low picture quality in the future. This phenomenon is not only the existing video, but also the VR content experienced through HMD (Head Mount Display), the user's concentration is much higher, so the experience of reduced image quality is further aggravated. For streaming with satisfactory quality, innovative network technology is developed or innovative video encoding/decoding technology is required. Current content requires streaming services in 8K resolution.

하지만, 현재 인터넷 서비스로는 360도 비디오 영상에서 8k의 해상도의 스트리밍 서비스에 필요한 대역폭을 지원하기는 획기적인 기술의 개발이 없는한 물리적으로 불가능하다.. 따라서, 현재의 네트워크 상황에서 4K/8K 에 해당하는 영상컨텐츠를 스트리밍할 수 있는 기술이 필요하다.However, with the current Internet service, it is physically impossible to support the bandwidth required for a streaming service with a resolution of 8k in a 360-degree video image unless an innovative technology is developed. Therefore, it corresponds to 4K/8K in the current network situation. A technology that can stream video content is required.

본 발명으로 해결하고자 하는 기술적 과제는, 슈퍼 레졸루션을 이용하여 네트워크 사용량을 효과적으로 줄이는 새로운 인코딩과 디코딩 방법, 그리고 관련 서버와 클라이언트 장치 사이의 스트리밍 방법이다.The technical problem to be solved by the present invention is a new encoding and decoding method for effectively reducing network usage by using super resolution, and a streaming method between a related server and a client device.

상기 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 서버는 클라이언트 장치에 비디오 영상을 스트리밍 전송하는 통신부; 적어도 하나의 프로세서; 및 상기 프로세서에서 수행되는 명령어를 저장하는 메모리를 포함하고, 상기 메모리에 저장된 명령어는, 스트리밍하고자 하는 비디오 영상을 복수의 제1 이미지 그룹으로 구분하고, 상기 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 상기 제1 I 프레임의 사이즈를 줄여 제2 I 프레임을 생성하며, 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행하여 상기 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 상기 제3 I 프레임을 이용하여 상기 각 제1 이미지 그룹에 포함되는 제1 P 프레임 또는 제1 B 프레임을 인코딩하여 제2 P 프레임 또는 제2 B 프레임을 생성하며, 상기 제2 I 프레임과 상기 제2 P 프레임 및 상기 제2 B 프레임 중 적어도 하나를 제2 이미지 그룹으로 재그룹화하여 상기 클라이언트 장치로 송신하는 명령어를 포함하는 것을 특징으로 한다.In order to solve the above technical problem, a server according to an embodiment of the present invention includes a communication unit for transmitting streaming video images to a client device; at least one processor; and a memory for storing a command executed by the processor, wherein the command stored in the memory divides a video image to be streamed into a plurality of first image groups, and records the first I frame of each first image group. extracting, reducing the size of the first I frame to generate a second I frame, performing super resolution through a trained deep neural network (DNN) to generate a third I frame from the second I frame, and 3 I-frames are used to encode a first P frame or a first B frame included in each of the first image groups to generate a second P frame or a second B frame, and the second I frame and the second P frame and a command for regrouping at least one of a frame and the second B frame into a second image group and transmitting the regrouping command to the client device.

또한, 상기 메모리 저장되는 명령어는, Bi-cubic 보간을 이용하여 상기 제1 I 프레임으로부터 상기 제2 I 프레임을 생성하는 명령어를 포함할 수 있다.Also, the command stored in the memory may include a command for generating the second I frame from the first I frame using bi-cubic interpolation.

또한, 상기 심층 신경망은, 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다.Also, the deep neural network may be trained so that a third I frame generated by performing super resolution from the second I frame approaches the first I frame.

또한, 상기 비디오 영상은 3차원 이미지로 구성되고, 상기 메모리 저장되는 명령어는, 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성하는 명령어를 포함할 수 있다. In addition, the video image is composed of a three-dimensional image, and the instruction stored in the memory divides the second I frame into 6 sides to generate a fourth I frame of each side, and generates a fourth I frame of each side. It may include a command to generate the third I-frame by using it.

상기 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 클라이언트 장치는 서버로부터 비디오 영상을 제2 이미지 그룹으로 스트리밍 수신하는 통신부; 적어도 하나의 프로세서; 및 상기 프로세서에서 수행되는 명령어를 저장하는 메모리를 포함하고, 상기 메모리에 저장된 명령어는, 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 상기 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 상기 제3 I 프레임을 이용하여 상기 제2 이미지 그룹에 포함된 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성하며, 상기 제3 I 프레임과 상기 제1 P 프레임 및 상기 제1 B 프레임 중 적어도 하나를 이용하여 비디오 영상을 생성하는 명령어를 포함하는 것을 특징으로 한다.In order to solve the above technical problem, a client device according to an embodiment of the present invention includes: a communication unit for receiving streaming video images from a server to a second image group; at least one processor; and a memory for storing instructions executed by the processor, wherein the instructions stored in the memory perform super-resolution through a trained deep neural network to form a third I frame from the second I frame included in the second image group. and decoding a second P frame or a second B frame included in the second image group using the third I frame to generate a first P frame or a first B frame, and the third I frame and a command for generating a video image by using at least one of the first P frame and the first B frame.

또한, 상기 비디오 영상은 3차원 이미지로 구성되고, 상기 메모리 저장되는 명령어는, 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성하는 명령어를 포함할 수 있다.In addition, the video image is composed of a three-dimensional image, and the instruction stored in the memory divides the second I frame into 6 sides to generate a fourth I frame of each side, and generates a fourth I frame of each side. It may include a command to generate the third I-frame by using it.

또한, 상기 심층 신경망은, 상기 서버로부터 수신하거나, 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다.In addition, the deep neural network may be trained so that a third I frame, which is received from the server or generated by performing super resolution from the second I frame, approaches the first I frame.

상기 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 서버에서의 서버와 클라이언트 장치 간 비디오 스트리밍 방법에 있어서, 서버가 비디오 영상을 복수의 제1 이미지 그룹으로 구분하는 단계; 상기 서버가 상기 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 상기 제1 I 프레임의 사이즈를 줄여 제2 I 프레임을 생성하는 단계; 상기 서버가 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행하여 상기 제2 I 프레임으로부터 제3 I 프레임을 생성하는 단계; 상기 서버가 상기 제3 I 프레임을 이용하여 상기 각 제1 이미지 그룹에 포함되는 제1 P 프레임 또는 제1 B 프레임을 인코딩하여 제2 P 프레임 또는 제2 B 프레임을 생성하는 단계; 및 상기 서버가 상기 제2 I 프레임과 상기 제2 P 프레임 및 상기 제2 B 프레임 중 적어도 하나를 제2 이미지 그룹으로 재그룹화하여 클라이언트 장치로 송신하는 단계를 포함한다.In order to solve the above technical problem, there is provided a video streaming method between a server and a client device in a server according to an embodiment of the present invention, the method comprising: dividing, by the server, a video image into a plurality of first image groups; generating, by the server, a first I frame of each of the first image groups and reducing the size of the first I frame to generate a second I frame; generating, by the server, a third I frame from the second I frame by performing super resolution through a trained deep neural network (DNN); generating, by the server, a first P frame or a first B frame included in each of the first image groups by using the third I frame to generate a second P frame or a second B frame; and regrouping, by the server, at least one of the second I frame, the second P frame, and the second B frame into a second image group and transmitting the regrouping to the client device.

또한, 상기 비디오 영상은 3차원 이미지로 구성되고, 상기 제3 I 프레임을 생성하는 단계는, 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하는 단계; 및 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성하는 단계를 포함할 수 있다.In addition, the video image is composed of a three-dimensional image, and the generating of the third I-frame may include dividing the second I-frame into 6 sides to generate a fourth I-frame of each side; and generating the third I frame by using the fourth I frame of each side.

상기 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 클라이언트 장치에서의 서버와 클라이언트 장치 간 비디오 스트리밍 방법에 있어서, 클라이언트 장치가 서버로부터 제2 이미지 그룹을 수신하는 단계; 상기 클라이언트 장치가 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 상기 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성하는 단계; 및 상기 클라이언트 장치가 상기 제3 I 프레임을 이용하여 상기 제2 이미지 그룹에 포함된 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성하는 단계; 상기 클라이언트 장치가 상기 제3 I 프레임과 상기 제1 P 프레임 및 상기 제1 B 프레임 중 적어도 하나를 이용하여 비디오 영상을 생성하는 단계를 포함한다.In order to solve the above technical problem, in a video streaming method between a server and a client device in a client device according to an embodiment of the present invention, the method comprising: receiving, by the client device, a second image group from the server; generating, by the client device, a third I frame from the second I frame included in the second image group by performing super resolution through a trained deep neural network; and generating, by the client device, a first P frame or a first B frame by decoding a second P frame or a second B frame included in the second image group using the third I frame. and generating, by the client device, a video image by using at least one of the third I frame, the first P frame, and the first B frame.

본 발명의 실시예들에 따르면, 전체 영상 컨텐츠를 프레임 단위로 슈퍼레졸루션 하지 않고, I 프레임만을 슈퍼레졸루션 함으로써 클라인언트 단에서 적은 양의 데이터 전송만으로 고용량 영상 컨텐츠와 동일 또는 더 우수한 화질로 클라이언트 장비에게 서비스 할 수 있는 획기적인 기술이다. 또한, 고용량의 비디오 영상을 스트리밍 할 수 있다. 나아가, 8k의 360도 고해상도 비디오 영상에 대한 스트리밍이 가능하다.According to the embodiments of the present invention, without super-resolution of the entire video content frame by frame, only I-frames are super-resolution, so that only a small amount of data is transmitted from the client end to the client equipment with the same or superior image quality to the high-capacity video content. It is a groundbreaking technology that can be serviced. It can also stream high-capacity video images. Furthermore, streaming of 360-degree high-resolution video images of 8k is possible.

도 1은 본 발명의 일 실시예에 따른 서버 및 클라이언트 장치의 블록도이다.
도 2 내지 도 6 은 본 발명의 실시예에 따른 서버와 클라이언트 장치 간 비디오 스트리밍 과정을 설명하기 위한 도면이다.
도 7은 스트리밍 방법에 따른 사용자의 피드백을 나타낸 그래프이다.
도 8은 본 발명의 일 실시예에 따른 서버의 서버와 클라이언트 장치 간 비디오 인코딩 및 스트리밍 방법의 흐름도이다.
도 9 및 도 10은 본 발명의 다른 실시예에 따른 서버의 서버와 클라이언트 장치 간 비디오 인코딩 및 스트리밍 방법의 흐름도이다.
도 11은 본 발명의 본 발명의 일 실시예에 따른 클라이언트 장치의 서버와 클라이언트 장치 간 비디오 디코딩 및 스트리밍 방법의 흐름도이다.
도 12 및 도 13은 본 발명의 다른 실시예에 따른 클라이언트 장치의 서버와 클라이언트 장치 간 비디오 디코딩 및 스트리밍 방법의 흐름도이다.1 is a block diagram of a server and a client device according to an embodiment of the present invention.
2 to 6 are diagrams for explaining a video streaming process between a server and a client device according to an embodiment of the present invention.
7 is a graph showing user feedback according to a streaming method.
8 is a flowchart of a video encoding and streaming method between a server of a server and a client device according to an embodiment of the present invention.
9 and 10 are flowcharts of a video encoding and streaming method between a server of a server and a client device according to another embodiment of the present invention.
11 is a flowchart of a video decoding and streaming method between a server of a client device and a client device according to an embodiment of the present invention.
12 and 13 are flowcharts of a video decoding and streaming method between a server of a client device and a client device according to another embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

다만, 본 발명의 기술 사상은 설명되는 일부 실시 예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있고, 본 발명의 기술 사상 범위 내에서라면, 실시 예들간 그 구성 요소들 중 하나 이상을 선택적으로 결합 또는 치환하여 사용할 수 있다.However, the technical spirit of the present invention is not limited to some embodiments described, but may be implemented in various different forms, and within the scope of the technical spirit of the present invention, one or more of the components may be selected among the embodiments. It can be used by combining or substituted with

또한, 본 발명의 실시예에서 사용되는 용어(기술 및 과학적 용어를 포함)는, 명백하게 특별히 정의되어 기술되지 않는 한, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 일반적으로 이해될 수 있는 의미로 해석될 수 있으며, 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥상의 의미를 고려하여 그 의미를 해석할 수 있을 것이다.In addition, terms (including technical and scientific terms) used in the embodiments of the present invention may be generally understood by those of ordinary skill in the art to which the present invention pertains, unless specifically defined and described explicitly. It may be interpreted as a meaning, and generally used terms such as terms defined in advance may be interpreted in consideration of the contextual meaning of the related art.

또한, 본 발명의 실시예에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. In addition, the terms used in the embodiments of the present invention are for describing the embodiments and are not intended to limit the present invention.

본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함할 수 있고, "A 및(와) B, C 중 적어도 하나(또는 한 개 이상)"로 기재되는 경우 A, B, C로 조합할 수 있는 모든 조합 중 하나 이상을 포함할 수 있다.In the present specification, the singular form may also include the plural form unless otherwise specified in the phrase, and when it is described as "at least one (or one or more) of A and (and) B, C", it is combined with A, B, C It may include one or more of all possible combinations.

또한, 본 발명의 실시 예의 구성 요소를 설명하는데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등으로 한정되지 않는다.In addition, in describing the components of the embodiment of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the component from other components, and are not limited to the essence, order, or order of the component by the term.

그리고, 어떤 구성 요소가 다른 구성 요소에 '연결', '결합', 또는 '접속'된다고 기재된 경우, 그 구성 요소는 그 다른 구성 요소에 직접적으로 '연결', '결합', 또는 '접속'되는 경우뿐만 아니라, 그 구성 요소와 그 다른 구성 요소 사이에 있는 또 다른 구성 요소로 인해 '연결', '결합', 또는 '접속'되는 경우도 포함할 수 있다.And, when it is described that a component is 'connected', 'coupled', or 'connected' to another component, the component is directly 'connected', 'coupled', or 'connected' to the other component. In addition to the case, it may include a case of 'connected', 'coupled', or 'connected' by another element between the element and the other element.

또한, 각 구성 요소의 "상(위)" 또는 "하(아래)"에 형성 또는 배치되는 것으로 기재되는 경우, "상(위)" 또는 "하(아래)"는 두 개의 구성 요소들이 서로 직접 접촉되는 경우뿐만 아니라, 하나 이상의 또 다른 구성 요소가 두 개의 구성 요소들 사이에 형성 또는 배치되는 경우도 포함한다. 또한, "상(위)" 또는 "하(아래)"로 표현되는 경우 하나의 구성 요소를 기준으로 위쪽 방향뿐만 아니라 아래쪽 방향의 의미도 포함될 수 있다. In addition, when it is described as being formed or disposed on "above (above)" or "below (below)" of each component, "above (above)" or "below (below)" means that two components are directly connected to each other. It includes not only the case of contact, but also the case where one or more other components are formed or disposed between two components. In addition, when expressed as "upper (upper)" or "lower (lower)", the meaning of not only an upper direction but also a lower direction based on one component may be included.

서버(100)와 클라이언트 장치(200) 간 비디오 스트리밍을 전송하기 위하여, 본 발명의 일 실시예에 따른 서버(100)와 클라이언트 장치(200)는 각각 통신부(110, 210), 적어도 하나의 프로세서(120, 220), 메모리(130, 230)로 구성된다. In order to transmit video streaming between the server 100 and the client device 200, the server 100 and the client device 200 according to an embodiment of the present invention include communication units 110 and 210, at least one processor ( 120, 220), and memory 130, 230.

서버(100)는 클라이언트 장치(200)에 비디오 영상을 스트리밍 전송하는 비디오 영상 제공 서버로, 비디오 영상을 스트리밍으로 전송가능한 서버일 수 있다. 여기서, 비디오 영상은 고용량 고해상도의 4K/8K의 고해상도 비디오 영상일 수 있으며, 고용량 고해상도의 360도 비디오 영상일 수 있고, VR 비디오 영상일 수 있다. 서버(100)는 HTTP(HyperText Transfer Protocol)를 이용하여 스트리밍 서비스를 제공할 수 있다.The server 100 is a video image providing server that transmits a video image to the client device 200 by streaming, and may be a server capable of transmitting a video image by streaming. Here, the video image may be a high-capacity, high-resolution 4K/8K high-resolution video image, a high-capacity and high-resolution 360-degree video image, or a VR video image. The server 100 may provide a streaming service using HTTP (HyperText Transfer Protocol).

본 발명의 일 실시예에 따른 서버(100)는 클라이언트 장치(200)에 비디오 영상을 스트리밍 전송하는 통신부(110), 적어도 하나의 프로세서(120), 및 프로세서(120)에서 수행되는 명령어를 저장하는 메모리(130)를 포함한다.The server 100 according to an embodiment of the present invention stores a communication unit 110 for streaming a video image to the client device 200 , at least one processor 120 , and a command executed by the processor 120 . It includes a memory 130 .

통신부(110)는 프로세서(120)에서 생성되는 비디오 영상을 클라이언트 장치(200)로 스트리밍 전송한다.The communication unit 110 streams and transmits the video image generated by the processor 120 to the client device 200 .

보다 구체적으로, 클라이언트 장치(200)에 비디오 영상을 전송시 스트리밍 방식으로 비디오 영상을 전송한다. 이때, 통신부(110)는 비디오 영상의 각 프레임을 순차적으로 전송하는 것이 아닌, I 프레임 및 I 프레임에 대한 P 프레임과 B 프레임을 이미지 그룹 순서대로 스트리밍 전송한다. More specifically, when the video image is transmitted to the client device 200 , the video image is transmitted in a streaming manner. At this time, the communication unit 110 does not sequentially transmit each frame of the video image, but transmits the I frame and the P frame and the B frame for the I frame by streaming in the order of the image group.

프로세서(120)는 메모리(130)에 저장된 명령어를 수행하여, 클라이언트 장치(200)로 스트리밍 전송할 비디오 영상을 생성한다.The processor 120 generates a video image to be streamed to the client device 200 by executing a command stored in the memory 130 .

보다 구체적으로, 프로세서(120)는 통신부(110) 및 메모리(130)를 포함하는 서버(100)의 구성들에 대한 제어를 수행하며, 각 구성에 저장되거나 생성되는 데이터에 대한 처리를 수행한다. 프로세서(120)는 하나 또는 복수의 프로세서를 포함할 수 있고, 복수의 프로세서(120)는 각각 독립적으로 동작하거나 종속적으로 동작할 수도 있다. 프로세서(120)는 메모리(130) 상에 저장된 명령어를 독출하여, 해당 명령에 따른 동작을 처리할 수 있고, 통신부(110)를 제어하여 프로세서(120)에서 처리된 비디오 영상을 클라이언트 장치(200)에 스트리밍 전송할 수 있다.More specifically, the processor 120 controls the components of the server 100 including the communication unit 110 and the memory 130 , and processes data stored or generated in each component. The processor 120 may include one or a plurality of processors, and each of the plurality of processors 120 may operate independently or dependently. The processor 120 may read a command stored in the memory 130 , process an operation according to the command, and control the communication unit 110 to transmit the video image processed by the processor 120 to the client device 200 . can be streamed to

메모리(130)는 프로세서(120)에서 수행되는 명령어를 저장한다.The memory 130 stores instructions executed by the processor 120 .

보다 구체적으로, 메모리(130)에 저장되는 명령어는 비디오 영상을 클라이언트 장치(200)에 전송함에 있어서, 스트리밍 전송이 가능하도록 비디오 영상을 인코딩 처리하는 명령어를 포함한다. 메모리(130)는 하나 이상의 메모리로 구성될 수 있고, 클라이언트 장치(200)에 전송할 비디오 영상을 저장할 수 있다. 또는 별도의 데이터베이스(Database)에 비디오 영상이 저장되어 있을 수 있다. More specifically, the command stored in the memory 130 includes a command for encoding the video image to enable streaming transmission when the video image is transmitted to the client device 200 . The memory 130 may include one or more memories, and may store a video image to be transmitted to the client device 200 . Alternatively, the video image may be stored in a separate database.

메모리(130)에 저장된 명령어는 스트리밍하고자 하는 비디오 영상을 복수의 제1 이미지 그룹으로 구분하고, 상기 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 상기 제1 I 프레임의 사이즈를 줄여 제2 I 프레임을 생성하며, 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행하여 상기 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 상기 제3 I 프레임을 이용하여 상기 각 제1 이미지 그룹에 포함되는 제1 P 프레임 및 제1 B 프레임을 인코딩하여 제2 P 프레임 및 제2 B 프레임을 생성하며, 상기 제2 I 프레임, 상기 제2 P 프레임, 및 상기 제2 B 프레임을 제2 이미지 그룹으로 재그룹화하여 상기 클라이언트 장치로 송신하는 명령어를 포함할 수 있다.The command stored in the memory 130 divides the video image to be streamed into a plurality of first image groups, extracts the first I frame of each first image group, and reduces the size of the first I frame to obtain a second Generate an I frame, perform super resolution through a trained deep neural network (DNN) to generate a third I frame from the second I frame, and use the third I frame to include in each of the first image groups The first P frame and the first B frame are encoded to generate a second P frame and a second B frame, and the second I frame, the second P frame, and the second B frame are divided into a second image group. It may include a command to regroup and transmit to the client device.

먼저, 스트리밍하고자 하는 비디오 영상을 복수의 제1 이미지 그룹으로 구분한다. 여기서, 이미지 그룹은 I 프레임, P 프레임, 및 B 프레임으로 형성된다. 비디오 영상을 연속되는 이미지들을 하나의 그룹으로 구분하고, 해당 이미지 그룹의 프레임들을 I 프레임, P 프레임, 및 B 프레임으로 형성한다. 이미지 그룹은 현재 프레임과 이전 프레임 간 변화가 임계값 이상인 경우, 해당 프레임과 이전 프레임을 다른 이미지 그룹으로 구분할 수 있다. 연속되지 않는, 즉 장면이 달라지는 프레임들 간에 이미지 그룹으로 구분할 수 있다.First, a video image to be streamed is divided into a plurality of first image groups. Here, the image group is formed of an I frame, a P frame, and a B frame. In a video image, successive images are divided into one group, and frames of the corresponding image group are formed into I-frames, P-frames, and B-frames. In the image group, when a change between the current frame and the previous frame is equal to or greater than a threshold, the corresponding frame and the previous frame may be divided into different image groups. Images can be grouped between non-consecutive, i.e., different, scene frames.

이때, HTTP 동적 적응 스트리밍 방식을 이용하여 비디오 영상을 제1 이미지 그룹으로 구분할 수 있다. 고용량의 비디오 영상을 프레임마다 스트리밍 전송하는 경우, 상당히 큰 용량의 대역폭이 필요하기 때문에, 스트리밍 전송하기 위한 데이터 용량을 줄이기 위하여, HTTP 동적 적응 스트리밍(Dynamic Adaptive Streaming over HTTP, MPEG-DASH)을 이용하여, 스트리밍 전송을 수행한다. 표준 H265는 MPEG-DASH을 위한 비디오 인코딩을 제공하여, 비디오 프레임을 I 프레임, P 프레임, 및 B 프레임으로 구성되는 복수의 이미지 그룹 (GoP, Group of Picture)으로 인코딩할 수 있다. 여기서, I 프레임, P 프레임, 및 B 프레임은 압축 효율을 최대화하는 방식으로 연속하는 프레임 간의 중복 패턴에 따라 결정됩니다. 즉, 이전 프레임과의 차이만을 검출하여, 해당 정보만으로 연속되는 프레임을 표현할 수 있다. In this case, the video image may be divided into the first image group using the HTTP dynamic adaptive streaming method. When streaming high-capacity video images per frame, a fairly large amount of bandwidth is required, so in order to reduce the data capacity for streaming transmission, HTTP Dynamic Adaptive Streaming over HTTP (MPEG-DASH) is used. , perform streaming transmission. Standard H265 provides video encoding for MPEG-DASH, enabling encoding of video frames into a plurality of groups of images (GoP, Group of Picture) consisting of I-frames, P-frames, and B-frames. Here, I-frames, P-frames, and B-frames are determined according to the overlapping pattern between successive frames in a way that maximizes compression efficiency. That is, by detecting only the difference from the previous frame, continuous frames can be expressed only with the corresponding information.

I 프레임은 참조 프레임(reference frame)이고, P 프레임은 이전 프레임만 참조하고, B 프레임은 이전 및 이후 프레임을 모두 참조하는 프레임이다. I 프레임을 참조 프레임으로 이후 프레임들은 이전 또는 이전 및 이후 프레임을 참조하는 정보만을 가지는 P 프레임 B 프레임으로 변환하여, 데이터 양을 줄일 수 있다.The I frame is a reference frame, the P frame refers only to the previous frame, and the B frame refers to both the previous and subsequent frames. By converting an I frame into a reference frame and subsequent frames into a P frame B frame having only information referring to the previous or previous and subsequent frames, the amount of data can be reduced.

비디오 영상을 제1 이미지 그룹으로 구분한 이후, 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 제1 I 프레임의 사이즈를 줄여 제2 I 프레임을 생성할 수 있다. I 프레임은 온전한 프레임의 정보를 포함하고 있기 때문에, I 프레임의 크기가 다른 프레임보다 크다. 예를 들어, I 프레임의 수는 전체 프레임 중 6.7 % 이지만 데이터 크기는 42.8 %로 P 프레임(17.4%) 및 B 프레임(39.7%)에 비해 상당히 큰 비율을 차지한다. 따라서, 가장 데이터 크기가 큰 I 프레임의 사이즈를 줄여, 전체 데이터의 용량을 줄일 수 있다. 이를 위하여, 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 제1 I 프레임의 사이즈를 줄여, 사이즈를 줄인 제2 I 프레임을 생성할 수 있다. 이때, 바이큐빅(Bi-cubic) 보간을 이용하여 제1 I 프레임으로부터 제2 I 프레임을 생성할 수 있다. 여기서, 바이큐빅 보간이란 이미지의 크기를 변환시 이용되는 보간법으로, 바이큐빅 보간을 이용하여 제1 I 프레임의 사이지를 줄여 제2 I프레임을 생성할 수 있다. 제2 I 프레임은 제1 I 프레임을 샘플링(sampling)하여 생성할 수 있다. 이외에, 제1 I 프레임의 사이즈를 줄이는 다양한 방식을 이용하여 제2 I 프레임을 생성할 수 있다. 바이큐빅 보간 등을 통해 생성되는 제2 I 프레임은 제1 I 프레임에 비해 저장하고 있는 정보의 양을 줄고, 따라서, 데이터의 양도 줄어든다. 제1 I 프레임 대신 제2 I 프레임을 스트리밍 전송함으로써 스트리밍 데이터 양을 상당히 줄일 수 있다.After dividing the video image into the first image group, the first I frame of each first image group may be extracted, and the second I frame may be generated by reducing the size of the first I frame. Since the I frame includes the complete frame information, the size of the I frame is larger than that of other frames. For example, the number of I frames is 6.7% of the total frames, but the data size is 42.8%, which is significantly larger than P frames (17.4%) and B frames (39.7%). Accordingly, it is possible to reduce the size of the I frame having the largest data size, thereby reducing the total data capacity. To this end, the first I frame of each first image group may be extracted, and the size of the first I frame may be reduced to generate the reduced second I frame. In this case, the second I frame may be generated from the first I frame using bi-cubic interpolation. Here, the bicubic interpolation is an interpolation method used to transform the size of an image, and the second I frame may be generated by reducing the size of the first I frame by using the bicubic interpolation. The second I frame may be generated by sampling the first I frame. In addition, the second I frame may be generated using various methods of reducing the size of the first I frame. The second I frame generated through bicubic interpolation or the like reduces the amount of stored information compared to the first I frame, and thus the amount of data. By streaming the second I frame instead of the first I frame, the amount of streaming data can be significantly reduced.

제2 I 프레임을 생성한 이후, 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행하여 상기 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 상기 제3 I 프레임을 이용하여 상기 각 제1 이미지 그룹에 포함되는 제1 P 프레임 또는 제1 B 프레임을 인코딩하여 제2 P 프레임 또는 제2 B 프레임을 생성한다.After generating the second I frame, super-resolution is performed through a trained deep neural network (DNN) to generate a third I frame from the second I frame, and using the third I frame, each first image A second P frame or a second B frame is generated by encoding the first P frame or the first B frame included in the group.

제1 이미지 그룹은 하나의 제1 I 프레임을 포함하고, 제1 P 프레임 또는 제1 B 프레임을 더 포함할 수 있다. 프레임의 연속여부에 따라 하나의 제1 I 프레임만을 포함하거나, 제1 P 프레임 및 제1 B 프레임 중 적어도 하나를 포함하거나, 제1 P 프레임 및 제1 B 프레임을 모두 포함할 수 있다. 제1 이미지 그룹에 포함되는 제1 P 프레임 또는 제1 B 프레임은 하나 이상일 수 있다.The first image group may include one first I frame, and may further include a first P frame or a first B frame. Depending on whether the frames are continuous or not, only one first I frame may be included, at least one of the first P frame and the first B frame may be included, or both the first P frame and the first B frame may be included. The first P frame or the first B frame included in the first image group may be one or more.

앞서, 설명한바와 같이, 제1 이미지 그룹을 구성하는 제1 P 프레임 또는 제1 B 프레임들은 제1 I 프레임을 참조 프레임으로 하여 생성되는 프레임들이고, 제1 프레임의 사이즈를 줄여 생성되는 제2 I 프레임과 사이즈가 상이하기 때문에, 제2 I 프레임을 참조 프레임으로 하여, 제1 P 프레임 또는 제1 B 프레임으로부터 원본 프레임을 생성하기 어렵다. 또한, 제2 I 프레임은 제1 I 프레임으로부터 정보가 상당히 없어진 상태인 바, 제2 I 프레임으로부터 사이즈를 키운 I 프레임을 생성하더라도 제2 I 프레임으로부터 사이즈가 커진 I 프레임은 제1 I 프레임과 상당한 차이가 있다. 따라서, 제1 I 프레임을 참조하여 생성되는 제1 P 프레임 및 제1 B 프레임으로부터 영상 프레임을 생성하는 경우, 상당히 노이즈가 많이 포함될 수 밖에 없다. 즉, 클라이언트 장치(200)가 제2 I 프레임, 제1 P 프레임, 및 제1 B 프레임을 수신하고, 제2 I 프레임의 사이즈를 단순히 키우는 것만으로는 제1 P 프레임 및 제1 B 프레임을부터 품질 좋은 영상 프레임을 생성하기 어렵다.As described above, the first P frames or the first B frames constituting the first image group are frames generated using the first I frame as a reference frame, and the second I frame generated by reducing the size of the first frame. Since the size is different from , it is difficult to generate an original frame from the first P frame or the first B frame using the second I frame as a reference frame. In addition, since the second I frame has substantially lost information from the first I frame, even if an I frame with an increased size is generated from the second I frame, the I frame with an increased size from the second I frame is significantly different from the first I frame. There is a difference. Accordingly, when an image frame is generated from the first P frame and the first B frame generated by referring to the first I frame, a large amount of noise is inevitably included. That is, the client device 200 receives the second I frame, the first P frame, and the first B frame, and simply increases the size of the second I frame from the first P frame and the first B frame. It is difficult to generate high-quality video frames.

따라서, 제2 I 프레임을 스트리밍 전송하더라도, 제2 I 프레임을 이용하여 함께 스트리밍 전송되는 P 프레임 및 B 프레임으로부터 품질 좋은 비디오 영상을 생성하기 위하여, 제2 I 프레임에 대해 해상도 변환을 적용한다. 즉, 제2 I 프레임에 대해 슈퍼 레졸루션을 수행하여, 제1 I 프레임과 같은 사이즈의 제3 I프레임을 생성하고, 제3 I 프레임을 이용하여 제1 P 프레임 및 제2 B 프레임을 인코딩하여 제2 P 프레임 및 제2 B 프레임을 생성한다. 제3 I 프레임은 제2 I 프레임으로부터 생성할 수 있고, 제3 I 프레임을 참조 프레임으로 하는 제2 P 프레임 및 제2 B 프레임을 제2 I 프레임과 함께 스트리밍 전송하는 경우, 제2 I 프레임으로부터 생성되는 제3 I 프레임을 참조 프레임으로하여 제2 P 프레임 및 제2 B 프레임으로부터 비디오 영상을 생성할 수 있게 된다. Therefore, even when the second I frame is transmitted by streaming, a resolution conversion is applied to the second I frame in order to generate a video image with good quality from the P frame and the B frame that are streamed and transmitted together using the second I frame. That is, super resolution is performed on the second I frame to generate a third I frame having the same size as the first I frame, and the first P frame and the second B frame are encoded by using the third I frame. 2 P frames and a second B frame are generated. The third I frame may be generated from the second I frame, and in the case of streaming transmission of the second P frame and the second B frame using the third I frame as a reference frame together with the second I frame, from the second I frame A video image can be generated from the second P frame and the second B frame by using the generated third I frame as a reference frame.

제2 I 프레임으로부터 제3 I 프레임을 생성하기 위하여, 프로세서(120)는 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행한다. 이를 위하여, 프로세서(120)는 도 2와 같이, 심층 신경망(121)을 포함할 수 있고, 슈퍼 레졸루션을 수행하기 위한 슈퍼 레졸루션 파라미터(122)를 포함할 수 있다. 심층 신경망(121) 또는 슈퍼 레졸루션 파라미터(122)는 메모리(130)에 저장되어 있을 수 있다. To generate the third I frame from the second I frame, the processor 120 performs super-resolution through a trained deep neural network (DNN). To this end, the processor 120 may include a deep neural network 121 as shown in FIG. 2 , and may include a super resolution parameter 122 for performing super resolution. The deep neural network 121 or the super-resolution parameter 122 may be stored in the memory 130 .

여기서, 심층 신경망(Deep Neural Network, DNN)은 입력층(input layer)과 출력층(output layer) 사이에 다중의 은닉층(hidden layer)을 포함하는 인공 신경망(ANN: Artificial Neural Network)을 의미한다. Here, the deep neural network (DNN) refers to an artificial neural network (ANN) including multiple hidden layers between an input layer and an output layer.

심층 신경망은 다중의 은닉층을 포함하여 다양한 비선형적 관계를 학습할 수 있다. 심층 신경망은 알고리즘에 따라 비지도 학습 방법(unsupervised learning)을 기반으로 하는 심층 신뢰 신경망(DBN: Deep Belief Network), 심층 오토인코더(deep auto encoder) 등이 있고, 이미지와 같은 2차원 데이터 처리를 위한 콘볼루션 신경망(CNN: Convolutional Neural Network), 시계열 데이터 처리를 위한 순환 신경망(RNN: Recurrent Neural Network) 등이 있다. 슈퍼 레졸루션을 수행하도록 트레이닝된 콘볼루션 신경망 등을 이용하여 제2 I 프레임으로부터 제3 I 프레임을 생성할 수 있다. Deep neural networks can learn a variety of nonlinear relationships, including multiple hidden layers. Depending on the algorithm, there are deep trust neural networks (DBN) based on unsupervised learning, deep autoencoders, etc. There are a convolutional neural network (CNN), a recurrent neural network (RNN) for processing time series data, and the like. A third I frame may be generated from the second I frame using a convolutional neural network trained to perform super resolution, or the like.

여기서, 슈퍼 레졸루션(Super Resolution, SR)은 이미지의 해상도를 크게 변환하는 디지털 줌을 수행하는 기능으로, 심층 신경망은 제1 해상도 프레임(123)을 입력받아, 제1 해상도 프레임(123)으로부터 제2 해상도 프레임(124)을 생성하여 출력할 수 있다. 여기서, 제2 해상도는 제1 해상도보다 높은 해상도일 수 있다. 반대로, 제1 해상도가 제2 해상도보다 높을 수 있음은 당연하다.Here, the super resolution (SR) is a function of performing a digital zoom that greatly converts the resolution of the image, and the deep neural network receives the first resolution frame 123 and receives the second resolution frame 123 from the first resolution frame 123 . A resolution frame 124 may be generated and output. Here, the second resolution may be a higher resolution than the first resolution. Conversely, it goes without saying that the first resolution may be higher than the second resolution.

이를 위하여, 심층 신경망은 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다. 심층 신경망에 대한 트레이닝은 제1 I 프레임과 제2 I 프레임을 이용하여 수행될 수 있다. 심층 신경망은 제2 I 프레임을 입력 데이터로 입력받고, 심층 신경망이 출력한 출력 데이터인 제3 I 프레임과 Ground Truth(GT)인 제1 I 프레임을 비교분석하고, 손실함수 및 최적화를 이용하여 심층 신경망의 슈퍼 레졸루션 파라미터(122)를 조정해가며, 출력 데이터인 제3 I 프레임이 GT인 제1 I프레임에 가까워지도록 반복 트레이닝한다. 심층 신경망을 트레이닝시 저해상도 이미지를 최대 8 배 스케일 업하여 8k 해상도의 고해상도 이미지를 생성할 수 있도록 트레이닝할 수 있다. To this end, the deep neural network may be trained so that the third I frame generated by performing super resolution from the second I frame approaches the first I frame. Training for the deep neural network may be performed using the first I frame and the second I frame. The deep neural network receives the second I frame as input data, compares and analyzes the third I frame, which is the output data output by the deep neural network, and the first I frame, which is ground truth (GT), and uses a loss function and optimization to perform deep While adjusting the super-resolution parameter 122 of the neural network, iterative training is performed so that the output data of the third I frame approaches the GT of the first I frame. When training a deep neural network, it can be trained to generate high-resolution images with 8k resolution by scaling up low-resolution images up to 8x.

상기와 같이, 트레이닝된 심층 신경망을 통해 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 제3 I 프레임을 이용하여 제1 이미지 그룹에 포함되는 제1 P 프레임 및 제1 B 프레임을 인코딩하여 제2 P 프레임 및 제2 B 프레임을 생성한다. As described above, the third I frame is generated from the second I frame through the trained deep neural network, and the first P frame and the first B frame included in the first image group are encoded by using the third I frame. 2 P frames and a second B frame are generated.

이후, 제2 I 프레임, 상기 제2 P 프레임, 및 상기 제2 B 프레임을 제2 이미지 그룹으로 재그룹화하여 클라이언트 장치(200)로 송신한다. 앞서 설명한 바와 같이, 제2 I 프레임으로부터 생성할 수 있는 제3 I 프레임을 이용하여 인코딩된 제2 P 프레임 및 제2 B 프레임을 하나의 이미지 그룹인 제2 이미지 그룹으로 클라이언트 장치(200)로 송신함으로써 클라이언트 장치(200)가 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 제3 I 프레임, 제2 P 프레임, 및 제2 B 프레임으로부터 비디오 영상을 생성할 수 있도록 할 수 있다.Thereafter, the second I frame, the second P frame, and the second B frame are regrouped into a second image group and transmitted to the client device 200 . As described above, the second P frame and the second B frame encoded using the third I frame that can be generated from the second I frame are transmitted to the client device 200 as one image group, the second image group. By doing so, the client device 200 may generate a third I frame from the second I frame, and generate a video image from the third I frame, the second P frame, and the second B frame.

스트리밍 전송하고자 하는 비디오 영상은 일반 이미지일 수 있고, 360도 3차원 이미지로 구성될 수 있다. 비디오 영상이 3차원 이미지인 경우 메모리(130) 저장되는 명령어는 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성하는 명령어를 포함할 수 있다.The video image to be transmitted by streaming may be a general image or may be composed of a 360 degree 3D image. When the video image is a 3D image, the command stored in the memory 130 divides the second I frame into 6 sides to generate a fourth I frame of each side, and uses the fourth I frame of each side to generate the third I frame. It may include instructions for generating frames.

3차원 이미지 즉, 360도 영상의 경우, 각 프레임은 3차원을 형성하는 직육면체의 구조로 형성될 수 있다. 3차원 이미지로 구성되는 제2 I 프레임으로부터 제3 I 프레임을 생성함에 있어서, 제2 I 프레임을 6 면으로 나누고, 6 면으로 나누어진 각 면의 제4 I 프레임으로부터 제3 I프레임을 생성할 수 있다. 즉, 심층 신경망(121)은 제4 I 프레임 각각에 대해 슈퍼 레졸루션을 수행하고, 이를 다시 연결함으로서 고해상도의 제3 I 프레임을 생성할 수 있다. 여기서, 3차원 영상이 6 면으로 형성되는 경우를 예로 설명한 것으로, 각 프레임이 직육면체가 아닌 다른 형태의 3차원 영상인 경우, 해당 형태에 따라 제2 I 프레임을 나누어 제4 프레임을 생성하고, 그로부터 제3 I 프레임을 생성할 수 있음은 당연하다.In the case of a three-dimensional image, that is, a 360-degree image, each frame may be formed in a structure of a cuboid that forms three dimensions. In generating the third I frame from the second I frame composed of a three-dimensional image, the second I frame is divided into 6 sides, and the third I frame is generated from the fourth I frame of each side divided into 6 sides. can That is, the deep neural network 121 may generate a high-resolution third I frame by performing super resolution on each of the fourth I frames and reconnecting them. Here, the case in which the 3D image is formed of 6 faces has been described as an example. If each frame is a 3D image having a shape other than a cuboid, the second I frame is divided according to the shape to generate a fourth frame, and from the It goes without saying that the third I frame can be generated.

서버(100)의 프로세서(120)가 클라이언트 장치(200)로 비디오 영상을 스트리밍하기 위하여, 비디오 영상을 처리하는 과정은 도 3과 같이 수행될 수 있다.In order for the processor 120 of the server 100 to stream the video image to the client device 200 , a process of processing the video image may be performed as shown in FIG. 3 .

먼저, 스트리밍하고자 하는 비디오 영상을 연속되는 GoP(Group of Picture, 310~310n))로 구분한다. 여기서, GoP는 앞서 설명한 제1 이미지 그룹에 해당하고, 각 GoP는 제1 I 프레임(I Frame, 321), 제1 P 프레임(P Frame, 322), 제1 B 프레임(B Frame, 323)으로 구성된다.First, video images to be streamed are divided into consecutive GoPs (Group of Pictures, 310 to 310n). Here, GoP corresponds to the first image group described above, and each GoP is composed of a first I frame (I Frame, 321), a first P frame (P Frame, 322), and a first B frame (B Frame, 323). is composed

이후, 각 GoP(310)의 제1 I 프레임(321)을 추출(311)하고, 바이큐빅 보간을 수행(312)하여 사이즈를 줄여 제2 I 프레임을 생성(313)한다. Thereafter, the first I frame 321 of each GoP 310 is extracted ( 311 ), and the size is reduced by performing bicubic interpolation ( 312 ) to generate a second I frame ( 313 ).

이후, 제2 I 프레임에 대해 심층 신경망(DNN)을 통해 슈퍼 레졸루션을 수행(314)한다. 여기서, 심층 신경망은 슈퍼 레졸루션을 수행하는 트레이닝(316)을 수행하여, 트레이닝된 심층 신경망을 저장(315)하여 이용할 수 있다. Thereafter, super resolution is performed on the second I frame through a deep neural network (DNN) ( 314 ). Here, the deep neural network may perform training 316 to perform super-resolution, and store 315 the trained deep neural network for use.

심층 신경망을 통해 제2 I 프레임으로부터 제3 I 프레임(I* Frame, 324)을 생성(317)한다. 제3 I 프레임을 생성함에 있어서, 제2 I 프레임을 6 면의 제4 I 프레임으로 나누고, 제4 I프레임에 대해 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 해상도를 변경할 수 있다. 이때, 각 면은 SR-Face라 할 수 있다. 이후, 슈퍼 레졸루션이 수행된 각 면을 연결(concatenate)하여 제3 I 프레임을 생성할 수 있다.A third I frame (I* Frame, 324) is generated from the second I frame through the deep neural network (317). In generating the third I-frame, the resolution may be changed by dividing the second I-frame into 6-sided fourth I-frames, and performing super-resolution through a deep neural network trained on the fourth I-frame. In this case, each face may be referred to as an SR-Face. Thereafter, the third I frame may be generated by concatenating each surface on which super-resolution has been performed.

이후, 제3 I 프레임(324)를 이용하여 제1 이미지 그룹에 포함되는 제1 P 프레임 및 제1 B 프레임(318)으로부터 제2 P 프레임(P* Frame, 325) 및 제2 B 프레임(B* Frame, 326)을 생성하는 inter frame 인코딩을 수행(319)한다.Thereafter, the second P frame (P* Frame) 325 and the second B frame (B) from the first P frame and the first B frame 318 included in the first image group by using the third I frame 324 . * Inter-frame encoding to generate a frame, 326 is performed (319).

이후, 제2 I 프레임(313), 제2 P 프레임(325) 및 제2 B 프레임(326)을 재그룹화한 제2 이미지 그룹(320~320n)을 클라리언트 장치에 스트리밍 전송한다.Thereafter, the second image groups 320 to 320n obtained by regrouping the second I frame 313 , the second P frame 325 , and the second B frame 326 are transmitted by streaming to the client device.

상기 과정을 통해, 도 4와 같이, 제1 I 프레임, 제1 P 프레임, 및 제1 B 프레임으로 구성되는 비디오 영상의 제1 이미지 그룹을 제2 I프레임, 제2 P 프레임, 및 제2 B 프레임으로 구성되는 제2 이미지 그룹으로 스트리밍 전송할 수 있다. 제1 이미지 그룹에 비해 데이터 양이 감소된 제2 이미지 그룹을 전송할 수 있어, 고용량 비디오 영상을 스트리밍 할 수 있다.Through the above process, as shown in FIG. 4 , the first image group of the video image including the first I frame, the first P frame, and the first B frame is set to the second I frame, the second P frame, and the second B frame. A second image group composed of frames can be streamed and transmitted. Since the second image group having a reduced data amount compared to the first image group can be transmitted, a high-capacity video image can be streamed.

I 프레임의 크기를 줄여 스트리밍 전송할 수 있어, 스트리밍 데이터의 크기를 상당히 줄일 수 있고, 데이터의 크기를 줄였음에도 심층 신경망을 통한 슈퍼 레졸루션을 이용함으로써 품질 좋은 비디오 영상 스트리밍이 가능하다.By reducing the size of the I-frame for streaming transmission, the size of streaming data can be significantly reduced, and even though the size of the data is reduced, high-quality video image streaming is possible by using super-resolution through a deep neural network.

클라이언트 장치(200)는 서버(100)로부터 비디오 영상을 제2 이미지 그룹으로 스트리밍 수신하여 비디오 영상을 생성한다. 클라이언트 장치(200)는 사용자 단말, PC 단말 등 사용자가 서버(100)로부터 비디오 영상을 스트리밍 받아 재생할 수 있는 장치일 수 있다. The client device 200 generates a video image by receiving a streaming video image from the server 100 as a second image group. The client device 200 may be a device in which a user, such as a user terminal or a PC terminal, can receive and play a video image streamed from the server 100 .

클라이언트 장치(200)는 서버(100)로부터 비디오 영상을 제2 이미지 그룹으로 스트리밍 수신하는 통신부(210), 적어도 하나의 프로세서(220) 및, 프로세서(220)에서 수행되는 명령어를 저장하는 메모리를 포함할 수 있고, 비디오 영상을 재생하는 비디오 플레이어(미도시), 재생되는 비디오 영상을 표시하는 디스플레이(미도시) 또는 사용자의 입력을 수신하는 터치스크린, 키보드 등 다양한 입력부(미도시)를 더 포함할 수 있다.The client device 200 includes a communication unit 210 for streaming and receiving a video image from the server 100 as a second image group, at least one processor 220 , and a memory for storing instructions executed by the processor 220 . and may further include various input units (not shown) such as a video player (not shown) for playing a video image, a display (not shown) for displaying a reproduced video image, or a touch screen for receiving a user's input, a keyboard, etc. can

통신부(210)는 서버(100)로부터 제2 이미지 그룹을 수신할 수 있다. 또한, 통신부(210)는 서버(100)로부터 심층 신경망을 수신할 수 있다. 앞서 설명한 바와 같이, 제2 P 프레임 및 제2 B 프레임으로부터 비디오 영상을 생성하기 위해서는 제2 P 프레임 및 제2 B 프레임 인코딩시 이용된 제3 I 프레임이 필요하고, 제2 I 프레임으로부터 제3 I 프레임을 생성하기 위해서는 서버(100)에서 이용된 심층 신경망을 이용하여야 하는바, 통신부(210)는 서버(100)로부터 심층 신경망을 수신하여, 제2 I 프레임으로부터 제3 I 프레임을 생성하는데 이용할 수 있다.The communication unit 210 may receive the second image group from the server 100 . Also, the communication unit 210 may receive the deep neural network from the server 100 . As described above, in order to generate a video image from the second P frame and the second B frame, the third I frame used in encoding the second P frame and the second B frame is required, and the third I frame from the second I frame is required. In order to generate a frame, it is necessary to use the deep neural network used in the server 100, and the communication unit 210 may receive the deep neural network from the server 100 and use it to generate a third I frame from the second I frame. have.

프로세서(220)는 메모리(230)에 저장된 명령어를 수행하여, 서버(100)로 수신한 제2 이미지 그룹으로부터 비디오 영상을 생성한다.The processor 220 generates a video image from the second image group received by the server 100 by executing a command stored in the memory 230 .

보다 구체적으로, 프로세서(220)는 통신부(210) 및 메모리(230)를 포함하는 서버(100)의 구성들에 대한 제어를 수행하며, 각 구성에 저장되거나 생성되는 데이터에 대한 처리를 수행한다. 프로세서(220)는 하나 또는 복수의 프로세서를 포함할 수 있고, 복수의 프로세서(220)는 각각 독립적으로 동작하거나 종속적으로 동작할 수도 있다. 프로세서(220)는 메모리(230) 상에 저장된 명령어를 독출하여, 해당 명령에 따른 동작을 처리할 수 있고, 비디오 플레이어 및 디스플레이를 제어하여 프로세서(220)에서 처리된 비디오 영상을 재생할 수 있다.More specifically, the processor 220 controls the components of the server 100 including the communication unit 210 and the memory 230 , and processes data stored or generated in each component. The processor 220 may include one or a plurality of processors, and the plurality of processors 220 may each operate independently or operate dependently. The processor 220 may read a command stored in the memory 230 , process an operation according to the command, and control the video player and display to reproduce the video image processed by the processor 220 .

메모리(230)는 프로세서(220)에서 수행되는 명령어를 저장한다.The memory 230 stores instructions executed by the processor 220 .

보다 구체적으로, 메모리(230)에 저장되는 명령어는 서버(100)에서 수신한 제2 이미지 그룹으로부터 비디오 영상을 디코딩함에 있어서, 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 상기 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성하고, 상기 제3 I 프레임을 이용하여 상기 제2 이미지 그룹에 포함된 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성하며, 상기 클라이언트 장치가 상기 제3 I 프레임과 상기 제1 P 프레임 및 상기 제1 B 프레임 중 적어도 하나를 이용하여 비디오 영상을 생성하는 명령어를 포함할 수 있다.More specifically, the command stored in the memory 230 is included in the second image group by performing super resolution through a trained deep neural network in decoding the video image from the second image group received from the server 100 . A third I frame is generated from the second I frame, and the second P frame or the second B frame included in the second image group is decoded using the third I frame to decode the first P frame or the first B frame. and generating a frame, and the client device generates a video image by using at least one of the third I frame, the first P frame, and the first B frame.

제2 이미지 그룹을 수신하는 경우, 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성한다. 여기서, 심층 신경망은 서버(100)로부터 수신하거나, 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다. 앞서 설명한 바와 같이, 통신부(210)가 서버(100)로부터 수신한 심층 신경망을 이용하여 제2 I 프레임으로부터 제3 I 프레임을 생성할 수 있다. 동일한 심층 신경망을 이용하여 제2 I 프레임으로부터 제3 I 프레임을 생성함으로써, 제3 I 프레임을 참조 프레임으로 제2 P 프레임 또는 제2 B 프레임으로부터 비디오 영상을 생성시 품질 좋은 비디오 영상을 생성할 수 있다. 또는, 앞서 설명한 심층 신경망에 대한 트레이닝 과정을 수행하여 트레이닝된 별도의 심층 신경망을 이용하여 제2 I 프레임으로부터 제3 I 프레임을 생성할 수도 있다.When receiving the second image group, super resolution is performed through the trained deep neural network to generate a third I frame from the second I frame included in the second image group. Here, the deep neural network may be trained so that the third I frame generated by receiving from the server 100 or performing super resolution from the second I frame approaches the first I frame. As described above, the communication unit 210 may generate the third I frame from the second I frame using the deep neural network received from the server 100 . By generating the third I frame from the second I frame using the same deep neural network, a high-quality video image can be generated when the video image is generated from the second P frame or the second B frame using the third I frame as a reference frame. have. Alternatively, the third I frame may be generated from the second I frame using a separate deep neural network trained by performing the training process for the deep neural network described above.

트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성한 이후, 제3 I 프레임을 이용하여 제2 이미지 그룹에 포함된 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성한다. 제2 이미지 그룹은 연속되는 프레임의 수에 따라 하나의 제2 I 프레임만을 포함하거나, 하나의 제2 I 프레임과 제2 P 프레임 및 제2 B 프레임 중 적어도 하나 이상을 포함할 수 있다. 제2 P 프레임과 제2 B 프레임은 제3 I 프레임을 이용하여 인코딩되어 서버(100)로부터 전송되는바, 제2 프레임으로부터 생성된 제3 I 프레임을 이용하여 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성한다.After super-resolution is performed through the trained deep neural network to generate a third I frame from the second I frame included in the second image group, the second P frame included in the second image group using the third I frame Alternatively, the second B frame is decoded to generate a first P frame or a first B frame. The second image group may include only one second I frame or at least one of one second I frame, a second P frame, and a second B frame according to the number of consecutive frames. The second P frame and the second B frame are encoded using the third I frame and transmitted from the server 100, and the second P frame or the second B frame is obtained using the third I frame generated from the second frame. is decoded to generate a first P frame or a first B frame.

이후, 상기 제3 I 프레임과 상기 제1 P 프레임 및 상기 제1 B 프레임 중 적어도 하나를 이용하여 비디오 영상을 생성한다. 제3 I 프레임, 제1 P 프레임, 및 제1 B 프레임은 비디오 영상의 제1 이미지 그룹에 대응되는 프레임들로, 해당 프레임들을 이용하여 비디오 영상을 생성할 수 있다. 이와 같이, 생성된 비디오 영상은 비디오 플레이어에서 재생되어 사용자에게 스트리밍으로 제공될 수 있다.Thereafter, a video image is generated using at least one of the third I frame, the first P frame, and the first B frame. The third I frame, the first P frame, and the first B frame are frames corresponding to the first image group of the video image, and a video image may be generated using the frames. In this way, the generated video image may be reproduced in a video player and provided to a user as streaming.

상기 비디오 영상은 3차원 360도 이미지로 구성될 수 있고, 이 경우, 제2 I 프레임으로부터 제3 I 프레임을 생성함에 있어서, 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성할 수 있다. 앞서 설명한 바와 같이, 심층 신경망은 하나의 면에 대해 트레이닝될 수 있는바, 6 면으로 형성되는 제2 I 프레임을 6 면으로 나누고, 각 면에 대해 슈퍼 레졸루션을 적용하고 다시 결합함으로써 제3 I 프레임을 생성할 수 있다.The video image may be composed of a three-dimensional 360-degree image. In this case, in generating the third I frame from the second I frame, the second I frame is divided into 6 sides to obtain the fourth I frame of each side. and the third I frame may be generated using the fourth I frame of each side. As described above, a deep neural network can be trained on one facet, so the third I frame is formed by dividing the second I frame formed of six faces into six faces, applying super-resolution to each face, and combining them again. can create

클라이언트 장치(200)의 프로세서(220)가 서버(100)로부터 처리되어 수신된 비디오 영상을 스트리밍하기 위하여, 제2 이미지 그룹을 처리하는 과정은 도 5와 같이 수행될 수 있다. In order for the processor 220 of the client device 200 to stream the video image processed and received from the server 100 , the process of processing the second image group may be performed as shown in FIG. 5 .

서버(100)로부터 제2 I 프레임, 제2 P 프레임, 및 제2 B 프레임으로 형성되는 제2 이미지 그룹을 수신하면, 제2 이미지 그룹에 포함된 제2 I 프레임을 추출(511)하고, 제2 I 프레임에 대해 심층 신경망(DNN)을 통해 슈퍼 레졸루션을 수행(512)한다. 여기서, 심층 신경망은 서버(100)로부터 수신하여 저장하거나 슈퍼 레졸루션을 수행하는 트레이닝(514)을 수행하고, 트레이닝된 심층 신경망을 로드(513)하여 이용할 수 있다. Upon receiving the second image group formed of the second I frame, the second P frame, and the second B frame from the server 100, the second I frame included in the second image group is extracted (511), Super-resolution is performed 512 through a deep neural network (DNN) for 2 I frames. Here, the deep neural network may be used by receiving and storing from the server 100 or performing training 514 for performing super resolution, and loading 513 for the trained deep neural network.

심층 신경망을 통해 제2 I 프레임으로부터 제3 I 프레임(I* Frame, 324)을 생성(515)한다. 제3 I 프레임을 생성함에 있어서, 제2 I 프레임을 6 면의 제4 I 프레임으로 나누고, 제4 I프레임에 대해 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 해상도를 변경할 수 있다. 이때, 각 면은 SR-Face라 할 수 있다. 이후, 슈퍼 레졸루션이 수행된 각 면을 연결(concatenate)하여 제3 I 프레임을 생성할 수 있다.A third I frame (I* Frame, 324) is generated from the second I frame through the deep neural network (515). In generating the third I-frame, the resolution may be changed by dividing the second I-frame into 6-sided fourth I-frames, and performing super-resolution through a deep neural network trained on the fourth I-frame. In this case, each face may be referred to as an SR-Face. Thereafter, the third I frame may be generated by concatenating each surface on which super-resolution has been performed.

이후, 제3 I 프레임(324)를 이용하여 제2 이미지 그룹에 포함되는 제2 P 프레임 및 제2 B 프레임(516)으로부터 제1 P 프레임(322) 및 제1 B 프레임(323)을 생성하는 inter frame 디코딩을 수행(517)한다.Thereafter, the first P frame 322 and the first B frame 323 are generated from the second P frame and the second B frame 516 included in the second image group by using the third I frame 324 . Inter-frame decoding is performed (517).

이후, 제3 I 프레임(324), 제1 P 프레임(322) 및 제1 B 프레임(323)을 그룹화하고, 각 제2 이미지 그룹(518~518n)을 이용하여 비디오 영상을 생성한다. 생성된 비디오 영상은 비디오 플레이어에 의해 재생되어 디스플레이를 통해 사용자에게 제공한다. Thereafter, the third I frame 324 , the first P frame 322 , and the first B frame 323 are grouped, and a video image is generated using each of the second image groups 518 to 518n . The generated video image is reproduced by a video player and provided to a user through a display.

상기 과정을 통해, 도 6과 같이, 제2 I 프레임, 제2 P 프레임, 및 제2 B 프레임으로 구성되는 비디오 영상의 제2 이미지 그룹(510)을 제3 I프레임, 제1 P 프레임, 및 제1 B 프레임으로 구성되는 비디오 영상(518)으로 스트리밍 전송할 수 있다. 데이터 양이 감소된 제2 이미지 그룹을 수신하되, 그로부터 고용량의 영상을 생성하여 재생할 수 있다. 이를 통해, 고용량 비디오 영상을 스트리밍 할 수 있다.Through the above process, as shown in FIG. 6 , the second image group 510 of the video image including the second I frame, the second P frame, and the second B frame is converted into a third I frame, a first P frame, and A video image 518 composed of the first B frame may be transmitted by streaming. A second image group having a reduced data amount may be received, and a high-capacity image may be generated and reproduced therefrom. Through this, high-capacity video images can be streamed.

본 발명의 일 실시예에 따른 서버(100)와 클라이언트 장치(200)의 트레이닝된 슈퍼 레졸루션을 이용한 비디오 영상 스트리밍은 다른 스트리밍에 비해 대역폭 감소에 있어서, 해상도 변화를 이용하지 않는 Cube를 이용하는 경우에 비해, 슈퍼 레졸루션 x4를 적용하는 경우 62.96% 및 x8을 적용하는 경우 62.55 %로 감소하며, 클라이언트 장치에서의 처리지연시간 및 디코딩된 비디오 영상의 품질에 있어서도 상당한 효과가 있다.Video image streaming using super-resolution trained by the server 100 and the client device 200 according to an embodiment of the present invention reduces bandwidth compared to other streaming, compared to the case of using a Cube that does not use a resolution change. , it is reduced to 62.96% when super-resolution x4 is applied and 62.55% when x8 is applied, which has a significant effect on processing delay time in the client device and the quality of decoded video images.

도 7은 본 발명의 일 실시예에 따른 서버(100)와 클라이언트 장치(200) 간 비디오 영상 스트리밍을 다른 스트리밍 방식과의 비교를 위해 스트리밍 방법에 따른 사용자의 피드백을 나타낸 그래프이다.7 is a graph illustrating a user's feedback according to a streaming method for comparing video image streaming between the server 100 and the client device 200 with other streaming methods according to an embodiment of the present invention.

비교를 위해, Cube, 업스케일링으로 바이큐빅을 이용하는 Bicubic(Bi-Cx4, Bi-Cx8), 본 발명의 일 실시예에 따른 트레이닝된 심층 신경망을 통한 슈퍼 레졸루션(SRx4, SRx8)을 비교한 결과이다. 가로축 1 내지 5는 각각 1 Bad, 2 poor, 3 Fair, 4 Good, 5 Excellent를 의미한다. Bi-Cx8이 평균 1.8 점으로 가장 낮은 결과를 나타내는 것을 확인할 수 있고, Bi-Cx4는 3.0임을 확인할 수 있다. Cube(4.044), SRx4(4.093), SRx8(4.040)은 큰 차이 없이, 높은 품질을 만족하는 것을 확인할 수 있다.For comparison, Cube, Bicubic (Bi-Cx4, Bi-Cx8) using Bicubic as upscaling, and super resolution (SRx4, SRx8) through a deep neural network trained according to an embodiment of the present invention are compared. . Horizontal axes 1 to 5 mean 1 Bad, 2 poor, 3 Fair, 4 Good, and 5 Excellent, respectively. It can be seen that Bi-Cx8 has the lowest result with an average score of 1.8, and it can be confirmed that Bi-Cx4 is 3.0. Cube (4.044), SRx4 (4.093), and SRx8 (4.040) have no significant difference, and it can be seen that high quality is satisfied.

따라서, 본 발명의 일 실시예에 따른 서버(100)와 클라이언트 장치(200) 간 비디오 영상 스트리밍을 이용하는 경우, 데이터의 양을 줄여, 필요 대역폭을 감소시킬 수 있으며, 사용자에게 높은 품질의 비디오 영상을 제공할 수 있음은 알 수 있다.Therefore, in the case of using video image streaming between the server 100 and the client device 200 according to an embodiment of the present invention, the amount of data can be reduced, the required bandwidth can be reduced, and a high-quality video image can be provided to the user. It is known that it can be provided.

본 발명의 일 실시예에 따른 비디오 영상 스트리밍 시스템은 본 발명의 실시예에 따른 서버(100) 및 본 발명의 실시예에 따른 클라이언트 장치(200)를 포함하여 구현될 수 있다. 본 발명의 일 실시예에 따른 비디오 영상 스트리밍 시스템에 대한 상세한 설명은 도 1 내지 도 7의 서버(100) 및 클라이언트 장치(200)에 대한 상세한 설명에 대응된다.A video image streaming system according to an embodiment of the present invention may be implemented including the server 100 according to the embodiment of the present invention and the client device 200 according to the embodiment of the present invention. A detailed description of the video image streaming system according to an embodiment of the present invention corresponds to the detailed description of the server 100 and the client device 200 of FIGS. 1 to 7 .

도 8은 본 발명의 일 실시예에 따른 서버의 서버와 클라이언트 장치 간 비디오 인코딩 및 스트리밍 방법의 흐름도이고, 도 9 및 도 10은 본 발명의 다른 실시예에 따른 서버의 서버와 클라이언트 장치 간 비디오 인코딩 및 스트리밍 방법의 흐름도이다. 도 8 내지 도 10의 각 단계에 대한 상세한 설명은 도 1 내지 도 7의 서버에 대한 상세한 설명에 대응되는바, 이하 중복되는 설명은 생략하도록 한다.8 is a flowchart of a video encoding and streaming method between a server and a client device of a server according to an embodiment of the present invention, and FIGS. 9 and 10 are video encoding between a server and a client device of a server according to another embodiment of the present invention and a flow chart of a streaming method. A detailed description of each step of FIGS. 8 to 10 corresponds to the detailed description of the server of FIGS. 1 to 7 , and a redundant description will be omitted below.

서버와 클라이언트 장치 간 비디오 스트리밍 방법에 있어서, 서버가 S11 단계에서 비디오 영상을 복수의 제1 이미지 그룹으로 구분하고, S12 단계에서 상기 각 제1 이미지 그룹의 제1 I 프레임을 추출하고, 상기 제1 I 프레임의 사이즈를 줄여 제2 I 프레임을 생성하고, S13 단계에서 트레이닝된 심층 신경망(DNN)를 통해 슈퍼 레졸루션을 수행하여 상기 제2 I 프레임으로부터 제3 I 프레임을 생성한다.In the video streaming method between a server and a client device, the server divides a video image into a plurality of first image groups in step S11, extracts a first I frame of each of the first image groups in step S12, and the first A second I frame is generated by reducing the size of the I frame, and super resolution is performed through a deep neural network (DNN) trained in step S13 to generate a third I frame from the second I frame.

상기 비디오 영상은 3차원 360도 이미지로 구성될 수 있고, 이때, 제2 I 프레임으로부터 제3 I 프레임을 생성함에 있어서, S21 단계에서 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, S22 단계에서 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성할 수 있다.The video image may be composed of a three-dimensional 360-degree image. In this case, in generating the third I frame from the second I frame, the second I frame is divided into 6 sides in step S21, and the fourth I of each side is A frame may be generated, and the third I frame may be generated using the fourth I frame of each side in step S22.

상기 심층 신경망은 S31 단계에서 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다.The deep neural network may be trained so that the third I frame generated by performing super resolution from the second I frame in step S31 approaches the first I frame.

제2 I 프레임으로부터 제3 I 프레임을 생성한 이후, S14 단계에서 상기 제3 I 프레임을 이용하여 상기 각 제1 이미지 그룹에 포함되는 제1 P 프레임 또는 제1 B 프레임을 인코딩하여 제2 P 프레임 또는 제2 B 프레임을 생성한다.After generating the third I frame from the second I frame, the first P frame or the first B frame included in each first image group is encoded using the third I frame in step S14 to form a second P frame Alternatively, a second B frame is generated.

이후, S15 단계에서 상기 제2 I 프레임과 상기 제2 P 프레임 및 상기 제2 B 프레임 중 적어도 하나를 제2 이미지 그룹으로 재그룹화하여 클라이언트 장치로 송신한다.Thereafter, in step S15, at least one of the second I frame, the second P frame, and the second B frame is regrouped into a second image group and transmitted to the client device.

도 11은 본 발명의 본 발명의 일 실시예에 따른 클라이언트 장치의 서버와 클라이언트 장치 간 비디오 디코딩 및 스트리밍 방법의 흐름도이고, 도 12 및 도 13은 본 발명의 다른 실시예에 따른 클라이언트 장치의 서버와 클라이언트 장치 간 비디오 디코딩 및 스트리밍 방법의 흐름도이다. 도 11 내지 도 13의 각 단계에 대한 상세한 설명은 도 1 내지 도 7의 클라이언트 장치에 대한 상세한 설명에 대응되는바, 이하 중복되는 설명은 생략하도록 한다.11 is a flowchart of a video decoding and streaming method between a server and a client device of a client device according to an embodiment of the present invention, and FIGS. 12 and 13 are a server and a server of a client device according to another embodiment of the present invention. A flow chart of a method for decoding and streaming video between client devices. The detailed description of each step of FIGS. 11 to 13 corresponds to the detailed description of the client device of FIGS. 1 to 7 , and thus the redundant description will be omitted.

서버와 클라이언트 장치 간 비디오 스트리밍 방법에 있어서, 클라이언트 장치가 S41 단계에서 서버로부터 제2 이미지 그룹을 수신하고, 트레이닝된 심층 신경망을 통해 슈퍼 레졸루션을 수행하여 상기 제2 이미지 그룹에 포함된 제2 I 프레임으로부터 제3 I 프레임을 생성한다.In a video streaming method between a server and a client device, the client device receives a second image group from the server in step S41, performs super resolution through a trained deep neural network, and the second I frame included in the second image group A third I frame is generated from

상기 비디오 영상은 3차원 360도 이미지로 구성될 수 있고, 이때, 제2 I 프레임으로부터 제3 I 프레임을 생성함에 있어서, S51 단계에서 상기 제2 I 프레임을 6 면으로 나누어 각 면의 제4 I 프레임을 생성하고, S52 단계에서 상기 각 면의 제4 I 프레임을 이용하여 상기 제3 I 프레임을 생성할 수 있다.The video image may be composed of a three-dimensional 360-degree image. In this case, in generating the third I frame from the second I frame, the second I frame is divided into 6 sides in step S51, and the fourth I of each side is A frame may be generated, and the third I frame may be generated using the fourth I frame of each side in step S52.

상기 심층 신경망은 S61 단계에서 상기 서버로부터 수신하거나, 제2 I 프레임으로부터 슈퍼 레졸루션을 수행하여 생성되는 제3 I 프레임이 제1 I 프레임에 가까워지도록 트레이닝될 수 있다.The deep neural network may be trained such that the third I frame, which is received from the server in step S61 or generated by performing super resolution from the second I frame, approaches the first I frame.

제2 I 프레임으로부터 제3 I 프레임을 생성한 이후, S43 단계에서 상기 클라이언트 장치가 상기 제3 I 프레임을 이용하여 상기 제2 이미지 그룹에 포함된 제2 P 프레임 또는 제2 B 프레임을 디코딩하여 제1 P 프레임 또는 제1 B 프레임을 생성하고, S44 단계에서 상기 클라이언트 장치가 상기 제3 I 프레임과 상기 제1 P 프레임 및 상기 제1 B 프레임 중 적어도 하나를 이용하여 비디오 영상을 생성한다.After generating the third I frame from the second I frame, the client device decodes the second P frame or the second B frame included in the second image group by using the third I frame in step S43 1 P frame or a first B frame is generated, and in step S44, the client device generates a video image by using the third I frame and at least one of the first P frame and the first B frame.

비디오 영상을 생성한 이후, 사용자의 입력 등에 따라 비디오 영상을 비디오 플레이어에서 재생하여 디스플레이를 통해 사용자에게 스트리밍 제공할 수 있다.After the video image is generated, the video image may be reproduced in a video player according to a user's input, etc., and may be provided as streaming to the user through a display.

한편, 본 발명의 실시예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. Meanwhile, the embodiments of the present invention can be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. Also, computer-readable recording media are distributed in networked computer systems. , computer-readable code can be stored and executed in a distributed manner. And functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention pertains.

본 실시 예와 관련된 기술 분야에서 통상의 지식을 가진 자는 상기된 기재의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 방법들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Those of ordinary skill in the art related to this embodiment will understand that it can be implemented in a modified form within a range that does not deviate from the essential characteristics of the above description. Therefore, the disclosed methods are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100: 서버
110, 210: 통신부
120, 220: 프로세서
130, 230: 메모리
200: 클라이언트 장치100: server
110, 210: communication unit
120, 220: processor
130, 230: memory
200: client device

Claims

a communication unit for streaming and transmitting a video image to a client device;
at least one processor; and
and a memory for storing instructions executed by the processor;
The instructions stored in the memory are
Divide the video image to be streamed into a plurality of first image groups,
extracting the first I frame of each first image group, and reducing the size of the first I frame to generate a second I frame,
performing super resolution through a trained deep neural network (DNN) to generate a third I frame from the second I frame,
generating a second P frame or a second B frame by encoding a first P frame or a first B frame included in each first image group using the third I frame;
and a command for regrouping at least one of the second I frame, the second P frame, and the second B frame into a second image group and transmitting the regrouping command to the client device;
The video image is composed of a three-dimensional image,
The command stored in the memory is
and dividing the second I frame into 6 sides to generate a fourth I frame of each side, and generating the third I frame by using the fourth I frame of each side.

According to claim 1,
The command stored in the memory is
and a command for generating the second I frame from the first I frame using bi-cubic interpolation.

According to claim 1,
The deep neural network is
A server, characterized in that the third I frame generated by performing super resolution from the second I frame is trained to approximate the first I frame.

delete

a communication unit receiving streaming video images from the server to a second image group;
at least one processor; and
and a memory for storing instructions executed by the processor;
The instructions stored in the memory are
performing super resolution through a trained deep neural network to generate a third I frame from the second I frame included in the second image group;
decoding a second P frame or a second B frame included in the second image group using the third I frame to generate a first P frame or a first B frame,
and instructions for generating a video image by using at least one of the third I frame, the first P frame, and the first B frame,
The video image is composed of a three-dimensional image,
The command stored in the memory is
and a command for dividing the second I frame into 6 sides to generate a fourth I frame of each side, and generating the third I frame by using the fourth I frame of each side. .

delete

6. The method of claim 5,
The deep neural network is
The client device according to claim 1, wherein the third I frame, which is received from the server or generated by performing super resolution from the second I frame, is trained to approximate the first I frame.

A method for streaming video between a server and a client device, the method comprising:
classifying, by the server, the video image into a plurality of first image groups;
generating, by the server, a first I frame of each of the first image groups and reducing the size of the first I frame to generate a second I frame;
generating, by the server, a third I frame from the second I frame by performing super resolution through a trained deep neural network (DNN);
generating, by the server, a first P frame or a first B frame included in each of the first image groups by using the third I frame to generate a second P frame or a second B frame; and
regrouping, by the server, at least one of the second I frame, the second P frame, and the second B frame into a second image group and transmitting to the client device;
The video image is composed of a three-dimensional image,
The step of generating the third I frame comprises:
dividing the second I frame into 6 sides to generate a fourth I frame of each side; and
and generating the third I frame using the fourth I frame of each side.

delete

9. The method of claim 8,
The deep neural network is
The method of claim 1, wherein a third I frame generated by performing super resolution from the second I frame is trained to approximate the first I frame.

A method for streaming video between a server and a client device, the method comprising:
receiving, by the client device, a second group of images from the server;
generating, by the client device, a third I frame from the second I frame included in the second image group by performing super resolution through a trained deep neural network; and
generating, by the client device, a first P frame or a first B frame by decoding a second P frame or a second B frame included in the second image group by using the third I frame;
generating, by the client device, a video image by using at least one of the third I frame, the first P frame, and the first B frame;
The video image is composed of a three-dimensional image,
The step of generating the third I frame comprises:
dividing the second I frame into 6 sides to generate a fourth I frame of each side; and
and generating the third I frame using the fourth I frame of each side.

delete

12. The method of claim 11,
The deep neural network is
Method according to claim 1, wherein the third I frame, which is received from the server or generated by performing super resolution from the second I frame, is trained to approximate the first I frame.