KR20020072478A

KR20020072478A - Streaming method by moving picture compression method using SPEG

Info

Publication number: KR20020072478A
Application number: KR1020010012490A
Authority: KR
Inventors: 김정길; 신승환; 홍성렬
Original assignee: 블럭엠 주식회사
Priority date: 2001-03-10
Filing date: 2001-03-10
Publication date: 2002-09-16

Abstract

PURPOSE: A streaming method using a motion picture compression method employing SPEG(Stream Picture Expert Group) is provided to remove the buffering process and resolve the overload of a streaming server while minimizing the load of lines by minimizing the band width of image and voice signals. CONSTITUTION: A streaming method using a motion picture compression method employing SPEG includes the steps of setting threshold values of frames to compress and compressing the frames individually, assigning index values per image and aligning the images in sequence, removing overlaying parts after the individual images are compressed while remaining a single index value, residing internal operators of a streaming server in user groups of clients when the clients are connected via the above steps, for carrying out the reciprocal synchronization without any additional players, and directly sending the internal operators to applet or class on a common browser on the web by using java.

Description

Streaming method by moving picture compression method using SPEG}

본 발명은 SPEG을 이용한 동영상 압축방법을 사용하여 스트리밍 하는 방법에 관한 것으로, 자세하게는 별도의 장비나 전용플레이어 없이 압축하여 실시간으로 인터넷상에서 오디오/비디오를 포함한 멀티미디어 데이터를 주고받는 방법에 관한 것이다.The present invention relates to a method of streaming using a video compression method using SPEG, and more particularly, to a method for transmitting and receiving multimedia data including audio / video over the Internet by compressing without a separate device or a dedicated player.

스트리밍(Streaming)이란 인터넷에서 음성 및 동영상을 실시간으로 받아볼 수 있는 기술로, 오디오와 비디오 등 멀티미디어 콘텐츠를 인터넷 웹에 구현하는 인터넷 솔루션을 말한다. 스트리밍은 멀티미디어 데이터를 인터넷을 통해 PC로 전송해주며 방대한 동영상 자료를 인터넷으로 보낼 경우 요구되었던 시간을 최소한으로 줄일 수 있게 한다.Streaming is a technology that can receive voice and video in real time from the internet. It is an internet solution that implements multimedia contents such as audio and video on the internet web. Streaming transfers multimedia data to the PC via the Internet and minimizes the time required to send massive video data to the Internet.

스트리밍 서비스는 네트워크 대역폭에 따라 영향을 받는다는 점이 가장 큰 단점이고, 더욱 강력한 PC를 요구하고 있기 때문에 제한적 서비스라는 한계가 있었다. 하지만 최근 들어 인터넷 전용선이 빠르게 확장되고 있고 PC도 고성능화 되고 있어 이러한 문제는 점진적으로 사라지고 있다.The biggest drawback is that streaming services are affected by network bandwidth, and they are limited because they require more powerful PCs. However, with the recent expansion of the dedicated Internet line and the increase in PCs, these problems are gradually disappearing.

인터넷 스트리밍의 판매시장은 기존 방송사, 신문사를 중심으로 한 인터넷 미디어업체의 부상과 인터넷 방송, 근거리통신망(LAN)을 통한 사내방송, 원격교육등이다. 스트리밍 기술을 활용하면 인터넷을 통한 실시간 방송뿐 아니라 나아가 주문형비디오(VOD)서비스까지 가능하기 때문에 인터넷 방송분야의 핵심 기술로 자리잡고 있다. 또한 이러한 기술은 우선 스트리밍의 중심이 되는 압축 원천 기술과 멀티미디어 스트리밍 기술로 나뉜다고 말할 수 있겠다. 또한, 현시점의 대다수의 멀티미디어 스트리밍 기술들의 원천은 mpeg 방식을 따르고 있는 것 또한 사실이다.The market for the sale of Internet streaming is the rise of Internet media companies centered on existing broadcasters and newspapers, Internet broadcasting, in-house broadcasting through local area networks (LAN), and distance learning. The use of streaming technology enables not only real-time broadcasting through the Internet but also video on demand (VOD) service, which is becoming a core technology in the Internet broadcasting field. In addition, these technologies can be said to be divided into compression source technology and multimedia streaming technology, which are the core of streaming. It is also true that the current source of the majority of multimedia streaming technologies follows the mpeg scheme.

그러나 고질적인 버퍼링과 다운의 문제와 스트리밍 서버의 가격문제에서 많은 걸림돌을 가지는 것 또한 사실이며 서버 과부하에 대한 해결책도 없는 것 또한 사실이다.However, it is also true that there are many obstacles in the problems of chronic buffering and down and the price of streaming server, and there is also no solution to server overload.

우선 스트리밍 기술의 근간이 되는 멀티미디어 스트리밍 기술에 대해서 설명해 보겠다. 멀티미디어 스트리밍이란 인터넷상에서 오디오/비디오를 포함한 멀티미디어 데이터를 다운로드가 아닌 실시간으로 주고 받는 기술을 말한다. 스트리밍이란 정보를 한번에 모두 받아 처리하는 것이 아니고, 처리에 충분한 정보를 조금씩 지속적으로 받아 처리하는 기술이다.First, the multimedia streaming technology that is the basis of the streaming technology will be described. Multimedia streaming refers to a technology for transmitting and receiving multimedia data including audio / video over the Internet in real time rather than for downloading. Streaming is a technology that does not receive and process all information at once, but continuously receives and processes enough information for processing.

이부분에서 버퍼링이라는 과정을 가지게 되며, 멀티미디어 스트리밍 기술은 이 과정에서 음성신호와 영상신호의 상호 연동 과정을 거치게 된다. 또한 한꺼번에 모든 음성신호와 영상 신호의 연결이 불가능하므로 정규적인 시간을 주면서 사용자로 하여금 기다림의 시간 즉. 버퍼링 과정을 반복하게 되는 것이다.In this part, it has a process called buffering, and multimedia streaming technology goes through the interworking process of audio signal and video signal in this process. In addition, since all audio signals and video signals cannot be connected at once, the time of waiting for the user while giving a regular time. The buffering process is repeated.

보다 더 자세히 멀티미디어 스트리밍에 관해서 설명해 보겠다. 멀티미디어 스트리밍은 일반적으로 멀티미디어 데이터의 주문형(On-Demand) 서비스<VOD>와 생방송 서비스에 이용된다. 개인용 컴퓨터에서 사용자는 전체 파일을 다운로드 하지않고도 멀티미디어 스트림을 직접 재생할 수 있다. 웹 페이지상의 버튼을 간단히 클릭함으로써 해당 이벤트가 수초 안(버퍼링시간 단축)에 시작하게 된다. 멀티미디어 스트리밍 이벤트는 실시간으로 제공되거나 필요에 따라 언제든지 볼 수 있도록 저장될 수 있다.I will explain multimedia streaming in more detail. Multimedia streaming is generally used for on-demand <VOD> and live broadcast services of multimedia data. On personal computers, users can play multimedia streams directly without having to download the entire file. By simply clicking a button on the web page, the event starts in seconds (shortening the buffering time). Multimedia streaming events can be provided in real time or stored for viewing at any time as needed.

이 부분을 전담하게 되는 것이 스트리밍 서버이며, 서버에 연결할 수 있는 동작을 가능하게 하는 것이 바로 웹서버의 기능이라 하겠다. 대용량의 서버가 필요할 이유가 바로 이러한 점 때문이다. 주문형 서비스는 필요한 것을 바로바로 볼 수 있다는 장점과 동작의 계연성에서 나오는 장점이 있으나, 외부에서 멀리 떨어져서 서버에 접속해 동작할 때는 마찬가지로 회선에 따라 수초에서 몇 분가량의 버퍼링 과정을 거치게 되며 심한 경우 재접속의 과정을 사용자에게 요구하게 된다.This part is dedicated to the streaming server, and it is the function of the web server that enables the connection to the server. That's why you need a large server. On-demand services have the advantage of seeing what you need right away and the continuity of the operation, but when you are connected to a server remotely from the outside, it takes a few seconds to several minutes depending on the line, and in severe cases The user will be asked to proceed with reconnection.

멀티미디어 스트리밍 서비스는 다음과 같은 서비스에 이용된다.The multimedia streaming service is used for the following services.

(1)주문형(On-Demand) 서비스(1) On-Demand Service

멀티미디어 데이터를 클라이언트가 원할 때 제공해 주는 주문형 서비스로, 디스크에 대용량으로 저장된 멀티미디어 데이터를 실시간으로 전송하고자 할 때 멀티미디어 스트리밍(Streaming Media)시스템을 이용한다.It is an on-demand service that provides multimedia data when a client wants it. The multimedia streaming system is used to transmit multimedia data stored in large quantities on a disk in real time.

(2)생방송(Live Broadcasting)(2) Live Broadcasting

비디오 캠코더의 영상 또는 마이크의 음성을 클라이언트에게 실시간으로 제공하여 생방송 서비스를 할 수 있다. 이 생방송 서비스를 이용하여 인터넷 가상 방송국을 차릴 수 있다.The live broadcast service can be provided by providing the video of the video camcorder or the voice of the microphone to the client in real time. This live broadcasting service can be used to establish an Internet virtual broadcasting station.

멀티미디어 스트리밍기술의 적용분야Application field of multimedia streaming technology

멀티미디어 스트리밍 기술이 주로 활용될 수 있는 분야는 인터넷 방송국을 비롯하여 다양하다.There are various fields in which multimedia streaming technology can be utilized, including Internet broadcasters.

특히 요즘 관심을 끌고 있는 인터넷 방송국의 경우 생방송(Live Broadcasting)서비스를 이용하여 인터넷을 통한 가상의 방송국을 운영할 수 있다. 또한 VOD 솔루션을 적용해 원격강의 가상교육에도 활용할 수 있다. 현재 이미지배너형태의 웹페이지 광고를 멀티미디어 광고 형태로 전환할 수 있고 동영상 및 음성을 이용하여 전자 상거래에도 적극적으로 활용할 수 있다.In particular, Internet broadcasters that are attracting attention these days can operate a virtual broadcast station through the Internet using a live broadcasting service. In addition, the VOD solution can be applied to virtual training of distance lectures. Currently, web banner ads in the form of image banners can be converted into multimedia ads and can be actively used for electronic commerce using video and voice.

이밖에 사내 교육 및 안내 시스템 ,방송국 등 사내 인트라넷과 연동해 활용할 수 있다. 현재는 좀더 발전된 응용솔루션이 발표되고 있으며 광고, 동영상메일, 게시판 등의 현재의 웹솔루션에 직접 응용되어 가고있다. 그리고 요즘 화재로 떠오르는 IMT2000 기술 또한 주요한 것이라 하겠다.In addition, it can be utilized in conjunction with the company's intranet such as in-house education and guidance system and broadcasting station. Currently, more advanced application solutions are being announced and applied directly to current web solutions such as advertisements, video mails, and bulletin boards. And IMT2000 technology, which is emerging as a fire these days, is also a major one.

동영상 스트리밍에 있어서 많은 사용자들이 사용하면서 느끼는 버퍼링의 고질적인 문제나, 스트리밍 서버의 고가성 또한 회선의 이질성은 압축 원천을 제어하지 않고는 여전히 가장 큰 문제점으로 남아 있는 것 또한 사실이라 하겠다. 그렇다면 왜 이들은 다른 압축 기술을 사용할 생각을 하지 않으며 mpeg방식을 고집하고 있는 것인가?It is also true that the buffering problems experienced by many users in video streaming, the high cost of streaming servers, and the heterogeneity of circuits remain the biggest problems without controlling the compression source. So why aren't they thinking about using other compression techniques and sticking to mpeg?

물론 아래에서 두 압축 방식을 보다 자세히 비교해 보겠지만 간단히 살펴본다면 다음과 같다. 그것은 바로 mpeg방식이 표준으로 지정되어 있기 때문이다. 또한 음성신호와 영상신호의 적절한 상호 이식성과 압축의 효율성이 그것이라 하겠다.Of course, we'll compare the two compression methods in more detail below, but here's a quick look: That's because mpeg is the standard. In addition, it is the efficiency of proper inter-portability and compression of audio and video signals.

이 단계를 랜더링이라 하는데, 랜더링이란 스트림에서 얻은 디지털 정보를 사운드와 그림으로 변환하는 과정을 말한다. 컨텐트 정보를 랜더링하는 단계에서는 클라이언트 컴퓨터는 데이터를 수신하고 스트리밍 서버에서는 서버와 연동되는 플레이어를 인식(리얼오디오,미디어 플레이어 등)후 단독실행으로 컨텐트를 재생하거나, 각종 콘솔과 OS에서 지원해주는 플러그인을 사용해서 스트리밍 데이터를 웹페이지에 삽입하거나, SMIL등의 언어를 사용해서 고급 프리젠테이션이란 개념으로 텍스트, 이미지, 오디오, 비디오를 웹 페이지의 다른 항목과 동기화 해주는 방식을 사용하게 되는 것이다. 이와 같은 기능들이 현재 mpeg에서는 가능하며 이 분야로 계속 연구해온 결과 지금과 같은 통합적인 고급 프리젠테이션까지도 구현이 가능하게 된 것이다. 하지만 여전히 고질적인 버퍼링의 개념은 그대로 남아 있으며 대용량이든 소용량이든 필요 불가결하게 계속 사용하게 된다는 단점이 있다.This step is called rendering, which is the process of converting digital information from the stream into sound and pictures. In the rendering of the content information, the client computer receives the data and the streaming server recognizes the player linked with the server (real audio, media player, etc.) and plays the content by itself, or the plug-in supports various consoles and OS. You can use it to insert streaming data into web pages, or to use SMIL, such as advanced presentations, to synchronize text, images, audio, and video with other items on a web page. These features are now available in mpeg, and we have continued to work in this area, enabling even the most advanced, integrated presentations available today. However, the concept of enduring buffering remains intact, and there is a disadvantage in that it is indispensable to use continuously, whether large or small.

상기 압축방식에 대한 설명을 다음에서 하겠다.우선 이과정을 이해하기 위해서는 압축의 핵심인 프레임의 개념을 이해 해야만 한다. 스트리밍 기술측정에 가장 중요한 부분이 이부분이다.The compression method will be described below. First, in order to understand this process, it is necessary to understand the concept of frames, which is the key to compression. This is the most important part of measuring streaming technology.

초당 얼마정도의 프레임을 구현할수 있는가? 이것으로 스트리밍 기술의 속도와 화질을 결정하는 것이 그것이라 하겠다.How many frames per second can you achieve? This is what determines the speed and quality of streaming technology.

가. 프레임과 압축에 대한 이해end. Understanding Frames and Compression

1) 프레임(Frame)1) Frame

프레임이란 영상을 구성하는 하나 하나의 장면영화나 TV의 화면은 실제로 움직이는 화면이 아니다. 아시다시피 영화나 TV의 화면은 움직이지만 실제로는 정지된 영상의 연속으로 사람 눈의 착시현상으로 움직이는 것처럼 보이는 것이다. 이렇게 움직이는 것처럼 보이기 위해서는 초당 25∼30장면(프레임) 정도가 되어야 한다. 즉, 1초에 30개 정도의 영상을 연속해서 보여주어야 자연스러운 동작으로 보이게 된다. 초당 프레임 수를 FPS(Frame Per Second)로 표시한다.A frame is a scene of a movie or TV that constitutes an image, and is not a moving screen. As you know, the screen of a movie or TV is moving, but in reality it seems to be moving to the illusion of the human eye as a series of still images. In order to look like this, it should be about 25 to 30 scenes (frames) per second. In other words, only 30 images per second must be shown in succession to appear as a natural operation. Displays frames per second in FPS (Frame Per Second).

2) 압축의 필요성2) the need for compression

일반적으로 해상도 640×480에 256칼라(Color)를 구현하는 한 장면은 640×480×8÷8로 계산되어 약 300KB의 데이터량을 갖게 된다.In general, a scene that implements 256 colors at a resolution of 640 × 480 is calculated as 640 × 480 × 8 ÷ 8 and has a data amount of about 300KB.

이러한 데이터량으로 동영상을 구현하려면 1분간의 데이터만 해도 300KB×30frame×60초 = 540MB의 데이터량을 갖게 된다. 즉 1GB의 HDD라도 2분 정도의 분량밖에는 저장할 수 없게 된다. 따라서 동일한 값을 갖는 부분을 압축하여 사용시간을 늘리는 방법을 사용하고 있다. 한 예로서 V-CD는 680MB의 용량에도 불구하고 이런 압축을 사용하여 약 74분 분량의 동영상을 저장할 수 있다 이부분에서 압축원천 기술이 접목되게 되는 것이다. 기존의 스트리밍 기술은 대다수 국제표준인 mpeg방식을 채택해 왔으며 이부분에서 놀라은 연구결과물들을 속속들이 내놓은 또한 사실이다. 그러나 용량의 한계를 해결했을지 모르나 스트리밍에 관련한 분명한 해독부분인 버퍼링의 시간차는 줄일 수가 없었던 것이다.In order to implement a video using such data amount, only one minute of data has a data amount of 300KB × 30frame × 60 seconds = 540MB. In other words, even a 1GB HDD can only store about two minutes. Therefore, the method of increasing the use time by compressing the parts having the same value is used. As an example, V-CD can store about 74 minutes of video using this compression despite its 680MB capacity. This is where compression source technology comes into play. Existing streaming technology has adopted the most international standard, mpeg, and it is also true that surprising research results have been published one after another. However, the capacity limitation may have been solved, but the time difference of buffering, which is a clear decryption part related to streaming, could not be reduced.

높은 압축율에 비해서 해독부분인 버퍼링의 과정이 필연적이었던 것이다.Compared to the high compression ratio, the buffering process, which is the decryption part, was inevitable.

나. 압축과 원리 및 종류I. Compression and Principle and Class

1) 압축의 원리1) The principle of compression

데이터의 압축은 데이터의 크기는 줄더라도 자료의 내용은 원래의 것을 갖고있어야 한다. 즉, 원래의 자료를 압축하여 새로운 자료를 만들어 내는 것이다. 그리고 이것을 다시 원래 상태로 복구하는 것이 압축 해제다. 가령 33333333이라는 데이터가 있을 경우, 이것을 단순한 원리로 38이라고도 표현할 수 있다. 이럴 경우 8Byte를 2Byte로 줄이는 효과를 낼 수 있다. PKZIP, LHA, ARJ는 각기 좀 더 복잡한 원리를 사용해서 데이터를 압축하고 다시 원래의 데이터로 복구한다.Data compression should reduce the size of the data, but the content of the data must be original. In other words, the original data is compressed to produce new data. And restoring it back to its original state is decompression. For example, if the data is 33333333, it can be expressed as 38 simply. In this case, the effect of reducing 8 bytes to 2 bytes can be achieved. PKZIP, LHA, and ARJ each use more complex principles to compress and restore data back to their original data.

2) 압축기술의 종류2) Types of Compression Techniques

-손실 압축(Lossy Compression)Lossy Compression

사람의 눈으로 식별할 수 있는 그림이나 영상의 색상과 해상도는 한계를 갖는다. 따라서 어느 정도의 해상도를 넘어서면 차이를 느끼지 못하고 거의 비슷하게 보인다. 바로 이런 점을 활용하여 사람이 감지할 수 있는 이상의 데이터는 제거하여 자료의 량을 줄이는 것이다.오디오나 영상은 손실압축이 가능하다. 대다수의 스트리밍 기술이 차용하는 방법이다.The colors and resolutions of pictures and images that can be identified by the human eye are limited. Therefore, beyond a certain resolution, you do not notice a difference and look almost similar. That's why we can reduce the amount of data by eliminating any data that humans can detect. Audio or video can be compressed. This is how most streaming technologies borrow.

-무손실 압축(Lossless Compression)Lossless Compression

무손실 압축은 압축 대상이 되는 자료를 전혀 손상시키지 않고 압축하는 방법이다. 수치나 문서, 프로그램 같은 자료는 절대 손상시켜서는 안된다. 만일 손상시킨다면 그 자료는 이용할 수 없게 된다. 널리 사용되고 있는 압축 유틸리티는 무손실 압축을 사용한다. 정지영상압축기법이기도 하다.Lossless compression is a method of compressing data without damaging it. Data such as numbers, documents and programs should never be damaged. If damaged, the data is not available. Popular compression utilities use lossless compression. It is also a still image compression technique.

-DCT 압축기술-DCT compression technology

1974년은 오늘날 멀티미디어 혁명을 가능케 한 기념비적인 발명이 있던 해이다. 미 텍사스대학의 라오 교수를 비롯한 3명의 연구진이 이산여현변환 (DC T:Discrete Cosine Transform)이라는 새로운 직교변환에 관한 논문을 IEEE학술지에 발표했던 것이다. 이 DCT는 특히 영상의 압축에 탁월한 성능을 갖는 것으로 오늘날 멀티미디어 관련 국제표준인 H.261, JPEG, MPEG의 핵심요소로 자리잡고 있다. 문자, 도형, 일반 데이터 등을 무손실 압축하면 완전 복구가 가능하지만 압축률은 평균적으로 2대1정도이다. 반면 영상 음성 음향 등의 데이터를 인간의 눈과 귀가 거의 느끼지 못할 정도로 작은 손실을 허용하면서 압축하면 10 대1이상의 압축률을 쉽게 얻을 수 있다. 동영상의 경우 화면간 중복성과 화면내 화소간 중복성이 많아 시각 특성을 잘 활용하면 MPEG영상 압축에서 볼 수 있듯이 30대1이상의 압축을 쉽게 얻을 수 있다. 정지영상은 화면내 화소의 중복성만이 있고, 한 화면이므로 화면간 중복성은 없어 JPEG에서 보듯이 MPEG보다는 다소 압축률이 낮다. 영상이 중복성이 높은 3차원(동영상) 혹은 2차원(정지영상) 데이터여서 압축도 크게 되는데 비해 음성과 음향은 중복성이 상대적으로 떨어지는 1차원 데이터여서 압축률도 영상에 비해 크게 떨어진다. 북미 이동통신용의 음성 압축방식인 VSELP에서는 8대1정도의 압축률이 얻어지고, 돌비 AC-3이나 MPEG 음향 압축에 있어서는 단일 채널의 경우 6대1, 채널간 중복성이 높은 스테레오나 다채널(예:극장영화 감상시의 5.1채널)의 경우 10대1정도의 압축이 얻어진다. 영상데이터를 효과적으로 압축하기 위한 목적으로 가장 널리 쓰이는 손실부호화 기법은 변환부호화이다. 이 방식의 기본구조는 공간적으로 높은 상관도를 가지면서 배열되어있는 데이터를 직교변환에 의하여 저주파 성분으로부터 고주파 성분에 이르기까지 여러 주파수 성분으로 나누어 성분별로 달리 양자화하는 것이다. 이때 각 주파수 성분간에는 상관도가 거의 없어지고신호의 에너지가 저주파 쪽에 집중된다. 단순 PCM에 비해 같은 비트율에서 얻는 변환부호화의 이득은 각 주파수 성분의 분산치의 산술평균과 기하평균의 비와 같다. 즉 저주파쪽 으로 에너지의 집중이 심화될수록 압축효율이 높다. 공간상의 데이터에 대한 단순 PCM은 모든 표본을 같은 길이(예:m비트/표본) 의 비트로 표현하며 신호대 양자화 잡음비는 약 6m가 된다. 반면 직교변환에 의해 주파수 영역으로 바뀐 데이터는 에너지가 많이 모이는(즉 분산치가큰) 주파수 성분이 보다 많은 비트를 할당받아 그 주파수 성분을 보다 충실히 표현하도록 하고 있다. 분산치가 4배(즉 진폭이 2배) 될 때마다 1비트씩 더 할당받는데 이렇게 되면 모든 주파수 성분에서 동일한 양자화 에러 특성을 갖게 된다. 여러가지의 직교변환 가운데 이론적으로 영상신호의 에너지 집중특성이 가장 뛰어나 압축에 가장 효과적인 것은 카루넨-뢰브 변환(KLT)이다. 그러나 이것은 영상에 따라 변환함수가 새로 정의되어야 하므로 현실적으로 사용할수 없다. 이 KLT에 충분히 가까운 성능을 가지면서 구현 가능한 변환을 찾는것이 라오 교수팀의 목표였고 그 결과가 바로 앞에 말한 DCT이다. 현재 여러 국제표준에 핵심기술로 자리잡고 있는 DCT는 8×8크기의 화소를 하나의 블록으로 묶어 변환의 단위로 삼고 있다. 블록의 크기를 키울수록 압축효율은 높아지나 변환의 구현이 훨씬 어려워진다. 실험적으로 8×8이 성능과 구현의 용이성간 타협점으로 선택되었다. DCT 변환계수의 양자화는 스칼라 양자화(SQ)와 벡터 양자화(VQ)가 가능하다. VQ는 보통 계수간 상관도가 높을 때 효과적이고 대신 SQ보다는 복잡도가 높다. DCT계수들끼리는 이미 상관도가 거의 없어 현재 국제표준 에서는 SQ를 채택하고 있다. 또 SQ도 다시 구현이 용이한 선형과 특성이 좋은 비선형기법으로 나뉘는데 양자화된 계수가 다시 엔트로피 부호화(무손실)를 거치면 두 기법간 성능의 차가 작아진다. 현재 국제표준에서는 엔트로피 부호화가 뒤따르고 있어 H.261, JPEG, MPEG-1에서는 선형 기법만을 사용하였다. 그러나 MPEG-2에서는 약간의 성능개선을 위해 비선형 기법도 함께 채택했다. 또한 양자화된 DCT계수들의 통계적 특성을 이용한 무손실 압축을 위해 현재 국제표준에서는 런길이 부호화와 허프만 부호화를 결합하여 사용하고 있다. 영상의 압축은 이렇게 DCT, 양자화, 런길이 부호화, 허프만 부호화, 움직임보상 DPCM (동영상의 경우만 해당) 등 많은 기술이 결합되어 이루어지고 있다1974 was a year of monumental invention that made the multimedia revolution possible. Three researchers, including Professor Lao of the University of Texas at U.S.A., published a paper in the IEEE journal of a new orthogonal transformation called the Discrete Cosine Transform (DC T). This DCT is particularly good at compressing video, making it the core of today's multimedia standards H.261, JPEG and MPEG. Lossless compression of texts, figures, and general data is possible to recover completely, but the compression ratio is about 2 to 1 on average. On the other hand, compressing data such as video, audio, and sound while allowing small loss that human eyes and ears hardly feel can easily achieve compression ratio of 10 to 1 or more. In the case of a video, there is a redundancy between screens and redundancy between pixels in a screen, and if you use the visual characteristics well, compression of more than 30 to 1 can be easily obtained as can be seen in MPEG video compression. Still images have only the redundancy of the pixels in the screen, and there is no redundancy between the screens, so the compression rate is somewhat lower than that of MPEG. Since the image is three-dimensional (video) or two-dimensional (still image) data with high redundancy, the compression is also large. In VSELP, a voice compression method for North American mobile communication, a compression ratio of about 8 to 1 is obtained, and in Dolby AC-3 or MPEG sound compression, 6 to 1 for a single channel and stereo or multichannel with high channel redundancy (e.g., In case of theater movie 5.1 channel), about 10 to 1 compression is obtained. The most widely used lossy encoding technique for the purpose of effectively compressing image data is transform encoding. The basic structure of this method is to quantize differently components by dividing the arranged data having spatially high correlation into various frequency components from low frequency components to high frequency components by orthogonal transformation. At this time, there is almost no correlation between the frequency components and the energy of the signal is concentrated on the low frequency side. The gain of the conversion encoding obtained at the same bit rate compared to the simple PCM is equal to the ratio of the arithmetic mean and geometric mean of the variance of each frequency component. In other words, the deeper the energy concentration toward the lower frequency, the higher the compression efficiency. A simple PCM for spatial data represents all samples in bits of the same length (eg mbits / sample), with a signal-to-quantization noise ratio of about 6m. On the other hand, the data transformed into the frequency domain by the orthogonal transformation allows the frequency component with a lot of energy (that is, a large variance value) to be allocated more bits to more faithfully express the frequency component. Whenever the variance is four times (ie twice the amplitude), one more bit is allocated, which has the same quantization error characteristic in all frequency components. Among the various orthogonal transformations, the most energy efficient characteristic of the video signal is theoretically the most effective in compression. However, this cannot be used realistically because the conversion function must be newly defined according to the image. Lao's team's goal was to find a feasible conversion with performance close to this KLT, and the result is the DCT mentioned earlier. DCT, which is now a core technology in many international standards, combines 8x8 pixels into a single block as a unit of transformation. The larger the block size, the higher the compression efficiency, but the harder it is to implement the transform. Experimentally, 8x8 was chosen as a compromise between performance and ease of implementation. Quantization of the DCT transform coefficients is possible by scalar quantization (SQ) and vector quantization (VQ). VQ is usually effective when the correlation between coefficients is high and is more complicated than SQ instead. Since DCT coefficients are almost uncorrelated, SQ is currently adopted by international standards. In addition, SQ is divided into linear and easy-to-implement nonlinear techniques, which are easy to implement. When the quantized coefficients are subjected to entropy coding (lossless) again, the performance difference between the two techniques becomes smaller. Currently entropy encoding is followed in international standards, so only H.261, JPEG, and MPEG-1 used linear techniques. However, MPEG-2 also employs nonlinear techniques to improve performance slightly. In addition, the current standard uses a combination of run length coding and Huffman coding for lossless compression using statistical characteristics of quantized DCT coefficients. Image compression is combined with many techniques such as DCT, quantization, run length coding, Huffman coding, and motion compensation DPCM (for video only).

-움직임 보상압축Motion compensation compression

인간의 시각은 초당 16장 이상의 화면이 보이면 연속적인 자연의 영상처럼 느낀다. 즉 동영상에 있어서는 초당 16장이 정보를 보존하면서 신호를 표본화하기 위한 최소의 표본화 주파수(나이퀴스트 주파수)인 셈이다. 이를 감안하여 영화는 초당 24장의 속도로, TV는 초당 25장 혹은 30장의 속도로 자연의 영상을 촬영하고 있다. 영화가 하나하나의 화면을 순간적으로 필름에 담아 저장하여 화면단위로 일시에 스크린에 비추는 형식인데 비해 TV는 기본적으로 전파를 통해 영상을 전송해야 하므로 매 화면을 다시 수백개의 주사선으로 주사하여 촬영및 전송하고 브라운관에서도 주사에 의해 영상을 나타낸다(주사하여 전송하는 점에서는 팩시밀리도 비슷하다). 미국 일본 한국 등에서 채택하고 있는 NTSC 컬러TV방식에서는 화면당 5백25라인의 주사선에 초당 30장(정확히는29.97장)을, 유럽 등지에서 채택하고 있는 PAL이나 SECAM 방식에서는 6백25라인에 초당 25장을 전송하고 있다. 또한 TV에있어서는 제한된 주사선을 이용하여 보다 효과적으로 동영상을 나타내기 위해 한 화면(프레임)을 다시 짝수번째 주사선으로 이루어진 짝수 필드와 홀수번째 주사선으로 이루어진 홀수필드로 나누어 교대로 전송하는 소위 격행(interlaced) 주사방법을 사용하고 있다. 따라서 초당 NTSC는60필드, PAL이나 SECAM은 50필드가 되어 스포츠 화면과 같이 움직임이 많은 경우에도 잘 따라가도록 하고 있다. 영화를 TV로 방영할 때는 텔레시네(텔레비전과 시네마의 합성어)라는 변환기를 통해 영화필름 한장한장을 주사하여 전송한다. 이때 영화와 TV의 초당 화면수가 달라 이를 맞추지 않고 필름을 단순히 TV화면속도로 재생하면 PAL이나 SECAM의 초당 25화면은 영화의 초당 24화면과 큰 차이가 없어 시각적으로 별 문제가 되지 않으나 NTSC는 초당 30 화면이므로 움직임이 빠르고 목소리도 높고 빠른 영화를 보게 된다. 따라서 영화필름을 NTSC TV로 전송할 때는 화면 속도를 맞추어야 하는데 초당 24화면으로부터 60필드를 얻어야 하므로 2화면으로부터 5필드를 얻으면 된다. 간단하고 실용적으로 널리 쓰이는 방법은 2화면중 첫 화면에서 3필드를 주사하고 다른 화면에서 2필드를 주사하는 방법이다. 이를 "3:2 풀다운" 방식이라고 부른다. 데이터 압축의 관점에서 보면 영화나 TV처럼 초당 수십장의 화면을 취하면 화면간 즉 시간축상) 중복성이 매우 높다. 예를 들어 고정된 장면의 경우 화면 한장한장의 내용이 같으므로 첫 화면만 전송하면 다음 화면들은 "앞 화면과 같다"는 단순 정보만으로 완전하게 전송할 수 있다. 또 움직임이 있는 장면에서도 우선 배경부분은 정지해 있는 경우가 많고 움직인 부분도 "어떤 부분이 어디로 움직였는지"의 정보를 보냄으로써 데이터량을 크게 줄일 수 있다. 데이터 압축이 되지 않는 경우는 장면전환이 있어 두 화면간 상관성이 없을 때로 이 때는 어쩔 수 없이 뒤 화면은 앞 화면의 정보를 이용하지 않고 뒤 화면내에서만 압축한다. 70년대 중반 전화선을 이용한 영상전화가 "픽처폰"이라는 이름으로 선보인 적이 있다. 당시에는 획기적인 기술이었지만 시장이 넓지 않고 반도체 기술이 충분히 뒷받침해 주지 못해 고가일 수밖에 없어 결국 실패하고 말았지만 동영상 압축에 관한 연구가 본격화되는 계기가 되었다. 이 무렵 화면간 중복성을 줄이기 위해 시도된 방법은 이웃하는 화면간에 움직인 부분과 정지한 부분을 영역 구분하여 움직인 영역의 영역정보와 그 안의 내용을 갱신하여 보내고 정지한 부분은 보내지 않는 것이었다. 이 방법은 움직임이 있는 부분을 영역구분해야 하므로 영상전화와 같은 실시간 시스템에서는 구현상 어려움이 많았다. 80년대 초반 이를 극복하기 위해 나온 방법이 오늘날 MPEG이나 H.261 등에까지 널리 쓰이고 있는 블록별 움직임 추정 및 보상방법이다. 즉 화면을 일정한 크기의(보통 16×16으로 매크로 블록이라 부르며 DCT의 단위인 8×8의 블록이 4개 모인 것이다) 단위로 나누어 단위마다 앞 화면의 어느 곳으로부터 움직여 왔는지 움직임 벡터를 구하고 이를 이용하여 움직임 보상을 한다. 현 매크로 블록과 움직임 보상에 의해 얻어진 이전화면의 매크로 블록 간 차이만을 부호화함으로써 데이터량을 크게 줄일 수 있다. 수신측에서 영상재생에 쓸 수 있도록 움직임 벡터도 전송해야 하는데 이때 DPCM과 허프만 부호를 이용한 무손실 압축이 이용된다. 이 움직임 보상압축기법에 의해 MPEG 등 동영상 압축기술의 효율이 JPEG 등의 정지영상 압축기술보다 크게 높아지게 되는 것이다.Human vision feels like a continuous image of nature when 16 or more images are displayed per second. In other words, in a moving picture, 16 pieces per second is the minimum sampling frequency (Nyquist frequency) for sampling a signal while preserving information. With that in mind, movies are shooting at 24 speeds per second, while TVs are shooting nature at 25 or 30 speeds per second. Movie is a format that saves one screen at once and lightens it on the screen at the time of screen unit, whereas TV basically has to transmit the image through radio waves. The CRT also displays images by scanning (the facsimile is similar in terms of scanning and transmitting). NTSC color TV systems adopted by the United States, Japan, and Korea use 30 sheets per second (exactly 29.97) per scan line at 525 lines per screen, and 25 sheets per second with 625 lines at PAL or SECAM methods adopted in Europe. Is sending. In addition, in TV, so-called interlaced scanning in which one screen (frame) is alternately transmitted into an even field consisting of even-numbered scan lines and an odd field consisting of odd-numbered scan lines in order to display a video more effectively using limited scan lines. I'm using the method. Therefore, NTSC per second is 60 fields, PAL or SECAM is 50 fields, so that it can be followed well even when there is a lot of movement such as sports scene. When a movie is broadcasted on TV, it is scanned and sent one by one through a converter called telecine (a combination of television and cinema). At this time, if the number of screens per second of film and TV is different, and the film is simply played at the TV screen speed, 25 screens per second of PAL or SECAM is not a big difference from 24 screens per second. Because of the screen, the movement is quick, the voice is high, and the movie is fast. Therefore, when transferring a motion picture film to an NTSC TV, it is necessary to adjust the screen speed. Since 60 fields are obtained from 24 screens per second, 5 fields are obtained from 2 screens. A simple and practical method is to scan three fields from the first screen of two screens and two fields from another screen. This is called "3: 2 pulldown". From a data compression point of view, if you take dozens of screens per second, like movies or TVs, you have very high redundancy between screens. For example, in the case of a fixed scene, the contents of each screen are the same, so if only the first screen is transmitted, the following screens can be completely transmitted with only simple information that "same as the previous screen". Also, even in a moving scene, the background part is often stationary, and the moving part can also greatly reduce the amount of data by sending information of "what part moved where". If the data is not compressed, there is a scene change and there is no correlation between the two screens. At this time, the rear screen is compressed only in the rear screen without using the information of the front screen. In the mid 70s, video calls using telephone lines were introduced under the name "picture phones." Although it was a breakthrough technology at the time, it was expensive because the market was not wide and semiconductor technology could not fully support it, but it failed. At this time, an attempt was made to reduce the redundancy between screens by dividing the moving part and the stationary part between neighboring screens by updating the area information of the moved area and its contents and not sending the stopped part. This method had to distinguish the moving parts, which made it difficult to implement in a real-time system such as a videophone. The method to overcome this in the early 80's is a block-by-block motion estimation and compensation method that is widely used today, such as MPEG or H.261. In other words, the screen is divided into units of a certain size (usually called 16x16 macroblock and 4 blocks of 8x8, which are DCT units). To compensate for the movement. The amount of data can be greatly reduced by encoding only the difference between the current macroblock and the macroblock of the previous picture obtained by motion compensation. A motion vector must also be transmitted so that the receiver can use it for video playback. Lossless compression using DPCM and Huffman codes is used. With this motion compensation compression technique, the efficiency of moving picture compression technology such as MPEG is significantly higher than that of still picture compression technology such as JPEG.

-H. 261-H. 261

ITU-T(구 CCITT)에 의해 만들어진 국제표준인 H.261은 종합정보통신망(ISDN: Integrated Services Digital Network)을 이용한 영상전화 및 영상회의를 위한 동영상 압축방식이다. 이는 정지화 압축에 관한 국제표준인 JPEG와 더불어 오늘날 멀티미디어 혁명의 중심이 되고 있는 국제표준인 MPEG1과 2의 모태라고 할 수 있다. 1876년 그레이엄 벨에 의해 발명된 전화는 이후 인간의 주요 통신수단으로 자리잡아왔다. 전화와 팩시밀리를 수용하는 기간통신망이라 할 수 있는 전화망, 기업간 통신에 주로 사용되는 텔렉스망, 데이터 통신을 위한 디지털 통신망 회선교환 및 패킷교환 포함) 등 복잡하게 얽혀 있는 개별적 통신망을 통합하고자 등장한 것이 ISDN이다. ISDN은 지난 80년대에 ITU-T의 I계열로 국제표준이 마련되면서 몇몇 나라에서 실용화되기 시작하였다. ISDN에서 규정하고 있는 채널에는 음성 및 팩스등 기본 정보전송을 위한 B채널 64Kbps 동영상 및 고속데이터 전송을 위한 H채널(H는 3백84Kbps, H^{3 3}은 1천5백36Kbps), 그리고 여러가지 제어신호용의 D채널(16kbps 혹은 64Kbps)등 세 종류가 있다. 실제 사용시에는 이 세 채널을 적절히 조합하여 기본접속 혹은 1차군 접속의 형태를 취한다. 가정에 연결되는 기본접속은 현재의 전화선과 같이 2선식 나선형 동선을 이용하고 있고, 두개의 B채널과 하나의 D채널(16Kbps)을 시분할 다중화하여 (2B+D) 총 1백44Kbps의 데이터 전송속도를 갖는다. 댁내 배선은 4선식 버스방식으로 최대 8개까지의 단말을 연결할 수 있는데 전화-팩스-저속 컴퓨터 통신을 동시에 할 수 있고 1가구 2전화가 실현된다. 이 ISDN을 이용한 서비스의 일환으로 나온 것중 하나가 현재의 전화선을 이용한 G3팩스를 고속.고해상도로 개선한 G4팩스이고 또 하나가 얼굴영상과 음성을 함께 전송하는 영상전화이다. 이때 음성은 G.711(64Kbps PCM) 혹은G.728(16Kbps LD-CELP)에 의해 부호화하고 얼굴영상은 H.261에 의해 부호화한 후 이들을 H.221에 의해 다중화 하고 망 인터페이스를 부가하면 ISDN영상전화 단말기인 H.320이 된다. 이때ISDN 기본접속이 제공하는 비트속도가 1백44Kbps 밖에 되지 않아 음성과 제어용으로 사용하는 비트를 빼고 남은, 즉 영상에 사용할 수 있는 비트는 매우 부족하다. 따라서 초당 보낼 수 있는 화면수와 화질에 제약이 많다. 이것은 영상전화의 성격이 음성을 통한 정보전달이 주이고 영상은 보조기능이라는 점을 감안한것이다. 사무실 업무용의 ISDN 접속은 가정의 기본 접속보다 전송속도가 훨씬 높은 1차군 접속일 때가 많다. ISDN 1차군 접속은 북미와 일본에서는 1천5백36Kbps, 유럽에서는 2천48Kbps로 B H D 등 세 종류의 채널을 여러 방법으로조합하여 사용할 수 있다. 이 높은 전송률을 활용하는 방안중 하나가 바로 영상회의이다. 영상회의는 멀리 떨어져 있는 다자간에 시간과 경비를 절약하면서 회의를 하고자 할 때 유력한 수단이다. 보통 여러 사람이 큰 화면을 통해 서로를 보면서 회의를 하기 때문에 두사람이 작은 액정화면을 보며 대화하는 영상전화에 비해 화질과 음질이 훨씬 좋아야 한다. 따라서 음성코덱은 AM방송에 가까운 음질을 제공하는 G.722(64Kbps이하, SB-ADPCM방식)를 사용하며, 영상코덱은 영상전화와 마찬가지로 H.261을 사용하는데 단지 사용가능한 비트가 훨씬 많다. 즉H.261은 p×64Kbps의 영상 전화 및 영상회의를 위한 동영상 압축표준으로ISDN 기본접속 및 1차군 접속을 감안한 것이므로 p는 1~30의 값을 취하는데 보통 영상전화는 p값이 1∼2, 영상회의는 6 이상이다(H 채널은 p값이 6이됨). ITU-T에서 H.261의 표준화를 이끈 사람은 일본 NTT의 오쿠보인데(현재는 GCT 소속) 이 표준화과정에서 그가 사용한 수법은 소위 "Reference Model"로 불리는 모델을 만들어 회의때마다 이를 개량해가는 것이었다. 즉 참여사들이 이 모델과 새로이 제안하고자 하는 기술을 비교하여 객관적 우수성을 검증한 후 그 기술을 모델에 추가하고 이 과정을 반복함으로써 많은 요소기술들을 단 시간안에 수렴하여 우수한 알고리듬을 만들 수 있었다. 이 수법은 그후 MPEG에서도 채택되었으므로 H.261은 영상압축의 기술적 내용뿐 아니라 표준화를 하는 효과적 프로세스로서도 MPEG의 탄생에 모태역할을 했던 것이다. H.261은 ISDN의 기본접속이나 1차군접속에 연결되는 영상전화 및 회의용 터미널 전송속도는 p×64Kbps, p=1∼30)에 내장되는 동영상 압축.신장에 관한 표준이다. H.261은 1984년 표준화 작업이 시작되어 "RM(Reference Model) 8"까지 방식이 개정된 끝에 1988년 기술적 내용이 완성되었고, 마침내 1990 년 ITU-T의 최종 승인을 얻어 권고로서 확정되었다. H.261은 효과적인 동영상 압축을 위하여 여러가지 손실/무손실 압축 기법들 을 결합하고 있다. 우선 영상의 입출력 포맷에 대해 살펴보기로 하자. TV방식에 있어서 525/30 /2:1(화면구성은 480×720, 주로 NTSC)을 쓰는 북미, 일본, 한국등과 625/ 25/2:1(화면구성은 576×720, 주로 PAL과 SECAM)을 쓰는 유럽간에 차이가 있어, 카메라와 모니터의 수평및 수직 동기 주파수가 다르다. 영상전화의 카메라와 모니터도 경제성으로 인해 기존 TV와 호환성이 있는 것을 사용하므로 서로 다른 나라간의 직접적인 영상통신이 불가능하다. 따라서 H.261에서는 두 시스템간의 변환을 위해 CIF(Common Intermediate Format)라 는 공통 양식을 만들어 코덱의 영상 입출력 포맷으로 사용한다. CIF 포맷은 방송용의 1/4크기를 취해 288×360의 화면구성에 초당 30장의 순행주사로 이루어진다(이는 휘도의 경우이고 두색 신호는 수평 수직 각각2:1로 추림을 하여 전체적으로 4:2:0 형식임). 또 이의 1/4 크기인 QCIF(Quarter CIF)가 쓰이기도 한다. 카메라 출력은 전처리를 거쳐 CIF로 변환된 후 H.261부호기에 인가되며, H.261복호기에 의해 재생된 CIF화면은 후처리를 거쳐 정상적 TV신호로 바뀐후 모니터에 표시된다. 화면내 압축을 위해서는 블록(8×8화소)단위의 DCT를 행하여 블록의 에너지를 저주파 성분에 집중시킨 후 양자화된다. 양자화는 시스템을 간단하게 하기 위해 균일 양자화기를 사용한다. 이때 인간의 시각 특성이 고주파 성분의 양자화 잡음을 상대적으로 덜 느끼는 점을 이용하여, DCT 계수가 고주파일수록 양자화 스텝을 크게 한다. 따라서 고주파 성분은 크기도 작은데 양자화 스텝도 커서 대부분이 0이 된다. 화면간 상관성을 이용하여 압축률을 높이기 위해서는 화면을 매크로블록(4개의 휘도블록과 두개의 색블록)단위로 나누어 각 매크로블록마다 이전 화면의 어느 위치에서 움직여 왔는지를 나타내는 움직임 벡터를 구한다. 움직임 추정방법으로는 이전화면의 각 화소 위치마다 현 매크로블록과의 에러를 구하여 이것이 최소가 되는 위치를 찾는 블록 정합 알고리듬이 널리 쓰인다. H.261에서는 화소단위로 움직임 벡터를 찾는데 비해 보다 고화질이 요구되는 MPEG에서는 반화소 단위로 찾는다. 움직임 벡터는 화면 재구성을 위해 수신 측에 전송되어야 하는데, 비트를 절약하기 위해 벡터간에 DPCM을 행하고 그 결과를 허프만 부호화하고 있다. 양자화된 DCT계수는 0이 많은데 화면내 부호화의 경우에 비해 화면간 부호화된 블록은 더욱 그러하다. 이를 효율적으로 압축하기 위해 런길이 부호화를 쓰고 있다. 즉 DC로부터 출발하여 지그재그 주사를 하면서 0이 몇개 반복되고 0이 아닌 값이 나오는지를 (런, 레벨)의 형태로 나타낸다. 이렇게 하면 연속되는 많은 0들이 "런"에 한꺼번에 수용되어 데이터 압축이 이루어진다. 이 (런, 레벨)심벌들은 발생 빈도가 각각 다르므로 발생 빈도가 높은, 즉 런이나 레벨 값이 작은 심벌을 짧은 부호로 부호화하는 허프만 부호를 써서 더욱 압축하고 있다. 이런 과정을 거쳐 발생되는 데이터의 양은 시간에 따라 변하는데 복잡한 부분이나 화면내 부호화되는 부분에서 많다. 채널을 통한 데이터 전송 속도가 일정한 경우에는 발생되는 데이터를 써 넣었다가 일정 속도로 읽어내기 위한 버퍼가 필요하다. 수신측에는 일정 속도로 써 넣고 가변속도로 읽어 복호화 하기 위해 송신측과 같은 크기의 버퍼가 필요하다. 버퍼는 시간에 따라 충만도가 변하는 데 넘치거나 완전히 비면 비트열의 연속성이 끊겨 복호가 일시 중단된다. 이를 피하기 위해서는 버퍼의 상태를 궤환시켜 양자화 스텝을 조절함으로써 비트 발생량을 제어해야 한다. H.261은 이렇게 당시까지의 많은 손실/무손실 데이터 압축 기법을 결합하여 높은 압축률을 얻음으로써 실시간 영상통신의 길을 열었고, 이는 후에 MPEG- 1과 2로 이어져 멀티미디어 혁명이 본격화되는 계기가 되었다.H.261, an international standard created by ITU-T (formerly CCITT), is a video compression method for video telephony and video conferencing using an Integrated Services Digital Network (ISDN). This, together with JPEG, the international standard for still picture compression, is the birthplace of MPEG-1 and 2, the international standards that are the center of the multimedia revolution today. The telephone, invented by Graham Bell in 1876, has since become the main means of human communication. ISDN emerged to integrate individual intertwined communication networks, including the telephone network, which is the backbone network that accommodates telephones and facsimile, the telex network that is used primarily for enterprise communications, and the digital and network switched circuits for data communications. to be. ISDN began to be practical in several countries in the 80s, when international standards were established as ITU-T's I-family. Channels defined by ISDN include B channel 64Kbps for basic information transmission such as voice and fax, H channel for high speed data transmission (384Kbps for H, 1,536Kbps for H ^{3 3} ), and various control signals. There are three types of D-channels (16kbps or 64Kbps). In practical use, these three channels can be combined as appropriate to form a basic or primary group connection. The basic connection to the home uses a two-wire spiral copper wire like the current telephone line, and the data transmission rate of total 1,44Kbps by time-division multiplexing two B-channels and one D-channel (16Kbps) (2B + D) Has The home wiring is a four-wire bus that can connect up to eight terminals, enabling simultaneous telephone-fax-low-speed computer communication and two household telephones. As part of the service using ISDN, G4 fax is a high speed and high resolution improvement of G3 fax using the current telephone line, and another is a video phone that transmits face and voice together. In this case, the voice is encoded by G.711 (64Kbps PCM) or G.728 (16Kbps LD-CELP), the face image is encoded by H.261, and then multiplexed by H.221. It becomes H.320 which is a telephone terminal. At this time, the bit rate provided by the ISDN basic connection is only 144Kbps, so there are not enough bits left for the video, i.e., the bits used for voice and control. Therefore, there are many restrictions on the number of screens and image quality that can be sent per second. This is based on the fact that video telephony is mainly used for information transmission through voice and video is an auxiliary function. The ISDN connection for office work is often a primary group connection with a much higher transmission rate than the home connection. The ISDN primary access is 1,536 Kbps in North America and Japan, and 2,48 Kbps in Europe, allowing a combination of three different channels, BHD. One way to take advantage of this high rate is video conferencing. Video conferencing is a powerful way to confer meetings while saving time and money between distant parties. Since many people meet each other on the big screen, they should have better image quality and sound quality than video calls where two people talk on small LCD screens. Therefore, the voice codec uses G.722 (64Kbps or less, SB-ADPCM method) which provides sound quality close to AM broadcasting, and the video codec uses H.261 as much as a videophone. H.261 is a video compression standard for video telephony and video conferencing of p × 64 Kbps, which takes into account ISDN basic connection and primary group access, so p is 1 ~ 30, and p is usually 1 ~ 2. , Video conferencing is above 6 (the H channel has a p value of 6). The person responsible for H.261 standardization at ITU-T was Okubo of Japan NTT (now GCT). The technique he used in the standardization process was to create a model called "Reference Model" and improve it at every meeting. . In other words, the participants were able to compare this model with the newly proposed technology, verify the objective excellence, add the technology to the model, and repeat this process to create a good algorithm by converging many element technologies in a short time. Since this technique was later adopted in MPEG, H.261 played a role in the birth of MPEG as an effective process of standardization as well as the technical content of image compression. H.261 is a standard for video compression / extension with built-in video telephony and conferencing terminal speeds of p × 64 Kbps, p = 1–30 connected to ISDN basic or primary group access. H.261 began its standardization work in 1984, was revised to "Reference Model (RM) 8", and the technical content was completed in 1988. It was finally approved by ITU-T in 1990 and finally confirmed as a recommendation. H.261 combines several lossy / lossless compression techniques for effective video compression. First, let's look at the input and output format of the video. 625/25/2: 1 (screen configuration is 576 × 720, mainly PAL and SECAM) with North America, Japan, Korea etc. which use 525/30 / 2: 1 (screen configuration is 480 × 720, mainly NTSC) in TV system There is a difference between the European countries, where the horizontal and vertical sync frequencies of the camera and monitor are different. Cameras and monitors for video phones are also compatible with existing TVs due to economic feasibility, so direct video communication between different countries is not possible. Therefore, in H.261, a common format called Common Intermediate Format (CIF) is used for conversion between two systems and used as the video input / output format of the codec. The CIF format takes a 1/4 size for broadcast and consists of 288 × 360 screens with 30 forward scans per second. Format). Its quarter size is QCIF (Quarter CIF). The camera output is converted to CIF after preprocessing and applied to the H.261 encoder, and the CIF screen reproduced by the H.261 decoder is converted to a normal TV signal after postprocessing and displayed on the monitor. For intra-screen compression, DCT in blocks (8x8 pixels) is performed to concentrate the energy of the block in low frequency components and then quantize it. Quantization uses a uniform quantizer to simplify the system. At this time, the human visual characteristics feel relatively less of the quantization noise of the high frequency component, so that the higher the DCT coefficient, the larger the quantization step. Therefore, the high frequency component is small in size, but the quantization step is large, and most of them are zero. In order to increase the compression rate by using the correlation between screens, the screen is divided into macroblocks (four luminance blocks and two color blocks) to obtain a motion vector indicating which position of the previous screen has been moved for each macroblock. As a motion estimation method, a block matching algorithm is widely used to find an error position of the minimum macroblock by finding an error with the current macroblock for each pixel position of the previous screen. In H.261, motion vectors are searched in pixel units, whereas in MPEG, which requires higher image quality, it is searched in half pixels. The motion vector must be sent to the receiver for picture reconstruction. DPCM is performed between the vectors to save bits and Huffman encodes the result. The quantized DCT coefficient has many zeros, but more so for inter-screen coded blocks than in intra-picture coding. Run-length coding is used to compress this efficiently. That is, it shows how many zeros are repeated and nonzero values are generated in the form of (run, level) while performing a zigzag scan starting from DC. This allows many consecutive zeros to be accommodated together in a "run" to achieve data compression. Since these (run, level) symbols have different occurrence frequencies, they are further compressed by using a Huffman code that encodes a symbol with a high occurrence frequency, that is, a small run or level value, into a short code. The amount of data generated through this process varies with time, and is large in the complex part or the part encoded in the picture. If the data transfer rate through the channel is constant, a buffer is required to write the generated data and read it at a constant speed. The receiving side needs a buffer of the same size as the transmitting side to write at a constant speed and read and decode at a variable speed. When the buffer overflows or fills completely over time, the continuity of the bit strings is broken and the decoding is suspended. To avoid this, it is necessary to control the bit generation amount by adjusting the quantization step by feedbacking the state of the buffer. H.261 opened the way for real-time video communications by combining high loss / lossless data compression techniques up to that time, which later led to MPEG-1 and 2, which prompted the multimedia revolution.

-H . 263-H. 263

1876년 벨이 전화를 발명한 이래 인류는 상대의 얼굴을 보며 통화할 수 있는 화상전화를 꿈꾸어왔다. 이를 위한 첫 시도로 70년대 중반 "픽처폰"이라는 이름의 화상전화기가 선보였다. 그러나 당시의 기술 수준으로는 가격과 성능을 충족시킬 수없어 시장 진입에 실패했다. 80년대에 종합정보통신망(ISDN)이 ITU-T(옛 CCITT)에 의해 표준화되면서 그 응용으로 ISDN망을 이용한 화상전화를 생각하게 되었다. 이 화상전화기(H. 320 터미널)는 가정에서 ISDN망에 연결하는 기본 인터페이스(64Kbps채널 둘과16Kbps채널 하나)나 사무실에서 사용하는 일차군 인터페이스(64Kbps채널최대 30개)를 대상으로하므로 p×64Kbps(P=1∼30)의 비트율을 갖는다. 이때의 동영상 압축을 위한 국제 규격이 이전에 소개한 바 있는 H.261이다. H.320 화상전화기는 G4 팩시밀리와 마찬가지로 ISDN망에 접속하도록 되어있으므로 일반 전화망에서는 사용할 수 없어, ISDN이 널리 보급되지 않은 현시점에서 상용화에 어려움을 겪고 있다. 90년대 초에는 미국의 AT＆T.MCI 등의 통신회사들이 전화망을 이용하는 화상전화기를 개발해 보급에 나섰다. 그러나 통신기기는 속성상 호환성이 매우중요한데, 이 화상 전화기들간에는 호환성이 없었다. MPEG4는 본래 이 수요를 충족시키기 위해 시작되었으나, 차츰 그 범위가 확장되어 결국 멀티미디어 데이터베이스 액세스나 무선 멀티미디어 통신을 주목적으로 하게 되었고 표준화도 98년에나 완성될 예정이다. 이러한 배경을 바탕으로 ITU-T SG15에서는 전화망을 이용, 64Kbps이내에서 동작하는 화상전화기의 국제 표준규격을 만들기 시작해 지난해말 그 기술적 내용을 최종적으로 확정했다. 단기간에 완성하는 것을 목표로 했기 때문에 MPEG4처럼 새로운 알고리듬을 수용하기보다는 H.320 화상전화기를 개량하는 방향으로 나아갔는데, 이것이 바로 H.324 터미널이다. H.324 화상전화기는 최근 표준화된 V.34모뎀(전화선용 모뎀으로는 최고속으로 전송속도는 28.8Kbps)을 통해 전화망에 접속되며, 동영상의 압축은 H.261을 상당부분 개선한 H.263을 이용하고 음성의 압축은 CELP 방식인 G723을 이용한다. H.263에 있어 H.261에 비해 개선된 부분을 정리하면 다음과 같다. 우선 각 매크로 블록의 움직임 벡터를 부호화하는데 있어서 이웃하는 매크로블록의움직임벡터와 상관도가 높음을 감안해 세 벡터의 중간 값을 취하는 보다 효율적인 방법을 사용하고 있다. 이 방법으로 약 10%의 데이터 감축이 얻어진다. 또한 한 매크로블록내에서 움직임을 세분화하는 블록별 움직임 추정이 가능하다. DCT 변환계수의 효율적 양자화를 위해 양자화기를 개선해 약 3%의 데이터절약을 얻고, 또한 양자화된 변환계수들의 가변장 부호화는 H.261, JPEG, MPEG1, MPEG2의 2차원 부호를 개선해 화면내.화면간 정보까지를 고려한 3차원부호를 사용한다. 여기서 약 5%정도의 데이터를 절약할 수 있다. 비트열의 구문(syntax) 에 있어서도 기존의 H.261보다 크게 단순화해 순수한정보 비트이외의 오버헤드를 줄이고 있다. 이밖에 성능향상을 가져오지만 복잡하여 사용여부를 옵션으로 남겨둔 기술로 구문기반 적응산술부호화와 PB프레임이 있다. 구문기반 적응산술 부호화는복잡하기는 하지만 5∼14%의 절약을 가져온다. PB프레임은 다른 기법이 비트절약을 위한 기법인데 비해 비트를 약간 더 허용하면서 초당 프레임수를 배로 늘릴 수 있어 시각적으로 훨씬 더 안정되고 부드러운 동화상을 얻을 수 있다. 성능향상을 위해 기타 몇 가지 요소가 더 제안되었으나 복잡도에 비해 성능향상이 뚜렷하지 않은 것들은 제외되었다. 이상의 여러 요소들을 가지고 있는 H.263 은 성능이 매우 뛰어난 알고리듬으로 최근의 MPEG4 화질 평가에서도 제안된 방식들의 성능을 평가하기 위한 기준 알고리듬으로 사용되었는데, 놀랍게도 제안된 방식들 대부분보다 H.263이 성능이 나은 것으로 판명되었다. 따라서 전화선을 이용해 제법 괜찮은 수준의 화상전화를 구현할 수 있게 되었다. 이미 규격이 완성되었기 때문에 올해하반기부터는 전화선에 바로 연결할수있는 화상전화기가 선보일 것으로 예상된다. 이 화상 전화기는 또한 PC상에 구현될 수도 있어, 전화선을 통한 PC통신과 팩스 송수신 기능을 가지고 있는 기존의 멀티미디어 PC의 개념을 한 차원 높여줄 것으로 기대된다. 즉 멀지않아 가정마다 보급된 PC를 이용해 전화선을 통한 화상전화를 하게 될 것이다.Since Bell invented the telephone in 1876, mankind has dreamed of making a video call that can talk to each other's faces. In the mid-'70s, a video phone called "Picture Phone" was introduced. However, it failed to enter the market because the technology level at that time could not meet the price and performance. In the 80's, the ISDN became standardized by ITU-T (formerly CCITT), and the application of the ISDN network was considered to be video telephony. This videophone (H.320 terminal) is targeted at the primary interface (two 64 Kbps channels and one 16 Kbps channel) for connecting to the ISDN network in the home, or the primary group interface (up to 30 64 Kbps channels) used in the office. It has a bit rate of (P = 1 to 30). The international standard for video compression at this time is H.261. H.320 videophones, like the G4 facsimile, are intended to be connected to the ISDN network, so they cannot be used on a regular telephone network, making it difficult to commercialize them at this time when ISDN is not widely available. In the early 90s, telecommunications companies such as AT & T.MCI developed and distributed video telephones using telephone networks. However, the compatibility of communication devices is very important in nature, and there was no compatibility between these video phones. MPEG4 was originally started to meet this demand, but gradually expanded its scope to eventually focus on multimedia database access or wireless multimedia communications, and standardization is expected to be completed in 1998. Based on this background, ITU-T SG15 started to make international standard of video phone operating within 64Kbps using telephone network and finally finalized its technical contents at the end of last year. The goal was to complete it in the short term, so instead of accommodating new algorithms like MPEG4, it was an improvement to H.320 videophones, the H.324 terminal. H.324 videophones are connected to the telephone network through a recently standardized V.34 modem (up to 28.8Kbps at the fastest speed for a telephone line modem), and video compression is a significant improvement over H.261. The voice compression is performed using CE723 G723. The improvement of H.263 over H.261 is as follows. First, in order to encode a motion vector of each macroblock, a more efficient method of taking the median value of three vectors is used in consideration of high correlation with a motion vector of a neighboring macroblock. In this way, a data reduction of about 10% is obtained. In addition, motion estimation can be performed for each block to subdivide motion within a macroblock. For efficient quantization of the DCT transform coefficients, the quantizer is improved to obtain about 3% data savings, and the variable length coding of the quantized transform coefficients improves the two-dimensional code of H.261, JPEG, MPEG1, and MPEG2 to improve intra-screen and inter-screen Use 3D code considering information. You can save about 5% of your data here. The syntax of bit strings is greatly simplified compared to H.261 to reduce the overhead of pure information bits. In addition, it is a technique that brings performance improvement but is complex and leaves it available as an option, such as syntax-based arithmetic encoding and PB frame. Syntax-based adaptive arithmetic coding is complex but saves 5-14%. PB frames can double the number of frames per second while allowing slightly more bits than other techniques for bit saving, resulting in a much more stable and smooth video visually. Several other factors have been proposed to improve the performance, but those that do not have a noticeable improvement compared to the complexity are excluded. H.263, which has many factors above, is used as a reference algorithm for evaluating the performance of the proposed schemes in the recent MPEG4 image quality evaluation. Surprisingly, H.263 performs better than most of the proposed schemes. It turned out to be better. This makes it possible to use a telephone line to make a decent videophone. Since the specification has already been completed, it is expected that from the second half of this year, a video telephone that can be directly connected to a telephone line will be introduced. The videophone can also be implemented on a PC, which is expected to take the concept of a traditional multimedia PC with PC communication and fax transmission and reception over the telephone line to the next level. In other words, it will not be far from home to make video calls via telephone lines.

-JPEG-JPEG

영상을 디스크에 저장하거나 통신채널을 통해 전송하려면 과다한 데이터량이 큰 부담이 된다. 예를 들어 컴퓨터 화면에서 흔히 쓰이는 4백40×6백40화소 에 한 화소당 R(적색) G(녹색) B(청색) 각각 8비트씩 차지하는 자연색 영상의 경우 한장의 화면이 약 0.9MB를 차지한다. 현재 널리 보급되어 있는PC의 하드디스크 용량이 5백60MB, 3.5인치 플로피디스크의 용량이 1.44 MB인 점을 감안하면 영상의 데이터량은 상당한 것이다. JPEG(Joint Photographic Experts Group)은 컴퓨터 전자카메라 컬러팩 스 컬러프린터 등에 응용되는 정지화의 저장 및 전송을 위한 효율적인 압축 에 관한 국제표준(ISO-IEC 10918)으로서 이 표준화를 담당하는 작업반 이 별칭이기도 하다. 표준화단계에서 유럽의 DCT, 미국의 산술부호화, 일본의 벡터 양자화가 치열한 경합을 벌인 끝에 극적으로 DCT방식으로 타협을 보아 1988년 그 기술적 내용이 완성되었다. 심의 과정에서 특히 이진 화상(팩시밀리 등이 대상으로 하는)은 별도의 효과적 압축방식이 필요하다고 여겨 별도의 작업반을 만들어 JBIG(Joint Bilevel Image Coding Experts Group)표준을 완성 하게 되었다. JPEG압축방식은 크게 무손실 모드와 손실 모드로 나눌 수 있다. 무손실 모드는 의료 영상등과 같이 원화에 전혀 손상을 주어서는 안되는 응용분야 에 쓰이고, 손실 모드는 시각적으로 못 느낄 정도의 손실을 허용하면서 압축 률을 높이는 많은 응용분야에 채택된다. 무손실 모드는 부호화하고자 하는 화소를 인접한 이전 화소들로부터 공간적 으로 예측하여 그 예측오차를 통계적 빈도에 따라 허프만 부호화하고 있다. 손실 모드는 압축률을 높이기 위하여 손실부호화(DCT+양자화)와 무손실 부호화(DPCM.런길이 부호화, 허프만부호화.산술부호화)를 결합하고 있다. 화면을 블록(8×8화소)단위로 나누어 블록별로 DCT를 행하여 에너지 를저주파계수에 집중시킨 후 양자화한다. 이때 인간의 눈이 고주파 성분의 양자화 잡음을 덜 느끼는 점을 이용하여 고주파 계수일수록 양자화 스텝을 크게 한다. 따라서 고주파 DCT계수들은 크기도 작은데 양자화 스텝도 커서대부분 0이 된다. 손실 모드는 기본(Baseline)방식과 확장(Extended)방식으로 나뉜다. 기본방식은 양자화된 DCT계수들을 더욱 압축하기 위해 런길이 부호화와 허프 만 부호화를 쓰고 있다. 즉 DC로부터 출발하여 지그재그 주사를 하면서 0이 몇개 반복되고 0이 아닌 값이 나오는지를 런.레벨의 형태로 나타낸다. 이 심벌들은 발생 확률이 각각 다르므로 2차원 허프만 부호를 써서 더욱 압축하고 있다. 이때 각 블록의 평균값에 해당하는 DC는 화질에 크게 영향을 주므로 보다 충실히 표현할 필요가 있다. 따라서 DC는 AC계수들에 비해 보다 세밀히 양자화하고, 양자화된 결과에 대해 이전 블록과의 차이를 취해 1차원 허프만 부호화한다. 확장방식은 양자화된 DCT 계수들을 보다 더 효율적으로 압축하기 위해 산술 부호화를 사용한다. 허프만 부호에 비해 약간 효율은 높으나 그만큼 더 복잡하고 특히 IBM을 비롯한 몇개사가 이 부분의 특허를 보유하고 있어 특허료가 없고 구현이 용이한 기본방식이 오히려 더 널리 사용되고 있다. 확장방식은 또 한 화면을 점진적으로 부호화하여 수신측에서 점진적으로 선명해지는 화면을 선택적으로 재생할 수 있도록 하는 기능도 포함하고 있다. 컴퓨터에 있어서 현재 여러가지의 영상데이터 저장포맷과 압축방식이 사용되고 있다. 널리 쓰이는 GIF나 TIFF 포맷등은 렘펠-지브 알고리듬을 이용한 무손실 압축기법에 기초하고 있다. 이들보다 훨등히 뛰어난 압축률과 응용범위를 갖는 JPEG는 최근 PC에 적극적으로 도입되기 시작하여.jpg라는 확장자를 갖는 파일을 출력시킨다. 또 동영상에 대해서 매 화면을 JPEG로 압축하는 M-JPEG(Motion-JPEG)도 있다. M-JPEG는 움직임 보상부분이 없어 압축률은 MPEG보다 약간 떨어지나 대신 구현이 훨씬 용이하고 MPEG 압축파일과 달리 화면단위의 편집이 가능하다. JPEG는 비슷한 시기에 역시 DCT를 근간으로 하여 완성된 영상전화.회의용의H.261 표준과 함께 훗날 멀티미디어를 핵심기술로 탄생하는 MPEG-1과 2의 모태가 된다. 즉 MPEG는 JPEG로부터 DCT에 기초한 화면의 부호와 기법을, H.261로부터 움직임 보상DCT를 이용한 화면간 부호화 기법을 각각 따온 후 이들을 결합.발전시킨 것이다.To store the image on the disk or to transmit through the communication channel excessive data amount is a big burden. For example, for a natural color image of 440 x 640 pixels commonly used on computer screens, each having 8 bits of R (red), G (green), and B (blue) per pixel, one screen occupies approximately 0.9 MB. do. Considering that the current hard disk capacity of PCs is 560MB and 3.5-inch floppy disks are 1.44MB, the amount of video data is considerable. The Joint Photographic Experts Group (JPEG) is an international standard (ISO-IEC 10918) for efficient compression for the storage and transmission of still images, such as those used in computer electronic cameras, color fax color printers. In 1988, the technical content was completed after a dramatic competition between DCT in Europe, arithmetic coding in USA, and vector quantization in Japan. During the deliberation process, binary images (targeted by facsimile, etc.), in particular, needed a separate effective compression method, and created a separate working group to complete the Joint Bilevel Image Coding Experts Group (JBIG) standard. JPEG compression can be divided into lossless mode and lossy mode. Lossless mode is used for applications that should not damage the original at all, such as medical imaging, and lossy mode is used for many applications that increase the compression rate while allowing visually subtle loss. In the lossless mode, the pixels to be encoded are spatially predicted from adjacent pixels, and Huffman coding is performed based on statistical frequencies. Lossy mode combines lossy coding (DCT + quantization) and lossless coding (DPCM.runlength coding, Huffman coding, and arithmetic coding) to increase compression. The screen is divided into blocks (8x8 pixels) and DCT is performed for each block to concentrate energy on low frequency coefficients before quantizing. In this case, the quantization step is increased as the higher frequency coefficient is obtained by using the fact that the human eye feels less quantization noise of the high frequency component. Therefore, the high frequency DCT coefficients are small, and the quantization step is large, so that they are mostly zero. Lossy mode is divided into baseline and extended mode. The basic method uses run length coding and Huffman coding to further compress the quantized DCT coefficients. In other words, it shows how many zeros are repeated and nonzero values in zigzag scan starting from DC. Since these symbols have different probability of occurrence, they are further compressed using 2D Huffman code. At this time, since the DC corresponding to the average value of each block greatly affects the image quality, it is necessary to express more faithfully. Therefore, DC quantizes more finely than AC coefficients and encodes one-dimensional Huffman by taking the difference from the previous block for the quantized result. The extension uses arithmetic coding to compress the quantized DCT coefficients more efficiently. It is slightly more efficient than the Huffman code, but it is more complex and several companies, especially IBM, have patented this part, so the basic method is free of royalty and easy to implement. The expansion method also includes a function of gradually encoding a picture so that the receiving side selectively reproduces a picture that is gradually cleared. Various video data storage formats and compression methods are currently used in computers. Popular GIF and TIFF formats are based on lossless compression using the Rempel-Jib algorithm. JPEG, which has a much better compression rate and application range, has recently begun to be actively introduced into PCs and outputs files with a .jpg extension. There is also M-JPEG (Motion-JPEG), which compresses each screen into JPEG for moving pictures. M-JPEG has no motion compensation part, so the compression rate is slightly lower than MPEG. Instead, it is much easier to implement. Unlike MPEG compressed files, screen editing is possible. At the same time, JPEG, along with the H.261 standard for video telephony and conferencing, completed on the basis of DCT, will form the basis of MPEG-1 and 2, which will later become multimedia core technologies. In other words, MPEG obtains the code and method of the picture based on DCT from JPEG, and the inter-picture coding method using motion compensation DCT from H.261, respectively, and combines and develops them.

다. MPEGAll. MPEG

1) MPEG(Moving Pictures Expert Group)이란?1) What is Moving Pictures Expert Group (MPEG)?

동영상 전문가 그룹으로 ISO 산하의 멀티미디어 규격 제정 위원회의 Work Group을 의미한다.This is a video expert group, which is a work group of the multimedia standard development committee under ISO.

이 그룹은 디지털 동영상의 표준안 제정을 목적으로 결성되었는데 MPEG은 이 그룹에서 제정한 동영상과 음형 데이터 압축기술을 의미하기도 한다. 멀티미디어에서는 가장 문제가 되는 부분이 바로 동영상이다. 동영상은 정지영상과 달리 많은 데이터량을 갖게되므로, 이 크기를 줄일 수 있는 압축기술이 필요했다. MPEG은 영상과 더불어 사람의 음성 또는 여러 가지의 음향을 포함하여 압축하는 방법이다. 즉 영상과 음성을 따로따로 압축시킨 후 이를 동기화시킨다. 가장 일반적인 스트리밍 압축 방식이며 미디어 플레이어와 리얼네트웍의 근간 기술이기도 하다.The group was formed for the purpose of establishing a standard for digital video. MPEG also refers to the video and sound data compression technology established by the group. The most problematic part of multimedia is video. Unlike still images, video has a large amount of data, so a compression technique that can reduce the size is required. MPEG is a method of compressing a video including a human voice or various sounds. In other words, video and audio are compressed separately and synchronized. It is the most common streaming compression method and the foundation technology of media players and real networks.

특히 이 단체가 관심을 기울이는 분야는 전송기술로서, MPEG1 규격은 1.5Mbps전송속도(compact disk;CD와 동일)를 갖고 있으며 향후 4와 9Mbps의 MPEG2, 20Mbps의 MPEG++, 그리고 40Mbps의 MPEG3규격을 마련중이다.In particular, this organization is interested in transmission technology. The MPEG1 standard has a 1.5Mbps transmission speed (the same as a compact disk; CD), and is preparing 4 and 9Mbps MPEG2, 20Mbps MPEG ++, and 40Mbps MPEG3. .

2) MPEG의 동영상 압축방법2) MPEG video compression method

MPEG은 손실압축방법을 사용하여 영상을 압축하는데, 그 주요한 원리는 다음과 같은 방법으로 이루어진다.MPEG compresses video using lossy compression. The main principle is as follows.

(1) 하나의 영상을 압축한다.(1) Compress one image.

(2) 이어지는 영상은 중복성을 제거 한 후 압축한다.(2) The following image is compressed after removing redundancy.

-여기서 프레임 수를 결정하게 되는데 주로 스트리밍 서버에서 이 부분을 감당한다.This is where you determine the number of frames, which is usually handled by the streaming server.

(3) 각 개체의 영상이 압축된 후 중복되는 부분은 다시 압축한다.(3) After the image of each object is compressed, the overlapping part is compressed again.

- 이것은 영화의 한 장면에서 한 사람이 몇초 동안의 대사를 한다면 그 사람의 옷차림이나 배경화면이 변하지 않는 것을 볼 수 있다. 이때 변하는 것은 입의 모양과 표정뿐일 것이다. 그러므로 맨 처음 말하는 영상을 보관하면 두번째 프레임(Frame)부터는 입의 모양과 표정의 변화만 보관하면 재생시 원본과 거의 동일한 동영상을 재현할 수 있다. 이럴 경우 데이터량은 원본 대비 수십분의 일로 줄일 수 있다.This means that if a person speaks for a few seconds in a scene in a movie, his or her dress or background will not change. The only thing that changes is the shape and expression of the mouth. Therefore, if you store the first video, you can reproduce almost the same video as the original when playing the second frame. In this case, the amount of data can be reduced to several tenths of the original.

단, 이 과정에서 극심한 버퍼링을 거치며 중간중간 다운되는 경우 또한 생긴다. 그리고, 화상의 해상도나 컬러는 원본 보다 떨어진다(일반인은 잘 못 느끼지만).However, during this process, there is a case of severe downsizing and intermediate down. And the resolution and color of the image are lower than the original (although the general public does not feel well).

그리고 데이터를 해석해야할 스트리밍 서버의 과부하를 가져오게 되며 서버사양 또한 고가의 서버를 사용하게 되는 것이다. 또한, 전용 플레이어코덱이 필요하게 되는데 그것들이 바로 리얼 오디오나 미디어 플레이어 같은 전용 스트리밍 플레이어인 것이다.In addition, the streaming server has to overload the data to be interpreted, and the server specification also uses expensive servers. You also need a dedicated player codec, which is a dedicated streaming player such as a real audio or media player.

3) MPEG 규격3) MPEG standard

- MPEG- I : 90년 제정하여 92년부터 사용하는 초기의 MPEG 규격으로, CD, DAT(Digital Audio Tape), 방송용 VCR 등에서 사용되고, VHS 비디오 정도의 영상을 제공하며 ,1.5Mbps의 속도를 제공한다. 이는 이미비디오CD나 CD-I/FMV(CD-Interactive/Full Motion Video)에 채택되어 상품화되었다. 비디오CD는 93년 네덜란드의 필립스、 일본의 소니 마쓰시타 JVC 등 4개사가 합의하여 만든 규격이고、 CD-I/FMV는 필립스가 이미 발매하였던 최초의 멀티미디어 기기로서 애니메이션 수준의 영상을 보여주는 CD-I에 완전한 동영 상기능을 첨가한 것이다. CD-I/FMV에 비해 다소 단조로운 재생기능만을 가졌던 비디오 CD는 94년 대화형 기능을 강화시킨 버전 2.0이 나왔다. 또 그동안 93년에 만들어진 레벨 2 규격을 바탕으로 애니메이션 수준의 영상 과 음향을 제공하여 폭발적 성장을 기록했던 멀티미디어PC가 최근MPEG1 카드를 장착하여 자연스런 동화를 재생함으로써 CD에 담긴 영화를 PC를 통해서도 즐길 수 있게 되었다. 더 나아가 펜티엄급 이상의 고성능 PC에서는 소프트웨어만으로 MPEG1 복호기를 구현하려는 시도가 생기고 있다. 역사적으로 볼 때 지난 88년은 장래 멀티미디어의 방향을 결정하는 중요한시기였다. 우선 하이파이 오디오를 겨냥하여 82년 등장하였던 CD가 85년부터 는 CD롬이라는 이름하에 1.5Mbps 재생속도를 갖는 대용량(6백40MB、 74분) 데이터 저장매체로 응용되기에 이르렀다. 또한 영상전화 및 회의를 위한 동화 압축표준인 H.261과 정지화 압축표준인 JPEG가 이 무렵 완성을 눈앞에 두고 있었다. 이런 배경하에서 그동안 표준화를 이끌어왔던 전문가들의 다음목표는 자연스럽게 CD롬에 동영상과 음향을 담기 위한 표준 압축방법을 제정 하는 것으로 의견이 모아졌다. MPEG는 이 목적으로 결성된 위원회인 ISO-IEC JTC1/SC29/WG11의 별칭이다. MPEG안에는효율적 작업을 위해 여러 소그룹을 두었다. 즉 동영상의 압축방식을 담당하는 비디오 소그룹、스테레오 음향의 압축방식을 담당하는 오디 오 소그룹、압축된 비디오.오디오 비트열의 패킷화.다중화.동기화 등을 담당하는 시스템 소그룹、표준화를 위한 여러가지 요구조건을 제시하는 요구 사항 소그룹、비디오.오디오 소그룹에서 제안된 알고리듬 구현의 난이도를 평가하는 구현 소그룹、그리고 성능 테스트를 담당하는 테스트 소그룹 등이있다. 이 여러 소그룹들이 그룹별 혹은 합동으로 회의를 하면서 표준을 완성해간다. MPEG 표준화는 제안된 압축 알고리듬 중 어느 하나를 선택하는 것이 아니다. 각각의제안방식이 그 나름의 장점을 가질 수 있는 점을 감안하여 좋은 요소 들을 하나하나 흡수하여 알고리듬을 진화시키고 있다. 이 방법은 H.261에서 Reference Model을 설정하여 8차까지 개정해가면서 성공적으로 표준화를이끌었던 것을 거울삼아 택한 것으로 MPEG1에서는 이 기준을 시뮬레이션 모델이 라 한다. MPEG1 비디오의 경우 30여개의 제안들이 융합되어 결국 JPEG의정지 화 압축기법과 H.261의 동화 압축기법을 합성 개선한 하이브리드방식(움직임 보상 DPCM+DCT+양자화+런길이 부호화+허프만 부호화)이 되었다. 이와 관련된 시뮬레이션 모델은 3차까지 개정된 끝에 비로소 위원회의 표준안이 완성되었다. 또 MPEG1 오디오는 크게 4개 컨소시엄의 방식이 경합을 벌여 필립스의 MUSICAM에 기초한 서브밴드 부호화가 제 1、 2계층을 이루고 AT&T의 보다 복잡하고 효율적인 방식이 제3계층을 이루고 있다. 음질과 복잡도간의 균형을 고려 、 제2계층이 가장 널리 쓰이고 있다. MUSICAM방식은 MPEG1뿐 아니라 최근상품화된 휴대형 디지털 오디오 기기인 DCC(Digital Compact Cassette 에도채택되고 있다. MPEG1 시스템은 압축된 비디오와 오디오의 패킷화.다중화.동기화 등을 위한 데이터 포맷에 관한 규정으로 데이터의 다양한 다중화가 가능한 것을 비롯하여 전송 저장 처리 등에 있어 여러가지 이점을 갖는다. MPEG1은 비디오 CD、 CD-I/FMV、 차세대 멀티미디어 PC뿐 아니라 전화선을 이용한 주문형 비디오인 VDT(Video Dialtone)에도 이용되고 있다. 우리나라의 경우 현재 한국통신이 서울 반포전화국에서 2백50명의 가입자를 대상으로 시험서비스중인데 1.5Mbps 전송속도에서 MPEG1으로 영상과 음향을 압축한 후ADSL(Asymmetric Digital Subscriber Line)기술에 의해 데이터를 기존 전화 선(이중나선형 구리선)으로 전송한다. 화질이 만족스럽지 않기 때문에 장차6 Mbps의 MPEG2로 끌어올릴 것도 고려중이다. MPEG1은 또한 뒤를 잇는 훨씬 더 다양한 기능에 효율적인 방식인 MPEG2에의 징검다리 역할을 함으로써멀티미디어 시대를 앞당기는 역할을 하였다고 평가할 수 있다. MPEG1의 동영상 압축방식은 JPEG의 화면내 부호화기법과 H.261의 화면간 부호화 기법을 결합하여 개선한 것이다. H.261로부터 MPEG1으로의 진화과정을 살펴보면 다음과 같다.-MPEG-I: The first MPEG standard established in 1990 and used since 1992. It is used in CD, DAT (Digital Audio Tape), broadcasting VCR, etc., and provides VHS video and provides 1.5Mbps speed. . It has already been adopted and commercialized in video CD or CD-I / FMV (CD-Interactive / Full Motion Video). Video CD is a standard agreed by four companies such as Philips in the Netherlands and Sony Matsushita JVC in Japan in 1993. CD-I / FMV is the first multimedia device that Philips has already released. The complete dynamic recall was added. Video CD, which had only slightly monotonous playback function compared to CD-I / FMV, was released in version 2.0, which enhanced the interactive function in 1994. In addition, based on the Level 2 standard created in 1993, the multimedia PC, which recorded the explosive growth by providing animation-quality video and sound, can play movies on a CD on a PC by playing natural fairy tales with the latest MPEG1 card. It became. Furthermore, attempts are being made to implement MPEG1 decoders in software on Pentium-class or higher-performance PCs. Historically, the last 88 years have been an important time deciding the direction of future multimedia. First of all, CD, which appeared in 82 for high-fidelity audio, has been applied as a large-capacity (640MB, 74 minutes) data storage medium with 1.5Mbps playback speed under the name of CD-ROM from 1985. In addition, H.261, a moving picture compression standard for video telephony and conferencing, and JPEG, a still image compression standard, were nearing completion. In this context, the next goal of the experts who have been leading the standardization has been to establish a standard compression method to naturally capture video and sound on CD-ROM. MPEG is an alias for ISO-IEC JTC1 / SC29 / WG11, a committee formed for this purpose. There are several small groups in MPEG for efficient work. That is, the video small group responsible for the compression method of the video, the audio small group responsible for the compression method of the stereo sound, the system small group responsible for the packetization, multiplexing, and synchronization of the compressed video, audio bit stream, and the various requirements for standardization. Proposed requirements Small groups, video and audio small groups include implementation small groups that evaluate the difficulty of implementing algorithms, and test small groups responsible for performance testing. Many of these small groups meet in groups or jointly to complete the standard. MPEG standardization does not choose any of the proposed compression algorithms. Given that each proposal can have its own merits, the algorithm is evolving by absorbing good elements one by one. This method was chosen based on the fact that H.261 set up a reference model and revised it to the 8th stage and successfully led standardization. In MPEG1, this standard is called a simulation model. In the case of MPEG1 video, about 30 proposals were fused, resulting in a hybrid method (motion compensation DPCM + DCT + quantization + run length coding + Huffman coding) that synthesizes and improves JPEG's static compression method and H.261's moving picture compression method. The simulation model was revised up to the third stage and the committee's standard was completed. In the MPEG1 audio system, four consortiums compete with each other, and subband coding based on Philips MUSICAM forms the first and second layers, and AT & T's more complicated and efficient method forms the third layer. Considering the balance between sound quality and complexity, the second layer is the most widely used. The MUSICAM method is adopted not only in MPEG1 but also in the recently commercialized portable digital audio device, DCC (Digital Compact Cassette) .The MPEG1 system is a regulation on the data format for packetization, multiplexing, and synchronization of compressed video and audio. It can be used for various multiplexing, and has various advantages in transmission and storage processing, etc. MPEG1 is used not only for video CD, CD-I / FMV, next generation multimedia PC, but also for video dialtone (VDT), which is video on demand using a telephone line. In this case, KT is currently conducting a test service for 250 subscribers at the Seoul Banpo Telephone Bureau. After compressing video and audio with MPEG1 at 1.5Mbps, the data is transferred to an existing telephone line by ADSL (Asymmetric Digital Subscriber Line) technology. (Double helix copper wire), because the image quality is not satisfactory. Consideration is being taken to MPEG 2. MPEG-1 also serves as a stepping stone to the multimedia era by acting as a bridge to MPEG2, an efficient way to follow even more features. It is a combination of the encoding technique and the inter-screen encoding technique of H.261. The evolution from H.261 to MPEG1 is as follows.

H.261은 영상전화및 영상회의를 위한 동영상 압축표준이다. 따라서 H.261이 다루는 화면은 기본적으로 사람의 얼굴과 어깨가 포함되는 소위 Head-and-Shoulder 영상이다. 영상전화기의 경우 보통 카메라가 고정되어 있고 그 앞에서 화자가 화면에 나타나는 상대 얼굴을 보며 대화한다. 이 때의 영상은 고정된 배경과 약간의 얼굴 움직임(특히 눈과 입의 움직임이 큼)으로 특징지워진다.H.261 is a video compression standard for video telephony and video conferencing. Therefore, the screen covered by H.261 is basically a so-called Head-and-Shoulder image that includes the human face and shoulders. In videophones, the camera is usually fixed and the speaker talks in front of the other person's face. The image at this time is characterized by a fixed background and some facial movements (especially large eye and mouth movements).

따라서 장면전환이 없이 이웃하는 화면간 상관도가 매우 높고 화면간 부호화가 매우 효율적이다. H.261에서 첫 화면은 모든 매크로블록을 화면내 부호화하고 (I화면)、 그 이후의 화면은 앞 화면으로부터 순방향 예측 부호화하여(P화면) 압축효율을 높이고 있다. 즉 화면의 구성은 IPPP-의 형태가 된다. P화면은 각 매크로블록 마다 움직임 추정 및 보상을 한 후 화면내 부호화(인트라 모드)와 화면간 부호화 인터 모드)중 보다 압축이 많이 되는 것을 선택한다. 한편 움직임 보상 예측 부호화에 있어서는 한번 에러가 발생하면 그것이 이후 계속 전파되어 화면이 원상복구되지 않는다. 이를 극복하기 위하여 H.261 에서는 각각의 매크로블록이 최소한 132화면에 한번씩 강제로 인트라 모드로 부호화되도록 규정하고 있다. MPEG1이 대상으로 하는 화면은 H.261에서처럼 사람의 얼굴로 제한되지 않는 영화나 TV에서 보는 일반적인 영상이다. 따라서 움직임이 보다 많고 장면 전환도 수시로 발생한다. 또 음성을 통한 의사 전달이 주 목적인 H.261 기반영상전화와 달리 MPEG의 응용분야는 화질이 중요시되는 엔터테인먼트가 주를이루고 있다. 이를 고려하면 에러의 전파를 제한하고 화면 복구를 빠르게 하여야 한다 또한 랜덤 액세스 기능이 있어 영상 시퀀스의 임의의 화면으로부터 재생이 가능하여야 한다. 위 조건들을 충실히 만족하려면 화면 하나하나를 화면내 부호화하여야 한다. 그한 예가 방송국 스튜디오에서 부분적으로 사용되고 있는 Motion JPEG(M-JPEG )인데 각 화면을 JPEG로 부호화하여 화면단위의 편집이 가능하고 에러가 다음 화면으로 전파되지 않는다. 그러나 M-JPEG는 화면간 상관도를 이용하지않아 압축효율이 떨어진다. MPEG1은 H.261과 M-JPEG의 중간형태를 취하여 화면내 부호화되는 소위 I화면 은 일정한 주기로 위치시키고 그 사이의 화면들은 예측부호화하고 있다. I화 면은 에러 발생시나 전원을 켰을 때의 화면 복구와 랜덤 액세스의 기본 단위 가 된다. I화면의 부호화는 자연히 JPEG와 매우 흡사하게 된다. 보다 구체적으로는 JPEG기본(Baseline) 시스템과 유사하여 DCT+양자화+런길이 부호화 허프만 부호화의 결합으로 이루어져 있다. MPEG1의 예측부호화 화면은 크게 두 종류로 나뉜다. 하나는 H.261에서 본 바와 같은 P화면으로 이전의 I 혹은 P화면으로부터 움직임보상 예측부호화를행한다. 또 하나는 MPEG1에 새로이 도입된 획기적 개념인 소위 B화면으로 전후의 가장 가까운 I 혹은 P화면으로부터 양방향으로 움직임 보상예측을 행하여 오차를 부호화한다. 따라서 화면의 구성은 B화면이 2개씩 들어가고 I화면 이 9화면에 한번씩 들어가는 경우 IBBPBBPBB IBB-의 형태가 된다. P화면과 B화면은 움직임 보상 DPCM+DCT+양자화+런길이 부호화+허프만 부 호화의 결합으로 이루어진다. 다만 움직임 보상에 있어서 P화면은 순방향、B 화면은 양방향으로 이루어지는 점이 다르다. 여기서 B화면은 움직임 벡터의 양이 2배(순방향과 역방향)가 되지만 오차가P 화면에 비해 크게 줄어 압축효율이 높아진다. 반면 B화면 앞뒤의 I 혹은 P화 면이 차례로 부호화된 후 B화면이 부호화되므로 화면의 전송순서와 표시순서 가 바뀐다. 따라서 부호화/복호화에 여러 화면분의 지연이 불가피하고 화면 저장용 메모리 양이 크게 늘어난다. 예를 들어 현행 TV의 디지털 방송에있어서 B화면이 없으면 8Mb의 메모리가 필요하나 B화면이 있을 경우 16Mb가필요 하다. B화면은 화질 향상에 크게 기여하기 때문에 결국 화질과 시스템의복잡 도간 절충이 필요한데 최근에는 반도체 기술의 발전으로 구현시의 가격부담 이 적어져 많은 시스템에서 B화면을 넣어 사용하고 있다. MPEG1은 1.5Mbps의 낮은 전송률에 기초하고 있어 MPEG1에 기초한 비디오 CD나 CD-I/FMV의 화질은 현행 방송보다 다소 떨어진다. 그럼에도 불구하고 멀티미디어 PC와 전자오락、 가정에서의 일반 영화 감상용으로는 무난한 화질 과 음질이라는 평을 받고 있다. 그런 점에서 앞으로 상당 기간동안 고가.고 화질.다기능의 MPEG2보다도 더 널리 쓰일 것으로 보인다.Therefore, the correlation between neighboring pictures without scene change is very high and the picture coding is very efficient. In H.261, the first picture encodes all macroblocks in picture (I picture), and subsequent pictures are forward predictive coded from the previous picture (P picture) to increase compression efficiency. In other words, the configuration of the screen is in the form of IPPP-. The P picture selects one of more compression among intra picture encoding (intra mode) and inter picture encoding inter mode after motion estimation and compensation are performed for each macroblock. On the other hand, in the motion compensation prediction encoding, once an error occurs, it is continuously propagated thereafter and the screen is not restored. To overcome this, H.261 specifies that each macroblock is forcibly coded in intra mode at least once for 132 pictures. The screen targeted by MPEG1 is a general picture seen in a movie or TV that is not limited to a human face as in H.261. Therefore, there is more movement and scene transitions occur frequently. Unlike H.261-based video telephony, whose main purpose is to communicate via voice, the application of MPEG is mainly focused on entertainment where image quality is important. In consideration of this, it is necessary to limit the propagation of errors and to speed up the screen recovery. Also, the random access function should be able to reproduce from any screen of the video sequence. In order to satisfy the above conditions, each picture must be encoded in the picture. An example is Motion JPEG (M-JPEG), which is partly used in broadcasting studios, where each screen is encoded in JPEG, allowing editing of screen units and no error propagating to the next screen. However, M-JPEG does not use the correlation between screens, which reduces compression efficiency. MPEG1 takes the intermediate form of H.261 and M-JPEG, so that the so-called I-picture encoded in the picture is placed at regular intervals, and the pictures between them are predictively encoded. The I screen is the basic unit for screen recovery and random access when an error occurs or when the power is turned on. The encoding of the I picture is naturally very similar to JPEG. More specifically, it is similar to the JPEG baseline system and consists of a combination of DCT + quantization + run length coding Huffman coding. The MPEG-1 predictive encoding screen is divided into two types. One is the P picture as seen in H.261, which performs motion compensation prediction encoding from the previous I or P picture. The other is the so-called B picture, which is a new concept introduced in MPEG1, and motion compensation prediction is performed in both directions from the nearest I or P picture before and after to encode an error. Therefore, the configuration of the screen is in the form of IBBPBBPBB IBB- when the B screen enters 2 screens and the I screen enters 9 screens once. P picture and B picture consist of a combination of motion compensation DPCM + DCT + quantization + run length coding + Huffman encoding. However, in motion compensation, the P screen is forward and the B screen is bidirectional. Here, the B screen has twice the amount of motion vectors (forward and backward), but the error is greatly reduced compared to the P screen, thereby increasing the compression efficiency. On the other hand, the I and P pictures before and after the B picture are coded in sequence, and then the B picture is coded, thus changing the transmission order and display order of the picture. Therefore, the delay of several screens is inevitable in encoding / decoding, and the amount of memory for screen storage is greatly increased. For example, in the digital broadcasting of current TV, if there is no B screen, 8 Mb of memory is required, but if there is a B screen, 16 Mb is required. As B screens contribute greatly to the improvement of picture quality, a tradeoff between picture quality and system complexity is necessary. Recently, B screens are used in many systems due to the development of semiconductor technology, which reduces the cost of implementation. MPEG1 is based on a low transmission rate of 1.5Mbps, so the picture quality of MPEG-1 based video CDs and CD-I / FMVs is somewhat lower than that of current broadcasts. Nevertheless, multimedia PCs, electronic entertainment, and the general movie for watching at home are said to be good quality and sound quality. In that regard, it is likely to be used more widely than expensive, high-quality and multi-functional MPEG2 for quite some time.

-MPEG1 오디오-MPEG1 audio

MPEG1은 CD롬에의 응용을 위해 1.5Mbps에서 동영상과 음향을 압축하여 다중화하는 방법에 관한 국제표준이다. 이중 음향압축을 다루는 MPEG1 오디오는 모노、 하이파이 스테레오、 또는 2개국어 음향을 약 6대1 안팎으로 압축하는 방식이다. 기존 FM 스테레오 방송、 CD의 하이파이 스테레오 오디오、 컬러TV의 음성다중 등을 감안하면 MPEG1 오디오의 위와 같은 세가지 모드는 많은 응용분야에서의 오디오 요구조건을 수용하고 있음을 알 수 있다. MPEG1 오디오는 4개 컨소시엄의 방식이경합을 벌인 끝에 필립스의 MUSICAM 에 기초한 방식이 제 1、2 계층을 이루고 AT&T등이 제안한 방식이 제3계층 을 이루고 있다. 이중 성능과 복잡도를 고려하여 제 2계층이 가장 널리 사용된다. 동영상은 화면(2차원) 데이터가 시간에 따라 변하는 3차원 데이터이어서 화 소간 시공간적 상관도가 높아 압축률도 수십분의 1까지 높일 수 있다. 그러나 음향은 기본적으로 1차원 신호이어서 샘플간 상관도도 낮아 압축률은 10 분의 1이하가 보통이다. 그나마 압축률을 약간 더 높일 수 있는 경우는 좌우채널간 상관도를 활용할 수 있는 스테레오의 경우이다. MPEG1 오디오가 허용하는 입력신호의 표본화주파수는 CD의 44.1KHz、 DAT나 프로덕션 시스템 등에서 사용하는 48KHz、 그리고 FM오디오(15KHz 대역폭)의 디지털 처리시 사용하는 32KHz 등 세가지이다. 샘플당 비트수는 16~24인데 CD나 DAT에서 사용되고 있고 컴퓨터 환경에도 적합한 16비트(2바이트)가 가장 널리 쓰인다. 압축전의 데이터량은 예를 들어 CD의 경우 44.1KHz×16비트×2=1.5Mbps이다. MPEG1오디오에 있어서는 제2계층에서의 스테레오 압축시 64~384Kbps를 출력 한다. 입력음에 따라 다르겠으나 실험적으로 약 6대1 정도의 압축(2백56Kbp s)까지는 원음과의 차이가 거의 느껴지지 않다고 알려지고 있다. MPEG1 오디오는 효과적 압축을 위해 인간의 청각 특성 두가지를 잘 활용한다. 그 첫째는 단일음에 대한 특성으로 가청 주파수인 20Hz~20KHz 대역에서 각주파수 성분마다 다른 귀의 감도를 곡선으로 나타낸다. 즉 귀에 들리기 위한최소크기 임계치 를 음압으로 나타낸 것으로 1KHz 부근이 가장 낮아 귀에가장 잘 들리고 20Hz의 저주파나 20KHz의 고주파 쪽은 매우 높아진다. 두번째의 청각 특성은 복합성분간의 상호작용에 관한 것이다. 즉 어떤 주파 수 성분이 큰 값으로 존재하면그 주변에 언덕모양의 소위 마스킹 커브를 만들어 이 임계치 이하의 신호는 귀에 들리지 않고 그 이상의 큰 신호만이 들리게 된다. 위 두가지 특성을 결합하면 음향의 스펙트럼을 분석하여 각 주파수에 있어서귀에 들리지 않는 범위인 마스킹 값을 구할 수 있다. 각 주파수 성분의 양자화 잡음이 이 임계치 이내가 되어 들리지 않도록 하면 그만큼 비트를 절약할 수 있다. 이것이 손실이 있음에도 불구하고 원음과의 차이가 거의 느껴지지않는 MPEG1 오디오 압축의 근본 원리이다. MPEG1 오디오에서는 DCT를 이용하는 비디오와 달리 소위 서브밴드 부호화를이용한다. 입력신호를 우선 32개의 균일한 폭을 갖는 대역통과필터로 구성된 필터뱅크를 통과시켜 저주파에서부터 고주파에 이르기까지 성분별로 나눈다. 또한편에서는 보다 세밀한 푸리에 스펙트럼 분석을 통해 마스킹 임계치를구하고 이에 따라 32개의 각 대역에서의 양자화잡음 허용범위를 결정한다. 비트율이정해지면 오버헤드를 뺀 가용 비트수가 구해지므로 이를 32개의대역에 할당하여 대역별로 양자화를 행한다. 이때 비트를 많이 필요로 하는、 즉 양자화 잡음을 적게 해야 하는 곳부터 비트를 차례로 한 비트씩 할당한다. 이렇게 하면 전체 비트의 한도내에서 각 대역들이 필요한 만큼 비트를할당받게 된다. 한 비트 증가할 때마다 그 대역의 양자화잡음의 크기가 반으로 줄어 약 6kt씩 신호대잡음비가 개선된다. 각 대역마다 이렇게 할당된 비트수와 신호의 진폭에 관한 정보인 스케일인자 、 그리고 양자화된 표본값을 구하여 이들을 정해진 포맷에 따라 보내게 된다. MPEG1 오디오의 압축방식은 최근 발매가 시작된 DCC(Digital Compact Casset te)에도 이용되고 있다. 대부분의 MPEG1 오디오 응용분야와 달리 DCC는 부호 기와 복호기가 함께 들어가므로 부호기를 쉽게 구현할수 있도록 하기 위해DCC에서는 청각특성을 MPEG1 오디오보다 훨씬 단순화하여 사용하고 있다.MPEG1 is an international standard for compressing and multiplexing video and sound at 1.5Mbps for CD-ROM applications. MPEG1 audio, which handles dual acoustic compression, compresses mono, hi-fi stereo, or bilingual sound to about 6 to 1 in and out. Considering the existing FM stereo broadcasting, hi-fi stereo audio of CD, and voice multiplexing of color TV, it can be seen that the above three modes of MPEG1 audio accommodate audio requirements in many applications. In the case of MPEG1 audio, four consortiums competed, and Philips' MUSICAM-based method formed the first and second layers, and AT & T and others proposed the third layer. The second layer is the most widely used due to its dual performance and complexity. Since video (3D) data is three-dimensional data that changes with time, the spatial and temporal correlation between pixels is high, so that the compression ratio can be increased to one tenth of that. However, the sound is basically one-dimensional signal, so the correlation between samples is low, so the compression ratio is usually less than one tenth. However, the case where the compression ratio can be slightly increased is the case of stereo that can utilize the correlation between left and right channels. There are three sampling frequencies for the input signal of MPEG1 audio: 44.1KHz for CD, 48KHz for use in DAT and production systems, and 32KHz for digital processing of FM audio (15KHz bandwidth). The number of bits per sample ranges from 16 to 24, with 16 bits (two bytes) being used on CDs and DATs and suitable for computer environments. The amount of data before compression is, for example, 44.1 KHz x 16 bits x 2 = 1.5 Mbps for CD. In MPEG1 audio, 64 to 384 Kbps is output during stereo compression in the second layer. Although it depends on the input sound, it is known that the difference from the original sound is hardly felt until the compression of about 6 to 1 (2560 Kbp s). MPEG-1 audio takes advantage of two human hearing characteristics for effective compression. The first is characteristic of single sound, and shows curve of ear sensitivity different for each frequency component in 20Hz ~ 20KHz band. In other words, the minimum size threshold to be heard is expressed as sound pressure, which is the lowest in the vicinity of 1KHz, which is best heard in the ear, and the high frequency of 20Hz or 20KHz is very high. The second auditory characteristic relates to the interaction between complex components. In other words, if a frequency component is present at a large value, a hill-shaped masking curve is created around it so that signals below this threshold are not audible and only a larger signal is heard. Combining the above two characteristics, the spectrum of the sound can be analyzed to obtain a masking value, an inaudible range for each frequency. If the quantization noise of each frequency component is kept within this threshold and is not heard, the bit can be saved. This is the fundamental principle of MPEG1 audio compression, which, despite its loss, shows little difference from the original. In MPEG1 audio, unlike video using DCT, so-called subband coding is used. First, the input signal is passed through a filter bank composed of 32 uniform width bandpass filters and divided by components from low frequency to high frequency. In addition, a more detailed Fourier spectral analysis is used to determine the masking threshold and to determine the quantization noise tolerance in each of the 32 bands. Once the bit rate is determined, the number of available bits is obtained by subtracting the overhead, so that it is allocated to 32 bands and quantized for each band. In this case, bits are allocated one by one in order from where a lot of bits are needed, that is, where quantization noise is to be reduced. This ensures that each band is allocated as many bits as needed within the limit of the total bits. Each time the bit is increased, the quantization noise of the band is halved, thereby improving the signal-to-noise ratio by about 6 kt. For each band, we get the scale factor, the quantized sample value, and the information about the allocated number of bits, the amplitude of the signal, and send them according to the specified format. MPEG-1 audio compression is also used in the Digital Compact Cassette (DCC), which has recently been released. Unlike most MPEG1 audio applications, DCC uses the coder and decoder together to make the coder easier to use.

-MPEG1 시스템-MPEG1 system

MPEG1은 CD롬 등의 디지털 축적 미디어(DSM:Digital Storage Med ia)에 1.5Mbps이내로 동영상과 음향을 압축하여 다중화하는 방법에 관한 국제표준이다.MPEG1 is an international standard for compressing and multiplexing video and sound within 1.5 Mbps to digital storage media (DSM) such as CD-ROM.

이중 다중화와 관련된 규격이 MPEG1시스템이다. 시스템이라 는용어는 일반적으로 매우 광범위한 의미를 갖지만 MPEG규격에 있어서의시스템은 동영상과 음향을 압축한 결과로서 발생하는 비트열들과 기타 부가데이터들을 어떻게 동기를 맞추고 어떤 형태로(예를 들어 패킷 형태)다중화 할 것인가에 관한 기술 규격이다. 디지털 신호의 다중화는 각 채널신호가 시간적으로 분리되어 교대로 전송 되는 시분할다중화(TDM:Time Division Multiple.ing)를 이용한다. 가장간단한 경우는 처음부터 일정하게 시간을 나누어 각 채널이 고유의 시간 구역을 차지하도록 하는 것이다. 수신측에서도 각 채널의 시간 할당을 알고 있으므로 원하는 채널의 신호를 선택할 수 있다. 예를 들어 전화국의 전자교환 기는 여러 가입자들로부터 오는 음성신호를 PCM디지털 데이터로 바꾼 후 이런 방법으로 다중화하고 있다. 위 방법은 채널 할당이 고정되어 있어 비교적 간단하다는 장점이 있지만그만큼 유연성이 떨어진다. 즉 채널에 따라 발생하는 데이터량이 매 순간 변하거나 채널을 새로 추가하거나 어떤 채널을 빼는 등의 변화를 수용할 수 없고 복잡한 디지털 통신망에 따른 다양한 형태의 처리 및 전송요구에도 부응할 수 없다. 이를 해결하기 위한 방법이 패킷에 의한 다중화이다. MPEG1 시스템에 서도 이 방법을 채택하고 있다. 패킷은 여러 비트를 묶은 하나의 다발이다. MPEG1시스템 비트열을 구성하는 패킷들의 길이는 고정될 수도 있고 가변적일 수도 있다 각 패킷의 앞머리에 위치하는 소위 헤더에는 그 패킷의 여러가지 속성들이 기록되어 있어 수신측에서 이에 따라 그 패킷을 적절히 처리하도록 하고 있다. 예를 들어 패킷이 시작됨을 알리는 동기 신호와 그 패킷이 비디오인지 오디오인지、 패킷의 길이는 얼마인지、 오류처리는 어떻게하는지 등의 정보가 실려있다. 헤더는 일종의 오버헤드가 되는 셈이지만 그만큼 처리에 유연성을 보장해 준다. 헤더의 길이는 일정하므로 패킷의 길이가 짧을수록 오버헤드의 비율이 상대적으로 커져 데이터 전송효율이 떨어지지만 처리를 위해 패킷을 임시 저장하는 버퍼의 크기를 줄일 수 있고 처리에 따른 지연도 작아진다. 비디오 오디오 그리고 부가 데이터의 비트열이 서로 독립적이고 아무 관련없으면 세 종류의 패킷들을 발생되는 대로 차례로 전송하기만 하면 된다. 그러나 MPEG1 시스템 비트열은 기본적으로 하나의 프로그램이므로 그 안의비디오-오디오-부가 데이터는 서로 밀접하게 연관되어 있다. 특히 화면과 소리가 서로 시간적으로 어긋나지 않는 소위 립싱크가 이루어져야 한다. 개별적 비트열간 시간적 동기를 위해 MPEG1 시스템에서는 소위 타임스탬프를 이용한다. 이는 비디오 화면마다、 그리고 오디오 프레임마다 부호 화기에 들어갈 당시의 시간을 꼬리표로 기록하여 함께 전송함으로써 복호후에 모니터나 스피커에 표시되어야 할 시간을 알려주는 역할을 한다. 타임 스탬프에는 두 종류가 있는데 복호를 위한 시간을 알려주는 DTS(Decoding Time Stamp)와 표시를 위한 시간을 알려주는 PTS(Presentation Time Stam p)가 그것이다. 수신측에서 타임 스탬프를 이용、 비디오와 오디오를 동기시키기 위해서는수신기에 일종의 시계가 있어 타임 스탬프를 이 시계의 시각과 대조하면서 비트열을 처리하여야 한다. 마치 두 사람이 몇시에 만나기로 약속할 때 두 사람의 시계를 미리 서로 일치시켜야 하는 것과 같은 원리이다. MPEG1 시스템에서는 이 기준 시각을 SCR (System Clock Reference)라고 하는데 부호기에서 이 시각을 수시로 복호기에 보내 복호기의 시계를 부호기에 맞추도록 하고 있다. 이를위해 여러개의 패킷을 묶어 팩이라는 단위로 만들어 전송하고 팩의 헤더에 필요때마다 이 SCR를 보낸다. 팩 헤더에는 이밖에도수신기가 MPEG1 시스템 비트열의 처음이 아닌 도중에서부터도 복호화할수 있도록 필요한 정보들을 실어 보낸다. MPEG1은 처음부터 CD롬에의 응용을 염두에 두었기 때문에 MPEG1 시스템은 오류에 대한 고려를 많이 하지 않고 응용분야가 가지고 있는 고유의 오류 정정방식에 의존하고 있다. MPEG1 시스템은 후에 MPEG2 시스템 중 프로그램 스트림으로 개량되는데 MPEG2 시스템에는 이외에도 오류가 있는 환경에서 하나가 아닌 여러 프로그램을 다중화하는 데 사용되는트랜스포트 스트림도 있다.The standard related to double multiplexing is the MPEG1 system. The term system generally has a very broad meaning, but in the MPEG standard, a system can synchronize the bit streams and other additional data that occur as a result of compressing video and sound, and in some form (for example, in the form of a packet). This is a technical standard for multiplexing. Multiplexing of digital signals uses Time Division Multiplexing (TDM), in which each channel signal is separated in time and transmitted alternately. The simplest case is to divide the time constant from the beginning so that each channel occupies its own time zone. Since the receiving side also knows the time allocation of each channel, it is possible to select the desired channel signal. For example, a telephone exchange's electronic switch converts voice signals from multiple subscribers into PCM digital data and multiplexes in this way. The above method has the advantage of being relatively simple because the channel assignment is fixed, but it is less flexible. In other words, the amount of data generated by a channel changes every moment, adds a new channel, or removes a certain channel, and cannot cope with various types of processing and transmission demands of a complicated digital communication network. The solution to this problem is packet multiplexing. The MPEG1 system also employs this method. A packet is a bundle of bits. The length of packets constituting the MPEG1 system bit string may be fixed or variable. In the so-called header located at the front of each packet, various attributes of the packet are recorded so that the receiver can process the packet accordingly. have. For example, it contains a synchronization signal that indicates the start of a packet, whether the packet is video or audio, how long is the packet, and how error processing is performed. Headers are a kind of overhead, but they provide flexibility in processing. Since the length of the header is constant, the shorter the packet, the greater the ratio of overhead, resulting in lower data transmission efficiency. However, the size of the buffer that temporarily stores the packet for processing can be reduced, and the processing delay is also small. If the bitstreams of video, audio and additional data are independent of each other and are irrelevant, then all three types of packets need to be transmitted one after the other. However, since the MPEG1 system bit stream is basically a program, the video-audio-additional data therein is closely related to each other. In particular, the so-called lip-sync should be made so that the screen and sound do not deviate from each other in time. For temporal synchronization between individual bit streams, MPEG-1 systems use so-called time stamps. This indicates the time that should be displayed on the monitor or speaker after decoding by recording the time at the time of entering the encoder for each video screen and audio frame together with a tag. There are two types of time stamps: Decoding Time Stamp (DTS), which indicates the time for decoding, and PTS (Presentation Time Stam p), which indicates the time for display. In order to synchronize video and audio using the time stamp on the receiving side, there is a type of clock in the receiver that must process the bit string while matching the time stamp to the time of this clock. It's as if two people agreed to meet at what time they had to synchronize their clocks together. In the MPEG1 system, this reference time is called SCR (System Clock Reference), and the encoder sends this time to the decoder so that the clock of the decoder is matched to the encoder. To do this, multiple packets are bundled and sent in packs, and the SCR is sent whenever needed in the pack's header. In addition, the pack header carries necessary information so that the receiver can decode even from the beginning of the MPEG1 system bit stream. Since MPEG1 has been designed for CD-ROM applications from the beginning, the MPEG1 system does not consider much errors and relies on the inherent error correction scheme of the application field. The MPEG1 system is later refined into program streams in the MPEG2 system. In addition to the MPEG2 system, there are also transport streams used for multiplexing multiple programs instead of one in an error environment.

-MPEG1의 응용Application of -MPEG1

MPEG1은 디지털 축적 미디어(DSM:Digital Storage Media)에 1.5Mbps이 내로 동영상과 음향을 압축하여 (각각 MPEG1 비디오와 MPEG1 오디오) 다중화 하는 (MPEG1 시스템) 방법에 관한 국제표준이다. MPEG1표준의 위와 같은 공식명칭으로 인해 발생할 수 있는 혼란을 해소하기 위해 다음과 같이 그 의미를 보다 분명히 하기로 하자. 이렇게 함으로써 각종 응용분야에 있어서 MPEG1의 효용성도 보다 명확해질 것이다. 디지털 축적 미디어라 함은 CD-ROM뿐 아니라 DAT (Digital AudioTape)도 있고, 비트율에 있어서도 채널당 약 8Mbps를 쓰는 디지털 방송의 경우처럼 1.5Mbps보다 훨씬 높은 비트율을 사용하는 시스템도 많다. MPEG1 규격 만을 놓고 볼때는 응용에 이렇다할 제약이 없어, 디지털 방송을 포함한 여러가지 멀티미디어 응용 분야에 몇가지의 파라미터만을 변화시킨 채로 사용할수있다. 실제로 미국의 디지털 위성방송 시스템인 DirecTv는 초기에 비디오 와오디오 모두 MPEG1을 사용하여 압축했었다. 다만 이러한 응용에 있어서 염두에 두어야 할 사실이 있다. 즉 MPEG1의 탄생배경이 CD-ROM에 동화상과 음향을 기록.재생하는 것이었기 때문에, MPEG1 압축 알고리듬은 CD-ROM의 데이터 재생속도인 1.5Mbps이내(엄밀히는 1백50KB 초-1.2Mbps)를 목표로 하여 최적화가 이루어졌다. 또한 MPEG1 시스템의 데이터 포맷도 CD-ROM에 단일 프로그램을 넣는 것을 고려했기 때문에 에러 정정이나 디지털 방송과 같은 복수 프로그램의 경우에 대한 고려가 덜 되어 있다. 앞서 예를 든 DirecTv도 이러한 이유로 초기에 조차 MPEG1 시스템의 다중화 방식을 그대로 채택하지 않았다. 결국 MPEG1의 응용으로 가장 적절한 것은 CD-ROM에의 응용이다. 초창기에는기존의 VHS 테이프에 비해 화질도 다소 떨어지고 74분의 재생시간이 영화한편을 수용할 수 없어 가전업체들이 상품화에 소극적이었다. 그러나 92년 가을 MPEG1에 기초한 일본 JVC의 가라오케 시스템이 선풍적 인기를 끌면서 MPEG1의 상품화가 다시 업체들의 관심을 모으게 되었다. 결국 가전업계의 선두주자들인 필립스 소니 마쓰시타 등이 합세하여 이 가라오케 규격을 오늘날 비디오 CD규격으로 발전시켰다. 그리고 이것은 다시 94년 대화형 기기로서의 기능을 보강한 버전 2.0으로 개량되었다. 필립스는 이와 비슷하나 약간 차이가 있는 CD-I FMV를 개발하여 발표했는데, 이것은 기존의 대화형 기기인 CD-I에 MPEG1에 기초한 완전한 동화기능(Full Motion Video)을 삽입한 것이다. MPEG1의 채택은 컴퓨터 업계와 게임기 업계에도 확산되고 있다. 컴퓨터업계에서는 일찍이 멀티미디어의 구현에 앞장서 91년에 레벨 1, 93년에 레벨 2의 멀티미디어 PC(MPC) 규격을 내놓은 바 있다. 2배속 CD-ROM(3백KB 초의 데이터 전송 속도)에 기초한 레벨2는 올해 레벨 3이 나오면서 그 자리를 물려주고 있다. MPC 레벨 3은 75MHz 펜티엄 이상의 CPU와 4배속 CD-ROM, 그리고 무엇보다도 MPEG1 복호기를 장착하도록 규정하고 있다. 또 이와는 별도로 CPU의 성능이 향상되면서 소프트웨어에 의한 MPEG1 복호기 구현이 늘고 있고, 아예 VGA카드에 MPEG1 복호기능을 통합해 넣은 시스템도 나타나고 있다. 게임기 업체에서도 최근 게임기에 CD-ROM을 결합하는 경향이 두드러지고 있다. 더욱이 게임기를 게임 CD와 비디오 CD를 포함한 모든 종류의 CD를 재생할 수 있는 종합 엔터테인먼트 기기인 멀티 플레이어로 탈바꿈시키고 있다. 컴퓨터나 게임은 화질에 대한 요구가 일반 TV와는 다르다. 즉 컴퓨터는 TV에 비해 작은 모니터상에 화상이 디스플레이 되고 (물론 윈도를 열어 디스플레이 하면 더욱 작아진다) 게임은 특성상 자연화보다 시공간적 해상도가 떨어져도 좋다. MPEG1은 이러한 정도의 응용에는 충분한 화질과 음질을 MPEG2보다 훨씬 값싸게 구현할 수 있어 MPEG2의 등장에도 불구하고 상당기간 사용될 것으로 보인다. 물론 TV방송의 디지털화 등에서는 화질에 대한 요구조건이 더 강화되므로 앞으로 본 연재에서 다룰 MPEG2를 채택하는 것이 일반적이다.MPEG1 is an international standard for multiplexing (MPEG1 system) by compressing video and sound (MPEG1 video and MPEG1 audio, respectively) within 1.5Mbps on Digital Storage Media (DSM). To resolve the confusion that may arise from such formal names in the MPEG1 standard, let's clarify the meaning as follows: This will also make the utility of MPEG1 clearer in various applications. Digital storage media are not only CD-ROMs, but also digital audio tapes (DATs). Many systems use bit rates much higher than 1.5 Mbps, such as digital broadcasting, which uses about 8 Mbps per channel in bit rate. In terms of the MPEG1 specification, there are no restrictions on the application, and it can be used with only a few parameters changed in various multimedia applications including digital broadcasting. Indeed, DirecTv, a US digital satellite broadcasting system, initially compressed both video and audio using MPEG1. However, there are facts to keep in mind for this application. In other words, MPEG1 compression algorithm aims at less than 1.5Mbps (strictly 150KB ultra-1.2Mbps), which is the data playback speed of CD-ROM, because MPEG1's birth was to record and play back moving images and sound on CD-ROM. Optimization was achieved. In addition, since the data format of the MPEG1 system is considered to include a single program in a CD-ROM, there is less consideration in the case of multiple programs such as error correction and digital broadcasting. DirecTv, for example, did not even adopt the multiplexing scheme of the MPEG1 system even earlier. After all, the most appropriate application of MPEG1 is to CD-ROM. In the early days, consumer electronics companies were passive in commercialization because the image quality was slightly lower than the existing VHS tape and 74 minutes of play time could not accommodate a movie. However, in the fall of '92, Japanese JVC's karaoke system based on MPEG1 became very popular, and the commercialization of MPEG1 attracted the attention of companies again. Ultimately, Philips Sony Matsushita, a leader in the consumer electronics industry, joined forces to develop the karaoke standard into today's video CD standard. It was then upgraded to version 2.0, augmenting its capabilities as an interactive device in 1994. Philips developed and released a similar but slightly different CD-I FMV, incorporating full motion video based on MPEG1 into CD-I, an existing interactive device. The adoption of MPEG1 is also spreading in the computer and game industry. In the early days of the multimedia industry, the computer industry introduced level 1 multimedia PC (MPC) standards in level 1 and 1993. Level 2, based on a double-speed CD-ROM (300KB-second data transfer rate), is taking its place with Level 3 this year. MPC Level 3 requires a 75MHz Pentium or better CPU, a 4x CD-ROM and, above all, an MPEG1 decoder. In addition, as the performance of the CPU improves, the implementation of MPEG1 decoder by software is increasing, and a system in which MPEG1 decoding function is integrated into a VGA card is appearing. Game machine makers have recently become more prominent in integrating CD-ROMs with game machines. Moreover, game consoles are being transformed into multiplayer, a comprehensive entertainment device that can play all kinds of CDs, including game CDs and video CDs. Computers and games have a different demand for picture quality than regular TVs. In other words, a computer displays images on a smaller monitor than a TV (which is smaller when a window is opened and displayed), and a game may have a lower spatiotemporal resolution than naturalization. MPEG1 can realize enough quality and sound quality for this level of application much cheaper than MPEG2, so it will be used for a long time despite the advent of MPEG2. Of course, in the digitization of TV broadcasting, the requirements for image quality are further strengthened, so it is common to adopt MPEG2, which will be covered in this series.

- MPEG-Ⅱ: MPEG-I을 개선하여 93년에 제정된 규격으로, 방송국 취급 정도의 화질을 제공하며, DVD 사용하고, 5 ~ 10Mbps의 속도를 제공한다. H261은 ISDN을 이용한 화상전화및 화상회의를 위한 P×64Kbps(P-1~30)의 화상압축 국제표준이고, MPEG1은 CD등 디지털 축적 미디어에 1.5Mbps이내로 동영상과 음향을 압축하여 다중화하는 국제표준이다. 이렇게 디지털 화상압축 기술은 1990년께까지는 통신미디어와 축적 미디어로 한정되어 왔다. MPEG-2는 MPEG-1의 표준화 작업이 일단락된 1990년 9월 미국 산타클라라 회의에서 논의가 시작되었다. MPEG-1의 대상은 주로 약 1.5Mpbs의 비디오 CD로 제한되어 있는데, 화질이 VHS수준에 머무르고 한 장에 74분밖에 기록되지 않아 더욱 높은 비트율에서 고화질을 실현하는 표준이 요구되었기 때문이다. 초기의 MPEG-2의 목표는 5~10Mbps정도에서 현행 TV품질을 실현하는 것이었다. 우선 MPEG-2를 완성한 다음 그 후속 작업으로 HDTV(High Definition Television:고선명 TV)품질을 실현하기 위한 MPEG-3를 표준화하기로 합의되어 있었다. MPEG-2의 표준화도 H261이나 MPEG-1과 같이 경쟁과 협력의 단계로 나누어져 시행되었다. MPEG-2의 경쟁과 협력의 분기점이 된 것은 1990년 11월 일본 도쿄 부근의 구리하마에서 열렸던 MPEG회의였다. JVC의 연구소에서 각 참여기관이 제안한 앨고리듬으로부터 얻어진 화상에 대한 화질평가가 시행된 것이다. MPEG-1 때와 비교해 제안기관 수가 16에서 32로 배가되었고 방송에의 응용이 주였던 만큼 방송분야의 연구기관들이 참여하기 시작하였다. 이 평가 이후 테스트 모델이라 불리는 참조모델을 기반으로 본격적으로 상호 협력에 의한 표준화가 진행되었다. 이 무렵 MPEG-2의 표준화에 결정적 영향을 준 사건이 일어났다. 고선명TV (ATV:Advanced Television)의 개발을 추진하던 FCC(미국 연방통신위원회)에 의한 표준화 작업이그것이었다. ATV는 당초 1987년부터 일본의 고선명 TV인 하이비전에 대항해 미국의 독자적 차세대 방송방식을 결정하기 위해 시작된 국가 프로젝트였다. 프로젝트의 성격상 정치적 색채를 띨수밖에 없었지만 또 한편으로는 커다란 기술적 도약을 가져오기도 하였다. 즉 참여사중의 하나였던 GI(General Instruments)사가 1990년에 15Mpbs의 Degicipher방식을 발표하여, 이 정도의 전송속도로 HDTV품질을 얻을 수 있는 정보압축이 가능하다는 것을 보였다. 이 사건은 특히 일본의 하이비전 개발에 참여해 왔던 많은 사람들에게 커다란 충격을 주었다. 이후 미국의 ATV표준화는 GI AT&T 제니스 톰슨 필립스 등이 3개의 컨소시엄을 형성, 4개의 방식을 제안하여 치열한 경쟁에 들어갔다. 4개의 디지털 HDTV방식에 대한 테스트 결과 뚜렷한 승자가 없었으므로 FCC는 ATV제안사들에 통일안을 낼 것을 권하였고 결국 1993년 봄 GA(Grand Alliance:대연합)가 결성되었다. GA는 결국 MPEG-2에 준거한 방식으로 통일안을 내어 93년 10월 승인되었다. 이렇게 해서 MPEG-2는 미국의 차세대 TV방송방식으로 정식 채택된 것이다. 유럽에서도 이와 비슷한 움직임이 일어났다. 아날로그인 HD-MAC방식에 기초한 EUREKA95프로젝트는 중지되고 93년 10월 DVB(Digital Video Broadcasting)프로젝트가 EU가맹국을 중심으로 새로이 출발하였다. 이 프로젝트는 MPEG-2를 기본으로 한 차세대 TV방식의 사양을 정하여 CATV.위성방송.지상방송 등을 디지털화하기 위한 토대를 만들고 있다. 미국의 ATV프로젝트의 진전에 보조를 맞추어 MPEG-2는 훗날 HDTV표준을 위한 MPEG-3를 따로 만들 시간이 없음을 감안하여 MPEG-3를 흡수하여 HDTV품질까지를 그 표준화 대상으로 하기로 결정했다(92년 3월 회의). 이와 같은 우여곡절을 겪으면서 MPEG-2의 표준화가 진척되어 수렴 단계에 들어서 드디어 TM0(Test Model 0, 91년 1월 싱가포르 회의), TM1(3월 하이파회의), TM2(7월 리우데자네이루 회의)로 개량이 거듭되고 드디어 93년 11월 서울 회의에서 HDTV까지를 포함하는 방식으로 CD(Committee Draft, 위원회 원안)가 완성되어 표준화작업의 기술적 사항이 거의 완결되었다. 방송품질의 실현을 목표로 표준화작업이 추진되어 최근 표준화가 끝난 MPEG-2는 뛰어난 성능과 유연성에 따라 디지털 위성방송(우리나라의 무궁화위성을 통한 DBS포함), 고선명TV, 디지털비디오디스크(DVD), 주문형 비디오(VOD) 등 많은 분야에서 채용이 결정되어 멀티미디어 혁명을 주도하는 원동력이되고 있다. 특히 최근 소니 필립스 진영과 도시바 마쓰시타 진영이 규격에 합의을 본 DVD는 한장에 MPEG-2화질(평균 6Mbps)의 1백30분 길이 영화를담을 수 있어 현재의 비디오CD나 VHS, 디지털 VCR등에 큰 영향을 주고 나아가서 약 5GB의 대용량을 실현함으로써 HDD, CD-ROM 등 컴퓨터 보조기억장치분야에도 파급효과가 매우 클 것으로 보인다. -MPEG-II : A standard established in 1993 by improving MPEG-I. It provides the quality of broadcasting station handling, uses DVD, and provides a speed of 5 to 10Mbps. H261 is the international standard for video compression of P × 64Kbps (P-1 ~ 30) for video telephony and video conferencing using ISDN, and MPEG1 is the international standard for compressing and multiplexing video and sound within 1.5Mbps on digital storage media such as CD. to be. As such, digital image compression technology has been limited to communication media and storage media until 1990. MPEG-2 was discussed at the September 1990 US Santa Clara Conference, which ended the standardization of MPEG-1. The subject of MPEG-1 is mainly limited to video CDs of about 1.5 Mpps, because the quality remains at the VHS level and only 74 minutes are recorded per sheet, so a standard is required to realize high quality at higher bit rates. The initial goal of MPEG-2 was to achieve current TV quality at around 5-10Mbps. It was agreed to standardize MPEG-3 to complete MPEG-2 and then to succeed in HDTV (High Definition Television) quality. The standardization of MPEG-2 has been implemented in the same level of competition and cooperation as H261 and MPEG-1. The bifurcation of competition and cooperation in MPEG-2 was the MPEG conference held in Gurihama, near Tokyo, Japan in November 1990. JVC's laboratory conducted image quality evaluation on the images obtained from the algorithms proposed by each participating organization. Compared to the MPEG-1, the number of proposed institutions doubled from 16 to 32, and the application of the application to broadcasting has led to the participation of research institutes in the broadcasting field. After this evaluation, standardization was carried out in earnest based on a reference model called a test model. At this time, an event had a decisive impact on the standardization of MPEG-2. This was standardized by the Federal Communications Commission (FCC), which was pushing for the development of high-definition television (ATV). ATV was originally a national project that began in 1987 to determine America's next-generation broadcast system against Hi-Vision, Japan's high-definition television. Due to the nature of the project, it was forced to take on political colors, but on the other hand, it also brought a great technological leap. In other words, GI (General Instruments), one of the participating companies, announced a 15Mpbs degicipher method in 1990, showing that it is possible to compress information to obtain HDTV quality at this rate. This incident was particularly shocking to many who have been involved in the development of high vision in Japan. Later, ATV standardization in the United States entered a fierce competition by GI AT & T Zenith Thompson Phillips, which formed three consortiums and proposed four methods. Since there were no clear winners in the test of the four digital HDTV schemes, the FCC recommended unification to ATV proponents and eventually formed the Grand Alliance (GA) in the spring of 1993. The GA was eventually approved in October 1993 with a unification plan in a MPEG-2 compliant manner. Thus, MPEG-2 was formally adopted as the next generation TV broadcasting method in the United States. Similar movements took place in Europe. The EUREKA95 project, which is based on the analog HD-MAC method, was discontinued, and in October 1993, the DVB (Digital Video Broadcasting) project was newly launched, focusing on EU member states. The project sets the specification for the next generation TV system based on MPEG-2, laying the foundation for the digitalization of CATV, satellite broadcasting and terrestrial broadcasting. In keeping with the progress of the ATV project in the US, MPEG-2 decided to absorb MPEG-3 and target HDTV quality as it does not have time to make MPEG-3 for HDTV standard later. March 92 meeting). With these twists and turns, the standardization of MPEG-2 has progressed and convergence has finally taken place. Finally, TM0 (Test Model 0, January 1991 Singapore meeting), TM1 (March Haifa meeting) and TM2 (July Rio de Janeiro meeting) Improvements were made and finally the CD (Committee Draft) was completed in November 1993, including the HDTV, including the HDTV. The technical details of the standardization work were almost completed. The standardization work has been carried out with the aim of realizing the broadcast quality, and recently standardized, MPEG-2 has been equipped with digital satellite broadcasting (including DBS through the Korean satellite), high-definition TV, digital video disc (DVD), Employment has been decided in many fields, such as video on demand (VOD), which is driving the multimedia revolution. In particular, the DVD, which Sony Philips and Toshiba Matsushita, agreed to in recent years, can hold a 130-minute movie of MPEG-2 quality (average 6 Mbps) on a single sheet, which greatly affects the current video CD, VHS, and digital VCR. In addition, by realizing the large capacity of about 5GB, the ripple effect is expected to be very large in the field of computer auxiliary storage devices such as HDD, CD-ROM.

-MPEG2 비디오-MPEG2 video

MPEG2 비디오는 MPEG1 비디오를 포함하고 있어 순방향 호환성이 유지된다. MPEG1비디오가 CD등 디지털 축적 매체에 1.15Mbps의 저비트율로 동화상 을 저장하는데 반해, MPEG2 비디오는 보다 고비트율의 방송, 통신, 축적 미 디어에서 고화질의 동화상을 전송하거나 저장하는데 사용된다. 응용분야가 다양해진 만큼 충족시켜야 할 요구조건도 많아졌다. 세계 유수의 기업들이 이러한 조건을 만족시키기 위한 MPEG-2 비디오앨리 고듬을 제안하여 91년 9월 최종적으로 30개의 방식이 일본의 JVC연구소에서 평가를 받았다. 그중 20개는 MPEG1과 마찬가지로 DCT에 기초한 방식이었고, 5개는 서브밴드 부호화 방식이었으며, 나머지 5개는 웨이블릿 변환을 이용한방식이었다. 평가 결과 여전히 DCT에 기반을 둔 방식들이 다소 우세하였고 MPEG1 비디오와의 호환성도 고려하여, 결국 MPEG2 비디오의 표준화의 방향도D CT를 기반으로 하는 방식으로 결정되었다. MPEG2 비디오는 일종의 범용 압축 앨고리듬으로, MPEG1 비디오를 크게 확장발전시키면서 많은 도구들을 마련하여 응용분야에 따라 이들을 적절히 선택사용하도록 하고 있다. 압축효율의 향상을 위해 MPEG2 비디오는 MPEG1 비디오의 각 요소들을 재검토하여 조금씩 개선함으로써 전체적으로 상당한 향상을 가져오고 있다. 즉 필드단위의 처리, 움직임 추정.보상 방식, 양자화, DCT계수의 주사방식, 가변장 부호화 등 많은 부분들이 개선되었다. MPEG2 비디오의 범위는 매우 넓지만 응용분야마다 그중 특정 해상도에 특정기능까지만을 사용한다. 따라서 부호기와 복호기를 제작할 때의 편의를 위해MPEG2 비디오를 해상도와 기능에 따라 몇가지로 분류하고 있다. 우선 화면의 해상도는 4개의 레벨로 분류된다. MPEG1 비디오가 대상으로 하는 것과 같은 작은 화면인 심플(Simple), 현행 TV 화면크기인 메인(Main), 유럽 고선명TV HDTV 의 화면크기인 하이(High) 1440, 미국 고선명 TV를 위한 규격인 하이가 그것이다. 또 기능에 따라서는 5개의 프로필로 나누어진다. 양방향 예측을 이용하는B프레임을 제외하여 구현을 용이하게 한 심플, 많은 기능을 포함하여 대부분의 응용분야에 채택되고 있는 메인, 계층 구조를 가지면서 보다 기능이 확장 된 SNR Scalable, Spatial Scalable, High 등이 그것이다. 구체적 응용 예로서는, 내년부터 실시될 예정인 무궁화 위성을 이용한 디 지털 DBS 방송이나 최근 도시바와 소니 진영이 규격합의를 본 DVD 등은 메인 프로필.메인레벨, 미국과 우리나라의 HDTV는 메인프로필.하이레벨, 유럽의 HDTV는 Spatial Scalable Profile.하이-1440 레벨, 그리고 미국의 디지털 케이블 방송은 심플 프로필.메인레벨이다. MPEG2 비디오는 현행TV나 HDTV를 효율적으로 압축하는 것이 주목적이다. 현행TV의 화질은 3~9Mpbs에서, 그리고 HDTV 화질은 17~30Mpbs에서 각각 얻어진다. 비트율은 주어진 채널의 용량과 요구화질을 고려하여 선택된 다. 예를 들어 무궁화 위성 DBS에 있어서는 약 7Mbps, 미국의 그랜드 얼 라이언스 HDTV방식에서는 17Mbps, 전화선을 이용한(ADSL-3방식) 주문형비디오나 DVD에 있어서는 5~6Mbps를 비디오에 할당하고 있다. MPEG2 비디오에서는 컴퓨터에서 채택되고 있는 순차주사만을 대상으로 하는MPEG1 비디오와는 달리, TV에서 사용되고 있는 비월주사 방식의 동화에 대해서도 많은 고려를 하고 있다. 즉 한 화면(프레임)을 두 필드(짝수번째 주사선으로 이루어진 필드와 홀수번째 주사선으로 이루어진 필드)로 나누어 필드구조로 부호화할 수도 있고, 프레임 구조로 부호화할 수도 있다. 움직임이많은 장면은 한 프레임의 두 필드간에도 큰 차이가 나므로 필드구조로 부호 화하는 것이 효과적이고, 정지화에 가까울수록 두 필드간에 상관도가 높아프레임 구조로 부호화하는 것이 유리하다. 또 프레임 구조의 부호화에 있어서도 각 매크로블록(16×16화소 단위)별로 필드단위의 처리가 가능하도록 하여 화면내의 부분적 움직임을 용이하게 처리할 수 있게 하고 있다. 이와 함께 많은 요소가 결합되어 MPEG2 비디오는 MPEG1 비디오보다도 월등한 동화 압축 능력을 가지게 된다.MPEG2 video contains MPEG1 video, which maintains forward compatibility. While MPEG1 video stores video at low bit rates of 1.15 Mbps on digital storage media such as CDs, MPEG2 video is used to transmit and store high-quality video on higher bit-rate broadcast, communication, and storage media. As the range of applications varied, so too did the requirements to be met. The world's leading companies have proposed MPEG-2 video alloting to meet these requirements. Finally, in September 1991, 30 methods were evaluated by Japan's JVC Research Institute. 20 of them were DCT-based, like MPEG1, 5 were subband coding, and 5 were wavelet transform. As a result of the evaluation, DCT-based schemes still prevailed somewhat, and considering the compatibility with MPEG1 video, the standardization of MPEG2 video was decided as DCT-based scheme. MPEG2 video is a general-purpose compression algorithm that greatly expands and develops MPEG1 video, providing a number of tools to properly select and use them according to the application. In order to improve the compression efficiency, MPEG2 video has been considerably improved by reviewing each element of MPEG1 video and improving it little by little. In other words, many improvements have been made, including field-based processing, motion estimation and compensation, quantization, scanning of DCT coefficients, and variable length coding. The range of MPEG2 video is very wide, but each application uses only a specific function at a particular resolution. Therefore, MPEG2 video is classified into several types according to the resolution and function for the convenience of the encoder and decoder. First, the resolution of the screen is classified into four levels. Simple, the small screen like MPEG1 video targets, Main, the current TV screen size, High 1440, the screen size of HDTV HD, and High, the standard for US high-definition TV. will be. Depending on the function, it is divided into five profiles. SNR Scalable, Spatial Scalable, High, etc. are the main, hierarchical, and more advanced functions that are adopted for most applications, including simple and easy to implement, except B-frame using bi-prediction. It is. Specific examples of applications include digital DBS broadcasts using Mugunghwa satellite, which will be implemented from next year, and DVDs that Toshiba and Sony have recently agreed to. The main profile is the main profile. HDTV is Spatial Scalable Profile. High-1440 level, and American digital cable broadcasting is simple profile. Main level. The main purpose of MPEG2 video is to compress current TV or HDTV efficiently. The current TV picture quality is obtained from 3 to 9 Mpbs, and the HDTV picture quality is obtained from 17 to 30 Mpbs. The bit rate is selected in consideration of the capacity and required quality of a given channel. For example, about 7 Mbps for Mugunghwa satellite DBS, 17 Mbps for the US Grand Alliance HDTV system, and 5 to 6 Mbps for video or DVD on demand (ADSL-3 system) are allocated to video. In MPEG2 video, unlike MPEG1 video, which is intended only for progressive scan, which has been adopted by computers, much attention has been given to the interlaced video format used in TV. That is, one screen (frame) may be divided into two fields (a field consisting of even scan lines and an odd scan line) and encoded in a field structure, or encoded in a frame structure. Since a scene with a lot of motions has a big difference between two fields of one frame, it is effective to encode the field structure. The closer to the still image, the higher the correlation between two fields. In the frame structure encoding, each macroblock (16x16 pixel units) can be processed on a field-by-field basis so that partial motions on the screen can be easily processed. Many factors combine to make MPEG2 video more capable of compressing video than MPEG1 video.

MPEG2 비디오는 높은 압축률을 얻기 위해 MPEG1 비디오의 여러 요소들을 조금씩 개선하고 있다. 지난호에서 언급한 바와 같이 비월 주사인 TV영상신 호의 효율적 압축을 위하여 프레임구조와 필드구조를 모두 수용하였다. 프레임구조의 경우에도 움직임 보상 예측은 각 매크로블록(16×16)마다 프레임 단위 혹은 필드단위로 선택적으로 수행할 수 있다. MPEG2에 새로이 채택된 움직임 추정.보상방법으로서 듀얼프라임(Dual Prim e)이 있다. 이것은 필드단위의 움직임 보상을 하되 이에따라 많아지는 움직임 벡터의 양을 효과적으로 줄이는 방식이다. 이 방식은 B프레임을 생략하여 부호화에 따른 지연시간과 복잡도를 줄이고자 할 때의 보완수단으로서 여전히 좋은 화질을 유지하는데 기여한다. DCT에 있어서는 프레임구조에서도 매크로블록단위로 프레임 모드와 필드 모드중 데이터발생량이 적은 것을 선택할 수 있다. 따라서 움직임이 많은 경우와 적은 경우 모두 효과적으로 처리할 수 있다. 단 필드구조에서는 필드 DCT 하나만이 쓰인다. DCT계수의 양자화에 있어서 MPEG1에서는 계수의 크기에 관계없이 양자화 스텝이 일정한 선형 양자화가 사용되고 있다. 반면 MPEG2에서는 계수값이 적을수록 양자화 스텝이 작아 세밀하게 양자화하는 비선형 양자화가 함께 사용된다. 비선형방식은 복잡도는 증가하지만 평균 양자화잡음을 줄여 양자화기 의성능향상을 가져온다. 양자화된 DCT계수의 가변장 부호화를 위한 주사에 있어서 MPEG1에서는 지 그재그주사만을 사용하였다. 그러나 MPEG2에서는 대체주사(Alternate Scan) 방법이 추가로 사용되는데 이 두가지중 하나를 화면단위로 선택하여 사용한다. 대체주사는 DCT의 수직방향 고주파성분을 상대적으로 일찍 주사하여 특 히움직임이 큰 비월주사 화상에 뛰어난 효과를 보인다. 주사된 DCT계수의 가변장 부호화(VLC:Variable Length Coding)를 위해 MPE G1에서는 하나의 2차원 VLC테이블만이 사용되었다. 이에 비해 MPEG2에서는 화면내 부호화를 위한 테이블을추가로 사용할 수 있도록 하였다. 화면내 부 호화는 화면간 부호화에 비해 DCT계수값들이 훨씬 커 통계적 특성이 다르므 로이를 반영한 별도의 테이블이 필요했던 것이다. 이 VLC를 사용함으로써 화면내 부호화시의 데이터 발생량을 크게 줄일 수 있다. 스케일러빌리티기능은 MPEG2에 새로이 도입된 개념으로서 공간 스케일 러빌리티( Spatial Scalability), 시간 스케일러빌리티(Temporal Scalability), SNR 스케일러빌리티(SNR Scalability)등이 있다. 공간 스케일러빌리티는 우선 화면을 공간해상도가 낮은 기본계층(예:현행 TV)과 높은 고위계층(예:고선명TV)로 나눈다. 기본계층을 먼저 부호화하고이어서 기본계층의 보간성분과 고위계층의 차이성분을 부호화하여 두 부호화비트열을 함께 보낸다. 이렇게 하면 현행 TV수신기로도 기본계층 비트열을 복호하여 고선명TV를 현행 TV화질로 볼 수 있고 고선명TV수신기는 두비트열을 모두 복호하여 고선명 화면을 재생한다. 즉 마치 흑백TV와 컬러 의 경우처럼 디지털TV수상기나 고선명 TV수상기가 디지털TV 방송과 고선명TV방송을 모두 수신할 수 있어 완전한 호환성이 유지된다. 유럽의 디지털TV와 고선명 TV는 이런 틀을 바탕으로 하고 있다. 한편 우리나라의 고선명TV는 동일프로그램을 디지털 TV방송으로 함께 내보내는 동시방송 (Simulcast) 형식을 취하고 있다. 시간스케일러빌리티와 SNR스케일러빌리티도 공간 스케일러빌리티와 마찬가지로 기본계층과 고위계층으로 나누어 기본 계층의 부호화 비트열과 기본계층의 확장성분과 고위계층간의 차이성분의 부호화 비트열을 보낸다. 다 만기본계층과 고위계층의 분류에 있어서 시간 스케일러빌리티는 시간축(화면 의진행방향)으로, SNR 스케일러빌리티는 화소마다의 비트 표현상의 해상 도에 따라 나누는 점이 다르다. 이처럼 MPEG2 비디오는 MPEG1 비디오보다도 다양한 기능과 월등한 동화압축능력을 가져 고선명TV를 포함한 방송미디어.주문형비디오등의 통신미디어 DVD로 대표되는 축적미디어등에 모두 사용되면서 멀티미디어시 대를 선도하는 핵심기술로 자리잡고 있다.MPEG2 video improves on several elements of MPEG1 video in order to achieve high compression rates. As mentioned in the last issue, both frame and field structures are accommodated for efficient compression of interlaced TV image signals. Even in the case of a frame structure, motion compensation prediction may be selectively performed in units of frames or fields for each macroblock 16 × 16. Newly adopted motion estimation and compensation method in MPEG2 is Dual Prime. This compensates for field-level motion, but effectively reduces the amount of motion vectors. This method still contributes to maintaining good image quality as a complementary measure to reduce the delay time and complexity of coding by omitting B frames. In the DCT, the frame structure can be selected that has a small amount of data generation in the frame mode and the field mode in units of macroblocks. Thus, both high and low motions can be effectively handled. However, only one field DCT is used in the field structure. In quantization of DCT coefficients, linear quantization in which MPEG1 has a constant quantization step is used regardless of coefficient size. On the other hand, in MPEG2, the smaller the coefficient value, the smaller the quantization step, so that non-linear quantization is used. The nonlinear method increases the complexity but reduces the average quantization noise, which leads to an improvement in the performance of the quantizer. In scanning for variable length coding of quantized DCT coefficients, only zigzag scanning is used in MPEG1. However, Alternate Scan method is additionally used in MPEG2, and one of the two is selected and used as the screen unit. Alternative scans have a relatively early scan of the high frequency components of the DCT, which is particularly effective for high interlaced scans. Only one two-dimensional VLC table was used in MPE G1 for variable length coding (VLC) of the scanned DCT coefficients. On the other hand, in MPEG2, an additional table for intra picture coding can be used. Intra picture encoding requires a separate table that reflects Roy because the DCT coefficients are much larger than the inter picture coding and have different statistical characteristics. By using this VLC, the amount of data generated during intra picture coding can be greatly reduced. The scalability function is a new concept introduced in MPEG2, and includes spatial scalability, temporal scalability, and SNR scalability. Spatial scalability first divides the picture into base layers with low spatial resolution (e.g. current TV) and high seniors (e.g. high definition TV). The base layer is encoded first, and then the interpolation component of the base layer and the difference component of the higher layer are encoded, and the two encoded bit strings are sent together. In this way, even the current TV receiver can decode the base layer bit string to view high definition TV with the current TV quality, and the high definition TV receiver decodes both bit strings to reproduce the high definition screen. In other words, as in the case of black and white TV and color, a digital TV receiver or a high definition TV receiver can receive both a digital TV broadcast and a high definition TV broadcast, thereby maintaining full compatibility. European digital TVs and high-definition TVs are based on this framework. Meanwhile, Korea's high-definition TV takes the form of simultaneous broadcasting (Simulcast), which exports the same program to digital TV broadcasting. Similar to spatial scalability, temporal scalability and SNR scalability are divided into a base layer and a high layer, and the encoded bit stream of the base layer and the encoded bit sequence of the difference component between the base layer extension and the high layer is sent. However, in the classification of the base layer and the high layer, the temporal scalability is divided by the time axis (the progress direction of the screen), and the SNR scalability is divided according to the resolution of the bit representation for each pixel. As such, MPEG2 video has various functions and superior animation compression capability than MPEG1 video, and it is used as a leading technology in multimedia era as it is used in broadcasting media including high-definition TV, communication media such as on-demand video, and DVD. Hold.

-MPEG2 오디오-MPEG2 audio

PEG2 비디오의 고품질화에 대응하여 MPEG2 오디오도 다채널 고품질화를목표로 표준화가 진행되었다. 그리하여 MPEG2 오디오는 1994년 11월에 IS(International Standard, 국제규격) 13818-3으로 승인되어 표준화가 완결되었다. MPEG2 오디오는 MPEG1 오디오를 바탕으로 하여 압축효율을 높이기 위한 몇가지 새로운 기법들이 도입되고 있다. MPEG1 오디오와 비교할 때 MPEG2 오디 오에는 특히 다음과 같은 특징들이 포함되어 있다. 우선 멀티 채널화되었다는 점이다. MPEG1 오디오의 스테레오 기능이 MPEG2 오디오에서는 6채널까지 확장되어 영화관에서의 입체음향을 그대로 만끽할 수 있게 되었다. 6채널의 성분을(즉 스피커의 위치를) 살펴보면 C(Center),L (Left), R(Right), LS(Left Surround), RS(Right Surround)의 5개 광대역신 호(20KHz)와 저주파 성분(1백20KHz)만을 별도로 제공하는 LFE(Low Frequency Enhancement) 신호로 이루어져 있다. 통상 이를 5.1 채널이라고 부르는데, L, R, LS, RS의 4채널로 이루어진 기존의 돌비 서라운드 입체음향에 비해 더욱 입체감이 향상되었다. 또 멀티 링궐(Multi lingual) 기능이 강화되었다. 아시아.유럽 등의 여러 지역의 위성방송에서도 볼 수 있듯이 멀티미디어 및 정보통신기술의 만개에 따라 지구촌이 점차 하나가 되어가고 있다. 이에 대응하여 MPEG2 오디오는 5. 1 채널 외에 별도로 7개 국어까지의 부가 음성을 보낼 수 있는 기능이 들어있다. MPEG2 오디오는 또한 MPEG1 오디오에서 사용된 표본화 주파수의 반인 16KH z, 22.05KHz, 24KHz의 표본화 주파수를 사용할 수 있도록 하고 있다. 이는한정된 비트율에서 멀티채널 및 멀티링궐의 많은 데이터를 효과적으로 압축 하기 위해서는 입력신호의 대역이 좁을 경우 표본화 주파수를 줄이는 것이유리하기 때문이다. 또 하나의 고려사항으로 MPEG1 오디오와의 역방향 호환성이 있다. 이는 MPEG2 오디오 비트열이 MPEG1 오디오 수신기에서 제한적이나마 재생될 수 있음을 의미한다. 구체적으로는 5.1 채널의 MPEG2 오디오가 MPEG1 오디오 수신기 에서는 스테레오로 재생된다. 예를 들어 무궁화호 위성을 이용한 디지털 방송의 경우 MPEG2를 표준으로 하고 있는데, MPEG1 오디오 복호기를 장착하는 보급형 모델도 MPEG2 비트열을 받아 스테레오 음향을 재생한다. 이는 컬러 TV방송이 흑백TV에서도 흑백으로나마 수신되고 FM스테레오 방송이 모노 수신 기에서 모노로나마 수신되는 것과 같은 개념이다. 오디오에 있어서 이 역방향 호환성은 비디오에 있어서의 MPEG2 비디오 수신기가 MPEG1 비디오 비트열을 수신할 수 있는 순방향 호환성과 대비된다. 보다엄밀하게는 오디오에 있어서는 MPEG2 오디오 복호기를 설계할 때 MPEG1 오디오 비트열을 복호할 수 있도록 설계하는 것이 일반적이므로 사실상 양방 향 호환성이 모두 유지된다. 이 호환성을 위해 MPEG2 오디오는 MPEG1 오디오의 비트열에서 오디오 데이터부분에 스테레오 성분을 넣고, 이어지는 부가 데이터 부분에 MPEG2 오디오 의추가 성분을 싣고 있다. 비트 여유가 없어 부족할 때는 비트열의 포맷(신 택스)을 확장하여 여기에 나머지 데이터를 싣는다. 이렇게 하다보니 MPEG2 오디오의 비트열 포맷이 매우 비효율적이 되어버렸고 이것이 MPEG2 오디오의 성능을 저하시키는 하나의요인으로 작용하고 있다. 이를 보완하기 위해서 MPEG에서는 MPEG1과의 역방향 호환성을 버리고 대신 성능이 향상된 NBC(Non Backward Compatible) 모드의 표준화가 진행되고 있다. NBC는 97년 국제표준을 완성할 예정인데, 미국 HDTV 및 영화업계에 표준으로 채택되고 있고 최근의 디지털 비디오 디스크에도 채택 가능성이 높은 돌비사의 AC3 등이 그 후보가 되고 있다.In response to the high quality of PEG2 video, MPEG2 audio has also been standardized for the purpose of multichannel high quality. Thus, MPEG2 audio was approved in November 1994 as International Standard 13818-3, and standardization was completed. MPEG2 audio is based on MPEG1 audio and several new techniques are introduced to improve compression efficiency. Compared to MPEG1 audio, MPEG2 audio includes the following features. First of all, it is multi-channelized. The stereo function of MPEG1 audio has been extended to 6 channels in MPEG2 audio, so that stereoscopic sound in a movie theater can be enjoyed as it is. Looking at the components of the six channels (i.e., the location of the speakers), five wideband signals (20KHz) and low frequencies of C (Center), L (Left), R (Right), LS (Left Surround), RS (Right Surround) It consists of a Low Frequency Enhancement (LFE) signal that provides only the component (120KHz) separately. This is commonly referred to as 5.1 channel, which is more three-dimensional than conventional Dolby Surround stereo sound consisting of four channels of L, R, LS, and RS. In addition, the Multi lingual function was enhanced. As can be seen from satellite broadcasting in various regions, such as Asia and Europe, the global village is becoming one with the full bloom of multimedia and information and communication technology. Correspondingly, MPEG2 audio has a function to send additional voices of up to seven languages in addition to 5.1 channels. MPEG2 audio also enables the use of sampling frequencies of 16KH z, 22.05KHz and 24KHz, which are half the sampling frequencies used in MPEG1 audio. This is because it is advantageous to reduce the sampling frequency when the bandwidth of the input signal is narrow in order to effectively compress a large amount of data of multichannel and multiring at a limited bit rate. Another consideration is backward compatibility with MPEG1 audio. This means that the MPEG2 audio bitstream can be played back in a limited way in the MPEG1 audio receiver. Specifically, 5.1 channel MPEG2 audio is reproduced in stereo in the MPEG1 audio receiver. For example, in the case of digital broadcasting using Mugunghwa-ho satellite, MPEG2 is a standard. A low-end model equipped with an MPEG1 audio decoder also receives MPEG2 bit strings and reproduces stereo sound. This is the same concept that color TV broadcasts are received in black and white, even in black and white TVs, and FM stereo broadcasts are received in mono. This backward compatibility with audio contrasts with the forward compatibility with which an MPEG2 video receiver in video can receive an MPEG1 video bitstream. More precisely, in audio, when designing an MPEG2 audio decoder, it is common to design an MPEG1 audio bit stream so that bidirectional compatibility is substantially maintained. For this compatibility, MPEG2 audio puts stereo components in the audio data portion of the bit stream of MPEG1 audio and additional components of MPEG2 audio in the subsequent additional data portion. When there is not enough bit space, the format (syntax) of the bit string is extended and the rest of the data is put there. As a result, the bit string format of MPEG2 audio has become very inefficient, which is one factor that degrades the performance of MPEG2 audio. To compensate for this, MPEG standardized the NBC (Non Backward Compatible) mode, which improved backward performance with MPEG1, instead of backward compatibility with MPEG1. NBC plans to complete international standards in 1997, including Dolby's AC3, which has been adopted as a standard in the US HDTV and film industry and is also likely to be adopted in recent digital video discs.

-MPEG2 시스템-MPEG2 system

MPEG비디오 비트열과 MPEG 오디오 비트열을 하나로 묶어 전송하거나 저장하기 위한 규격이 MPEG 시스템이다. 이렇게 하나의 비트열로 다중화할때 통신 채널이나 저장 미디어 등이 갖는 프로토콜이나 저장 포맷에 적합한 형식으로 할 필요가 있다. 이와 함께 비디오와 오디오의 동기(Iip sync)를 맞추는 수단을 제공하는 것도 MPEG 시스템의 중요한 역할이다. MPEG시스템에는 이미 다룬바 있는 MPEG 1 시스템과 MPEG 2 시스템이 있다. MPEG 1 시스템은 단일 프로그램을 오류가 없는 채널환경에서 다중화하므로, 비디오 CD 등 비교적 좁은 범위의 응용분야에 사용된다. 보다 정확히는 채널이 가지고 있는 오류 정정 능력에 의해 오류가 수정되므로 MPEG1 시스템에서는 오류를 고려할 필요가 없다. 이에 비해 MPEG 2 시스템은 방송, 통신, 저장 미디어 등 광범위한 응용분야에 대응하고 있어 그 포맷도 훨씬 복잡하다. MPEG 2 시스템에는 두 종류의 다중화 방식이 있다. 하나는 프로그램스트림(PS:Program Stream)이라 불리는 것으로 단일 프로그램을 오류가 없는 채널 환경에서 다중화하는데, MPEG-1 시스템을 약간 개선한 것이다. 또 하나는 트랜스포트 스트림(TS:Transport Stream)으로 오류가 있는 채널환경에서 복수의 프로그램을 다중화한다. 복수의 프로그램을 하나의 비트열로 다중화하므로 멀티미디어 시대의 디지털 TV방송 등에 적합하고 제한수신을 위한 스크램블 기능(비트열을 암호화하여 유료가입자 이외에는 시청할 수 없게 하는 것)을 부가할 수 있도록 되어 있다. 또한 랜덤 액세스가 용이하도록 디렉토리 정보나 개별 비트열에 관한 정보 등을 실을 수 있다. PS는 이미 다룬 MPEG 1 시스템과 유사하므로 여기서는 주로 TS에 대해 기술한다. MPEG 2 시스템은 시분할다중방식(TDM:Time Division Multiple.ing)에서 쓰이고 있는 패킷 다중화 방식을 채택하고 있다. 이때 비디오와 오디오 비트열 각각을 우선 패킷이라 불리는 적당한 길이의 비트열(PES:Packetized Elementary Stream)로 분할한다. PES패킷은 다양한 응용에 대응하도록 길이의 상한을 64KB까지로 하고 있고, 각 패킷마다 고정길이나 가변길이 어느 것이라도 취할 수 있도록 하고 있다. 또한 가변 전송속도도 허용되고 있고 불연속적인 전송도 가능하다. 이 각각의 PES를 하나의 비트열로다중화하여 PS나 TS를 만든다. 패킷 길이는 전송채널이나 매체에 크게 의존한다. 가령 광대역 종합정보통신망(BISDN)에 있어서의 프로토콜인 ATM(Asynchronous Transfer Mode : 비동기 전달모드)에서는 53 바이트의 패킷(셀)을 사용한다. 이중 패킷에 관한 기본 정보를 담는 헤더가 5바이트를 차지하므로 실제 사용자 정보(Payload)는 48바이트이다. 이와 같이 길이가 짧은 패킷은 헤더가 상대적으로 많은 비율을 점유하므로 사용자 정보의 전송효율이 떨어지지만 지연시간과 버퍼 메모리양이 적은 이점이 있다. TS패킷은 ATM과의 접속성을 고려하여 1백88바이트의 비교적 짧은 고정길이를 가지고 있다. ATM 셀의 사용자 정보 48바이트중 한 바이트를 AAL(ATM Adaptation Layer)용으로 사용하면 실제사용자 정보는 47바이트가 된다. 따라서 하나의 TS패킷은 4개의 ATM 셀에 실어서 전송할 수 있다. 각 TS 패킷의 첫 4바이트는 해더용이므로 나머지 1백84바이트가 실제 비디오나 오디오 등을 실어나르는 사용자정보 부분이다. 많은 응용분야에서 오류정정을 위한 부호를 부가하는데 TS 패킷의 길이는 이를 고려하여 결정되었다. 즉 블록 오류정정부호로서 가장 탁월한 성능을 갖는 리드솔로몬부호를 적용하려면 TS 패킷의 길이는 2백55보다 충분히 작은 것이 바람직하므로 결국 ATM과의 접속성을 함께 만족시키는 1백88로 결정된 것이다. 한 예로 무궁화위성을 이용한 디지털 방송에서는 각 TS 패킷에 16바이트의 오류정정부호를 부가한 RS(204, 188)를 사용하고 있어 수신측에서 2백4바이트중 8바이트까지의 오류를 정정할 수 있다. 많은 경우 군집오류에 강한 리드솔로몬 부호와 더불어 산발적 오류에 강한 길쌈 부호(Convolutional Code) 혹은 길쌈 부호를 변조부와 결합하여 성능을 개선하는 TCM(Trellis Coded Modulation)을 함께 사용하고 있다.The MPEG system is a standard for transmitting and storing MPEG video bit streams and MPEG audio bit streams. When multiplexing into one bit string like this, it is necessary to have a format suitable for a protocol or a storage format of a communication channel or a storage medium. In addition, providing a means of synchronizing video and audio (Iip sync) is also an important role of the MPEG system. The MPEG system includes the MPEG 1 system and the MPEG 2 system, which are already discussed. The MPEG 1 system multiplexes a single program in an error-free channel environment and is therefore used for a relatively narrow range of applications such as video CDs. More precisely, the error is corrected by the error correction capability of the channel, so the error is not considered in the MPEG1 system. On the other hand, MPEG 2 systems correspond to a wide range of applications such as broadcasting, communication, and storage media, and the format is much more complicated. There are two types of multiplexing in the MPEG 2 system. One is called Program Stream (PS), which multiplexes a single program in an error-free channel environment, a slight improvement over the MPEG-1 system. The other is a transport stream (TS), which multiplexes a plurality of programs in an error channel environment. Since multiple programs are multiplexed into a single bit stream, it is suitable for digital TV broadcasting in the multimedia era, and a scramble function for restrictive reception (encrypting the bit stream so that only paying subscribers can watch) can be added. In addition, directory information, individual bit string information, and the like can be loaded to facilitate random access. Since the PS is similar to the MPEG 1 system, we will mainly describe the TS. The MPEG 2 system adopts a packet multiplexing scheme used in time division multiplexing (TDM). At this time, each of the video and audio bit strings is first divided into a packet string of a proper length called a packet (PES). The PES packet has an upper limit of 64 KB in length to accommodate various applications, and can take either fixed length or variable length for each packet. Variable rates are also allowed and discontinuous transmission is possible. Each PES is multiplexed into one bit string to form a PS or a TS. Packet length is highly dependent on the transport channel or medium. For example, in the ATM (Asynchronous Transfer Mode), which is a protocol in a broadband integrated information network (BISDN), a 53-byte packet (cell) is used. Since the header containing the basic information about the double packet takes 5 bytes, the actual user information (Payload) is 48 bytes. As such, a short packet occupies a relatively large ratio of headers, which decreases transmission efficiency of user information, but has a low delay time and a small amount of buffer memory. The TS packet has a relatively short fixed length of 188 bytes in consideration of connectivity with ATM. If one byte of the 48-byte user information of the ATM cell is used for ATM Adaptation Layer (AAL), the actual user information is 47 bytes. Therefore, one TS packet can be carried in four ATM cells. Since the first 4 bytes of each TS packet are for headers, the remaining 184 bytes are part of user information carrying actual video or audio. In many applications, the code for error correction is added, and the length of the TS packet is determined in consideration of this. That is, to apply the Reed Solomon code having the best performance as the block error correcting code, the length of the TS packet is preferably smaller than 255, so it is determined to be 188 which satisfies the connectivity with ATM. For example, in digital broadcasting using Mugunghwa satellite, RS (204, 188) which adds an error correcting code of 16 bytes to each TS packet is used, and the receiving side can correct errors of up to 8 bytes out of 2,400 bytes. . In many cases, Reed Solomon codes that are resistant to clustering errors and TCM (Trellis Coded Modulation), which improves performance by combining convolutional codes or convolutional codes that are resistant to sporadic errors with modulators, are used together.

MPEG2 시스템에서는 두 종류의 다중화 비트열을 다룬다. 그중 프로그램 스트림 (Program Stream)은 하나의 방송 프로그램(비디오+오디오+자 막)을 오류가 없는 채널환경 혹은 CD 등에서 보는 바와 같이 매체 자체의 오류정정 기능을 그대로 활용하는 경우에 사용하는 다중화 방법이고, 트랜스포트 스트림(TS:Transport Stream)은 오류가 있는 채널 환경에서 여러개 의방송 프로그램을 동시에 보낼때 사용하는 다중화 방법이다. 예를 들면 비디오 CD처럼 하나의 프로그램을 저장할 때는 프로그램 스트림이 사용되고무궁화 위성을 이용한 복수 프로그램의 디지털 방송에는 트랜스포트 스트림이 사용된다. 트랜스포트 스트림의 기능에 관해 무궁화 위성방송의 예를 들어 보다 구체적으로 살펴보기로 하자. 무궁화 위성은 (비록 1호기는 발사 실패로 수명이 단축되어 앞으로 발사될 2호기가 그 역할을 대신하겠지만) 방송용 중계기 3개와 통신용 중계기 12개를 가지는 방송.통신 겸용 위성이다. 위성방송에 있어서 현재 일본의 위성방송이나 홍콩의 스타TV 등과 같은 아날로그 FM 방식을 사용하면 중계기당 한 방송밖에 수용할 수 없지만, MPEG2를 이용한 디지털 방식을 사용하면 중계기당 4~8 방송까지 수용할 수 있다. 우리나라의 경우 프로그램의 부족이나 화질등을 감안, 중계기당 4방송을 고려하고 있다. 이 위성방송에 있어서 다중화는 다음과 같은 단계로 이루어진다. 우선 각 방송국으로부터의 프로그램이 비디오는 MPEG2 비디오, 오디오는 MPE G2 오디오압축 알고리듬을 이용해 각각 30대1과 6대1 정도로 압축된다. 이 압축된 비트열은 패킷 형태로 묶여져 각각 비디오 패킷과 오디오 패킷으로 변형된다. 이어서 이들을 1백88바이트의 고정길이를 갖는 트랜스포트 스트림 패킷 여러개에 차곡차곡 싣는다. 하나의 트랜스포트 패킷은 4바이트의 헤더 를 제외하면 1백84바이트의 실제 짐을 실을 수 있다. 마치 택시의 정원이 5명이지만 운전기사를 빼면 실제 승객은 4명인 탈 수 있는 것과 같은 원리이다. 헤더에는 13비트의 프로그램 식별정보(PID:Program Identification) 가 포함되어, 이 패킷에 실린 짐이 어느 방송국의 무슨(즉 비디오인지 오디 오인지) 정보인지를 나타내는 데 쓰인다. 이렇게 각 방송국에서 1차적으로 다중화되어 나오는 트랜스포트 패킷은 2차적으로 여러 방송국의 트랜스포트 패킷들이 또 다중화되어 하나의 비트열 을구성해 하나의 중계기를 통해 송출될 수 있는 형태가 된다. 이런 최종 비트열이 중계기 수만큼 필요하다. 따라서 디지털 위성방송에 있어서의 다중화는 시분할 다중화(TDM:Time Division Multiple.ing)와 주파수분할다중화(FDM:Frequency Division Multiplex 가 결합되어 있다. 즉 중계기들은 각각 27MHz의 대역폭을 가지면서 FDM의 형태로 운용되지만 한 중계기를 4개 방송사가 TDM방식으로 공유하는 것이다. 각각의 중계기에 실릴 트랜스포트 스트림은 에러 정정을 위해 리드솔로몬 부호와 길쌈부호가 행해지고 QPSK 변조를 통해 지상과 위성간에 전송이 이루어진다. 수신기에서의 트랜스포트 스트림의 복호시에는 위의 역과정이 행해진다. 우선 수신하고자 하는 방송이 들어있는 중계기를 선택해 그 신호를 QPSK 복조하고 에러 정정을 행한다.이 출력은 여러 방송이 다중화된 비트열이므로 우선 수신하고자 하는 방송국의 트랜스포트 패킷만을 골라내고, 이중 비디오 패킷은 비디오 디코더에서, 오디오 패킷은 오디오 디코더에서 각각 복호함으로써 영상과 음향을 재생하게 된다. 이와 같은 다단계 동작을 위해 몇가지 프로그램 관련정보 테이블(PSI:Program Specific Information)이 필요하게 된다. PAT(Program Association Table)는 PID=0인 패킷으로, 각 프로그램 마다 트랜스포트 패킷을 할당해주는 역할을 한다. 이렇게 지정된 패킷에 가보면 거기에서는 그 프로그램을 구성하는 비디오와 오디오 비트열이 어떤 패킷에 실려오는지를 알려주는데 이를 PMT(Program Map Table)라 한다. 이렇게 PAT와 PMT로 나누어서 트리형태로 기술하는 이유는, 하나의 테이블로 모두를 기술하면 이 테이블이 너무 커져 테이블을 기억시킬 메모리가 커지게 되고, 또한 테이블의 후반부에 기술되는 프로그램의 정보를 액세스하는데 시간이 오래 걸리기 때문이다. 이밖에도 중계기와 프로그램간의 링크 정보를 담는 NIT(Network Inform ation Table)와 조건부 수신 정보를 담는 CAT(Conditional Access Table) 등이 시스템 운용을 위한 부가정보 테이블로 사용된다.The MPEG2 system handles two types of multiplexed bitstreams. Among them, the program stream is a multiplexing method used when one broadcasting program (video + audio + subtitle) is used in error-free channel environment or CD, as seen in an error-free channel environment or CD. Transport Stream (TS) is a multiplexing method used to send several broadcast programs simultaneously in an error channel environment. For example, a program stream is used to store a program, such as a video CD, and a transport stream is used for digital broadcasting of multiple programs using a green light satellite. Let's take a look at the function of the transport stream in more detail with an example of Mugunghwa satellite broadcasting. Mugunghwa satellite is a broadcasting and telecommunications satellite with three broadcasting repeaters and 12 communication repeaters (although Unit 1 will be shortened due to the failure to launch, so that Unit 2 will be replaced). In the case of satellite broadcasting, the analog FM system such as Japan's satellite broadcasting or Star TV in Hong Kong can accommodate only one broadcast per repeater, but the digital method using MPEG2 can accommodate up to 4 to 8 broadcasts per repeater. have. In Korea, considering the lack of programs and image quality, 4 broadcasts per repeater are being considered. In this satellite broadcasting, multiplexing consists of the following steps. First, the program from each broadcasting station is compressed to about 30 to 1 and 6 to 1 video using MPEG2 video and audio using MPE G2 audio compression algorithm. This compressed bit string is packed into packets and transformed into video packets and audio packets, respectively. They are then loaded one by one into several transport stream packets with a fixed length of 188 bytes. One transport packet can carry 184 bytes of actual load, except for the 4 bytes of the header. It's like a taxi seating up to five people, but without the driver, you can ride four passengers. The header contains 13 bits of Program Identification (PID), which is used to indicate what station (ie video or audio) information is contained in the packet in this packet. In this way, a transport packet primarily multiplexed by each broadcasting station is secondly transported by multiple broadcasting stations, forming a bit string, which can be transmitted through one repeater. This final bit string is needed by the number of repeaters. Therefore, in digital satellite broadcasting, multiplexing is combined with time division multiplexing (TDM) and frequency division multiplexing (FDM), ie, repeaters operate in the form of FDM, each having a bandwidth of 27 MHz. However, four broadcasters share one repeater in TDM mode, with transport streams on each repeater receiving Reedsolomon codes and convolutional codes for error correction and transmission between terrestrial and satellite via QPSK modulation. When the transport stream is decoded, the above reverse process is performed, first select the repeater containing the broadcast to be received, QPSK demodulate the signal, and perform error correction. Select only the transport packet of the station you want to receive, and the double video packet In the coder, audio packets are decoded in the audio decoder to reproduce video and sound, respectively, and several program specific information (PSI) programs are required for this multi-step operation. It is a packet with PID = 0, which allocates a transport packet for each program, and when you go to the designated packet, it tells which packet contains the video and audio bit streams that make up the program. Map Table) is divided into PAT and PMT and described in tree form. If you describe all as one table, this table becomes too large and the memory to store the table becomes large. Because it takes a long time to access the program's information. The like NIT (Network Inform ation Table) that holds the link information between programs and (Conditional Access Table) that holds the CAT condition information received is used in the additional information table for the system operation.

- MPEG-IV : MPEG-Ⅱ를 개선시킨 기술로 전화선을 이용한 화상회의 시스템과 동영상의 데이터 전송을 목적으로 개발중이다. 대다수의 스트리밍 업체들의 나아갈 방향으로 이해할 수 있으며, 영상압축시 개체별 압축 가능하다MPEG-IV: A technology that improves MPEG-II and is developing for video conferencing system and video data transmission using telephone line. It can be understood as the direction of most streaming companies, and it can be compressed by individual object when compressing video.

미국 댈러스에서는 최근 주목받고 있는 차세대 멀티미디어 국제표준인 MPEG4의 주관적 화질 평가를 위한 국제회의가 열렸다. 전 세계의 관심속에 성황리에 치러진 이번 회의는 앞으로의 MPEG4의 방향이 결정되는 중요한 의미를 지니고 있었으며 국내에서도 삼성.현대.대우 등 전자업체들이 MPEG2까지와는 달리 제안서를 내고 적극 참여했다. 현재까지의 표준인 JPEG, H.261, MPEG1, MPEG2가 모두 DCT, 양자화, 움직임보상 DPCM, 허프먼 부호화 등에 기반을 둔 표준이라면 MPEG4는 주로 멀티미디어 통신용으로 98년 완성예정인 차세대 압축 표준이다. (본래 HDTV용 표준으로 계획되었던 MPEG3는 MPEG2에 흡수통합되었다.) 이미 현재의 표준이 많은 응용분야를 수용하고 있음을 고려해 초기의 MPEG4는 단순히 공중전화망을 이용한 영상전화 정도를 목표로 하는 저 전송률부호화에 초점을 맞추었다. 이후 MPEG4는 점차 그 범위가 확대되고 기능도 늘어났는데 주요 응용분야는 TV나 영화 등의 AV데이터를 컴퓨터 환경처럼 대화형으로 액세스하거나 무선으로 통신하는 것 등이다. MPEG4의 기능을 크게 셋으로 나누면 객체지향 대화형, 고능률 압축,범용 액세스 등이다. 객체 지향 대화형 기능은 멀티미디어(주로 AV) 데이터액세스에 있어서 화면이나 음향의 객체 요소들을 독립적으로 취급하면서 이들을 서로 링크에 의해 결합해 사용자가 화면이나 음향을 자유로이 구성할수 있는 기능을 말한다. 예를 들어 화면에서 배경을 그대로 둔 채 주인공만을 교체하는 등의 처리가 이전까지는 프로덕션 단계에서만 가능했으나 MPEG4에서는 사용자 단계에서 가능해진다. 고능률 압축에 있어서는 차세대 표준인 만큼 기존의 방식들보다는 개선된 압축률을 제공해야한다. 또 범용 액세스에 있어서는 무선통신 환경 등을 고려해 채널에러가 많은 환경에서도 내성이 강하도록 해야 한다. 이러한 기능들을 모두 만족시키는 단일 알고리듬은 사실상 불가능하므로MPEG4에서는 많은 압축요소들을 표준에 메뉴형식으로 수용해 응용에 따라선택해 사용하도록 하고 있다. 즉 압축에 필요한 도구들을 정하고 이 도구들을 결합해 여러 압축 알고리듬을 만들며 알고리듬 하나 이상을 서로 묶어 응용에 따라 선택하는 프로파일을 만든다. 이 도구와 알고리듬과 프로파일의 계층적 구조는 MSDL(MPEG4 Syntactic Description Language)이라는 언어를 새로 만들어 정의한다. 따라서 MPEG4 단말기간의 데이터 송수신은 우선 상대가 어떤 프로파일.알고리듬.도구의 복호기를 가지고 있는지 확인해 복호가능한 모드로 교신하고 필요한 경우 복호에 필요한 프로그램을 먼저 다운로드한 후 내용물을전송한다. 지난 MPEG4 주관적 화질 평가에서는 10Kbps~1Mbps 범위의 선택된 몇가지 비트율에서 단순화면으로부터 복잡한 화면에 이르기까지의 지정된비디오 화면에 대해 기능별로 나누어 평가했다. 기술혁신의 정도를 비교하기위한 기준으로 현재까지의 표준중 비교적 최적화가 잘 이루어진 표준인H.263"전화선을 이용한 영상회의용 국제표준으로, H.261을 개선한 것"을선정했다. 많은 제안서들이 H.263의 변형 형태이고 일부는 프랙탈이나 웨이블릿 변환등 새로운 기법을 적용했는데 평가결과 H.263보다 뛰어난 기법이 별로 없어당초 기대했던 것 만큼의 기술의 진보가 아직 이루어지지 않았음을 확인했다. 앞으로 기술혁신의 여지를 보다 구체적으로 살펴보면 비디오는 27개 기관으로부터 92개의 제안서가 접수되었고 오디오는 19개의 제안서가 접수되었다. 국내에서 제안한 비디오 압축 알고리듬들은 대체로 중위권의 화질을 보여주었는데 MPEG2에 이르기까지 이렇다할 제안서를 내지 못하고 자료입수 및 동향파악에 그쳤던 것에 비하면 이번 MPEG4에서의 적극적 활동은 괄목할만한 발전이라고 할수 있다. MPEG4에서는 앞으로도 새로운 도구와 알고리듬을 제안받아 평가를 할예정인데 MPEG1과 MPEG2에 포함되어 있는 특허들이 최근 특허 보유사들에 커다란 보상이 되어 돌아오고 있다 (MPEG 수신기당 3달러의 특허료)는 것을 감안하면 앞으로 MPEG4에 대한 관심을 더욱 기울일 필요가 있다.In Dallas, USA, an international conference was held to evaluate the subjective picture quality of MPEG4, the next-generation multimedia international standard. This meeting, which was held in great interest with the attention of the world, had a significant meaning in which the future direction of MPEG4 was determined. In Korea, electronic companies such as Samsung, Hyundai, Daewoo, etc. actively participated in the proposal. While JPEG, H.261, MPEG1, and MPEG2 are all standards based on DCT, quantization, motion compensation DPCM, and Huffman coding, MPEG4 is a next-generation compression standard that is expected to be completed in 1998 for multimedia communication. (MPEG3, which was originally planned as a standard for HDTV, was absorbed and integrated into MPEG2.) Considering that the current standard accommodates many applications, early MPEG4 was designed for low bit rate encoding aiming at video telephony using public telephone networks. Focused on. Since then, MPEG4 has expanded its scope and its functions, and its main application areas are interactive access or wireless communication of AV data such as TVs and movies. The features of MPEG4 are largely divided into object-oriented interactive, high efficiency compression, and universal access. The object-oriented interactive function refers to a function in which multimedia elements (mainly AV) data accesses independently handle object elements of a screen or sound while combining them by linking with each other to freely configure the screen or sound. For example, the process of replacing only the main character with the background on the screen was previously possible only in the production stage, but in the MPEG4 user stage. High efficiency compression is a next-generation standard and must provide improved compression rates over existing methods. In addition, in general-purpose access, it is necessary to make the immunity strong even in the environment with many channel errors in consideration of the wireless communication environment. A single algorithm that satisfies all of these features is virtually impossible, so MPEG4 accepts many compression elements as a menu in the standard and selects them according to the application. In other words, you decide which tools you need for compression, combine these tools to create multiple compression algorithms, and then tie together one or more algorithms to create a profile that you choose according to your application. The hierarchical structure of these tools, algorithms, and profiles define and define a new language called MPEG4 Syntactic Description Language (MSDL). Therefore, data transmission and reception between MPEG-4 terminals first confirms which profile, algorithm, and tool decoders are available, communicates in a decodable mode, and, if necessary, first downloads a program necessary for decoding, and then transmits the contents. In the last MPEG4 subjective picture quality assessment, several selected bit rates in the range of 10Kbps to 1Mbps were evaluated by function for the specified video picture, from the simple to the complex. As a standard for comparing the degree of technological innovation, H.263 is an international standard for video conferencing using telephone lines, which is a relatively well-optimized standard. Many proposals are variants of H.263, and some have applied new techniques, such as fractal or wavelet transform, and the evaluation shows that there are few techniques better than H.263. did. Looking further into the room for technological innovation, 92 proposals were received from 27 institutions and 19 proposals were received from audio. The video compression algorithms proposed in Korea showed the mid-range quality of video, but the active activities in MPEG4 are a remarkable development, compared to the acquisition of data and trends without the proposal up to MPEG2. In MPEG4, new tools and algorithms will be proposed and evaluated, and the patents included in MPEG1 and MPEG2 have recently been returned to the patent holders with great compensation ($ 3 patent fee per MPEG receiver). There is a need to pay more attention to MPEG4.

다. M-JPEGAll. M-JPEG

1) M-JPEG(Joint Photographics Expert Group)이란?1) What is Joint Photographics Expert Group (M-JPEG)?

정지영상 전문가 그룹에서 만들어진 표준화 규격을 바탕으로 만들어진 스트리밍 압축기법으로 정지영상전문가 그룹이 해결한 고밀도의 압축율과 고밀도의 연산능력으로 프레임의 수준을 증대해서 보다 빠른 스트리밍 서비스를 가능하게 만든 원천 압축 알고리즘, JPEG은 별도의 장비없이 CPU를 통해 데이터를 압축한다. M-JPEG은 이런 jpeg의 장점을 활용해서 버퍼링 과정을 제거할 수 있었다.Streaming compression method based on the standardization standard created by the Still Image Expert Group, the original compression algorithm that enables faster streaming service by increasing the frame level with the high density and high computational capacity solved by the Still Image Expert Group. JPEG compresses data through the CPU without any extra equipment. M-JPEG took advantage of these jpegs to eliminate the buffering process.

하지만 기존의 jpeg 방식에선 말 그대로 정지영상 전문이였기에 스트리밍 구현은 사실상 불가능했다. 말 그대로 정지 영상이란 것은 멈춰져 있는 화소의 처리 방식을 의미하는 것이다. 물론 3-4Frame 정도의 Mjpeg들은 몇 군데 업체에서 개발해 사용한 것 또한 사실이지만 초당 15프레임 이상의 고밀도 스트리밍 화질은 기실 불가능했다. 거기다 음성구현의 문제는 보다 큰 문제로 작용하게 되었던 것 또한 사실이다. 왜냐하면 고밀도의 압축율이 가능한 대신에 데이터의 중복값에 대한 연산이나 등등의 스트리밍적인 요소를 넣을 수 있는 부분이 거의 막혀 있다는 문제점이 있다.However, in the existing jpeg method, since it was literally a still image professional, streaming implementation was virtually impossible. Literally, a still image means a processing method of a still pixel. Of course, 3-4 frame Mjpegs were also developed and used by several companies, but high-density streaming quality of more than 15 frames per second was impossible. It is also true that the problem of voice implementation has become a bigger problem. Because high density compression is possible, there is a problem in that a part that can insert streaming elements such as operations on duplicate values of data or the like is almost blocked.

2) 압축률과 영상 품질2) Compression rate and video quality

어떤 압축 기법을 이용하든 영상품질과 압축률은 서로 반비례한다. 압축 효율이 높으면 그만큼 영상의 질은 떨어지고, 반대로 효율이 낮으면 영상의 품질은 좋아지게 된다.Whatever compression technique is used, image quality and compression rate are inversely proportional to each other. The higher the compression efficiency, the lower the image quality. Conversely, the lower the efficiency, the better the image quality.

최신의 영상 압축기법들은 해당 영상의 시각적인 손실 없이 10~50배까지 압축이 가능하다.Modern image compressors can compress up to 10 to 50 times without visual loss of the image.

현재 압축기술은 눈에 띄게 발전하고 있으며, 압축효율은 높이면서도 본래 영상의 질을 떨어뜨리지 않도록 하는 기법도 속속 개발되고 있다.Currently, compression technology is remarkably developed, and techniques are being developed one after another to increase the compression efficiency but not to degrade the quality of the original image.

압축율과 영상 품질을 비교하면 다음과 같다.Compression ratio and image quality are as follows.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은 기존의 MPEG 방식의 압축방법을 사용하지 않고, 별도의 동영상 재생플레이어를 사용하지 않아 버퍼링 과정을 생략하여 인터넷 상에서 실시간으로 동영상을 재생하도록 하도록 버퍼링 과정을 소멸시키고, 스트리밍서버의 과부하를 해소하며, 영상신호와 음성신호의 대역폭을 최소화하여 회선의 부담감을 최소화하고, 불필요한 다운로드 과정을 해소하며, 별도의 플레이어 사용을 제거하고, 서버사양의 부담을 최소화하여 가격을 절감하며, 자바 클래스나 에플릿 등을 이용하여 범용 브라우져에 이식하여 작동토록 하는 방법을 제공하는데 있다.An object of the present invention for solving the above problems is to use a conventional MPEG-based compression method, the buffering process so that the video playback in real time on the Internet by skipping the buffering process without using a separate video player , Eliminate the overload of streaming servers, minimize the bandwidth of video and audio signals, minimize the burden on the line, eliminate unnecessary download processes, eliminate the use of separate players, and minimize the burden on server specifications It is possible to reduce the cost and to provide a way to work by porting it to a general-purpose browser using Java classes or applets.

신개념의 압축기술 및 스트리밍 기술을 개발하는 목적은 다음과 같다.The purpose of developing a new concept of compression technology and streaming technology is as follows.

1)유선 인터넷의 MPEG 기반 스트리밍 솔루션을 본 기술로 대체할 수 있다.1) The wired Internet MPEG-based streaming solution can be replaced with this technology.

2)인터넷 방송국 사이트의 스트리밍 서버를 대신하므로 서버의 비용절감 및 회선 비용 감소 원활한 영상화면 제공으로 속도가 느린 클라이언트 환경에서도 부담없이 영상을 시청할 수 있게된다.2) By substituting streaming server of internet broadcasting station site, server cost is reduced and line cost is smoothly provided, and it is possible to watch video without slow client environment without burden.

3)멀티미디어 영상으로 인한 현 인터넷 트래픽 문제를 획기적으로 해결할 수 있는 방안이 되며 국가적으로도 회선 비용과 인프라 구축에 수조원에 달하는 경비를 절감할 수 있는 획기적인 기술이다.3) It is a solution to solve the current Internet traffic problem caused by multimedia video, and it is a breakthrough technology that can reduce trillions of wons in line cost and infrastructure construction nationally.

4)무선 인터넷 단말기에 필요한 스트리밍 솔루션으로 가장 적합하다. 적은 용량과 고화질 영상 제공 기술이야 말로 차세대 무선 이동통신에 가장 중요하고 핵심이되는 기술이 될 것이다. CDMA2000, IMT2000 및 PDA, HPC 등에 가장 적합한 기술이다.4) It is the most suitable streaming solution for wireless internet terminal. Low capacity and high quality image providing technology will be the most important and core technology for next generation wireless mobile communication. It is the most suitable technology for CDMA2000, IMT2000 and PDA, HPC.

5)유무선 인터넷 단말기의 모든 영상 솔루션에 적용될 수 있는 스트리밍 솔루션 개발한다.5) Develop streaming solution that can be applied to all video solutions of wired and wireless internet terminals.

6)본 압축 알고리즘에 해당하는 인코딩, 디코딩 칩을 개발하여 무선 인터넷 영상 전송장비를 개발한다.6) Developing wireless internet video transmission equipment by developing encoding and decoding chips corresponding to this compression algorithm.

7)본 기술 개발의 다음 단계는 새로운 압축 및 스트리밍 기술을 세계표준화하고 무선 인터넷 시장의 핵심 기술을 계속해서 개발하는 것에 있다.7) The next step in the development of this technology is to globalize new compression and streaming technologies and continue to develop key technologies in the wireless Internet market.

도1은 기본 DCT 변환 부호화 한 도면.1 is a diagram of basic DCT transform coding.

도2는 대역 분할 부호화의 구성도를 표시한 도면.2 is a diagram showing a configuration diagram of band division coding.

도3은 SPEG 프레임 내 대역 분할/DCT 연계 방법을 표시한 도면.3 is a diagram showing a method of band division / DCT linkage in an SPEG frame;

도4는 움직임 보상형 프레임 간 부호화기의 구성도를 표시한 도면.4 is a diagram showing the configuration of a motion compensated interframe encoder;

도5는 움직임 변위 측정을 표시한 도면.5 shows a movement displacement measurement.

도6은 DCT를 이용한 움직임 보상형 예측을 부호화를 표시한 도면.6 is a diagram showing coding of motion compensated prediction using DCT.

도7은 표준화를 위한 전 처리 및 부호화를 표시한 도면.7 shows preprocessing and encoding for standardization.

도8은 표준화된 계층적 해상도를 통한 부호화를 표시한 도면.8 is a diagram showing encoding through standardized hierarchical resolution.

도9는 회선 교환망과 SPEG 패킷 교환망에서의 영상 부호화를 표시한 도면.9 is a diagram showing video encoding in a circuit switched network and a SPEG packet switched network.

도10은 DPLL을 이용한 가변 비트율에서의 클록 동기 재생을 표시한 도면.Fig. 10 is a diagram showing clock synchronous reproduction at a variable bit rate using a DPLL.

도11은 y[i,j]의 정의를 표시한 도면.Fig. 11 shows the definition of y [i, j].

도12는 한 블록의 압축을 표시한 도면.Fig. 12 shows the compression of one block.

도13은 조합 배열의 초기화를 표시한 도면.Fig. 13 shows the initialization of a combination arrangement.

도14는 Convolve 함수의 구현을 표시한 도면.14 shows an implementation of the Convolve function.

도15는 화상 조작을 위한 전략을 표시한 도면.Fig. 15 shows a strategy for image manipulation.

도16은 Dissolve 연산의 C구현을 표시한 도면.Fig. 16 shows the C implementation of the Dissolve operation.

도17은 C코드 구현을 표시한 도면.Figure 17 shows a C code implementation.

동영상 압축 및 스트리밍 기술의 근간이되고 있는 MPEG 방식과 전혀 다른 새로운 개념의 압축 알고리즘과 스트리밍 기법을 개발하여 보다 효율적인 동영상 스트리밍이 가능하도록 한다.We have developed a new concept of compression algorithms and streaming techniques that are completely different from the MPEG method, which is the basis of video compression and streaming technology, to enable more efficient video streaming.

본 발명의 특징은Features of the present invention

1)기존의 인터넷 동영상 시청시 항상 문제되었던 버퍼링 과정이 없다.1) There is no buffering process that has always been a problem when watching existing Internet videos.

2)동영상 제공서버(스트리밍 서버)의 과부하 문제를 해결했다.2) Overloading the video providing server (streaming server) has been solved.

3)새로운 압축 방식으로 데이터의 용량을 획기적으로 낮추었으며 회선의 부하를 최소화했다.3) The new compression method has dramatically lowered the data capacity and minimized the line load.

4)별도의 플레이어나 프로그램을 다운로드 받을 필요가 없다.4) There is no need to download a separate player or program.

5)자바 클래스 또는 에플릿을 사용할 수 있어 브라우저의 이식성을 높였다.5) You can use Java classes or applets to increase the portability of the browser.

이와 같은 장점을 가지는 새로운 압축 방식 및 스트리밍 기술을 개발하여 현재 MPEG 기반의 스트리밍서버를 대치할 수 있다.It is possible to replace the current MPEG-based streaming server by developing a new compression method and streaming technology with this advantage.

상기한 바와 같은 목적을 달성하고 종래의 결점을 제거하기 위한 과제를 수행하는 본 발명의 구성은 인터넷상에서 SPEG을 이용한 손실압축방법을 사용하여 동영상을 압축하고, 스트리밍하는 방법을 구성함에 있어서,In the configuration of the present invention to achieve the object as described above and to solve the conventional drawbacks in configuring a method for compressing and streaming a video using a lossy compression method using SPEG on the Internet,

압축할 프레임의 한계 값을 정한 다음 각각의 개체별로 압축하는 1단계와,Determine the limit value of the frame to be compressed and then compress it for each object,

각각의 영상별로 각각 색인 값을 준 다음 차례로 정렬하는 2단계와,Step 2 to give the index values for each image and then sort them in order,

개체의 영상이 압축된 후 중복되는 부분은 하나의 색인값 만을 남기며 제거하는 3단계와,After the image of the object is compressed, the overlapping part is removed in three steps, leaving only one index value.

상기 단계를 거쳐 클라이언트가 접속시 스트리밍 서버의 내부연산자들이 클라이언트의 유저 그룹에 상주토록 하여 별도 플레이어 없이 상호 동기화를 이루도록 하는 4단계와,The fourth step of allowing the internal operators of the streaming server to reside in the user group of the client when the client connects to achieve mutual synchronization without a separate player;

상기 내부 연산자들을 자바를 이용해서 웹상의 범용 브라우져 상에서 에플릿이나 클래스로 바로 띄워 볼 수 있도록 한 5단계로 이루어진 것을 특징으로 한다.The internal operators can be displayed in five steps so that they can be displayed as applets or classes in a general-purpose browser on the web using Java.

상기 압축할 프레임의 한계 값을 정한 다음 각각의 개체별로 압축하는 1단계는;The first step of determining the limit value of the frame to be compressed and then compressing each individual object;

프레임의 수를 정하는 것이 아니라 한계 값을 정한 다음 내부에서 연산작업을 수행함과 동시에 서로 다른 연산자들이 구역을 정한 다음 여러장의 프레임 신호를 압축하는 단계이다.Rather than determining the number of frames, it sets the limit value, performs the calculation operation internally, and simultaneously decompresses several frame signals after delimiting different operators.

상기 각각의 영상별로 각각 색인 값을 준 다음 차례로 정렬하는 2단계는;The second step of giving the index value for each of the respective images and then sorting in sequence;

중복되는 연산 값들을 제거해서 한계 연산 값 밖으로 밀어내는 단계와,Removing duplicate operation values and pushing them out of the limit operation value;

상기 단계에서 프레임 수를 결정하여, 주로 프레임 연산은 내부 연산자가 하고, 압축은 CUP에서 하도록 하여 버퍼링을 제거하는 단계로 이루어진다.In this step, the number of frames is determined, and the frame operation is mainly performed by an internal operator, and the compression is performed by the CUP to remove buffering.

상기 개체의 영상이 압축된 후 중복되는 부분은 하나의 색인값 만을 남기며 제거하는 3단계는;After the image of the object is compressed, the overlapping part may be removed while leaving only one index value;

상기 압축단계 후 각각의 색인값은 압축 포맷의 구간 구간마다 다른 내부 연산자가 하나씩, 하나씩 다시 연산하여 영상 신호와 음성 신호의 크기를 통일화 시켜 상호 싱커 부분이 여기서 이루어지도록 하여 상호 동기화를 시킴으로 해서 대역폭의 한계를 조밀하게 하는 단계로 이루어진다.After the compression step, each index value is recalculated one by one internal operator for each interval section of the compression format to unify the size of the video signal and the audio signal so that the mutual sinker part is made here to synchronize the bandwidth. It consists of densely limiting the limits.

상기 영상 신호와 음성 신호의 크기를 통일화시키는 방법은 gsm의 모노 압축방식에 내부 연산자를 영상신호와 묶어 내부 필터 자체를 제어하여 스트리밍 서버의 과부하를 막도록 이루어진다.In the method of unifying the magnitude of the video signal and the audio signal, the internal operator is combined with the video signal in the mono compression method of gsm to control the internal filter itself so as to prevent the overload of the streaming server.

상기 제어방법은 연산자들의 역할을 분담하여 제어하는 그 단계는 다음과 같다.In the control method, the steps of sharing and controlling the roles of operators are as follows.

즉, 신호만을 받아 들이는 신호 인식 연산자 처리를 전담하는 영상신호처리 연산자와 음성신호처리 연산자가 기본연산을 하는 단계와,That is, the basic operation of the video signal processing operator and the audio signal processing operator dedicated to the signal recognition operator processing that accepts only the signal;

상기 연산이 끝난 두 신호를 필터링하는 내부 필터링 전문 연산자를 메모리의 주기억 번지에 상주시키는 단계와,Residing at an internal memory address of a memory filtering operator for filtering the two signals after the operation;

상기 전문연산자를 cpu에서 순간 순간 필요할 때마다 가져와서 연산을 처리하는 단계로 이루어진다.The specialized operator takes the cpu from the cpu to the moment when necessary and processes the operation.

상기 gsm이란 독일의 전문 음성 인식 유저그룹에서 완성한 음성 압축 방식으로Gsm is a voice compression method completed by a German professional speech recognition user group.

전화기에서 가장 일반적으로 사용하는 음성 압축 기술입니다.This is the most common voice compression technology used on phones.

그러나 모노라는 음질의 한계가 있어서 막강한 압축율에도 불구하고 스트리밍 전문가들이 사용을 자제해 온 방식입니다.Mono, however, has sound quality limitations that streaming professionals have refrained from using despite their strong compression ratio.

이하 본 발명의 실시예인 구성과 그 작용을 상세히 설명하면 다음과 같다.Hereinafter, the configuration and the operation of the embodiment of the present invention will be described in detail.

본 발명은 기존 미디어플레이어와 리얼 플레이어의 압축방식으로는 근간에서 이루어지고 있는 버퍼링의 한계를 초월할 수 없다는 점에 착안해서 전혀 다른 방식의 압축 방식을 구상하게 되었다.The present invention was conceived that the compression method of the existing media player and the real player can not exceed the limitation of the buffering that is being made in the base, and came up with a completely different compression method.

그래서 본 발명은 기존의 방법을 사용시 발생하는 고질적인 버퍼링을 제거해서 보다 사용자의 입장에서 편하게 그리고 보다 더 빠르게 또한 안정적으로 스트리밍 데이터를 보게 할 수 없을까 하는 부분에서 연구를 하게 되었다.Therefore, the present invention has been studied in the point that it is not possible to view the streaming data comfortably and more quickly and stably from the user's point of view by eliminating the troublesome buffering that occurs when using the existing method.

그래서 기존의 mpeg방식을 과감히 포기하고 압축 원천기술부터 새로 접근하게 되었다.Therefore, the existing mpeg method was boldly abandoned and a new approach was made from the original compression technology.

그래서 M-Jpeg이란 방식을 차용하게 되나, 이 방식 또한 많은 문제가 있어서 몇 가지 연구를 거친 결과 보다 안정적이며 효율적인 압축율을 가지면서 버퍼링이 거의 느껴지지 않는 스트리밍방법을 제공하게 되었다.Therefore, M-Jpeg method is adopted, but this method also has many problems. As a result of several studies, it provides a more stable and efficient compression method and provides a streaming method with little buffering feeling.

보다 자세히 설명하면 다음과 같다.More detailed description is as follows.

본 발명은 목적은 새로운 개념의 SPEG을 제공함으로써 달성되는데, 즉, 프레임의 속성별로 보다 완전한 연속성을 준다는 것이다. 프레임의 개념에 연속성을 외부 연산자를 통해 전담하므로 해서 보다 막강한 스트리밍 데이터를 창조시킬 수 있었으며 그것을 소용량의 서버를 통해서도 감당할 수 있게 했다.The present invention is achieved by providing a new concept of SPEG, ie, giving more complete continuity for each property of a frame. By dedicating continuity to the concept of frames through external operators, we were able to create more powerful streaming data that could be handled by small servers.

스트리밍의 버퍼링이나 싱커의 과정을 본 압축 기법에선 내부 연산자들이 전담하게 된다.내부 연산자란 말 그대로 복잡한 스트리밍 연산을 전담하는 연산자이다. 본 발명품에선 이 내부 연산자를 통해서 버퍼링의 제거를 이루어 내었다.In the compression scheme, which sees the buffering of the streaming and the sinker process, the internal operators are dedicated to the internal operators, which are literally dedicated to complex streaming operations. In the present invention, buffering is eliminated through this internal operator.

내부 연산자는 프로세서란 개념으로 이해하면 쉬울듯하다. 내부 프로세서는 cpu에 상주하면서 스트리밍 서버에서 감당하던 복잡한 부하 연산을 음성과 영상 신호가 올 때 마다 바로바로 cpu에서 처리를 해낸다.Internal operators are easy to understand in terms of processors. The internal processor resides in the cpu and handles the complex load computations that the streaming server handles whenever the voice and video signals come in.

물론 cpu의 부하 문제가 있으므로 메모리 주기억번지에 내부 연산자의 기본적인 프로세서들이 상주하면서 내부 연산자의 복잡한 연산과정을 돕게 된다.Of course, because of the cpu load problem, the basic processors of internal operators reside in the memory main address to help with the complicated operation of internal operators.

그러므로 해서 고가의 스트리밍 서버의 문제를 해소할 수 있었으며 이것들을 통해서 보다 빠른 연상 처리가 가능해 질 수 있었다.Therefore, the problem of expensive streaming server could be solved, and these could enable faster association process.

이 내부 연산자의 프로그래밍 과정은 가장 기계어에 근접한 어셈블러로 구현했으며 보다 가벼운 연산 처리를 위해서 메모리 주기억번지에 내부 연산자들의 구성요소들을 적재하는 방식을 따랐다.This internal operator programming process is implemented in the assembler closest to the machine language and follows the method of loading the components of internal operators into memory main memory for lighter operation.

기존의 브라우져와의 충돌을 피하기 위해서 최대한 넷스케이프와 익스플로러 그리고 모질라의 화면 전사방식들을 지원 시켰으며 보다 빠르고 강력한 연산 처리를 위해서 신호처리 연산자와 화소처리 연산자 음성처리 연산자를 따로 만들어 프로세서들을 제어했다.In order to avoid collision with the existing browser, Netscape, Explorer, and Mozilla screen transfer methods were supported as much as possible. For faster and more powerful processing, signal processor and pixel operator voice processing operators were separately controlled to control the processors.

그러므로 해서 기존의 jpeg을 활용한 Mjpeg의 훨씬 뛰어 넘는 새로운 개념의 압축기술인 progressive motion jpeg (SPEG)을 완성할 수 있었다.Therefore, it was possible to complete progressive motion jpeg (SPEG), a new concept of compression technology that goes far beyond Mjpeg's use of existing jpegs.

영상 압축기술Video compression technology

본 발명품의 영상 처리 방식의 설명에 앞서 기본적인 영상 정보는 일련의 영상 또는 "프레임"으로 시청자에게 제공되는데, 장면에서의 움직임은 연속적으로 현시되는 프레임 간의 조그만 변화로 나타난다.Prior to the description of the image processing method of the present invention, basic image information is provided to the viewer as a series of images or "frames", the movement of which is represented by a small change between successive frames.

본 압축 방식에서는 그 프레임을 이루는 각각의 프레임에 조밀한 영상 압축을 행한 것이다.In this compression method, compact image compression is performed on each frame constituting the frame.

영상은 약 30프레임/초의 속도로 매우 빠르게 제공되기 때문에 프레임 간의 연속된 변화는 사람의 눈에 자연스럽게 움직이는 장면으로 보인다. 영상화면은 공간 및 시간영역의 정보로 구성되어 있는데, 공간영역 정보는 각 프레임 내에서 제공되고, 시간영역 정보는 시간이 지남에 따라 현시된 영상, 즉 프레임 간에 존재하는 변화로 제공된다.Since images are delivered very fast at a rate of about 30 frames / second, successive changes between frames appear to be naturally moving scenes in the human eye. The video screen is composed of space and time domain information. The space domain information is provided in each frame, and the time domain information is provided as a change existing between images, that is, frames, over time.

즉 스트리밍에서 영상의 압축율은 여기서 좌우된다 프레임의 무게가 가벼우면 가벼울수록 스트리밍이 시스템에 미치는 부담감은 최소화 되어 가는 것이다.In other words, the compression rate of the image in streaming depends on it. The lighter the frame weight, the less burden there is on the system.

그 부담감을 죽이기 위해서 각각의 프레임을 화소 단위로 나누는 것이다.In order to kill the burden, each frame is divided into pixels.

디지탈 영상 시스템에서 영상의 각 프레임은 "화소"(pixel) 단위로 표본화 된다. 화소의 밝기를 나타내는 표본값은 흑백 영상의 경우 화소당 8비트로 양자화 된다. 컬러 영상에서 각 화소는 색채 정보, 예를들면 RGB(빨강, 초록, 파랑)를 나타내는 3개의 밝기 정보를 지니며 각각 8비트로 양자화된다. 이와같이 구성된 영상 정보는 그 정보량이 매우 방대하므로, 전송 또는 저장하기 위해서는 영상 압축(또는 부호화) 기술이 필요한데, 이는 주로 공간 및 시간 영역에서의 중복 정보를 제거하는 방법을 이용한다.In a digital imaging system, each frame of an image is sampled in "pixels". Sample values representing the brightness of pixels are quantized to 8 bits per pixel in black and white images. In a color image, each pixel has three brightness information representing color information, for example, RGB (red, green, blue) and is quantized to 8 bits. Since the image information thus constructed has a very large amount of information, an image compression (or encoding) technique is required for transmission or storage, which mainly uses a method of removing redundant information in the space and time domains.

일반적으로 공간 영역에서의 중복성은 한 프레임 내에서 인접한 화소 간에 그 변화 정도가 크지 않음에 기인하며, 시간영역에서의 중복성은 물체의 움직임이 프레임 간에 변화가 미세한 차이로 나타남에 기인한다. 공간영역의 중복성 제거 방법을 프레임 내 부호화라 하는데, 이는 크게 예측 부호화(DPCM: Differential Pulse Code Modulation), 변환 부호화(transform coding), 대역 분할 부호화(subband coding) 등으로 나눌 수 있다. 한편 시간 영역의 중복성은 프레임 간 부호화 방법을 사용하여 제거할 수 있는데, 이 방법에는 물체의 움직임을 추정하여 이를 보상하는 움직임 추정/보상 방법이 있다. 프레임 내 부호화 및 프레임 간 부호화에는 데이터의 통계적 성질을 이용하여 손실없이 더 압축하는 줄길이 부호화(run-length coding), 가변 길이 부호화(variable length coding) 등이 쓰이고 있다.In general, the redundancy in the spatial domain is due to the small degree of change between adjacent pixels in one frame, and the redundancy in the time domain is due to the slight difference in the movement of an object between frames. The method of removing redundancy in the spatial domain is called intraframe coding, which can be roughly divided into predictive coding (DPCM), transform coding, and subband coding. On the other hand, the redundancy of the time domain can be removed by using an inter-frame encoding method, which includes a motion estimation / compensation method that estimates and compensates for the motion of an object. In intra-frame coding and inter-frame coding, run-length coding, variable length coding, and the like, which are further compressed without loss by using statistical properties of data, are used.

프레임 내 부호화In-frame coding

예측 부호화Predictive coding

예측 부호화는 가장 오래된 영상 압축 기법으로, 예측 부호화의 개념은 이웃 화소들로부 터 현재의 화소를 예측할 경우, 그 예측 오차가 매우 적다는데 있다. 예측 부호화 기법은 현재의 화소치와 예측된 값들의 차이(예측 오차)를 양자화한 다음 부호화하는데, 인접한 많 은 화소를 이용하여 예측을 할 경우, 예측 오차를 줄여서 성능을 향상시킬 수는 있으나, 그 복잡도에 비하여 장점이 크지 않으므로 예측에 사용되는 이웃 화소들의 숫자는 보통 4개 이 하가 된다.Predictive coding is the oldest image compression technique. The concept of predictive coding is that the prediction error is very small when the current pixel is predicted from neighboring pixels. The predictive coding technique quantizes a difference (prediction error) between a current pixel value and a predicted value, and then encodes it. When prediction is performed using a large number of adjacent pixels, the prediction error may be reduced to improve performance. Since the advantages are not large compared to the complexity, the number of neighboring pixels used for prediction is usually 4 or less.

예측 부호화에서 화질 열화의 종류에는 알갱이 잡음, 경사 과부화, 모서리 떨림 등이 있 다. 알갱이 잡음은 양자화 계단의 크기가 너무 큼에 기인하고, 경사 과부하는 계단의 크기가 너무 작음에 기인하는 것으로서, 서로 상반된 관계를 가지고 있다. 알갱이 잡음과 경사 과부하는 영상 신호의 각 프레임 내에서 복원 영상에는 잡음과 물체의 경계 부분의 일그러 짐으로 나타난다. 영상 신호가 시간적으로 연속해서 현시될 때에는 모서리 떨림 현상이 나 타나는데, 이것은 물체 경계에 해당하는 화소가 이웃하는프레임에서 서로 다르게 양자화되 기 때문에 생긴다.Types of image quality degradation in predictive coding include grain noise, gradient overload, and edge blur. The grain noise is due to the size of the quantization step is too large, and the gradient overload is due to the size of the step is too small, and have a mutually opposite relationship. Grain noise and gradient overload appear in the reconstructed image as a distortion of the boundary between the noise and the object within each frame of the image signal. When the video signal is displayed continuously in time, corner blurring occurs because pixels corresponding to object boundaries are quantized differently in neighboring frames.

변환 부호화Transform Coding

도 1은 기본 DCT 변환 부호화기를 나타낸 것으로서, 변화 부호화의 기본개념은 영상 데이터를 직교 변환하여 영상 내에 존재하는 상관성을 제거 시킴으로써 압축률이 높은 영상 데이터를 얻는 것이다.1 illustrates a basic DCT transform encoder. The basic concept of change coding is to obtain image data having a high compression rate by performing orthogonal transformation of image data to remove correlations present in the image.

영상 데이터가 통계적으로 일정한 성질을 유지한다는 가정하에서는, 카루넨-로에브 변환(KLT: Karhunen-Loeve Transform)이 평균 제곱 오차 면에서 최적의 변환으로 알려져 있다. 그러나 KLT의 기본함수가 데이터 종속이므로 복호기에 기본 함수를 보내야 하는 단점과 실용적인 고속 계산의 어려움때문에 실시간 처리를 요구하는 응용 분야에는 적용하기 매우 곤란하다. 따라서 KLT에 가장 근접한 직교 변환은 이산 코사인 변환을 사용하며, 이 방법은 영상의 성질에 대한 통계적인 가정이 성립되지 않을지라도 좋은 성능을 유지하며, 이용 가능한 여러 종류의 고속 계산 방법이 개발되어 있다.Under the assumption that the image data remain statistically constant, the Karhunen-Loeve Transform (KLT) is known as the optimal transform in terms of mean squared error. However, because KLT's basic function is data dependent, it is very difficult to apply to applications that require real-time processing because of the disadvantage of sending basic function to decoder and the difficulty of practical high-speed calculation. Therefore, the orthogonal transform closest to KLT uses the discrete cosine transform, which maintains good performance even if statistical assumptions about the image's properties are not established.

본 발명품의 부호화 방식 (대역 분할 부호화)Coding scheme of the present invention (band division coding)

도 2는 대역 분할 부호화를 위한 구성도이다.2 is a configuration diagram for band division coding.

도 2에서와 같이, 대역 분할 부호화는 크게 두 단계 과정으로 나누어진다.As shown in FIG. 2, band division coding is largely divided into two steps.

첫번째 단계는 영상 신호를 각 주파수 대역으로 나누는 대역 분할 단계이며, 두번째 단계는 나누어진 각 대역을 그 특성에 맞추어 압축하는 부호화 단계이다.The first step is a band division step of dividing a video signal into each frequency band, and the second step is an encoding step of compressing each divided band according to its characteristics.

(1) 완전 복원 대역 분할 필터(1) fully reconstructed band split filter

대역 분할 부호화는 부호기에 분해 필터 뱅크, 복호기에 합성 필터 뱅크를 각각 수반하는데, 분해 필터 뱅크는 입력을 주파수 대역에 따라 표본화율이 다른 여러 대역으로 분해한다. 반면에 합성 필터 뱅크는 복호기 입력의 표본화율을 높여 원하는 신호로 합성시킨다. 대역 분할 부호화는 한 대역의 처리시간이 짧아지는 반면에 다수의 처리장치(각 대역당 1개의 처리장치)를 필요로 한다.The band division coding involves a decomposition filter bank in an encoder and a synthesis filter bank in a decoder. The decomposition filter bank decomposes an input into several bands having different sampling rates according to frequency bands. On the other hand, the synthesis filter bank increases the sampling rate of the decoder input and synthesizes the desired signal. Band split coding requires a short processing time for one band, but requires a plurality of processing devices (one processing device for each band).

(2) 대역 분할된 영상의 부호화(2) Coding of Band-Divided Image

도 3은 SPEG 프레임 내 대역 분할/DCT 연계 과정을 개략적으로 나타낸 도면이다.3 is a diagram schematically illustrating a process of band division / DCT linkage in an SPEG frame.

사람의 눈은 고주파 대역의 화소치의 작은 변화에 대해서는 잘 찾아내지 못하는 성질을 갖고 있다. 따라서 주위의 작은 화소값을 0으로 만드는 사각영역(dead zone)을 지닌 간단한 비균일 양자화기를 심각한 시각 열화없이 사용할 수 있으며, 0이 아닌 값과 0의 연속 길이를 줄길이 부호화함으로써 보다 높은 압축률을 얻을 수도 있다. 양자화기는 사각영역에 있는 고주파 대역의 수많은 화소치를 0으로 만들기 때문에 보다 긴 0의 연속 길이를 만들 수 있다.The human eye does not find small changes in pixel values in the high frequency band. Therefore, a simple nonuniform quantizer with a dead zone that makes the surrounding small pixel values zero can be used without severe visual degradation, and higher compression ratios can be obtained by length-encoding non-zero values and consecutive lengths of zero. It may be. The quantizer zeroes out many of the pixel values in the high-frequency band in the quadrature region, resulting in longer zero continuous lengths.

대역 분할 부호화 방법은 대역 분할 후 표본율이 감소하기 때문에, 고품위 TV에서와 같이 대역을 분할하지 않았을 경우의 어려운 초고속의 부호화 처리에 많이 사용되고 있다.In the band division coding method, since the sampling rate decreases after band division, it is frequently used for difficult ultra-fast encoding processing when bands are not divided as in high-definition TV.

본 발명품의 프레임 간 부호화Interframe Coding of the Invention

도 4는 일반적인 프레임 간 부호화기의 구성도를 나타낸 것으로, 이 기본 구성에는 두 가지 절차가 있는데, 첫번째는 움직임 추정과 보상, 그리고 두번째는 움직임 보상 후의 압축이다. 물체의 움직임은 이전 프레임과 상응하는 영상 데이터의 상대적인 변위를 계산함으로써 추정되며, 현재의 데이터와 이전 프레임으로부터 데이터를 예측하는 예측 부호화와 유사하다.4 is a block diagram of a typical inter-frame encoder. In this basic configuration, there are two procedures: first, motion estimation and compensation, and second, compression after motion compensation. The motion of the object is estimated by calculating the relative displacement of the image data corresponding to the previous frame, and is similar to the predictive encoding that predicts data from the current data and the previous frame.

움직임 변위 추정Motion displacement estimation

도 5는 움직임 변위 측정 원리를 나타낸 도면이다.5 is a diagram illustrating a principle of measuring movement displacement.

영상의 각 프레임은 먼저 N×N 크기의 구획으로 나누어지는데, 움직임 추정의 판별기준은 보통 제곱 평균 오차, 차의 절대치 오차 등 여러가지가 있으며, 도5에서와 같이, 주어진 탐색 영역(N+2L)×(N+2L) (여기서 L은 허용된 변위의 최대 범위) 내에서 현재 프레임 내의 구획과 이전 프레임 내의 구획 사이에서의 차이가 최소화가 되는 움직임 변위를 추정하여 보상에 사용한다.Each frame of the image is first divided into N × N partitions. There are various criteria for determining motion estimation, such as a mean square error and an absolute error of a difference. As shown in FIG. 5, a given search area (N + 2L) Within x (N + 2L), where L is the maximum range of permissible displacements, the motion displacement at which the difference between the compartment in the current frame and the compartment in the previous frame is minimized is used for compensation.

움직임 변위 추정 후의 압축 부호화Compression Coding After Motion Displacement Estimation

도 6은 DCT를 이용한 움직임 보상형 예측 부호화기를 나타낸 도면이다.6 is a diagram illustrating a motion compensated predictive encoder using DCT.

도 6과 같은 움직임 보상형 예측 부호화기를 이용한 움직임 변위 추정의 목적은 시간 중복성이 쉽게 감소될 수 있도록 전 프레임 또는 인접한 프레임들로부터 현재의 영상 데이터(또는 구획)를 추정해 내는 데 있다. 시간 중복성을 줄이기 위해 가장 널리 사용되는 기법은 움직임 보상형 예측 부호화 기법으로, 이 기법에서는 현재의 구획과 움직임 변위 추정을 이용하여 보상된 구획사이의 차로 정의되는 예측 오차가 부호화된다. 이전 프레임으로부터 현재의 구획에 대한 정확한 예측을 할 수 있다면 예측 오차를 줄이면서 압축률을 높일 수 있다.The purpose of motion displacement estimation using a motion compensated prediction encoder as shown in FIG. 6 is to estimate current image data (or partitions) from previous frames or adjacent frames so that time redundancy can be easily reduced. The most widely used technique for reducing temporal redundancy is a motion compensated predictive coding technique, in which a prediction error defined by the difference between the current segment and the segment compensated using the motion displacement estimation is encoded. If we can accurately predict the current partition from the previous frame, we can increase the compression rate while reducing the prediction error.

일반적으로 움직임 보상된 부호화 기법의 압축성은 다음의 몇 가지 요인에 의존한다.In general, the compressibility of a motion-compensated coding scheme depends on several factors.

o 영상 화면에서 물체의 움직임 보상이 가능한 움직임 변위의 크기(움직임 탐색 영역의 크기)o The amount of motion displacement that can compensate for the motion of the object on the video screen (the size of the motion search area)

o 움직임을 추정하기 위한 움직임 보상 방법의 정확도o Accuracy of motion compensation method for estimating motion

영상 압축 기술의 표준화 동향Standardization Trend of Image Compression Technology

기존의 스트리밍 방식들이 표준화 동향에 의거한 사실을 인정하는 바이다We acknowledge that existing streaming methods are based on the standardization trend.

그리고 본 기술의 압축 방식 또한 그 기술 표준에 의거한 것임에 틀림이 없다.And the compression scheme of the present technology must also be based on the technical standard.

그러나 위에거 본바와 같이 각각의 내부프레임의 압축과 영상 및 음원 처리의 방식에서 상당한 부담감을 줄일수 있었으며 본 압축 방식 또한 표준이라 불리는 여러가는 압축 기술과 비교할수 있으리라 생각하며 다음의 설명을 하고자 한다.However, as shown above, it is possible to reduce a considerable burden in the compression of each internal frame and the video and sound processing methods, and this compression method can also be compared with various compression technologies called standards.

영상 압축 기술은 다양한 영상 신호원에서 입력된 신호를 사전 처리 과정을 통하여 표준 공통 규격(CIF: Common Intermediate Format)으로 만든 후에 이를 압축하는 3단계로 나눌 수 있다. 압축된 영상은 디지탈 전송로를 통하여 전송된 후 수신측에서 이를 받아 복원하여 표준 공통 규격으로 만든 다음, 사후 처리 과정을 통하여 현시되는데, 도 7은 표준화를 위한 전 처리 및 부호화 과정을 개략적으로 나타낸 도면이다.Image compression technology can be divided into three stages of compressing the signal input from various video signal sources into a standard common format (CIF) through a pre-processing process. The compressed image is transmitted through the digital transmission path, received by the receiver, reconstructed, made into a standard common standard, and then displayed through post-processing. FIG. 7 schematically shows a preprocessing and encoding process for standardization. to be.

영상 신호원은 영상 전화/회의, TV 신호(NTSC, PAL, SECAM), 고품위 TV 신호(1050, 1125주사선 등), VTR 테이프 신호(VHS, S-VHS), 영상 필름등 여러가지의 다양한 형태를 갖는데, 각 신호원의 규격, 해상도, 전송에 요구되는대역폭 등이 동일한 응용 분야에서도 서로 다른 성질을 가지고 있다. 공통 규격은 서로 다른 신호원을 전 처리 과정을 통하여 만든 디지탈 영상 규격으로써, 다른 기종 간의 통신을 용이하게 하여 주며, 각 규격은 도 8과 같은 계층적 구조를 갖는다.Video sources come in many different forms, including video telephony / conferences, TV signals (NTSC, PAL, SECAM), high-definition TV signals (1050, 1125 scan lines, etc.), VTR tape signals (VHS, S-VHS), and video film. In the same application field, the specifications, resolution, and bandwidth required for transmission are different. The common standard is a digital video standard in which different signal sources are formed through preprocessing, and facilitates communication between different models, and each standard has a hierarchical structure as shown in FIG. 8.

도 8은 표준화된 계층적 해상도를 통한 부호화를 개략적으로 나타낸 도면이다.8 is a diagram schematically illustrating encoding through standardized hierarchical resolution.

영상 부호화의 표준화 동향Standardization Trend of Image Coding

그러나 본 압축 방식또한 위와 비교해서 전혀 손색없는 압축 율과 전송율을 가진다는것에 주의해서 살펴볼 필요가 있다.However, it is worth noting that this compression scheme also has a comparable compression rate and transmission rate compared to the above.

TV에 525 주사선을 사용하는 국가의 경우(우리나라 포함), 최소 공통 규격인 QCIF(Quarter CIF)는 영상 전화의 규격으로 180x120화소의 크기를 가진다.In countries where 525 scan lines are used in TVs (including our country), the minimum common standard, QCIF (Quarter CIF), is a 180x120 pixel size for video telephony.

QCIF의 4배 크기인 CIF는 영상 전화/회의 및 저장 매체를 이용한 멀티미디어 서비스에 사용되며, 표 2에 표시한 바와 같이 CCITT H.261, ISO MPEG I을 통해 국제 권고안이 제정되었거나 거의 마무리 되고 있는 상황이다. CCIR규격은 현재는 아날로그 방식인 CATV, TV 신호의 디지탈 규격으로서 CIF의 4배 크기를 갖는데, 이 규격은 TV 신호의 디지탈 전송, 디지탈 VTR, 고화질 영상 전화/회의 및 멀티미디어 서비스 등에 응용될 것으로 보인다. CCIR규격에 기초를 둔 정지 화상 압축 방법은 현재 MPEG 에서 표준 제정이 끝난 상태이며, 영상에 대한 압축 기법은 MPEG II에서표준 규격안 제정을 진행하고 있다.CIF, four times the size of QCIF, is used for multimedia services using video telephony / conference and storage media, and international recommendations are being made or nearly finalized through CCITT H.261, ISO MPEG I as shown in Table 2. to be. The CCIR specification is a digital standard for analog CATV and TV signals, which is four times the size of CIF, and is expected to be applied to digital transmission of TV signals, digital VTRs, high-definition video telephony / conference and multimedia services. The still picture compression method based on the CCIR standard has been standardized in MPEG, and the compression method for the image has been proposed in MPEG II.

그러나 본 SPEG은 보다 더 막강하며 조밀한 영상 압축율을 자랑한다.However, this SPEG is more powerful and has a dense image compression rate.

SPEG 패킷 영상SPEG packet video

도 9는 회선 교환망과 SPEG 패킷 교환망에서의 영상 부호화기를 나타낸 도면이다.9 is a diagram illustrating an image encoder in a circuit switched network and an SPEG packet switched network.

압축된 영상 신호를 회선 교환망을 이용해서 전송할 경우에는 일정한 대역폭에 맞추기 위하여 버퍼메모리가 쓰이게 되는데, 도 9에서 보는 바와 같이 버퍼 메모리의 상태를 알리는 장치가 부호화기에 연결되어 있어 버퍼가 넘칠 상태에 이르면 압축률을 높여 압축된 영상 신호의 데이터량을 줄임으로써 버퍼가 넘치는 것을 막고, 버퍼가 비워지게 되면 압축률을 낮추어 압축된 영상 신호의 데이터량을 늘리거나 의미없는 비트를 채움으로써 버퍼에서 회선 교환망으로 전송되는 대역폭을 주어진 항등 비트율에 맞추고 있다. 따라서 버퍼의 상태에 따라서 화질이 변하는 단점과, 아울러 주어진 대역폭의 효율적인 이용에 어려운 문제점을 가지고 있다.When transmitting a compressed video signal using a circuit-switched network, a buffer memory is used to fit a constant bandwidth. As shown in FIG. 9, a device for notifying the state of the buffer memory is connected to an encoder, and when the buffer overflows, the compression rate is reached. Increase the amount of data in the compressed video signal to avoid overflowing the buffer, and when the buffer becomes empty, reduce the compression rate to increase the amount of data in the compressed video signal or fill in the meaningless bits, thereby increasing the bandwidth transmitted from the buffer to the circuit-switching network. Is set to the given identity bit rate. Therefore, there is a problem in that the image quality is changed according to the state of the buffer, and also it is difficult to efficiently use a given bandwidth.

반면에 가변 비트율의 전송이 가능한 패킷 통신망에서는 대역폭을 일정하게 유지할 버퍼가 필요 없으므로 부호화기의 설계가 보다 간단해지고 일정한 화질을 유지할 수 있는 장점을 가지고 있다. 더구나 같은 화질의 요구 조건하에, 패킷 망에서는 여러 영상 신호들의 통계적 다중화를 통하여 회선 교환망보다 훨씬 좋은 대역폭의 사용 효율을 가져올 수도 있다. 이러한 결과만을 놓고 볼때, 패킷 전송 방식은 "일정한 화질"과 "효율적인 대역폭 사용" 두 가지를 모두 만족시키는 좋은 방식으로 보여진다.On the other hand, a packet communication network capable of transmitting a variable bit rate does not require a buffer to maintain a constant bandwidth, so that the design of the encoder is simpler and has the advantage of maintaining a constant image quality. Moreover, under the same quality requirements, packet multiplexing may bring about much better bandwidth utilization efficiency than circuit switched networks through statistical multiplexing of multiple video signals. Based on these results, packet transmission seems to be a good way to satisfy both "constant quality" and "efficient bandwidth usage."

패킷망에서의 영상 전송의 문제점Problems of Video Transmission in Packet Networks

압축된 영상 신호는 패킷 형성기(packetizer)로부터 헤더가 더해져서 패킷으로 형성되는데, 이 때 지연이 발생한다. 이것은 압축된 영상 신호의 대역폭이 변하는 정도에 따라 다르지만, 압축된 영상 신호의 대역폭이 변하는 정도에 따르지만, 압축된 영상 신호의 평균 대역폭이 1 Mbps~100 Mbps이면 시간 지연은 평균적으로 5 ms~400 ms 정도가 된다. 회선 교환망에서의 부호화도 가변 비트율을 항등 비트율로 바꾸기 위해서 10 ms~100 ms 정도의 시간 지연이 버퍼에서 발생한다.The compressed video signal is formed into a packet by adding a header from a packetizer, where a delay occurs. This depends on the degree of change in the bandwidth of the compressed video signal, but depends on the degree of change in the bandwidth of the compressed video signal, but if the average bandwidth of the compressed video signal is 1 Mbps to 100 Mbps, the time delay is 5 ms to 400 ms on average. It is about. Encoding in circuit-switched networks In order to change the variable bit rate to the constant bit rate, a time delay of about 10 ms to 100 ms occurs in the buffer.

패킷 형성이 끝난 다음, 이 영상 패킷들은 여러 개의 패킷 다중화 및 패킷 교환망을 통해서 수신 단말기에 도착하게 된다. 이 때 망 내의 트래픽 상태에 따라서 패킷 지터와 패킷 손실이 생기는데, 이 패킷 지터와 손실은 수신 단말기에서 보상이 되어야 한다.After packet formation, these video packets arrive at the receiving terminal through multiple packet multiplexing and packet switching networks. At this time, packet jitter and packet loss occur according to the traffic condition in the network. This packet jitter and loss should be compensated at the receiving terminal.

또한 망 내에서의 패킷 손실의 원인은 크게 두 가지로 구분할 수 있다. 하나는 통신량의 증가 상태에 따라 각 패킷 교환기 및 다중화기의 한정된 버퍼 메모리 때문에 생기는 패킷의 손실이고, 또 다른 원인은 전송 시스템에서 생기는 비트 오류가 패킷의 헤더으 번지 부분에 발생하면 예정된 수신 단말기에 도착하지 못함으로써 생기는 패킷의 손실이다.In addition, there are two main causes of packet loss in the network. One is the loss of packets due to the limited buffer memory of each packet switch and multiplexer as the traffic increases, and the other reason is that a bit error in the transmission system arrives at the intended receiving terminal if a bit error occurs in the header of the packet. This is a loss of packets due to failure.

4.2 패킷 지터하에서의 클록 동기 재생4.2 Clock Synchronous Playback Under Packet Jitter

도 10은 DPLL을 이용한 가변 비트율에서의 클록 동기 재생 원리를 나타낸 도면이다.10 is a diagram illustrating a clock synchronous reproduction principle at a variable bit rate using a DPLL.

도착한 영상 패킷을 복원화하기 위해서는 송신 단말기와 수신 단말기 간의동기가 필요하다. 따라서 수신 단말기에서는 송신 단말기의 클록을 재생하여야 하는데, 따로 클록을 보낼Synchronization between the transmitting terminal and the receiving terminal is required to restore the received video packet. Therefore, the receiving terminal must reproduce the clock of the transmitting terminal.

수가 없으므로 도착하는 패킷을 통해 수신 클록을 재생해 내어야 한다. 그런데 도착하는 패킷이 지터때문에 일정하지 않고, 또 때로는 패킷 손실이 망내에서 생길 수도 있으므로 클록 동기의 문제는 복잡하게 된다.Because of this, the incoming clock must be regenerated through the incoming packets. However, the problem of clock synchronization is complicated because arriving packets are not constant due to jitter, and sometimes packet loss can occur in the network.

도착하는 패킷이 항등 비트율을 갖는다면 패킷 지터가 있더라도 수신 단말의 버퍼의 상태를 관측함으로써, 디지탈 PLL(Phase-Locked Loop)을 써서 송신 클록을 추출해 낼 수 있는 반면에, 영상 패킷들이 가변 비트율로 전송이 되면 수신 단말기의 버퍼 메모리 상태가 실제의 송신 클록을 나타내는 것이 아니므로, 버퍼 메모리 상태로부터 바로 클록 정보를 추출할 수가 없다. 따라서 일반적으로 쓰이는 방법은 클록 정보를 송신 패킷에 포함시키고, 이를 수신측에서 찾아내는 방법을 쓴다. 먼저 도착하는 패킷으로부터 일정 시간 동안에 검출된 클록 패턴의 숫자를 산출한 다음, 같은 시간 동안에 수신기 클록(VCO)에서 형성된 클록 패턴의 숫자와 비교하여, 그 차이로서 필터를 거쳐 수신기 클록을 조절하는 방법이다.If the arriving packet has an equal bit rate, even if there is packet jitter, by observing the state of the buffer of the receiving terminal, the transmission clock can be extracted using a digital phase-locked loop (PLL), while the video packets are transmitted at a variable bit rate. In this case, since the buffer memory state of the receiving terminal does not represent the actual transmission clock, clock information cannot be extracted directly from the buffer memory state. Therefore, a commonly used method is to include clock information in a transmission packet and find it on the receiving side. A method of calculating a number of clock patterns detected for a predetermined time from a packet arriving first, and then comparing the number of clock patterns formed in a receiver clock (VCO) during the same time, and adjusting the receiver clock through a filter as a difference. .

도 10에 나타낸 방법은 시간 평균 방법인데, 지터는 긴 시간축에서 볼 때 평균값이 0이고, 송수신기 클록의 차이는 계속 쌓여 나가기 때문에 그 차이를 가지고 수신기 클록을 조절하여 송신기 클록에 수렴하게 하는 방법이다. 그림 4.2에 표시되어 있는 메모리는 클록 패턴의 차이를 계속 더해 나가서, 비록 처음 몇 주기(이 경우 약 1분) 동안은 지터의 영향이 수신기 클록에 조금은 남게 되지만, 시간이 흐를수록 송신기 클록에 수렴하게 하는 역할을 하게 된다.The method shown in FIG. 10 is a time averaging method. Since jitter has a mean value of 0 on a long time axis and a difference in the transceiver clock continues to accumulate, the jitter is a method of adjusting the receiver clock with the difference to converge on the transmitter clock. The memory shown in Figure 4.2 continues to add the difference in clock patterns, so that over the first few cycles (in this case, about one minute), the jitter's effect remains on the receiver clock, but as time passes, it converges to the transmitter clock. It will play a role.

클록 동기 재생을 사용하므로 인해서 본 압축기술의 보다 더 조밀하며 세분화된 영상처리가 가능해지게 되는 것이다.The use of clock synchronous reproduction allows for more compact and finer image processing of the compression technique.

또한, 위의 영상 매체를 통해서 다음 압축 기술의 압축 방식을 알아 보도록 하자.In addition, let's look at the compression method of the following compression techniques through the above image media.

이 기술의 내부 압축구조는 다음과 같다. 자세한 내용을 기본적으로 언급하자면;The internal compression structure of this technique is as follows. To mention more details by default;

Mjpeg 영상처리에 대한 압축 화상의 조작에 관한 알고리즘(Algorithms for Manipulating Compressed Images)Algorithms for Manipulating Compressed Images

새로운 기법은 압축된 JPEG 데이타에 직접 연산을 구현한다. 응용전에 화상을 해제하고 결과를 압축해야만 하는 알고리즘들 보다 50-100배 정도의 성능 향상을 가져온다. 오디오와 비디오 데이타에 작동하는 멀티미디어 응용들은 컴퓨터들을 위한 많은 새로운 용도를 가능하게 할 것이다.The new technique implements operations directly on compressed JPEG data. This results in a performance improvement of 50-100 times over algorithms that must decompress and compress the results before application. Multimedia applications that operate on audio and video data will enable many new uses for computers.

예를 들어서, 협동 작업 시스템(collaborative work systems)은 비디오 회의 윈도우들을 포함하고 하이퍼미디어(hypermedia) 교육 시스템은 오디오와 비디오의 교육적인 요소들을 포함한다.For example, collaborative work systems include video conferencing windows and a hypermedia education system includes educational elements of audio and video.

멀티미디어 응용의 대부분의 연구는 압축 표준들, 동기화 문제, 저장 매체, 그리고 소프트웨어 구조에 초점을 맞추어 왔으며, 특수효과와 화상 합성의 구현과 같은 실시간으로 디지탈 비디오 데이타를 조작하기 위한 기술들은 거의 보고되지 않았다.Most research in multimedia applications has focused on compression standards, synchronization issues, storage media, and software architectures, and few techniques for manipulating digital video data in real time, such as the implementation of special effects and image composition, have been reported. .

예를 들어서, 화상의 해제, 각 화소값의 수정, 결과 화상을 압축함으로서 압축된 화상의 밝기 조절을 하기 위한 brute-force 알고리즘을 구현했다고 하자. 현재의 워크스테이션상에서 이 접근 방식을 구현하고자 할 때 직면하는 문제는 2가지 원인으로부터 발생할 것이다. 조작되어야 할 데이타의 양(초당 30프레임의 압축되지 않은 640x480 24비트 비디오에 대해 초당 26.3M바이트)과 화상 압축과 해제의 계산 복잡도(computational complexity)이다.For example, suppose you implement a brute-force algorithm to control the brightness of a compressed image by releasing the image, correcting each pixel value, and compressing the resulting image. The problems faced in implementing this approach on current workstations will arise from two sources. The amount of data to be manipulated (26.3 Mbytes per second for uncompressed 640x480 24-bit video at 30 frames per second) and the computational complexity of image compression and decompression.

상기 내용은 압축된 디지탈 화상들에 연산을 구현한 알고리즘들을 기술한다.The above describes algorithms that implement computations on compressed digital pictures.

이들 알고리즘들은 많은 전통적인 화상 조작 연산들의 brute-force알고리즘 보다 50-100배 정도 빠르게 수행하도록 한다. 압축은 전형적으로 25 또는 그이상 압축으로 데이타의 양을 크게 감소시키기 때문에, 이 속도향상은 압축된 데이타에 직접 연산을 수행한 결과이다. 적은 데이타 양으로 부터 기인한 속도 향상에 따라, 압축과 해제에 관련된 계산의 대부분이 제거되어지고, 메모리로 그리고 메모리로부터의 트래픽(traffic)이 감소한다.These algorithms allow to perform 50-100 times faster than the brute-force algorithm of many traditional image manipulation operations. Since compression typically significantly reduces the amount of data with 25 or more compressions, this speedup is the result of performing operations directly on the compressed data. As a result of the speedup resulting from the small amount of data, most of the computations related to compression and decompression are eliminated, and traffic to and from memory is reduced.

이 기법을 어떻게 적용하고,대표적인 알고리즘들의 성능을 평가하는 방법을 기술한다.It describes how to apply this technique and how to evaluate the performance of representative algorithms.

먼저 압축 모델(compression model)을 설명한다.First, a compression model will be described.

많은 화상 변형의 기본인 화소 방향의 대수연산과 스칼라 덧셈과 곱셈을 어떻게 하는지, 압축 화상에 구현될 수 있는지를 보인다.It shows how algebraic operations and scalar addition and multiplication in the pixel direction, which are the basis of many image transformations, can be implemented in compressed images.

2개의 공통적인 비디오 변형을 구현하기 위하여 이들 연산을 사용한다. 하나의 비디오 시컨스를 다른 것과 오버랩하고 희미하게 하는 것.Use these operations to implement two common video variants. Overlap and fade one video sequence with another.

연산을 구현하고 그들의 성능을 brute-force방식으로 비교한다.Implement operations and compare their performance brute-force.

마지막으로 기법의 한계, 다른 압축 표준으로 확장, 그리고 본 연구를 이 분야에서 다른 작업과의 관계를 논한다.Finally, we discuss the limitations of the technique, extend it to other compression standards, and relate this work to other work in this field.

압축 모델(Compression model)Compression model

이 장은 변형 기반 코딩(transform-based coding)에서 사용되는 압축 모델을 기술한다.This chapter describes the compression model used in transform-based coding.

변형 코딩의 일반적인 개괄로 시작하여, 변형 기반 화상 코딩을 위한 CCITT Joint Photographic Expert Group(JPEG) 표준을 짧게 설명한다.Starting with a general overview of transform coding, we briefly describe the CCITT Joint Photographic Expert Group (JPEG) standard for transform-based image coding.

JPEG알고리즘의 상세한 설명은 다른 곳에서 참고할 수 있다. Foley와 Van Dam은 화상 포맷의 상세한 기술을 제시했고, Lim은 다른 변형 코딩 기법을 논했다. 본 설명에 있는 모든 결과는 Lim의 책에 나타났고 여기서는 그들을 증명 없이 언급한다.A detailed description of the JPEG algorithm can be found elsewhere. Foley and Van Dam presented a detailed description of the picture format, and Lim discussed other variant coding techniques. All the results in this description appear in Lim's book and are mentioned here without proof.

변형 기반 코딩(Transform-based coding)Transform-based coding

화상 압축을 위한 공통적인 기법은 변형 기반 코딩이다. 전형적인 변형 코더(coder)는 많은 행렬로서 화상의 화소를 다룬다. 이산 코사인 변형(discrete cosine transform)과 같은 선형 변형(linear transform)은 계수의 새로운 행렬을 만들기 위하여 이 행렬을 적용한다.A common technique for picture compression is deformation based coding. A typical variant coder handles the pixels of an image as many matrices. Linear transforms, such as discrete cosine transforms, apply this matrix to create a new matrix of coefficients.

원화상을 복구하기 위하여, 역 선형 변형(inverse linear transformation)을 적용한다.To recover the original image, inverse linear transformation is applied.

변형은 2가지 효과를 가진다.The transformation has two effects.

첫째, 그것은 많은 변형된 계수들이 거의 영('0')이 되도록 하기 위해 화상의 에너지를 모은다.First, it gathers the energy of the picture so that many of the modified coefficients are almost zero.

둘째, 그것은 화상을 높고 낮은 주파수들로 스펙트럼처럼 분해한다. 인간의 시각전이 시스템은 다른 것보다 어떤 주파수들을 거의 받을 수 없기 때문에, 어떤 계수들은 심각한 화상의 저하 없이 다른 것보다 좀더 대략 근사할 수 있다.Second, it resolves the picture like a spectrum into high and low frequencies. Since the human visual transition system can hardly receive certain frequencies than others, some coefficients can be approximated more roughly than others without serious degradation of the picture.

후자의 특성을 발휘하기 위한 흔한 방법은 계수를 양자화 하는 것이다. 계수를 양자화 하는 단순한 방법은 정수로부터 하위의 비트를 절단하는 것이다.(예를들어, 오른쪽 산술 시프트 연산) 산술 시프트보다 데이타의 손실을 제어하는 방법은 상수로 값을 나누고, 값을 양자화하고, 정수에 근사하도록 값을 반올림한다.A common way to characterize the latter is to quantize the coefficients. A simple way to quantize a coefficient is to truncate the lower bits from an integer (for example, right arithmetic shift operation). A way of controlling data loss rather than an arithmetic shift is to divide values by constants, quantize values, and integers. Round the value to approximate.

양자화 값을 결과에 곱함으로서 원래 값의 근사치를 복구할 수 있다.By multiplying the result by the quantization value, an approximation of the original value can be recovered.

양자화 값이 클수록 근사값은 조잡해지나, 큰 비트의 손실은 줄어든다.The larger the quantization value, the coarser the approximation, but the greater the loss of large bits.

변형 계수들이 이러한 방법으로 양자화 되어질 때, 대부분의 계수들은 전형적으로 영('0')이다.When the transform coefficients are quantized in this way, most of the coefficients are typically zero ('0').

예를 들어, 24:1의 압축 실험은 변형된 화상에서 계수들의 약 90%가 영('0')일 것이라는 것을 보여준다.For example, a compression experiment of 24: 1 shows that about 90% of the coefficients in the deformed image will be zero ('0').

JPEG 알고리즘JPEG algorithm

정지 화상의 변형 코딩을 위한 표준중 하나는 JPEG표준이다. 이 장의 나머지는 JPEG의 관련된 특성을 기술하고 관련된 용어를 소개한다.One of the standards for transform coding of still images is the JPEG standard. The remainder of this chapter describes the relevant properties of JPEG and introduces related terms.

원화상은 640x480화소에 하나의 광도(Y)와 2개의 채도(I와 Q)의 3원소로 합성된 24비트 화상이라고 가정하자. 즉, 원화상에서 각 화소에 대해, 8비트 값(Y,I,Q)의 3쌍과 관련된다.Assume that the original image is a 24-bit image composed of three elements of one luminance (Y) and two chroma (I and Q) in a 640x480 pixel. That is, for each pixel in the original image, it is associated with three pairs of 8-bit values (Y, I, Q).

각 요소는 비슷하게 다루어지기 때문에, 단지 Y요소만을 알고리즘에서 설명한다.Because each element is treated similarly, only the Y element is described in the algorithm.

Y요소는 8화소 폭과 8화소 높이의 연속된 4각형으로 나누어진다. 각 블럭은 0~255의 범위에 있는 정수의 8x8행렬이다. 알고리즘의 첫번째 단계는 일반화(normalization)단계라 불리는 것으로, 모든 값을 행렬에 있는 각 화소로부터 128을 빼어 -128에서 127의 범위로 가져온다. (이 단계는 I와 Q요소들에서는 건너 뛴다. 그들이 이미 -128에서 127사이에 있기 때문에). 결과 행렬은 y[i,j]이고 i,j 0...7이다. 도 11은 전체화상에 대한 y[i,j]의 관계를 보여준다.The Y element is divided into a series of four squares of eight pixels wide and eight pixels high. Each block is an 8x8 matrix of integers in the range 0-255. The first step in the algorithm is called the normalization step, which takes all the values from -128 to 127 by subtracting 128 from each pixel in the matrix. (This step is skipped for I and Q elements, since they are already between -128 and 127). The resulting matrix is y [i, j] and i, j 0 ... 7. 11 shows the relationship of y [i, j] to the whole image.

알고리즘에서 두번째 단계는(DCT 단계) 이 8x8행렬로 이산 코사인 변형을 적용하여, 새로운 8x8행렬을 생성한다. DCT는 결과의 8x8행렬에서의 값이 주파수와 관계된 빠른 퓨리에 변환(fast fourier transform)과 유사하다. 즉, 가장 낮은 주파수 요소는 위 왼쪽이고 가장 높은 주파수 요소는 오른쪽 아래이다. 새로운 행렬을 Y[u,v]라 부르고, u,v 0...7일때, DCT의 정의에 의해 다음과 같이 나타난다.The second step in the algorithm (DCT step) applies a discrete cosine transform to this 8x8 matrix, producing a new 8x8 matrix. DCT is similar to a fast fourier transform whose value in the resulting 8x8 matrix is frequency related. That is, the lowest frequency component is top left and the highest frequency component bottom right. The new matrix is called Y [u, v], and when u, v 0 ... 7, by the definition of DCT,

알고리즘에서 3번째 단계는 주파수에 의존하는 값으로 Y[u,v]의 요소를 양자화 하여, q[u,v]를 구한다. 이 양자화 단계는 다음과 같이 정의 된다.In the third step of the algorithm, q [u, v] is obtained by quantizing the elements of Y [u, v] with frequency-dependent values. This quantization step is defined as follows.

정수 양자화 값들의 행렬을 양자화 테이블(quantization table), 또는 QT라 부른다. 차분 QT들은 전형적으로 광도와 채도 요소을 위해 사용된다. QT의 선택은 압축의 양과 해제된 화상의 질을 결정한다. JPEG표준은 인간의 인자 연구로 기인한 추천된 광도와 채도 QT를 포함 한다. 흔한 실험은 이들 기본적인 QT의 값을 다른 화질을 얻기 위하여 단계화하는 것이다. 특별히, QT의 q1[u,v]와 q2[u,v]을 갖는 2화상이 주어지면, 모든 u,v에 대해 어떤 상수 감마는 흔히 다음과 같이 구한다.The matrix of integer quantization values is called a quantization table, or QT. Differential QTs are typically used for luminance and chroma elements. The choice of QT determines the amount of compression and the quality of the decompressed image. The JPEG standard includes recommended brightness and chromatic QT due to human factor studies. A common experiment is to step through these basic QT values to get different picture quality. In particular, given two pictures with q1 [u, v] and q2 [u, v] of QT, for all u, v some constant gamma is often found as

나중에 이 값을 사용한다.You will use this value later.

알고리즘의 4번째 단계는 방정식(2)로부터 8x8행렬을 그림2에 보여지는 지그재그(zigzag)를 사용하여 64요소의 벡터(vector)로 변환한다.The fourth step of the algorithm converts an 8x8 matrix from equation (2) into a vector of 64 elements using a zigzag shown in Figure 2.

이 순서(ordering)는 벡터의 시작부분 근처의 저주파수 성분과 끝부분의 고주파수 성분을 모이게 하기 위한 경험적인 것이다.This ordering is an empirical approach to bring together the low frequency components near the beginning of the vector and the high frequency components near the beginning.

우리는 그 벡터를 지그-재그(zig-zag)벡터라 부르고 이 단계를 지그재그 스캔 단계라 한다.We call it a zig-zag vector and this step is called the zig-zag scan step.

대부분의 화상들에서, 벡터 Yzz 는 순차적인 영들을 많이 담고 있을 것이다. 따라서 알고리즘에서의 다음단계는 실행 길이 부호화(run-length encoding)단계로서,벡터를(skip,value)의 쌍으로 부호화 한다. skip은 Yzz벡터에서 얼마나 많은 색인들이 다음의 영이 아닌 값에 도달하기 위하여 건너 뛰어야 하는 가를 나타낸다. 그것은 value에 저장되어 있다. 관례적으로, 쌍(0,0)은 Yzz에 남아 있는 값이 모든 영이라는 것을 가리킨다.In most pictures, the vector Yzz will contain many sequential zeros. Therefore, the next step in the algorithm is run-length encoding, which encodes a vector as a pair of (skip, value). skip indicates how many indices in the Yzz vector must be skipped to reach the next nonzero value. It is stored in value. By convention, the pair (0,0) indicates that the value remaining in Yzz is all zeros.

이 시점에서 그 블럭을 실행 길이 부호화된 블럭(RLE, run-length-encoded)이라 부르고, 각 (skip,value)쌍은 RLE값이라 불린다. RLE블럭은 YRLE[x]로 표기하고, YRLE[x].skip과 YRLE[x].value는 배열에 있는 x번째 요소의 skip과 value를 나타낸다. 우리의 알고리즘은 RLE블럭들상에서 작동한다.At this point the block is called a run-length-encoded block (RLE), and each (skip, value) pair is called an RLE value. RLE blocks are denoted by YRLE [x], and YRLE [x] .skip and YRLE [x] .value represent the skip and value of the xth element of the array. Our algorithm works on RLE blocks.

마지막 단계에서, 산술 압축(arithmetic compression) 또는 허프만 코딩(Huffman coding)과 같은 관례적인 엔트로피 코딩 기법(entropy coding method)이 RLE블럭들을 압축한다. 도 12는 도해적으로 하나의 블럭의 압축 처리에서 모든 단계들을 표시한다.In the last step, conventional entropy coding methods, such as arithmetic compression or Huffman coding, compress the RLE blocks. 12 graphically shows all the steps in the compression process of one block.

참고 3을 역방향으로 추적하면(즉 오른쪽에서 왼쪽으로) 데이타를 어떻게 해제하는 것을 설명한다. 해제의 첫번째 단계는 엔트로피(entropy) 부호화 비트 열로부터 RLE블럭을 복구한다. RLE블럭 YRLE[x]를 통해 단일 과정으로 만들기 위하여, 지그재그 벡터 Yzz[x]를 복구한다. Yzz[x]로 부터, 지그재그 스캔을 역으로 함으로서 YQ[u,v]를 복구한다. 그런 다음 우리는 Y[u,v]의 근사를 복구하기 위하여 적절한 QT로부터 YQ[u,v]의 각 요소와 q[u,v]를 곱한다. 마지막 단계에서, 우리는 역 DCT(IDCT)를 사용하여 Y[u,v]로부터 화상 블럭 y[i,j]를 얻는다.Tracking Note 3 backwards (that is, from right to left) explains how to release the data. The first step of release recovers the RLE block from the entropy coded bit stream. Restore the zigzag vector Yzz [x] to make it a single process via the RLE block YRLE [x]. From Yzz [x], recover YQ [u, v] by reversing the zigzag scan. Then we multiply each element of YQ [u, v] by q [u, v] from the appropriate QT to recover the approximation of Y [u, v]. In the last step, we obtain the picture block y [i, j] from Y [u, v] using inverse DCT (IDCT).

방정식(4)는 방정식(1)과 아주 닮았으나, 합산(summation)이 i와 j보다는 u,v에 대해 행한다.Equation (4) is very similar to equation (1), but the summation is done for u, v rather than i and j.

이 기법을 사용하여, 우리는 QT값을 변경함으로서 압축률을 조절할 수 있다.Using this technique, we can adjust the compression rate by changing the QT value.

경험상 화질의 심각한 손실 없이 약 24:1의 압축률을 달성할 수 있다.( 즉, 화소당 1비트) 10:1의 압축률에서, 해제된 화상은 대개 원래의 것과 대개 구분이 불가능하다. 엔트로피 코딩은 약 2.5:1로 데이타 크기를 감소시킨다. 따라서, RLE블럭의 데이타 크기는 모두 25:1의 압축을 했다면 원래 화상의 것보다 전형적으로 10배나 적다.Experience has shown that a compression ratio of about 24: 1 can be achieved without significant loss of image quality (i.e., 1 bit per pixel). At a compression ratio of 10: 1, the released image is usually indistinguishable from the original. Entropy coding reduces data size by about 2.5: 1. Therefore, the data size of the RLE block is typically 10 times smaller than that of the original picture if all of the compression is 25: 1.

대수적 연산(Algebraic operations)Algebraic operations

이 부분에서는 RLE블럭들 상에서 2개의 화상들의 스칼라 덧셈, 스칼라 곱셈, 화소 방향의 덧셈, 화소 방향의 곱셈, 4개의 대수적 연산이 어떻게 수행되는지를 보여준다. 따르는 계산에서,This section shows how scalar addition, scalar multiplication, pixel direction addition, pixel direction multiplication, and four algebraic operations of two pictures are performed on RLE blocks. In the following calculation,

HRLE= (FRLE,GRLE)HRLE = (FRLE, GRLE)

FRLE와 GRLE는 입력 화상들의 RLE표현이다. HRLE는 출력 화상의 RLE표현이다. 그리고 GRLE는 실수 값의 함수이다. 구현에서, HRLE자료구조에 저장된 값들은 정수들일 것이다. 따라서 함수에 의해 되돌려지는 값은 가장 가까운 정수값으로 반올림 되어져야만 한다. 따르는 계산에서 표기를 단순히 하기 위해, 이 반올림은 암시적인 것으로 한다.FRLE and GRLE are RLE representations of input pictures. HRLE is an RLE expression of the output image. And GRLE is a function of real values. In an implementation, the values stored in the HRLE data structure will be integers. Therefore, the value returned by the function must be rounded to the nearest integer value. In order to simplify the notation in the following calculations, this rounding is implicit.

표기를 더욱더 단순히 하기 위해, 양자화된 배열들 FQ[u,v],GQ[u,v],그리고 HQ[u,v]상에서 모든 계산들을 수행한다. RLE블럭이 이들 배열을 나타내는 자료 구조이기 때문에, 유도된 방정식은 제공된 RLE블럭들 상에서 타당할 것이다. 우리는 적합한 색인 변환을 수행한다. 다른 표기상의 관례는To make the notation even simpler, we perform all the calculations on quantized arrays FQ [u, v], GQ [u, v], and HQ [u, v]. Since the RLE blocks are data structures representing these arrays, the derived equations will be valid on the provided RLE blocks. We perform the appropriate index conversion. Another notational convention is

1. 색인 u,v,w에 의해 색인된 대문자 F,G,H는 압축된 화상들을 나타낸다.1. Capital letters F, G, H indexed by index u, v, w represent compressed pictures.

2. i,j,k에 의해 색인된 f,g,h와 같은 소문자는 압축 해제된 화상을 나타낸다.2. Lowercase letters such as f, g, h indexed by i, j, k represent decompressed images.

3. *와 같은 그리스 문자들은 스칼라 값(scalars)을 나타낸다.Greek characters such as * represent scalars.

4. QT는 화상을 나타내는 첨자와 함께 배열로 나타내어진다.4. QTs are represented in an array with subscripts representing images.

예를 들어, 압축된 화상 H의 QT는 qH[u,v]이다.For example, the QT of the compressed picture H is qH [u, v].

5. 문자들 x,y,z은 지그재그로 순서화된 색인들을 나타낸다.5. The characters x, y, z represent zigzag ordered indices.

흔히 단일 지그재그 색인(x와 같은)으로 QT을 색인 할 것이다. 그러한 경우에서 [u,v]와 같은 색인들로 변환은 암시적이고 문맥으로부터 명확할 것이다.Often you will index QT with a single zigzag index (like x). In such cases the conversion to indices such as [u, v] would be implicit and clear from the context.

스칼라 곱셈(Scalar multiplication)Scalar multiplication

화소 값들의 스칼라 곱셈(scalar multiplication) 연산을 고려해 보자. 이 연산에서, 원 화상의 화소 값이 f[i,j]라면, 출력 화상 h[i,j]에서 대응 화소의 값은 다음과 같이 주어진다.Consider a scalar multiplication operation of pixel values. In this operation, if the pixel value of the original image is f [i, j], the value of the corresponding pixel in the output image h [i, j] is given as follows.

h[i,j]= f[i,j] (5)h [i, j] = f [i, j] (5)

JPEG압축 알고리즘과 방정식 5의 선형성을 사용하여, 출력 화상의 양자화된 계수들을 쉽게 보여줄 수 있다. HQ[u,v]는 FQ[u,v]의 단지 크기가 다른 사본이다.Using the JPEG compression algorithm and the linearity of equation 5, it is easy to show the quantized coefficients of the output image. HQ [u, v] is just a different sized copy of FQ [u, v].

특별히,방정식(1)에서(5)까지 사용하여,Specifically, using equations (1) to (5),

qF(u,v)와 qH(u,v)는 각각 입력과 출력 화상의 QT이다. 오른쪽의 마지막에 정수 반올림이 암시적이다. 다른 말로 하면, 압축된 화상에서 스칼라 곱셈 연산을 수행하기 위하여, 화상의 QT를 취하는 한, 양자화된 계수들 상에 직접 수행할 수 있다. 화상들간의 QT는 비례적이라면(방정식 3에서와 같이), 방정식은 다음으로 단순화된다.qF (u, v) and qH (u, v) are the QTs of the input and output images, respectively. Integer rounding is implicit at the end of the right. In other words, in order to perform a scalar multiplication operation on the compressed picture, one can perform directly on the quantized coefficients as long as the QT of the picture is taken. If the QT between pictures is proportional (as in equation 3), the equation is simplified to:

HQ[u,v]= FQ[u,v] (6b)HQ [u, v] = FQ [u, v] (6b)

입력과 출력화상들의 질(quality)이 같다면, =1인 특별한 경우를 갖는다. 입력에 있는 한 값이 FQ[u,v]가 영('0')이라면, 출력에 대응하는 값 또한 영이다. 자료 구조에서 그 값들을 단순히 스케일링(scaling)함으로서 RLE블럭상에서 이 연산을 구현할 수 있다.If the quality of the input and output images is the same, there is a special case where = 1. If a value in the input is FQ [u, v] is zero ('0'), the value corresponding to the output is also zero. You can implement this operation on an RLE block by simply scaling those values in the data structure.

우리는 양자화된 배열 또는 지그재그 벡터조차 다시 만들 필요 없다. 이런식으로 구현할때, 연산은 FQ[u,v]가 영인 곳에서 불필요한 곱셈들을 피하여 아주 빠르게 만든다.We do not have to recreate quantized arrays or zigzag vectors. When implemented this way, the operation is made very fast, avoiding unnecessary multiplications where FQ [u, v] is zero.

스칼라 덧셈(Scalar addition)Scalar addition

이제 스칼라 덧셈 연산을 고려해 보자. 원 화상에서의 화소의 값이 f[i,j]라면, 출력 화상 h[i,j]에 대응하는 화소의 값은 다음과 같이 주어진다.Now consider the scalar addition operation. If the value of the pixel in the original image is f [i, j], the value of the pixel corresponding to the output image h [i, j] is given as follows.

h[i,j]= f[i,j]+ (7)h [i, j] = f [i, j] + (7)

각 화소에 상수를 더하는 것은 평균(즉, DC요소)값을 변경 한다. DCT는 [0.0]항에 저장한다. 단지 이 계수만이 영향을 받게 될 것이다. 방정식(1)에서(4)까지와 방정식(7), 그리고 DCT의 특성을 사용하여 이것을 쉽게 증명할 수 있다. 그러한 계산의 결과는Adding a constant to each pixel changes the average (ie, DC component) value. DCT is stored in section [0.0]. Only this coefficient will be affected. Equations (1) through (4), equations (7), and the properties of the DCT can easily prove this. The result of such a calculation is

두 화상의 QT가 비례적이라면(방정식 3에서와 같이), 이 방정식은 특별히 단순한 형태를 가질 것이고, 다음의 방정식으로 표현된다.If the QTs of two pictures are proportional (as in equation 3), this equation will have a particularly simple form and is represented by the following equation.

다시 스칼라 덧셈 연산은 양자화된 계수들상에서 직접 수행될 수 있다는 것을 안다. 좀더 중요하게, =1인 공통적인 경우에(즉, 출력 화상의 화질은 입력 화상의 화질과 같은 것이다.), 이 연산은 해제된 화상들상에서 대응하는 연산보다 훨신적은 계산을 포함한다. 단지 양자화된 행렬의 (0,0)계수만이 영향을 받기 때문이다.Again, we know that scalar addition operations can be performed directly on quantized coefficients. More importantly, in the common case where = 1 (i.e., the picture quality of the output picture is the same as the picture quality of the input picture), this operation involves a calculation much more than the corresponding operation on the released pictures. Only the (0,0) coefficient of the quantized matrix is affected.

화소 덧셈(Pixel addition)Pixel addition

도 13은 조합 배열의 초기화를 나타낸 도면이다.13 is a diagram illustrating initialization of a combination arrangement.

화소 덧셈 연산은 다음의 방정식으로 기술되어진다.The pixel addition operation is described by the following equation.

h[i,j]=f[i,j]+g[i,j] (9)h [i, j] = f [i, j] + g [i, j] (9)

스칼라 곱셈의 경우와 같이, 수행하기 원하는 연산은 선형이다. JPEG압축 알고리즘 또한 선형이기 때문에, 출력 화상 HQ[u,v]의 양자화된 계수는 합해지고, FQ[u,v]와 GQ[u,v]의 사본으로 스케일되어진다. 특별히, 방정식(1)에서(4)까지와 방정식(8)을 사용하여 보여줄 수 있다.As with scalar multiplication, the operation you want to perform is linear. Since the JPEG compression algorithm is also linear, the quantized coefficients of the output image HQ [u, v] are summed and scaled with copies of FQ [u, v] and GQ [u, v]. In particular, it can be shown using equations (1) to (4) and equation (8).

즉,In other words,

다시 한번, 화상들의 QT에 대해 설명한다면, 연산은 양자화된 계수들 상에서 직접 수행되어질 수 있다. 또한, 모든 화상을 위한 QT가 비례적이라면(비례 상수 F, G을 가지고), 방정식은 다음과 같이 단순화 된다.Once again, referring to the QT of the pictures, the operation can be performed directly on the quantized coefficients. Also, if the QT for all pictures is proportional (with proportional constants F and G), the equation is simplified as follows.

화소의 곱셈(Pixel multiplication)Pixel multiplication

마지막으로, 화소의 곱셈 연산은 다음의 방정식으로 표현되어진다.Finally, the multiplication operation of the pixel is represented by the following equation.

는 스칼라 값이다.Is a scalar value.

수학적으로 잉여지만, 스칼라 는 그들이 곱해질 때 화소 값들을 스케일 하기에 편리하다. 예를 들어, 화상 g는 [0..255]범위에서 화소 값들을 담고있을 때 이 공식을 사용한다. 그리고 [0..1]범위로 그들을 통역하기를 원한다. g가 마스크(mask)일 때와 같이. 이 연산은 를 1/256으로 설정함으로서 실현되어진다.Although mathematically redundant, scalars are convenient for scaling pixel values when they are multiplied. For example, image g uses this formula when it contains pixel values in the range [0..255]. And wants to translate them into the [0..1] range. as g is a mask. This operation is realized by setting to 1/256.

F(v1,v2), G(w1,w2)그리고 H(u1,u2)는 각각 f,g,h에 대해 압축된 화상의 양자화된 값이다. 방정식 (1),(2)와 (11)을 사용하여, 다음과 같이 H(u1,u2)의 값을 계산할 수 있다.F (v1, v2), G (w1, w2) and H (u1, u2) are the quantized values of the compressed picture for f, g, h, respectively. Using equations (1), (2) and (11), the value of H (u1, u2) can be calculated as follows.

2가지 사실에 주목함으로서 효과적으로 이것을 계산할 수 있다.By paying attention to two facts we can calculate this effectively.

1. 전형적인 압축된 화상들에 대해서,1. For typical compressed images,

G(w1,w2)와 F(v1,v2)는 (v1,v2)와 (w1,w2)의 대부분의 값들에 대해 영('0')이다.G (w1, w2) and F (v1, v2) are zero ('0') for most values of (v1, v2) and (w1, w2).

2. 함수 WQ(v1,v2,w1,w2,u1,u2)에서 256K요소들의,단지 항들의 약 4%만이 영('0')이 아니다.달리 말하면, 행렬 WQ는 아주 희소(sparse)하다.In the function WQ (v1, v2, w1, w2, u1, u2), only about 4% of the terms of 256K elements are not zero (in other words, the matrix WQ is very sparse). .

이 기법을 구현할 때, 합에 기여할 항들만을 계산하기 위해 고려해야만 한다. RLE블럭들상에서 이 기법을 구현할 때 첫번째 사실의 이점을 취할 것이다. 영('0')들은 쉽게 건너 뛰어 지기때문에. 두번째 사실의 이점을 취하기 위해, 다음 절에서 기술된 자료구조를 사용한다.When implementing this technique, we must consider to calculate only the terms that will contribute to the sum. We will take advantage of the first fact when implementing this technique on RLE blocks. Zeros are easily skipped. To take advantage of the second fact, we use the data structures described in the next section.

알고리즘이 RLE블럭상에서 작동하기 때문에, 지그재그 순서화된 색인은 데이타 요소를 참조하기 위해 사용되어진다. 관례적으로, x,y,z는 각각 쌍(v1,v2),(w1,w2)와 (u1,u2)의 지그재그로 순서화된 색인들을 나타낸다. 이 치환으로, 방정식 (12a)를 (12c)로 다음과 같이 쓸수 있다.Because the algorithm works on RLE blocks, zigzag ordered indexes are used to refer to data elements. By convention, x, y, z represent zigzag indices of pairs (v1, v2), (w1, w2) and (u1, u2), respectively. With this substitution, equation (12a) can be written as (12c) as follows.

x와 y상의 합은 0에서부터 63까지를 갖는다.The sum of x and y phases is from 0 to 63.

방정식 (13)을 효과적으로 계산하기 위해, 다음의 자료 구조를 소개한다.In order to efficiently calculate equation (13), the following data structure is introduced.

조합 항(combination element)은 숫자 z와 W의 쌍이다. z는 정수이고 W는 부동소숫점 값이다. 조합 리스트(combinati-on list)는 조합 항들의 리스트이다. 조합 배열(combination array)는 조합 리스트들의 64*64배열이다. 참고 4에 있는 C코드는 조합 배열 comb[x,y]를 초기화한다. 배열은 코드가 입력되어질 때 공백 리스트들을 담고 있다. 함수 ZigZag(u1,u2)는 (u1,u2)요소와 관계된 지그재그 색인을되돌려준다.The combination element is a pair of numbers z and W. z is an integer and W is a floating point value. The combination-on list is a list of combination terms. The combination array is a 64 * 64 array of combination lists. The C code in Note 4 initializes the collation array comb [x, y]. The array contains an empty list when the code is entered. The function ZigZag (u1, u2) returns the zigzag index associated with the (u1, u2) element.

함수 AddCombElt(z,W,comb[x,y])는 전역 조합 배열 comb[x,y]에 저장되어 있는 조합 리스트에서 조합 요소(z,W)를 삽입하고 수정된 조합 리스트를 되돌려준다.(저장 장소는 중요하지 않다.) 배열 W[8][8][8]는 방정식 12c의 W함수의 값들로 초기화 되어질 것으로 가정한다.The function AddCombElt (z, W, comb [x, y]) inserts a combination element (z, W) from the combination list stored in the global combination array comb [x, y] and returns the modified combination list. The storage location is not important.) Assume that the array W [8] [8] [8] is initialized with the values of the W function in equation 12c.

초기화된 조합 배열을 사용하여, Figure 5에 보여지는 C코드는 2개의 RLE블럭들 f와 g상에서 방정식 13을 효과적으로 구현한다. 이 알고리즘을 컨볼류션 알고리즘(convolution algorithm)이라 부른다. comb[]는 코드에서 상수임에 유의하라. 주어진 QT에 대해 1번만 계산되어진다. 화상의 비제한된 수에 적용할 수 있다. 코드는 다음과 같이 작동한다. 지그재그 벡터를 나타내는 배열 hzz는 모두 영으로 가정한다. 두 입력화상 f와 g에서 RLE값들의 각 쌍에 대해 지그재그 색인 x와 y를 계산한다. 그리고 tmp에 저장된 그들의 데이타 값들의 곱(product)을 계산한다. 영향을 받게 될 출력배열 hzz에 있는 항을 결정하기 위해 그리고 각 항으로 W*tmp의 곱을 계산하기 위해 comb[x,y]에 저장된 조합 리스트에서 각 조합 항의 z값을 사용한다.Using the initialized combinatorial array, the C code shown in Figure 5 effectively implements Equation 13 on two RLE blocks f and g. This algorithm is called a convolution algorithm. Note that comb [] is a constant in the code. It is calculated only once for a given QT. It can be applied to an unlimited number of images. The code works as follows. Assume that the array hzz representing the zigzag vector is all zero. The zigzag indices x and y are calculated for each pair of RLE values in the two input images f and g. Then calculate the product of their data values stored in tmp. Use the z value of each combination term in the list of combinations stored in comb [x, y] to determine the term in the output array hzz that will be affected and to compute the product of W * tmp for each term.

이 방식에서, 단지 영이 아닌 곱들(products)의 곱셈만이 hzz에서 계산한다. 이 알고리즘이 프로그램에서 사용되어질 때, 마지막 과정은 영('0')들을 실행 길이 부호화 하고, 정수 반올림을 수행하고, 결과의 RLE블럭을 만든다.( 물론, 정수 산술연산의 사용은 성능을 증가시킬 것이나, 명확성을 위해 부동소숫점 구현을 기술하기로 했다.)In this way, only multiplication of nonzero products calculates in hzz. When this algorithm is used in a program, the last step is to run-length code zeros, perform integer rounding, and produce an RLE block of the result (of course, the use of integer arithmetic can increase performance). For the sake of clarity, we have chosen to describe a floating point implementation.)

연산들의 요약(Summary of operations)Summary of operations

어떻게 화소 덧셈, 화소 곱셈, 스칼라 덧셈과 스칼라 곱셈이 양자화된 배열에서 구현될 수 있는가를 보였다. 앞서 언급한 바와 같이, 이들 변형은 RLE블럭들에 직접 연산할 수 있다. 표1은 화상 연산의 RLE블럭들상의 연산으로 사상한 것을 요약한 것이다. 표에서, 기호 F,H(x)는 다음과 같이 정의되어진다.We have shown how pixel addition, pixel multiplication, scalar addition and scalar multiplication can be implemented in quantized arrays. As mentioned above, these variations can be computed directly on the RLE blocks. Table 1 summarizes the mapping to the operations on the RLE blocks of the image operation. In the table, the symbols F, H (x) are defined as follows.

함수 Convolve(F,G, ,qF,qG,qH)는 방정식 13에서 정의되어 지고 참고 4와 5에서 구현되어진다.The function Convolve (F, G,, qF, qG, qH) is defined in equation 13 and implemented in references 4 and 5.

응용들(Applications)Applications

전형적으로 비디오 데이타는 압축된 화상들의 순서로 전송되어진다.Typically video data is transmitted in the order of compressed pictures.

엔트로피 부호화된 데이타는 직접 조작될 수 없으나, 몇몇 연산들이 RLE블럭들 상에서 어떻게 수행될지 보여주었다. 도 12에 관해, 화상을 엔트로피로 부호화한다면, RLE블럭들 상에서 연산을 수행한다. 그리고 엔트로피는 결과를 부호화한다. 화상 압축과 해제의 대부분에 대해 지름길일 수 있다. 좀더 빠른 알고리즘에 기인한다.Entropy coded data cannot be manipulated directly, but it shows how some operations are performed on RLE blocks. With respect to Fig. 12, if the picture is encoded in entropy, the operation is performed on the RLE blocks. And entropy encodes the result. It can be a shortcut for most of the image compression and decompression. This is due to a faster algorithm.

표에 있는 기본 연산들을 Dissolve(화상들의 두 순서에 있어서 동시적인 fade out과 fade in) 그리고 서브타이틀(subtitle)과 같은 좀더 강력한 연산들을 형성하기 위해 결합할 수 있다. 이들 연산들의 구현은 전형적으로 하나 또는 그 이상의 입력 화상들의 대수학적 결합인 출력 화상을 계산하는 것과 관련되어있다. Porter와 Duff는 그러한 연산들의 많을 예를 논했다. RLE블럭들의 한 쌍에서 조합을 수행하기 위한 한 방법은 수식을 계산하기 위해 중간적인 표현으로서 지그재그 벡터를 사용하는 것이다. 예를 들어서, 2개의 RLE블럭들을 곱하기 위해, 그리고 3번째와 더하기 위해, 도 14의 Convolve함수를 첫 2개의 RLE블럭들 상에서 호출할 것이다. 3번째 RLE블럭을 지그재그 벡터에 더하고, 실행 길이 부호화와 엔트로피 부호화 단계를 수행한다. 도 15는 도식적으로 우리의 전략을 설명한다.The basic operations in the table can be combined to form more powerful operations such as Dissolve (simultaneous fade out and fade in two orders of images) and subtitles. Implementation of these operations typically involves calculating an output picture that is an algebraic combination of one or more input pictures. Porter and Duff discussed many examples of such operations. One way to perform the combination on a pair of RLE blocks is to use a zigzag vector as an intermediate representation to calculate the expression. For example, to multiply two RLE blocks, and add the third one, we will call the Convolve function of FIG. 14 on the first two RLE blocks. The third RLE block is added to the zigzag vector, and execution length coding and entropy coding are performed. 15 diagrammatically illustrates our strategy.

Dissolve 연산Dissolve operation

제시를 간략화 하기 위해 엔트로피 엔코딩(encoding)과 디코딩(decoding) 단계들을 생략한다.In order to simplify the presentation, the entropy encoding and decoding steps are omitted.

시간 t(전형적으로 0.25초)에서 화상 S1[t]의 순서를 S2[t]의 순서로 dissolve하기 원한다. 달리 말하면, t=0에서 S1[0]를 표시해야만 한다. t= t에서S2[t]를 표시해야만 한다. 그리고 사이에서 화상들의 선형 조합(linear combination)을 표시하기 원한다.At time t (typically 0.25 seconds), we want to dissolve the order of pictures S1 [t] in order of S2 [t]. In other words, it should indicate S1 [0] at t = 0. At t = t we have to denote S2 [t]. And want to display a linear combination of pictures in between.

D[t]= (t)S1[t]+{1- (t)}S2[t] (14)D [t] = (t) S1 [t] + {1- (t)} S2 [t] (14)

(t)는 t=0과 t= t에서 1인 선형 함수이다.(t) is a linear function of 1 at t = 0 and t = t.

표 1에서, 이 연산을 다음과 같이 RLE블럭들 상에서 대응하는 연산으로 사상할 수 있다. 표로부터, 스칼라 곱셈들은 수식의 첫번째 반에 있는 계수가 S1,D(x)로 변경되어지고, 비슷한 치환이 {1- }에 의해 곱셈에 대해 수행되어진다면 RLE블럭들상에서 직접 수행될 수 있다. 표로부터 또한, 원하는 결과를 얻기 위해 직접 함께 계수들을 더할 수 있다는 것을 안다. 이들 두 새로운 RLE블럭들의 QT는 같기 때문에, qD(x)로. 그리하여, 다음과 같이 방정식 13에 있는 수식을 구현할 수 있다.In Table 1, this operation can be mapped to the corresponding operation on the RLE blocks as follows. From the table, scalar multiplications can be performed directly on RLE blocks if the coefficients in the first half of the equation are changed to S1, D (x) and similar substitutions are performed for the multiplication by {1-}. From the table also know that we can add the coefficients together directly to get the desired result. Since the QTs of these two new RLE blocks are the same, qD (x). Thus, the equation in equation 13 can be implemented as follows.

데이타의 RLE포맷이 영항(zero terms)을 건너뛸 것이라는 사실에 주목함으로서 이 방정식을 효과적으로 구현할 수 있다. RLE블럭 상에서 이 연산을 구현하기 위한 C코드는 도 16에 나타나 있다. 함수 Zero는 그것에 넘겨진 배열을 영으로 한다. 그리고 함수 RunLengthEncode는 h의 실행 길이 부호화(run-length encoding)를 수행한다. 배열 gamma1과 gamma2는 다음과 같이 정의된 미리 계산된 값들을 갖는다.We can effectively implement this equation by noting that the RLE format of the data will skip zero terms. The C code for implementing this operation on the RLE block is shown in FIG. The function Zero makes the array passed to it zero. The function RunLengthEncode performs run-length encoding of h. The arrays gamma1 and gamma2 have precalculated values defined as follows.

이들 값들은 각 화소에 대해 또는 같은 QT를 갖는 화상들의 순서에 대해 한번 계산되어질 수 있다. 반면 Dissolve함수는 하나의 화상에서 각 RLE블럭에 대해 호출되어진다. 이 구현의 성능을 시험하기 위해, 주기억 장치에 있는 화상들의 brute-force와 RLE알고리즘 모두를 실행하고 그들을 비교하는 프로그램을 만들었다.These values can be calculated once for each pixel or for the order of pictures having the same QT. Dissolve, on the other hand, is called for each RLE block in a picture. To test the performance of this implementation, we created a program to run and compare both brute-force and RLE algorithms of images in main memory.

알고리즘들은 28M바이트 메모리를 갖는 Sparcstation 1+상에서 화상들의 25개의 별개의 쌍들에서 실행되어졌다. 시험 화상들은 대략 화소당 1비트로 압축되었다. (24:1압축) 표 2는 결과를 요약한다. 표로부터 볼 수 있는 것과 같이, 속도 향상은 brute-force알고리즘상에서 100 대 1 이상이었다.The algorithms were run on 25 separate pairs of pictures on Sparcstation 1+ with 28 Mbyte memory. The test images were compressed at approximately 1 bit per pixel. Table 2 summarizes the results. As can be seen from the table, the speedup was over 100 to 1 on brute-force algorithms.

자막 연산(Subtitle operation)Subtitle operation

2번째 예제 연산은 서브타이틀로서 압축된 화상 f에 서브타이틀을 겹친다(overlay). 비록 워크스테이션이 많은 방법들에서 이 연산을 지원할 수 있지만(별개의 윈도우상에서 서브타이틀의 텍스트를 표시하는 것과 같은), 2가지 이유로 이 연산을 선택한다. 첫째로, 그것은 대부분의 사람들이 알고 있는 공통적인 연산이다. 그리고 두번째로, 한 화상의 일부를 다른 것과 결합하기를 원할때 사용되어지는 화상 마스킹(imgae masking)의 공통적인 연산의 특정한 예로 작동한다.The second example operation overlays a subtitle with a compressed picture f as a subtitle. Although workstations can support this operation in many ways (such as displaying subtitle text on a separate window), you choose this operation for two reasons. First, it's a common operation that most people know. And secondly, it acts as a specific example of a common operation of image masking that is used when one wants to combine part of an image with another.

검은 배경에 흰색의 글자의 압축된 화상에 서브타이틀은 S로 표기하며, 흰색과 검은 색은 1화소 값이 각각 127과 -128로 표기 되어진다. 텍스트가 표시될 f상의 영역이 검게될 마스크와 f를 곱함으로서 얻어지는 화상과 S를 함께 더함으로서 출력 화상을 만들 수 있다. 자막이 나타나는(subtitling) 출력 화상, h는 다음과 같이 주어진다.Subtitles are marked with an S in a compressed image of white letters on a black background, with one pixel values of 127 and -128 for white and black, respectively. The output image can be made by adding together the image obtained by multiplying the mask to be black by f and the area on f where the text is to be displayed and f. The subtitling output picture, h, is given by

표 1을 사용하여, RLE블럭상에서 대응하는 연산은Using Table 1, the corresponding operation on the RLE block is

도 17에서 C코드는 이 연산을 구현한다. 코드는 두단계로 나누어진다. SubtitleInit 함수는 QT가 화상 또는 순차 화상들을 위해 정의될 때 한번 호출되어 진다. 그리고 Subtitle함수는 화상에서 각 RLE블럭을 위해 호출되어진다.In Figure 17, the C code implements this operation. The code is divided into two stages. The SubtitleInit function is called once when QT is defined for a picture or sequential pictures. Subtitle is then called for each RLE block in the picture.

Dissolve연산과 같이, Subtitle함수는 중간의 결과를 저장하기 위하여 지그재그 hzz벡터를 사용한다. Dissolve연산과 같이, 주기억장치에 저장된 화상들 상에서 brute-force알고리즘과 RLE알고리즘을 구현하는 프로그램들의 성능을 비교한다.시험 파라미터들은 Dissolve연산과 같은 것이다. 표3은 결과를 요약하여, brute-force알고리즘보다 거의 50대1의 속도 향상을 보여준다.Like the Dissolve operation, the Subtitle function uses a zigzag hzz vector to store intermediate results. Like the Dissolve operation, the performance of programs implementing brute-force and RLE algorithms on images stored in the main memory is compared. The test parameters are the same as the Dissolve operation. Table 3 summarizes the results and shows a nearly 50-to-1 speed improvement over the brute-force algorithm.

본 기술의 적용분야는 주문형(On-Demand) 서비스와, 생방송(Live Broadcasting) 등의 멀티미디어 스트리밍 분야 모두에 사용이 가능해진다.The field of application of the technology can be used for both on-demand services and multimedia streaming fields such as live broadcasting.

본 발명의 가장 큰 특징은 기존 MPEG방식과 다음과 같은 7가지에서 구별된다.The biggest feature of the present invention is distinguished from the existing MPEG scheme in the following seven ways.

1. 버퍼링 과정의 소멸1. The disappearance of the buffering process

2. 스트리밍서버의 과부하 해소2. Eliminate overload of streaming server

3. 영상신호와 음성신호의 대역폭 최소화 < 회선의 부담감 최소화>3. Minimize bandwidth of video and audio signals <Minimize line burden>

4. 불필요한 다운로드 과정 해소4. Eliminate unnecessary download process

5. 별도의 플레이어 사용 제거.5. Eliminate the use of separate players.

6. 서버사양의 부담 최소화 <가격절감>6. Minimization of server specification burden <Price reduction>

7. 브라우져의 이식성 <자바 클래스, 에플릿 사용가능>7. Portability of browser <Java class, applet available>

본 발명 SPEG의 동영상 압축방법은 다음과 같다.The video compression method of the present invention SPEG is as follows.

SPEG은 또한 손실압축방법을 사용하여 영상을 압축하는데, 그 주요한 원리는 다음과 같은 방법으로 이루어진다.SPEG also uses lossy compression to compress images, the main principle of which is as follows.

① 압축할 프레임의 한계 값을 정한 다음 각각의 개체별로 압축한다. 프레임의 수를 정하는 것이 아니라 한계 값을 정한 다음 내부에서 연산작업을 수행한다.① Set the limit value of frame to compress and compress by each object. Instead of setting the number of frames, the limit value is set and the operation is performed internally.

동시에 서로 다른 연산자들이 구역을 정한 다음 여러장의 프레임 신호를 압축해 들어가므로 초당 발생할 수 있는 프레임의 수가 늘어나게 된다.At the same time, different operators delimit the area and compress multiple frame signals, increasing the number of frames that can occur per second.

그러므로 영상면에서 보다 디테일한 결과물을 본 발명에 의해 감상할 수 있게 되는 것이며 스트리밍 서버 또한 타 압축 기술만큼 높은 사양을 필요치 않게 되는 것이다.Therefore, more detailed results in terms of video can be enjoyed according to the present invention, and streaming servers also do not need as high specifications as other compression technologies.

② 각각의 영상별로 각각 색인 값을 준 다음 차례로 정렬한다.② Give index values for each image and sort them in order.

이 정열 부분에서 중복되는 연산 값들을 제거해서 한계 연산 값 밖으로 밀어낸다.In this alignment part, duplicate operation values are removed and pushed out of the limit operation value.

여기서 프레임 수를 결정하게 되는데 주로 프레임 연산은 내부 연산자가 하게 되며 압축은 CUP에서 이 부분을 감당하게 되는 것이다.Here, the number of frames is determined. In general, frame operations are performed by internal operators and compression is taken up by CUP.

보다 조밀한 압축이 가능하며 외부에서 자원을 끌어 쓰지 않으므로 버퍼링의 요인이 제거된다.More compact compression is possible, and resources from buffering are eliminated because no resources are drawn from outside.

클라이언트는 접속시 본 발명의 스트리밍 서버와 연동이 될시 본 발명의 내부연산자들이 클라이언트의 유저 그룹에 상주하게 되므로 즉, 상호 동기화가 이루어지게 된다. 그래서 유저들의 체감 사용 빈도는 버퍼링이 일어나지 않는 것처럼보이는 것이다.When the client is connected to the streaming server of the present invention at the time of connection, the internal operators of the present invention reside in the user group of the client, that is, mutual synchronization is achieved. So the user's frequency of use seems to be no buffering.

또한, 이 부분의 연동을 별도의 플레이어가 필요 없게 되는 것 또한 사실 인것이다.It is also true that this part does not require a separate player.

그리고, 사용자들이 보다 편하게 사용할 수 있게 웹에서의 이식성을 강화했으며 이런 연산자들을 자바를 이용해서 에플릿이나 클래스로 바로 띄워 볼 수 있게 만들었다.It also enhanced portability on the web to make it easier for users, and made these operators available directly to applets or classes using Java.

그래서 사용자가 원하는 인터넷 브라우져에 바로 자바로 동기화만 시켜주면 별다른 버퍼링이나 외부 연산 없이 바로 띄워지게 되는 것이다.So if you just synchronize to your favorite Internet browser in Java, it will be displayed immediately without any buffering or external operation.

③ 개체의 영상이 압축된 후 중복되는 부분은 하나의 색인값 만을 남기며 제거하게 된다. 그러므로 보다 가벼운 압축이 가능하게 되며 각각의 색인값은 내부 연산자가 하나씩, 하나씩 다시 연산해 내게 된다.③ After the image of the object is compressed, the overlapping part is removed leaving only one index value. Therefore, lighter compression is possible and each index value is recalculated one by one by the internal operator.

영상 신호와 음성 신호의 상호 싱커 부분이 여기서 이루어진다.The mutual sinker portion of the video and audio signals is made here.

기존의 mpeg 인프라에 비해서 신호체계가 형성되는 이 부분에서 바로 상호 동기화를 시킴으로 해서 대역폭의 한계가 보다 더 조밀해 진다.Compared to the existing mpeg infrastructure, the synchronization of the signal is formed right at this point where the signal system is formed.

그러므로 영상 신호와 음성 신호의 크기를 통일화시키는 작업이 우선이라 하겠다.Therefore, the task of unifying the magnitude of the video signal and the audio signal is a priority.

이 부분에 대해서 아주 많은 고민을 했으며 연구결과 gsm의 모노 압축 방식에 내부 연산자를 영상신호와 묶는데 성공했으며 여기서 내부 필터 자체를 제어할 수 있게 되어 기본적인 라이브 방송을 실현시킬 수 있었다.I was very much concerned about this part, and the research result succeeded in combining the internal operator with the video signal in the gsm mono compression method, and it was possible to control the internal filter itself to realize basic live broadcasting.

이 부분에서 스트리밍 서버의 과부하가 대다수 해소되게 되므로 서버사양이너무 높을 필요가 없게 되는 것이다.In this part, the streaming server's overload is largely eliminated, so the server specification does not need to be too high.

대신 CPU환경이 싱글이든 듀얼이든 강해질수록 내부 연산자들의 부담이 많이 줄어들게 되는 것이다.Instead, the stronger the CPU environment, the single or dual, the less the burden on internal operators.

이런 방식의 압축은 장비에 의존하지 않고 보다 원천적인 압축이 가능하게 된다.This type of compression allows for more native compression without device dependence.

본 발명은 CPU 자원의 한계를 내부 연산자가 각종 필터링을 감당하므로 해서 연산의 의존도를 피했다.The present invention avoids the dependence of the operation by limiting CPU resources because the internal operator handles various filtering.

하나의 연산자를 둔 것이 아니라 압축 포맷의 구간 구간마다 다른 연산자를 두므로 해서 하나의 연산자가 가지는 한계를 여러 연산자가 분담하게 되는 것이다.Instead of having a single operator, different operators are used for each section of the compression format, so multiple operators share the limitations of one operator.

그래서 보다 안정적인 M-JPEG의 장점을 살릴 수 있었다.So we could take advantage of the more stable M-JPEG.

많은 시행착오가 있었으나 결국엔 음성 신호와 영상 신호의 다양한 접목을 감행하므로 연산 오류의 한계를 벗어날 수 있게 되었다.There have been many trials and errors, but in the end, various combinations of audio and video signals can be used to overcome the limits of computational errors.

따라서 본 발명은 보다 조밀해진 압축 능력과 쓸데없는 필터링 등의 제거는 나아가 IMT-2000 인프라로 개발 될 수도 있는 것이다.Therefore, the present invention can be further developed into the IMT-2000 infrastructure by eliminating more compact compression and unnecessary filtering.

별다른 외부 장비의 필요성이 없어지는 본 압축 기술로 IMT-2000 기술에 접목을 생각해 볼 때 기존의 MPEG 4의 개체별 압축 능력보다 빈도면이나 효율성 있는 내부 CPU의 탑재만 가능해 진다면, 본 압축 기술로 IMT-2000에 접목시킬 수 있다.Considering the integration of IMT-2000 technology with this compression technology, which eliminates the need for external equipment, the IMT-based compression technology can only install an internal CPU that is more frequent or efficient than the existing MPEG 4 object-specific compression capability. Can be combined with -2000.

본 압축 기술의 미래는 보다 다양한 내부 필터링의 제어와 보다 조밀화된 음성 신호의 창조 등 등 많은 부분이 있을 수 있겠다.The future of this compression technology could be much more, including more control of internal filtering and the creation of denser speech signals.

본 압축 기술은 다양한 멀티미디어 스트리밍 솔루션등에 모두 접목이 가능며, 보다 개인적인 스트리밍 기술로서 활용되어질 수 있다. 예를 들어 개인전용 방송국 등 기존 라디오 방송에 국한되어 오던 개인 방송국에 본 스트리밍 기술을 접목하므로 해서 스트리밍 방송까지 하게 해 주므로 사용자들의 욕구를 충족해 나아가는 방향도 그 일환이다.This compression technology can be applied to various multimedia streaming solutions and can be used as a more personal streaming technology. For example, by integrating this streaming technology into personal broadcasting stations, which have been limited to existing radio stations such as private broadcasting stations, it also enables streaming broadcasting.

그 뿐만 아니라 무선통신, 교육, 원격 진료 등 다양한 활용이 가능해 지며 기존 MPEG에 국한되어 있는 스트리밍 기술을 보다 더 사용자들이 선택의 폭을 넓혀줌으로 해서 스트리밍 기술 발전에 기여 할 수 있다.In addition, various applications such as wireless communication, education, and telemedicine are possible, and users can contribute to the development of streaming technology by widening the users' choice of streaming technology that is limited to the existing MPEG.

본 발명은 상술한 특정의 바람직한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형실시가 가능한 것은 물론이고, 그와같은 변경은 청구범위 기재의 범위 내에 있게 된다.The present invention is not limited to the above-described specific preferred embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention claimed in the claims. Of course, such changes will fall within the scope of the claims.

상기와 같이 이루어진 본 발명은 버퍼링 과정을 소멸시킨 장점과, 스트리밍 서버의 과부하를 해소하였다는 장점과, 영상신호와 음성신호의 대역폭을 최소화하여 회선의 부담감을 최소화하였다는 장점과, 불필요한 다운로드 과정을 해소하였다는 장점과, 별도의 플레이어 사용을 제거하였다는 장점과, 서버사양의 부담을 최소화하여 가격을 절감하였다는 장점과, 자바 클래스나 에플릿 등을 이용하여 범용 브라우져에 이식하여 작동 가능토록 한 장점등이 있어서, 인터넷 상에서 멀티미디어를 주고받을 시 큰 효과가 기대되는 발명이다.The present invention made as described above has the advantage of eliminating the buffering process, eliminating the overhead of the streaming server, the advantage of minimizing the burden of the line by minimizing the bandwidth of the video signal and audio signal, unnecessary download process The advantages of eliminating the problem, eliminating the use of a separate player, minimizing the burden on the server specification, and reducing the price, and the advantage of porting it to a general-purpose browser using Java classes or applets. The present invention is expected to have a great effect when sending and receiving multimedia on the Internet.

Claims

In configuring a method of compressing and streaming a video using a lossy compression method using SPEG on the Internet,

Determine the limit value of the frame to be compressed and then compress it for each object,

Step 2 to give the index values for each image and then sort them in order,

After the image of the object is compressed, the overlapping part is removed in three steps, leaving only one index value.

The fourth step of allowing the internal operators of the streaming server to reside in the user group of the client when the client connects to achieve mutual synchronization without a separate player;

Streaming method using a video compression method using SPEG characterized in that the internal operators to go through a five-step to be displayed as an applet or a class in a universal browser on the web using Java.

The method of claim 1,

The first step of determining the limit value of the frame to be compressed and then compressing each individual object;

Instead of determining the number of frames, it uses a video compression method using M-JPEG, which is characterized by the step of setting a limit value, performing internal calculations, delimiting different operators, and then compressing multiple frame signals. By streaming method.

The method of claim 1,

The second step of giving the index value for each of the respective images and then sorting in sequence;

Removing duplicate operation values and pushing them out of the limit operation value;

In the above step, the number of frames is determined, the frame operation is mainly performed by an internal operator, and the compression is performed in the CUP to remove the buffering. The streaming method using the video compression method using M-JPEG.

The method of claim 1,

After the image of the object is compressed, the overlapping part may be removed while leaving only one index value;

After the compression step, each index value is recalculated one by one internal operator for each interval section of the compression format to unify the size of the video signal and the audio signal so that the mutual sinker part is made here to synchronize the bandwidth. Streaming method using the video compression method using M-JPEG, characterized in that the step of densifying the limits.

The method of claim 4, wherein

The method of unifying the magnitudes of the video signal and the audio signal includes controlling the internal filter itself by tying an internal operator with the video signal in a mono compression scheme of gsm;

A basic operation performed by the video signal processing operator and the audio signal processing operator dedicated to signal recognition operator processing that accepts only signals;

Residing at an internal memory address of a memory filtering operator for filtering the two signals after the operation;

Streaming method using a video compression method using M-JPEG characterized in that to prevent the overload of the streaming server consisting of the step of processing the operation by bringing the expert operator from the cpu to the moment whenever necessary.

5. A streaming method according to any one of the preceding claims, characterized in that it is a compression technique Stream Picture Expert Group (SPEG).

Streaming method that enables complete frame continuity through external operators.

Streaming method that eliminates buffering by handling streaming in such a way that the internal operator resides on the CPU and dramatically reduces the load on the CPU while the internal operator's processors reside in memory.

Signal processing operator and pixel processing operator Streaming method in which processors are controlled separately by using voice processing operators.