KR20140102286A

KR20140102286A - Encoding and decoding using perceptual representations

Info

Publication number: KR20140102286A
Application number: KR1020147018978A
Authority: KR
Inventors: 씬 티. 매카시; 비자이 카마쉬
Original assignee: 제너럴 인스트루먼트 코포레이션
Priority date: 2011-12-09
Filing date: 2012-12-07
Publication date: 2014-08-21
Also published as: JP6117818B2; CA2858413A1; AU2016201449B2; KR101656160B1; CN103988510A; EP2789164A1; CA2858413C; US20130148731A1; MX2014006895A; MX342362B; AU2016201449A1; AU2012347602A1; CN103988510B; WO2013086319A1; US9503756B2; JP2015506144A

Abstract

픽쳐들을 포함하는 비디오 신호를 인코딩하는 것은 픽쳐들에 기초하여 인식 표현들을 생성하는 것을 포함한다. 기준 픽쳐들이 선택되고, 모션 픽쳐들이 인식 표현들 및 기준 픽쳐들에 기초하여 생성된다. 기준 픽쳐들에 대한 모션 벡터들 및 포인터들이 인코딩된 비디오 신호 내에 제공된다. 디코딩은 기준 픽쳐들에 대한 포인터들 및 기준 픽쳐들의 인식 표현들에 기초한 모션 벡터들을 수신하는 것을 포함한다. 인코딩된 비디오 신호 내의 픽쳐들의 디코딩은 포인터들을 사용하여 기준 픽쳐들을 선택하는 것, 및 모션 벡터들과 선택된 기준 픽쳐들에 기초하여 예측된 픽쳐들을 결정하는 것을 포함할 수 있다. 디코딩은 예측된 픽쳐들 및 레지듀얼 픽쳐들로부터 재구성된 픽쳐들을 생성하는 것을 포함할 수 있다.Encoding a video signal that includes pictures includes generating recognition expressions based on the pictures. The reference pictures are selected and motion pictures are generated based on the recognition expressions and the reference pictures. Motion vectors and pointers to the reference pictures are provided in the encoded video signal. The decoding includes receiving motion vectors based on the perceptual representations of the reference pictures and pointers to the reference pictures. The decoding of pictures in the encoded video signal may include selecting reference pictures using pointers, and determining predicted pictures based on the motion vectors and the selected reference pictures. The decoding may include generating reconstructed pictures from the predicted pictures and the residual pictures.

Description

[0001] ENCODING AND DECODING USING PERCEPTUAL REPRESENTATIONS [0002]

모션 프로세싱은 모션 추정 및 모션 보상 모두에서의 모션 벡터의 사용을 포함한다. 모션 추정은 모션 벡터들을 결정하는 프로세스이다. 모션 벡터는 일반적으로 비디오 시퀀스 내의 인접하는 프레임 또는 픽쳐로부터, 하나의 2차원 이미지로부터 또다른 2차원 이미지로의 오브젝트들의 변환을 기술한다. 모션 보상은 기술된 오브젝트들의 비디오 시퀀스 내의 후속하는 픽쳐로의 변환을 합성하기 위해 하나의 픽쳐 내의 오브젝트들에 결정된 모션 벡터들을 적용하는 프로세스이다. 모션 추정과 모션 보상의 결합은 비디오 압축의 핵심 부분이며, 종종 프로세싱 비용의 견지에서 크게 요구된다.Motion processing involves the use of motion vectors in both motion estimation and motion compensation. Motion estimation is a process of determining motion vectors. A motion vector generally describes the transformation of objects from one two-dimensional image to another two-dimensional image, from adjacent frames or pictures in a video sequence. Motion compensation is the process of applying determined motion vectors to objects in one picture to synthesize the transformation of the described objects into subsequent pictures in the video sequence. The combination of motion estimation and motion compensation is a key part of video compression and is often highly demanded in terms of processing costs.

모션 프로세싱에서 모션 벡터는 직접적이거나 간접적인 것으로서 분류될 수 있는 방법에 의해 결정된다. 실제로, 피라미드 및 블록-기반 탐색에 의존하는 직접적 방법이 통상적으로 비디오 인코더에서 사용된다. 직접적 방법은 종종 이들 방법들에 의해 결정되는 모션 벡터의 정확도 및/또는 정밀도를 증가시키기 위해 프로세싱 능력 및 프로세싱 비용들의 증가를 요한다.In motion processing, motion vectors are determined by how they can be classified as direct or indirect. Indeed, direct methods that rely on pyramid and block-based searching are commonly used in video encoders. Direct methods often require an increase in processing power and processing costs to increase the accuracy and / or accuracy of the motion vectors determined by these methods.

모션 벡터를 결정하기 위한 간접적 방법은 픽쳐 내의 국부적 또는 전역적 영역에 걸쳐 적용된, 통계 함수(statistical function)들을 종종 사용하여, 픽쳐들 내에서 발생하는 추정된 움직임과 생성된 모션 벡터들 사이의 매치들을 식별한다. 충실도 메트릭(fidelity metrics)이 실제 모션에 대응하지 않는 잘못된 매치(false match)를 식별하여 제거하려고 시도할 시에 일반적으로 활용된다. 그러나, 충실도 메트릭은 종종, 에러인 기회적 최상의 매치(opportunistic best match), 및 코딩을 위한 더 많은 비트들의 요구로 인해 비효율적인 모션 벡터 이상치(motion vector outlier)들을 초래한다. 이러한 제한은 비디오 압축 품질 및 효율성을 감소시키는 경향이 있다.An indirect method for determining a motion vector is to use statistical functions that are applied over a local or global area within a picture to generate matches between the estimated motion generated in the pictures and the generated motion vectors . It is commonly used when fidelity metrics attempt to identify and eliminate false matches that do not correspond to real motion. However, fidelity metrics often result in inefficient motion vector outliers due to opportunistic best match, which is an error, and the demand for more bits for coding. This limitation tends to reduce video compression quality and efficiency.

또한, 기존의 평가 방법들은, 충실도 메트릭에 의존할 시에, 픽쳐 내의 높은 콘트라스트 영역들을 선호하는 경향이 있다. 이것은 종종 로우 텍스쳐(low texture)의 영역들에 대한 열악한 모션 추정치들을 산출하고, 일반적으로 이들 로우 텍스쳐들에서의 현저하게 부정확한 모션을 초래한다. 또한, 충실도 메트릭들은 종종 콘트라스트, 밝기, 블러, 추가 잡음, 아티팩트(artifact), 및 페이드(fades), 디졸브(dissolves), 및 압축 동안 발생할 수 있는 다른 차이들에 대한 비디오 시퀀스 내의 변경 동안 발생할 수 있는 모션을 구별하는 것에 실패한다. 이러한 다른 제한들 역시 비디오 압축 품질 및 효율성을 감소시키는 경향이 있다.Also, existing evaluation methods tend to favor high contrast areas in a picture when relying on a fidelity metric. This often results in poor motion estimates for regions of low texture and generally results in significantly inaccurate motion in these low textures. Fidelity metrics are also often used to determine the contrast, brightness, blur, additional noise, artifacts, and fades, dissolves, and other differences that may occur during compression, Fail to distinguish the motion. These other constraints also tend to reduce video compression quality and efficiency.

이러한 환경들 중 임의의 환경에서의 충실도 메트릭의 약점들은 모션 프로세싱 능력을 증가시킴으로써 종종 완화될 수 있는데, 이는 프로세싱 비용을 상승시킨다. 그럼에도, 충실도 메트릭이 덜 효과적인 환경에서, 기존의 평가 방법들을 사용하는 모션 프로세싱은 종종 비디오 압축에서 더 정확한/정밀한 모션 벡터를 달성하는 것과 프로세싱 비용을 낮추는 것 사이의 절충을 요구한다.The weaknesses of the fidelity metrics in any of these environments can often be mitigated by increasing the motion processing capability, which increases the processing cost. Nevertheless, in environments where fidelity metrics are less effective, motion processing using existing evaluation methods often requires a trade-off between achieving more accurate / precise motion vectors in video compression and lowering processing costs.

본 발명의 실시예들에 따르면, 모션 벡터를 결정하거나 활용할 시에 인식 표현(perceptual representation)을 활용하여 인코딩하고 디코딩하기 위해 제공하는 시스템, 방법 및 컴퓨터 판독가능한 매체(CRM)가 존재한다. 인식 표현의 활용은 개선된 정확도 및/또는 정밀도를 가지는 모션 벡터를 생성한다. 인식 표현은 픽쳐 내의 로우 텍스쳐의 영역에 대한 그리고/또는 트랜지션 시퀀스(transition sequence) 내의 픽쳐에 대한 모션 벡터의 정확도 및/또는 정밀도를 증가시킬 시에 활용될 수 있다. 모션 벡터의 정확도 및 정밀도는 특히, 콘트라스트, 밝기, 블러, 추가 잡음, 아티팩트, 및 페이드, 디졸브 및 압축 동안 발생할 수 있는 다른 차이들에 대한 변경들을 포함하는 비디오 시퀀스에 대해 증가한다. 모션 벡터들을 결정하거나 활용할 시에 인식 표현들을 활용하는 것은 개선된 압축 효율성을 산출하고, 모션 프로세싱 요건들 및/또는 프로세싱 비용을 낮춘다.In accordance with embodiments of the present invention, there are systems, methods, and computer readable media (CRM) that provide for encoding and decoding utilizing perceptual representations when determining or utilizing motion vectors. The use of a cognitive representation produces a motion vector with improved accuracy and / or accuracy. Recognition representations may be utilized to increase the accuracy and / or accuracy of the motion vector for the area of the low texture in the picture and / or for the picture in the transition sequence. The accuracy and precision of the motion vector increases especially for video sequences that include changes to contrast, brightness, blur, additional noise, artifacts, and other differences that may occur during fade, dissolve, and compression. Utilizing the recognition expressions in determining or utilizing motion vectors yields improved compression efficiency and lowers motion processing requirements and / or processing costs.

실시예에 따르면, 인코딩을 위한 시스템은 픽쳐를 포함하는 비디오 시퀀스 내에 오리지널 픽쳐를 포함하는 비디오 신호를 수신하도록 구성되는 인터페이스를 포함한다. 시스템은 수신된 오리지널 픽쳐에 기초하여 타겟 인식 표현을 생성하고, 복수의 기준 픽쳐로부터 기준 픽쳐를 선택하고, 타겟 인식 표현 및 선택된 기준 픽쳐에 기초하여 모션 벡터 정보를 결정하도록 구성되는 프로세서를 포함한다. 결정된 모션 벡터 정보는 기준 픽쳐 및 타겟 인식 표현들의 속성에 기초하여 결정된다. 시스템은 모션 벡터 정보를 인코딩하고, 선택된 기준 픽쳐와 연관된 포인터를 인코딩한다.According to an embodiment, a system for encoding comprises an interface configured to receive a video signal comprising an original picture within a video sequence comprising a picture. The system includes a processor configured to generate a target recognition representation based on the received original picture, to select a reference picture from the plurality of reference pictures, and to determine motion vector information based on the target recognition representation and the selected reference picture. The determined motion vector information is determined based on the attributes of the reference picture and the target recognition expressions. The system encodes the motion vector information and encodes the pointer associated with the selected reference picture.

또다른 실시예에 따르면, 인코딩을 위한 방법은 픽쳐를 포함하는 비디오 시퀀스 내에 오리지널 픽쳐를 포함하는 비디오 시퀀스를 수신하는 단계; 수신된 오리지널 픽쳐에 기초하여 타겟 인식 표현을 생성하는 단계; 복수의 기준 픽쳐로부터 기준 픽쳐를 선택하는 단계; 프로세서를 활용해서, 타겟 인식 표현 및 기준 픽쳐에 기초하여 모션 벡터 정보를 결정하는 단계 - 결정된 모션 벡터 정보는 기준 픽쳐 및 타겟 인식 표현의 속성에 기초하여 결정됨 - ; 결정된 모션 벡터 정보를 인코딩하고, 기준 픽쳐와 연관된 포인터를 인코딩하는 단계를 포함한다.According to yet another embodiment, a method for encoding includes receiving a video sequence comprising an original picture within a video sequence comprising a picture; Generating a target recognition representation based on the received original picture; Selecting a reference picture from a plurality of reference pictures; Utilizing the processor to determine motion vector information based on the target recognition representation and the reference picture, the determined motion vector information being determined based on attributes of the reference picture and the target recognition representation; Encoding the determined motion vector information, and encoding a pointer associated with the reference picture.

인코딩을 위한 방법은 비-일시적 컴퓨터 판독가능한 매체 상에 저장된 컴퓨터 판독가능한 명령에 의해 구현될 수 있다. 명령은 방법을 수행하기 위해 프로세서에 의해 실행될 수 있다.The method for encoding may be implemented by computer readable instructions stored on non-transitory computer readable media. The instructions may be executed by the processor to perform the method.

또다른 실시예에 따르면, 디코딩하기 위한 시스템은 모션 벡터 정보를 수신하도록 구성되는 인터페이스를 포함한다. 모션 벡터 정보는 픽쳐를 포함하는 비디오 시퀀스로부터의 오리지널 픽쳐에 기초한 타겟 인식 표현, 및 타겟 인식 표현과 연관된 기준 픽쳐에 기초할 수 있다. 인터페이스는 또한 기준 픽쳐와 연관된 포인터를 수신하고, 수신된 모션 벡터 정보와 연관된 레지듀얼(residual) 픽쳐를 수신하도록 구성된다. 시스템은 또한 수신된 포인터를 활용하여 복수의 기준 픽쳐로부터 기준 픽쳐를 선택하고, 수신된 모션 벡터 정보 및 선택된 기준 픽쳐에 기초하여 예측된 픽쳐를 결정하고, 예측된 픽쳐 및 레지듀얼 픽쳐에 기초하여 재구성된 픽쳐를 생성하도록 구성되는 프로세서를 포함한다.According to yet another embodiment, a system for decoding includes an interface configured to receive motion vector information. The motion vector information may be based on a target recognition representation based on the original picture from the video sequence containing the picture, and a reference picture associated with the target recognition representation. The interface is further configured to receive a pointer associated with the reference picture and to receive a residual picture associated with the received motion vector information. The system may also utilize the received pointer to select a reference picture from a plurality of reference pictures, determine a predicted picture based on the received motion vector information and the selected reference picture, and reconstruct a picture based on the predicted picture and the residual picture And generate a decoded picture.

또다른 실시예에 따르면, 디코딩하기 위한 방법은 모션 벡터 정보를 수신하는 단계 - 모션 벡터 정보는 픽쳐를 포함하는 비디오 시퀀스로부터의 오리지널 픽쳐에 기초한 타겟 인식 표현, 및 타겟 인식 표현과 연관된 기준 픽쳐에 기초함 - ; 각자의 기준 픽쳐와 연관된 포인터를 수신하는 단계; 수신된 모션 벡터 정보와 연관된 레지듀얼 픽쳐를 수신하는 단계; 각자의 수신된 포인터를 활용하여 복수의 기준 픽쳐로부터 기준 픽쳐를 선택하는 단계; 프로세서를 활용해서, 수신된 모션 벡터 정보 및 각자의 선택된 기준 픽쳐에 기초하여 예측된 픽쳐를 결정하는 단계; 및 결정된 예측된 픽쳐 및 수신된 레지듀얼 픽쳐에 기초하여 재구성된 픽쳐를 생성하는 단계를 포함한다.According to yet another embodiment, a method for decoding includes receiving motion vector information, the motion vector information including a target recognition representation based on an original picture from a video sequence comprising a picture, and a reference picture associated with the target recognition representation -; Receiving a pointer associated with a respective reference picture; Receiving a residual picture associated with received motion vector information; Selecting a reference picture from a plurality of reference pictures using each of the received pointers; Utilizing the processor to determine the predicted picture based on the received motion vector information and the respective selected reference picture; And generating a reconstructed picture based on the determined predicted picture and the received residual picture.

디코딩하기 위한 방법은 비-일시적 컴퓨터 판독가능한 매체에 저장된 컴퓨터 판독가능한 명령에 의해 구현될 수 있다. 명령은 방법을 수행하기 위해 프로세서에 의해 실행될 수 있다.The method for decoding may be implemented by computer readable instructions stored on a non-transitory computer readable medium. The instructions may be executed by the processor to perform the method.

예시 및 개시내용의 특징은 도면을 참조하여 다음의 설명으로부터 당업자에게 명백하다.
도 1은 예시에 따른 인식 표현을 활용하는 인식 인코딩 시스템을 예시하는 블록도이다.
도 2는 예시에 따른 인식 표현을 활용하는 인식 디코딩 시스템을 예시하는 블록도이다.
도 3은 예시에 따른 인식 표현과 오리지널 픽쳐를 도시하는 사진 이미지의 도면이다.
도 4는 예시에 따른 인식 표현을 생성하기 위한 프로세스에서의 계산을 예시하는 흐름도이다.
도 5는 예시에 따른, 오리지널 픽쳐와 상이한 압신 인자(companding factor)에 기초한 일련의 인식 표현을 도시하는 사진 이미지의 도면이다.
도 6은 예시에 따른, 오리지널 픽쳐에 적용되는 콘트라스트의 변경에 대한 인식 표현의 복원(resilience)을 도시하는 사진 이미지의 도면이다.
도 7은 예시에 따른, 오리지널 픽쳐에 적용되는 밝기의 변경에 대한 인식 표현의 복원을 도시하는 사진 이미지의 도면이다.
도 8은 예시에 따른 인식 표현을 활용하는 인코딩을 위한 시스템에서의 모션 추정 흐름 프로세스를 예시하는 블록도이다.
도 9는 예시에 따른 컨텐츠 분배 시스템을 예시하는 블록도이다.
도 10은 예시에 따른 인식 표현을 활용하는 인코딩을 위한 방법을 예시하는 흐름도이다.
도 11은 예시에 따른 인식 표현을 활용하는 디코딩을 위한 방법을 예시하는 흐름도이다.
도 12는 예시에 따른, 인코딩을 위한 시스템 및/또는 디코딩을 위한 시스템에 대한 플랫폼을 제공하는 컴퓨터 시스템을 예시하는 블록도이다.The features of the illustrations and disclosure are apparent to those skilled in the art from the following description with reference to the drawings.
1 is a block diagram illustrating a recognition encoding system that utilizes an example representation of a recognition representation.
FIG. 2 is a block diagram illustrating a recognition decoding system that utilizes a cognitive representation according to an example.
3 is a diagram of a photographic image showing the recognition representation and the original picture according to an example.
4 is a flow chart illustrating a calculation in a process for generating an awareness representation according to an example.
Figure 5 is a drawing of a photographic image illustrating a series of recognition expressions based on a companding factor different from the original picture, according to an example.
6 is a pictorial image illustrating the resilience of a perceptual representation of a change in contrast applied to an original picture, according to an example.
7 is a drawing of a photographic image illustrating the reconstruction of a recognized representation of a change in brightness applied to an original picture, according to an example;
FIG. 8 is a block diagram illustrating a motion estimation flow process in a system for encoding utilizing an example recognition perceptual representation. FIG.
9 is a block diagram illustrating a content distribution system according to an example.
Figure 10 is a flow chart illustrating a method for encoding that utilizes a recognizable representation according to an example.
11 is a flow chart illustrating a method for decoding utilizing an example representation of a recognition representation.
12 is a block diagram illustrating a computer system providing a platform for encoding and / or decoding systems, in accordance with an illustrative embodiment.

간략함과 예시의 목적으로, 본 발명은 본 발명의 실시예 및 예시를 주로 참조함으로써 기술된다. 후속하는 기재에서, 다수의 특정 상세항목이 예시의 철저한 이해를 제공할 목적으로 설명되어 있다. 그러나, 본 발명이 이들 특정 상세항목에 제한되지 않고도 실시될 수 있다는 점이 자명하다. 다른 경우들에서, 일부 방법 및 구조는 기재를 불필요하게 모호하게 하지 않기 위해 상세히 기술되지 않았다. 또한, 상이한 실시예가 하기에 기재되어 있다. 예시가 상이한 조합으로 함께 사용되거나 수행될 수 있다. 본원에서 사용되는 바와 같이, 용어 "포함하다", 용어 "포함하는"은, 포함하지만 그것으로 제한되지 않음을 의미한다. 용어 "~에 기초하는"은 적어도 부분적으로 기초하는 것을 의미한다.For simplicity and illustration purposes, the present invention is described by reference to embodiments and examples of the invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the examples. However, it is apparent that the present invention can be practiced without being limited to these specific details. In other instances, some methods and structures have not been described in detail in order not to unnecessarily obscure the substrate. Further, different embodiments are described below. The examples may be used or performed in different combinations together. As used herein, the term " comprises, "or " comprising" means including but not limited to. The term "based on" means at least partially based.

후속하는 예시에서 보여지는 바와 같이, 인식 표현에 기초하여 모션 벡터 정보를 인코딩하고 디코딩하기 위한 인식 엔진, 인코딩 및 디코딩 시스템, 방법, 및 컴퓨터-판독가능한 매체(CRM)에 저장된 기계 판독가능한 명령이 존재한다. 인식 표현은 비디오 시퀀스 내에 있는 것과 같은 프레임 및/또는 픽쳐들의 맵을 포함한다. 인식 표현 내의 맵은 픽셀과 같은 픽쳐 내의 단위와 연관된 계산된 값을 포함할 수 있다. 인식 표현의 맵 내의 계산된 값은 휴먼 인식의 모델(model of human perception)에 기초하여 개발될 수 있다. 인식 표현, 및 인식 표현이 생성되어 인코딩 및 디코딩에서 활용되는 방법에 관한 추가적인 상세항목이 하기에 제공된다.As shown in the following example, there is a recognition engine, encoding and decoding system, method, and machine-readable instructions stored in a computer-readable medium (CRM) for encoding and decoding motion vector information based on a cognitive representation do. Recognition representations include frames of the same frames and / or pictures as in the video sequence. The map in the recognized representation may include a calculated value associated with a unit in the same picture as the pixel. The computed values in the map of the recognition representation can be developed based on a model of human perception. Additional details regarding the manner in which recognition and presentation are generated and utilized in encoding and decoding are provided below.

도 1을 참조하면, 인식 인코딩 시스템(100)이 도시되어 있는데, 이는, 예컨대, 전송 스트림과 같은 압축된 비트스트림에서 컨텐츠를 분배하기 위한 헤드엔드에 있는 장치 내에서 발견될 수 있다. 예시에 따르면, 인식 인코딩 시스템(100)은 비디오 시퀀스(101)와 같은 비디오 시퀀스를 수신한다. 비디오 시퀀스는 비디오 비트스트림에 포함될 수 있다. 비디오 시퀀스(101)는 메모리(102)와 같은, 인식 인코딩 시스템(100)과 연관된 메모리 내에 오리지널 픽쳐로서 위치되거나 저장될 수 있는 프레임 또는 픽쳐를 포함할 수 있다. 메모리(102)는 하나 이상의 버퍼 또는 더 높은 용량의 스토리지를 포함할 수 있다. 비디오 시퀀스(101)로부터의 픽쳐는 인식 엔진(104)에 의해 인식 표현으로 변환되어, 메모리(102)에 저장될 수 있다. 인식 표현을 생성할 수 있는 상세한 단계 및 파라미터가 하기에 더욱 상세하게, 예컨대 도 4에 관련하여 기술되어 있다.Referring to FIG. 1, a recognition encoding system 100 is shown, which may be found in a device at the head end for distributing content in a compressed bitstream, such as, for example, a transport stream. According to an example, the recognition encoding system 100 receives a video sequence, such as a video sequence 101. The video sequence may be included in the video bitstream. The video sequence 101 may include a frame or picture that may be located or stored as an original picture in a memory associated with the recognition encoding system 100, such as the memory 102. [ The memory 102 may include one or more buffers or higher capacity storage. The picture from the video sequence 101 may be converted into a recognized representation by the recognition engine 104 and stored in the memory 102. The detailed steps and parameters that can generate a recognized representation are described in more detail below, e.g., with reference to FIG.

타겟 픽쳐(103)와 같은 타겟 픽쳐가 압축 및 인코딩을 위해 메모리(102)로부터 검색될 수 있다. 타겟 픽쳐(103)는 비디오 시퀀스(101)로부터의 오리지널 픽쳐일 수 있다. 또한, 모션 벡터를 결정하기 위한 기준 픽쳐(106)가 메모리(102)로부터 검색될 수 있다. 타겟 픽쳐(103) 및 기준 픽쳐(106)가 모션 보상기(116)에 시그널링되어(signaled) 예측된 픽쳐(110) 및 모션 벡터(113)를 생성한다. 모션 벡터(113)는 하기에 기술되는 바와 같이, 타겟 픽쳐(103)와 기준 픽쳐(106)의 인식 표현으로부터 생성될 수 있다. 인식 표현은 타겟 인식 표현(105)과 기준 인식 표현(108)으로서 도시되어 있다.A target picture such as the target picture 103 may be retrieved from the memory 102 for compression and encoding. The target picture 103 may be an original picture from the video sequence 101. [ Further, a reference picture 106 for determining a motion vector may be retrieved from the memory 102. [ The target picture 103 and the reference picture 106 are signaled to the motion compensator 116 to generate the predicted picture 110 and the motion vector 113. The motion vector 113 may be generated from the recognized representation of the target picture 103 and the reference picture 106, as described below. The recognition representation is shown as a target recognition representation (105) and a reference recognition representation (108).

포인터(114)는 기준 픽쳐(106)와 연관될 수 있다. 포인터(114)는 기준 픽쳐(106) 또는 기준 픽쳐와 연관된 속성을 식별할 수 있다. 포인터(114)는 아이덴티티(identity), 연관(association), 속성, 메모리 어드레스와 같은 위치 등일 수 있다. 포인터(114)는 기준 픽쳐(106)에 기초하거나 기준 픽쳐(106)와 연관된 다운스트림 디코딩 프로세스를 위해 인식 인코딩 시스템(100)으로부터 인코딩되고 전송될 수 있다.The pointer 114 may be associated with a reference picture 106. Pointer 114 may identify an attribute associated with reference picture 106 or a reference picture. Pointer 114 may be an identity, an association, an attribute, a location such as a memory address, and so on. The pointer 114 may be encoded and transmitted from the perceptual encoding system 100 for a downstream decoding process based on or associated with the reference picture 106. [

예시에 따르면, 타겟 픽쳐(103)가 메모리(102)로부터 검색되어 모션 보상기(116)에 시그널링된다. 또한, 타겟 픽쳐(103)로부터 인식 엔진(104)에 의해 생성될 수 있는 타겟 인식 표현(105)이 메모리(102)로부터 검색되어 모션 보상기(116)에 시그널링된다. 기준 픽쳐(106)는 메모리(102)로부터 선택기(117)에 의해 선택되어 모션 보상기(116)에 시그널링된다. 기준 픽쳐(106)로부터 인식 엔진(104)에 의해 생성될 수 있는 기준 인식 표현(108)은 메모리(102)로부터 검색되어 모션 보상기(116)에 시그널링된다. 모션 보상기(116)는 모션 추정기(109) 및 예측된 픽쳐 생성기(115)를 포함할 수 있다. 모션 추정기(109)는 타겟 인식 표현(105) 및 기준 인식 표현(108)을 수신하고, 이 둘을 활용하여 모션 벡터(113)를 결정한다. 모션 벡터(113)는, 포인터(114)와는 별도로 또는 포인터(114)와 함께, 압축된 비디오 비트스트림에서 인코딩되어 전송될 수 있다. 모션 벡터(113)는 타겟 인식 표현(105)에서의 블록과 유사한 기준 인식 표현(108)에서의 블록을 스캐닝 및 식별하고, 유사한 블록에 대한 포인터를 생성함으로써 결정될 수 있다.According to the example, the target picture 103 is retrieved from the memory 102 and signaled to the motion compensator 116. A target recognition representation 105 that may be generated by the recognition engine 104 from the target picture 103 is also retrieved from the memory 102 and signaled to the motion compensator 116. The reference picture 106 is selected by the selector 117 from the memory 102 and signaled to the motion compensator 116. A reference recognition representation 108 that may be generated by the recognition engine 104 from the reference picture 106 is retrieved from the memory 102 and signaled to the motion compensator 116. [ The motion compensator 116 may include a motion estimator 109 and a predicted picture generator 115. The motion estimator 109 receives the target recognition representation 105 and the reference recognition representation 108 and utilizes both to determine the motion vector 113. The motion vector 113 may be encoded and transmitted in a compressed video bitstream separately from the pointer 114 or along with the pointer 114. The motion vector 113 may be determined by scanning and identifying the block in the reference recognition representation 108 similar to the block in the target recognition representation 105 and generating a pointer to a similar block.

예측된 픽쳐 생성기(115)는 모션 추정기(109)에 의해 결정된 모션 벡터(113), 및 기준 픽쳐(106)를 사용하여 예측된 픽쳐(110)를 생성한다. 감산기(111)는 예측된 픽쳐(110)를 타겟 픽쳐(103)와 함께 수신하고 프로세싱하여 레지듀얼 픽쳐(112)를 생성할 수 있다. 레지듀얼 픽쳐(112)는 디코딩 시스템에 대한 다운스트림의 인코딩 및 전송을 위해 감소된다. 레지듀얼 픽쳐(112)는 모션 벡터(113)와 연관된 영역과 같은, 타겟 픽쳐(103)의 모션 추정 지역을 배제할 수 있다. 레지듀얼 픽쳐(112)는 타겟 픽쳐(103)에 기초하는 또는 타겟 픽쳐(103)와 연관된 다운스트림 디코딩 프로세스를 위해 인식 인코딩 시스템(100)으로부터 전송된 인코딩된 픽쳐이다.The predicted picture generator 115 generates the motion vector 113 determined by the motion estimator 109 and the predicted picture 110 using the reference picture 106. [ The subtractor 111 may receive and process the predicted picture 110 along with the target picture 103 to generate the residual picture 112. [ The residual picture 112 is reduced for downstream encoding and transmission to the decoding system. The residual picture 112 may exclude a motion estimation area of the target picture 103, such as an area associated with the motion vector 113. [ The residual picture 112 is an encoded picture that is sent from the recognition encoding system 100 for a downstream decoding process based on or related to the target picture 103. [

도 2를 참조하면, 인식 디코딩 시스템(200)이 도시되어 있는데, 예컨대, 셋톱 박스, 트랜스코더, 핸드셋, 개인용 컴퓨터, 또는 전송 스트림과 같은 압축된 비트스트림 내의 컨텐츠를 수신하기 위한 다른 클라이언트 디바이스와 같은 장치 내에서 발견될 수 있다. 예시에 따르면, 인식 디코딩 시스템(200)은 레지듀얼 픽쳐(112), 모션 벡터(113) 및 포인터(114)를 수신한다. 이들 중 어느 것이라도 메모리(201)와 같은 인식 디코딩 시스템(200)과 연관된 메모리에 위치되거나 저장될 수 있다. 인식 디코딩 시스템(200)은 포인터(114)를 활용하여 기준 픽쳐(202)와 같은 기준 픽쳐를 메모리(201)로부터 선택할 수 있다. 기준 픽쳐(202)는 기준 픽쳐(106)에 대응하거나 기준 픽쳐(106)와 연관된다. 기준 픽쳐(202)와 기준 픽쳐(106) 사이의 관계는 포인터(114)를 통해 결정되거나 식별될 수 있다.Referring to FIG. 2, a recognition decoding system 200 is shown, such as a set-top box, transcoder, handset, personal computer, or other client device for receiving content in a compressed bitstream, such as a transport stream Can be found in the device. According to an example, recognition decoding system 200 receives a residual picture 112, a motion vector 113, and a pointer 114. Any of these may be located or stored in a memory associated with the recognition decoding system 200, such as memory 201. [ The recognition decoding system 200 may utilize the pointer 114 to select a reference picture, such as the reference picture 202, The reference picture 202 corresponds to the reference picture 106 or to the reference picture 106. The relationship between the reference picture 202 and the reference picture 106 can be determined or identified through the pointer 114. [

예시에 따르면, 모션 보상기(205)와 같은 인식 디코딩 시스템(200) 내의 모션 보상기는 기준 픽쳐(202)와 모션 벡터(113) 모두를 수신할 수 있다. 모션 보상기(205)는 예측된 픽쳐(206)와 같은 예측된 픽쳐를 생성할 수 있다. 예측된 픽쳐(206)는 기준 픽쳐(202) 및 모션 벡터(113)에 기초하여 생성될 수 있다. 예측된 픽쳐(206)는 가산기(207)와 같은 가산기에 시그널링될 수 있다. 가산기(207)는 재구성된 픽쳐(208)와 같은 재구성된 픽쳐를 생성할 수 있다. 재구성된 픽쳐(208)는 예측된 픽쳐(206)와 레지듀얼 픽쳐(112) 모두에 기초하여 생성될 수 있다.According to an example, a motion compensator within a recognition decoding system 200, such as motion compensator 205, may receive both a reference picture 202 and a motion vector 113. [ The motion compensator 205 may generate a predicted picture such as the predicted picture 206. [ The predicted picture 206 may be generated based on the reference picture 202 and the motion vector 113. The predicted picture 206 may be signaled to an adder such as adder 207. [ The adder 207 may generate a reconstructed picture such as a reconstructed picture 208. [ The reconstructed picture 208 may be generated based on both the predicted picture 206 and the residual picture 112. [

오리지널 픽쳐 자체보다는 오리지널 픽쳐의 인식 표현이 모션 벡터를 결정하기 위한 기반이 된다. 도 3을 참조하면, 오리지널 픽쳐(300)와 대응하는 인식 표현(301)이 제공된다. 인식 표현(301)은 휴먼 시각의 적응적 콘트라스트 불변성(adaptive contrast constancy)을 모방한다. 인식 표현(301)의 영역(302 및 303)은 오리지널 픽쳐(300)에서 나타나는 로우-레벨 텍스쳐와 연관되는 로우 콘트라스트 영역(302 및 303)의 향상된 이미징을 도시한다. 인식 표현(301)의 영역(302 및 303)에 나타나는 향상된 이미징은 이들 영역들의 모션 추정에서 블록-기반 모션 매치를 개선시킬 수 있다.The recognition expression of the original picture rather than the original picture itself becomes the basis for determining the motion vector. Referring to FIG. 3, the original picture 300 and corresponding recognition representation 301 are provided. Recognition representation (301) mimics the adaptive contrast constancy of human vision. Areas 302 and 303 of the recognition representation 301 illustrate improved imaging of the low contrast regions 302 and 303 associated with a low-level texture appearing in the original picture 300. [ Enhanced imaging that appears in areas 302 and 303 of the recognized representation 301 can improve block-based motion matching in motion estimation of these areas.

인식 표현(301)의 영역(304)은 "마하 밴드(Mach bands)" 현상과 연관될 수 있는 오리지널 픽쳐(300)의 영역을 도시한다. 마하 밴드는 물리학자 Ernst Mach의 이름을 따서 명명된 인식 현상이며, 상이한 밝기를 가지는 이미지의 2개 영역 사이의 경계 옆에 나타나는 것으로서 사람 눈에 의해 인식되는 밝은 또는 어두운 스트라이프와 연관된다. 마하 밴드 효과는 망막에 의해 캡쳐된 이미지의 휘도 채널 상에서 휴먼 시각 시스템에 의해 수행되는 공간 하이-부스트 필터링(spatial high-boost filtering)으로 인한 것이다. 이 필터링은 주로 망막 자체에서, 신경들 사이의 측면 억제(lateral inhibition)에 의해 수행된다. 마하 밴드 현상, 및 유사한 텍스쳐 마스킹이 높은 콘트라스트 에지 및 피쳐 근처에서 발생할 수 있는 망막 내 필터링을 통해 수행된다. 인식 표현(301)의 영역(304)은 마하 밴드 현상 및 유사한 텍스쳐 마스킹과 같은 계조(gradient)가 인식 표현을 통해 캡쳐되는 방법을 예시한다. 이들 계조는 블록-기반 모션 벡터 매칭을 위해 오리지널 픽쳐에서 다른 방식으로 이용가능하지 않을 수 있다.Region 304 of the recognition representation 301 shows the area of the original picture 300 that may be associated with the "Mach bands" phenomenon. The MachBand is a recognition phenomenon named after the physicist Ernst Mach and is associated with a bright or dark stripe recognized by the human eye as appearing next to the boundary between the two regions of the image with different brightnesses. The Mach-band effect is due to spatial high-boost filtering performed by the human visual system on the luminance channel of the image captured by the retina. This filtering is mainly performed in the retina itself, by lateral inhibition between the nerves. Mach-band phenomenon, and similar texture masking are performed through intra-retinal filtering that can occur near high contrast edges and features. Area 304 of recognition representation 301 illustrates how a gradient, such as a mathematical phenomenon and similar texture masking, is captured through the recognition representation. These gradations may not be available in other ways in the original picture for block-based motion vector matching.

인식 표현(301)의 영역(305)은 인식 표현(301)의 영역(305)에서 보존되는 것으로 도시된 오리지널 픽쳐(300)의 높은 콘트라스트 피쳐를 도시한다.The region 305 of the recognized representation 301 shows the high contrast feature of the original picture 300 shown as being preserved in the region 305 of the recognized representation 301.

오리지널 픽쳐로부터 인식 표현을 생성하기 위한 프로세스가 이제 기술된다. 도 4를 참조하면, 오리지널 픽쳐로부터 인식 표현을 생성하는 예가 흐름도(400)에 도시되어 있다. 오리지널 픽쳐는 각각의 픽셀에 할당된 Y 값을 가진다. 예를 들어, Y_i _,j는 M x N 사이즈를 가지는 이미지의 좌표 i,j에서의 픽셀의 휘도 값(luma value)이다.The process for generating a recognized representation from an original picture is now described. Referring to FIG. 4, an example of generating a recognized representation from an original picture is shown in flowchart 400. The original picture has a Y value assigned to each pixel. For example, Y _i _{, j} is the luma value of the pixel at the coordinates i, j of the image having the M x N size.

흐름도(400)에서 인용된 Y 픽셀 값은 오리지널 픽쳐와 연관된다. 이들 Y 값은 공간 상세 맵(spatial detail map) 내의 eY 값으로 변환된다. 공간 상세 맵은 오리지널 픽쳐로부터 프로세싱된 픽쳐를 형성하는 가중 맵(weighting map)이다. 공간 상세 맵은 자연 이미지의 통계치 및 망막 내의 세포의 응답 함수를 고려하는 휴먼 시각 시스템의 모델을 사용하여 인식 인코딩 시스템(100)에 의해 생성될 수 있다. 가중 맵은 휴먼 시각 시스템의 모델에 기초하는 오리지널 픽쳐의 픽셀 맵일 수 있다. 가중 맵은 시각적 인식에 대한 난이도 레벨 및/또는 압축에 대한 난이도 레벨을 식별하는 각각의 픽셀에 대한 값 또는 가중치를 포함할 수 있다. 압축에 대한 난이도 레벨은 이미지의 픽셀 또는 영역을 인코딩하는 데 필요한 비트 수를 측정하는 연속적인 스케일(continuous scale)일 수 있다. 유사하게, 시각적 인식에 대한 난이도 레벨은 픽셀 또는 영역 내의 상세항목을 추적하기 위한 뷰어의 능력과 연관된 것으로서 이미지의 픽셀 또는 영역을 인코딩하기 위해 필요한 비트 수를 측정하는 연속적인 스케일이다. 가중 맵을 생성하는 프로세스는 그 전체 내용이 인용에 의해 포함된, 2010년 4월 16일에 출원된 "System for Reducing Noise in Video Processing"이라는 명칭의 미국 특허 출원 번호 제12/761,581호에서 더 상세하게 기술된다.The Y pixel value quoted in the flowchart 400 is associated with the original picture. These Y values are converted into eY values in a spatial detail map. The spatial detail map is a weighting map that forms a picture processed from the original picture. The spatial detail map may be generated by the recognition encoding system 100 using a model of a human visual system that considers the natural image statistics and the response function of the cells in the retina. The weighted map may be a pixel map of the original picture based on the model of the human visual system. The weighted map may include a value or weight for each pixel that identifies the difficulty level for visual recognition and / or the difficulty level for compression. The difficulty level for compression may be a continuous scale that measures the number of bits needed to encode a pixel or region of an image. Similarly, the difficulty level for visual perception is a continuous scale that measures the number of bits needed to encode a pixel or region of an image as being associated with the ability of the viewer to track detail items within a pixel or region. The process for generating a weighted map is described in more detail in U.S. Patent Application No. 12 / 761,581 entitled " System for Reducing Noise in Video Processing "filed April 16, 2010, the entire contents of which are incorporated by reference. .

예시에 따라, 가중 맵을 생성하기 위해 사용될 수 있는 휴먼 시각 시스템과 연관된 모델은 통합 인식 가이드(IPeG) 시스템을 포함한다. IPeG 시스템은 자연 이미지의 스케일-불변성(scale-invariance)과 같은, 특정 종류의 예상가능한 앙상블-평균 통계(ensemble-average statistic)를 이용하여, 데이터의 프로세싱과 연관된 "불확실성 신호"를 생성하는 IPeG 변환을 구현한다. IPeG 변환은 사람 망막 내의 특정 세포 클래스의 행동(behavior)을 모델링한다. IPeG 변환은 합산 단계가 뒤따르는 2d 공간 컨볼루션(2d spatial convolution)에 의해 달성될 수 있다. IPeG 변환의 개선은 낮은 공간 주파수 정정을 추가함으로써 달성될 수 있는데, 이것은 결국, 보간이 뒤따르는 데시메이션에 의해, 또는 다른 로우 패스 공간 필터링에 의해 근사화될 수 있다. 컴퓨터 파일에서 제공되거나 또는 스캐닝 시스템으로부터 제공되는 픽셀 값은 공간 상세 맵을 생성하기 위해 변환에 제공될 수 있다. IPeG 시스템은 2000년 1월 11일에 특허된 "Apparatus and Methods for Image and Signal Processing"이라는 명칭의 미국 특허 제6,014,468호; 2002년 3월 19일에 특허된 "Apparatus and Methods for Image and Signal Processing"이라는 명칭의 미국 특허 제6,360,021호; 2006년 5월 16일에 특허된 미국 특허 제6,360,021호의 계속출원인 "Apparatus and Methods for Image and Signal Processing"이라는 명칭의 미국 특허 제7,046,857호 및 2000년 1월 28일에 출원된 "Apparatus and Methods for Image and Signal Processing"이라는 명칭의 국제 출원 PCT/US98/15767호에 더 상세하게 기술되며, 이들은 그 전체 내용이 인용에 의해 포함된다. IPeG 시스템은 시각적 상세항목을 인식 중요도로 조직하는 신호의 세트, 및 특정 비디오 상세항목을 트래킹하기 위한 뷰어의 능력을 표시하는 메트릭을 포함하는 정보를 제공한다.By way of example, the model associated with the human visual system, which may be used to generate the weighted maps, includes an Integrated Recognition Guide (IPeG) system. The IPeG system uses an ensemble-average statistic of a certain kind, such as scale-invariance of a natural image, to generate an IPeG transform that generates an "uncertainty signal" Lt; / RTI > The IPeG transformation models the behavior of certain cell classes within the human retina. The IPeG transform can be achieved by a 2d spatial convolution followed by a summation step. An improvement in the IPeG transform can be achieved by adding low spatial frequency correction, which can be approximated by decimation followed by interpolation or by other low pass spatial filtering. Pixel values provided in a computer file or provided from a scanning system may be provided to the transform to generate a spatial detail map. The IPeG system is described in U.S. Patent No. 6,014,468 entitled " Apparatus and Methods for Image and Signal Processing "filed on January 11,2000; U.S. Patent No. 6,360,021, entitled " Apparatus and Methods for Image and Signal Processing, " filed March 19, 2002; U. S. Patent No. 7,046, 857, entitled " Apparatus and Methods for Image and Signal Processing, " filed on May 16, 2006, and U.S. Patent No. 6,360,021, entitled "Apparatus and Methods for Image and Signal Processing ", PCT / US98 / 15767, the entire contents of which are incorporated by reference. The IPeG system provides information including a set of signals organizing visual detail items into perceptual importance, and a metric representing the viewer's ability to track a particular video detail item.

도 4에 도시된 공간 상세 맵은 값 eY를 포함한다. 예를 들어, eY_i _,j는 오리지널 픽쳐로부터 i,j에 있는 Y 값의 IPeG 변환의 i,j에서의 값이다. 각각의 값 eY_i _,j는 시각적 인식의 난이도 레벨 및/또는 압축에 대한 난이도 레벨을 식별하는 각각의 픽셀에 대한 값 또는 가중치를 포함할 수 있다. 각각의 eY_i _,j는 양 또는 음일 수 있다.The spatial detail map shown in FIG. 4 includes a value eY. For example, eY _{_i, j} is the value at i, j of IPeG conversion of Y values in the i, j from the original picture. Each value eY _i _{, j} may include a value or weight for each pixel that identifies the difficulty level of visual perception and / or the difficulty level for compression. Each eY _i _{, j} may be positive or negative.

도 4에 도시된 바와 같이, 공간 상세 맵의 부호, 예를 들어, sign(eY), 및 공간 상세 맵의 절댓값, 예를 들어, |eY|가 공간 상세 맵으로부터 생성된다. 예시에 따르면, 부호 정보는 다음과 같이 생성될 수 있다:As shown in Fig. 4, the sign of the spatial detail map, for example, sign (eY), and an absolute value of the spatial detail map, for example, | eY |, are generated from the spatial detail map. According to an example, the sign information may be generated as follows:

또다른 예에 따르면, 공간 상세 맵의 절댓값은 다음과 같이 계산된다: |eY_i,j|는 eY_i _,j의 절댓값이다.According to another example, the absolute value of the spatial detail map is calculated as follows: | eY _{i, j} | is the minus value of eY _i _{, j} .

공간 상세 맵의 압신된 절댓값, 예를 들어, pY는, 공간 상세 맵의 절댓값 |eY|으로부터 생성된다. 예시에 따르면, 압신된 절댓값 정보가 다음과 같이 계산될 수 있다:The compressed absolute value of the spatial detail map, for example, pY, is generated from the absolute value | eY | of the spatial detail map. According to an example, the absent cut-off value information can be calculated as follows:

및

이고, 여기서, CF(압신 인자)는 사용자 또는 시스템에 의해 제공되는 상수이고, λ_Y는 |eY_i _,j|의 전체 평균 절댓값이다. "압신"은 "압축" 및 "확장"으로부터 형성된 혼성 단어이다. 압신은 때때로 디지털화라고 명명되는 양자화가 통상적으로 뒤따르는 값들의 세트가 값들의 또다른 세트에 비선형적으로 맵핑되는 신호 프로세싱 동작을 기술한다. 값들의 제2 세트에 균일한 양자화가 이루어질 때, 그 결과는 값들의 오리지널 세트의 불균일한 양자화와 등가이다. 통상적으로, 압신 동작은 더 작은 오리지널 값의 더 미세한(더 정확한) 양자화 및 더 큰 오리지널 값의 더 거친(덜 정확한) 양자화를 초래한다. 실험을 통해, 압신은, 특히, IPeG 변환과 함께 사용될 때, 비디오 프로세싱 및 분석에서 사용하기 위한 인식 맵핑 함수를 발생함에 있어서 유용한 프로세스인 것으로 발견되었다. pY_i _,j는 eY_i _,j 값의 비선형 맵핑이며, 새로운 값의 세트 pY_i _,j는 제한된 동적 범위를 가진다. 위에 보여진 것이 아닌 다른 수학적 표현이 eY_i _,j와 pY_i _,j사이의 유사한 비선형 맵핑을 산출하기 위해 사용될 수 있다. 일부 경우들에서, 값 pY_i _,j을 더 양자화하는 것이 유용할 수 있다. 계산에서 사용되는 비트수를 유지하거나 감소시키는 것이 이러한 경우일 수 있다.

And

, Where CF (constants) are constants provided by the user or system, and [lambda] _Y is the total average value of | eY _i _{, j} |. "Confession" is a hybrid word formed from "Compression" and "Extension". Compression describes a signal processing operation in which the quantization, sometimes referred to as digitization, is typically non-linearly mapped to another set of values. When uniform quantization is performed on the second set of values, the result is equivalent to non-uniform quantization of the original set of values. Typically, the act of pushing results in a finer (more accurate) quantization of the smaller original value and a coarser (less accurate) quantization of the larger original value. Through experimentation, compression has been found to be a useful process in generating awareness mapping functions for use in video processing and analysis, especially when used with IPeG transformations. pY _i _{, j} is a nonlinear mapping of eY _i _{, j} values, and the new set of values pY _i _{, j} has a limited dynamic range. Other mathematical expressions not shown above can be used to yield similar nonlinear mappings between eY _i _{, j} and pY _i _{, j} . In some cases, it may be useful to further quantize the value pY _i _{, j} . It may be in this case to maintain or reduce the number of bits used in the calculation.

인식 표현은 다음과 같이 공간 상세 맵의 부호를 공간 상세 맵의 압신된 절댓값과 결합시킴으로써 생성될 수 있다: pY_i _,j×sign(eY_i _,j). pY_i _,j×sign(eY_i _,j)의 결과는 eY_i _,j의 작은 절댓값들이 바람직하게는 eY_i _,j의 더 큰 절댓값보다 더 큰 동적 범위의 부분을, 그러나 보존된 eY_i _,j의 부호 정보를 가지고 점유하는 압축된 동적 범위이다.Expression recognition may be created by combining and companding the absolute value of the spatial detail map the code space of the detailed map, as _{_{follows: pY i, j × sign (}} eY i, j). _{_{pY i, j × sign (eY}} i, j) The result of eY _{_i,} is that they preferably small absolute value of the _j eY _{_i,} a further part of a larger dynamic range than the large absolute value of _j, but the retention eY _{_i, j} Lt; RTI ID = 0.0 > of < / RTI >

도 5를 참조하면, 다양한 상이한 압신 인자에 의해 오리지널 픽쳐로부터 생성된 상이한 인식 표현을 보여주는 흐름도(500)가 도시되어 있다. 도 6을 참조하면, 오리지널 픽쳐 및 오리지널 픽쳐에서의 콘트라스트의 10%인 더 낮은 콘트라스트에서의 동일한 오리지널 픽쳐에 기초하여 생성된 인식 표현을 포함하는 흐름도(600)가 도시되어 있다. 둘 모두에 대한 인식 표현은 콘트라스트에서의 변경에 대한 인식 표현의 복원을 비교적 유사하게 보여준다. 도 7을 참조하면, 오리지널 픽쳐 및 오리지널 픽쳐에서의 밝기의 200%인 더 높은 밝기에서의 동일한 오리지널 픽쳐에 기초하여 생성된 인식 표현을 도시하는 것을 포함한 흐름도(700)가 도시되어 있다. 둘 모두에 대한 인식 표현은 밝기에서의 변경에 대한 인식 표현의 복원을 비교적 유사하게 도시한다.Referring to FIG. 5, a flow diagram 500 is shown showing different recognition expressions generated from the original pictures by various different confidence factors. Referring to FIG. 6, a flowchart 600 is shown that includes a recognition representation generated based on the original picture and the same original picture at a lower contrast of 10% of the contrast in the original picture. Recognition expressions for both show comparatively similar reconstructions of the recognition expressions for changes in contrast. Referring to FIG. 7, there is shown a flowchart 700 that includes showing the recognized representation generated based on the original picture and the same original picture at higher brightness, which is 200% of the brightness in the original picture. The perceptual representation for both shows a relatively similar reconstruction of the perceptual representation of the change in brightness.

도 8을 참조하면, 인식 표현을 활용하여 인코딩하기 위한 시스템에서, 모션 추정기(109)와 같은 모션 추정기에 의해 실행되는 모션 추정 흐름 프로세스를 보여주는 흐름도(800)가 도시되어 있다. 흐름도(800)에서, 비디오 시퀀스(801)는 인식 엔진(802) 및 모션 추정기 내의 제1 통과 모션 추정 ASIC(804)에 시그널링되는 픽쳐를 포함한다. 인식 엔진(802)은 가이드 모션 벡터(803)를 생성한다. 가이드 모션 벡터(803)는 제1 통과 모션 추정 ASIC(804)에 시그널링되고, 여기서 가이드 모션 벡터(803)는 모션 벡터 "시드" 또는 "힌트"를 생성하기 위해 사전-분석 프로세스에서 활용될 수 있으며, 모션 벡터 "시드" 또는 "힌트"는 모션 벡터(113)와 같은 모션 벡터를 생성하기 위해 제2 통과 모션 추정 ASIC(805)에 의해 활용될 수 있다.Referring to FIG. 8, there is shown a flow diagram 800 illustrating a motion estimation flow process executed by a motion estimator, such as motion estimator 109, in a system for encoding using a perceptual representation. In the flowchart 800, the video sequence 801 includes a picture that is signaled to the recognition engine 802 and the first pass motion estimation ASIC 804 in the motion estimator. The recognition engine 802 generates a guide motion vector 803. The guide motion vector 803 is signaled to the first pass motion estimation ASIC 804 where the guide motion vector 803 may be utilized in the pre-analysis process to generate a motion vector "seed" or & , The motion vector "seed" or "hint" may be utilized by the second pass motion estimation ASIC 805 to generate a motion vector such as the motion vector 113.

도 9를 참조하면, 인식 표현이 MPEG-2, MPEG-4 AVC 등과 같은 다양한 비디오 인코딩 포맷에 따라 모션 벡터를 모션 추정 및/또는 결정 및/또는 활용할 시에 활용될 수 있다. 도 9에서, 예시에 따른, 인코딩 장치(910) 및 디코딩 장치(940)를 포함하는, 컨텐츠 분배 시스템(900)의 예가 도시되어 있다. 인코딩 장치(910)는 도 1 및 2에 관련하여 위에서 논의된 것과 같은 비디오 시퀀스의 압축 또는 트랜스코딩에서 활용될 수 있는 임의의 인코딩 시스템을 나타낸다. 디코딩 장치(940)는 도 1 및 2에 관련하여 위에서 논의된 것과 같은, 셋톱 박스 또는 다른 수신 디바이스 중 임의의 것을 나타낸다. 인코딩 장치(910)는, 예시에 따르면, 디코딩 장치(940)에, 모션 벡터 및 인식 표현을 활용하는 인코딩과 연관된 다른 정보를 포함하는 압축된 비트스트림(905)을 전송할 수 있다.Referring to FIG. 9, the perceptual representation may be utilized in motion estimation and / or determination and / or utilization of motion vectors according to various video encoding formats such as MPEG-2, MPEG-4 AVC, and the like. In FIG. 9, an example of a content distribution system 900, including an encoding device 910 and a decoding device 940, according to an example, is shown. Encoding device 910 represents any encoding system that may be utilized in the compression or transcoding of a video sequence as discussed above with respect to FIGS. The decoding device 940 represents any of the set top boxes or other receiving devices, such as those discussed above with respect to Figs. Encoding device 910, according to an example, may send to decoder device 940 a compressed bitstream 905 that includes motion vectors and other information associated with the encoding utilizing the perceptual representation.

다시 도 9를 참조하면, 인코딩 장치(910)는 인입 신호(920), 제어기(911), 카운터(912), 프레임 메모리(913), 인코딩 유닛(914), 송신기 버퍼(915)에 대한 인터페이스(930), 및 아웃고잉 압축 비트스트림(905)에 대한 인터페이스(935)를 포함한다. 디코딩 장치(940)는 수신기 버퍼(950), 디코딩 유닛(951), 프레임 메모리(952) 및 제어기(953)를 포함한다. 인코딩 장치(910) 및 디코딩 장치(940)는 압축된 비트스트림(905)에 대한 전송 경로를 통해 서로 연결된다. 인코딩 장치(910)의 제어기(911)는 수신기 버퍼(950)의 용량에 기반하여 전송될 데이터의 양을 제어할 수 있고, 단위 시간당 데이터의 양과 같은 다른 파라미터를 포함할 수 있다. 제어기(911)는 인코딩 유닛(914)을 제어하여, 디코딩 장치(940)의 수신 신호 디코딩 동작의 실패의 발생을 방지할 수 있다. 제어기(911)는, 예를 들어, 프로세서, 랜덤 액세스 메모리 및 판독 전용 메모리를 가지는 마이크로컴퓨터를 포함할 수 있다.9, the encoding apparatus 910 includes an interface 920 for the incoming signal 920, a controller 911, a counter 912, a frame memory 913, an encoding unit 914, a transmitter buffer 915, 930), and an interface 935 to an outgoing compressed bitstream 905. The decoding apparatus 940 includes a receiver buffer 950, a decoding unit 951, a frame memory 952, and a controller 953. Encoding device 910 and decoding device 940 are connected to each other via a transmission path to compressed bitstream 905. The controller 911 of the encoding device 910 may control the amount of data to be transmitted based on the capacity of the receiver buffer 950 and may include other parameters such as the amount of data per unit time. The controller 911 can control the encoding unit 914 to prevent the occurrence of a failure of the decoding operation of the received signal of the decoding device 940. [ The controller 911 may include, for example, a microcomputer having a processor, a random access memory, and a read only memory.

예를 들어, 컨텐츠 제공자에 의해 공급된 인입 신호(920)는 비디오 시퀀스(101)와 같은 비디오 시퀀스 내에 프레임 또는 픽쳐를 포함할 수 있다. 프레임 메모리(913)는 인코딩 유닛(914)을 통해 구현되는, 인식 인코딩 시스템(100)과 같은 인식 인코딩 시스템을 통해 프로세싱될 픽쳐를 저장하기 위해 사용되는 제1 영역을 가질 수 있다. 인식 표현 및 모션 벡터는 제어기(911)를 활용하여, 비디오 시퀀스(101) 내의 픽쳐로부터 유도될 수 있다. 프레임 메모리(913) 내의 제2 영역은 저장된 데이터를 판독하고, 이를 인코딩 유닛(914)에 출력하기 위해 사용될 수 있다. 제어기(911)는 프레임 메모리(913)에 영역 스위칭 제어 신호(923)를 출력할 수 있다. 영역 스위칭 제어 신호(923)는 제1 영역이 사용될지 또는 제2 영역이 사용될지를 표시할 수 있다.For example, an incoming signal 920 supplied by a content provider may include a frame or picture in a video sequence such as video sequence 101. [ The frame memory 913 may have a first area, which is implemented via the encoding unit 914, used to store a picture to be processed through a recognition encoding system, such as the recognition encoding system 100. [ Recognition representations and motion vectors may be derived from the pictures in the video sequence 101, utilizing the controller 911. The second area in the frame memory 913 can be used to read the stored data and output it to the encoding unit 914. The controller 911 may output the area switching control signal 923 to the frame memory 913. [ The area switching control signal 923 may indicate whether the first area is used or the second area is used.

제어기(911)는 인코딩 제어 신호(924)를 인코딩 유닛(914)에 출력한다. 인코딩 제어 신호(924)는 인코딩 유닛(914)이 인코딩 동작을 시작하게 한다. 픽쳐 또는 프레임과 연관된 제어 정보를 포함하는, 제어기(911)로부터의 인코딩 제어 신호(924)에 응답하여, 인코딩 유닛(914)은 고효율성 인식 표현 인코딩 프로세스에 대한 픽쳐를 판독하여, 모션 벡터, 포인터 및 레지듀얼 픽쳐를 준비해서 이것들을 압축된 비트스트림으로 인코딩한다.The controller 911 outputs the encoding control signal 924 to the encoding unit 914. [ The encoding control signal 924 causes the encoding unit 914 to begin the encoding operation. In response to the encoding control signal 924 from the controller 911, which includes control information associated with the picture or frame, the encoding unit 914 reads the picture for the high-efficiency perceptual representation encoding process to generate a motion vector, A residual picture is prepared and these are encoded into a compressed bitstream.

인코딩 유닛(914)은 비디오 패킷 및 프로그램 정보 패킷을 포함하는 패킷화된 기본 스트림(packetized elementary stream; PES)에서 인코딩된 압축 비트스트림(905)을 준비할 수 있다. 인코딩 유닛(914)은 프로그램 타임스탬프(PTS) 및 제어 정보를 사용하여 압축된 픽쳐를 비디오 패킷에 맵핑할 수 있다.Encoding unit 914 may prepare a compressed bitstream 905 encoded in a packetized elementary stream (PES) that includes video packets and program information packets. Encoding unit 914 may use the program time stamp (PTS) and control information to map the compressed picture to a video packet.

인코딩된 정보는 송신기 버퍼(915)에 저장될 수 있다. 카운터(912)는 송신기 버퍼(915) 내의 데이터의 양을 표시하기 위해 증분되는 정보량 카운터를 포함할 수 있다. 데이터가 버퍼로부터 검색되고 제거됨에 따라, 정보량 카운터(912)는 버퍼 내의 데이터의 양을 반영하기 위해 감소될 수 있다. 점유된 영역 정보 신호(926)가 카운터(912)에 전송되어, 인코딩 유닛(914)으로부터의 데이터가 전송기 버퍼(915)로부터 추가 또는 제거되었는지를 표시하고, 따라서, 카운터(912)가 증분되거나 감소될 수 있다. 제어기(911)는 송신기 버퍼(915)에 오버플로우 또는 언더플로우가 발생하는 것을 방지하기 위해, 제어기에 의해 인코딩 유닛에 전달된 점유된 영역 정보(926)에 기반하여 인코딩 유닛(914)에 의해 생성되는 패킷의 생성을 제어한다.The encoded information may be stored in a transmitter buffer 915. The counter 912 may include an information amount counter that is incremented to indicate the amount of data in the transmitter buffer 915. [ As the data is retrieved and removed from the buffer, the information amount counter 912 may be decremented to reflect the amount of data in the buffer. The occupied area information signal 926 is transmitted to the counter 912 to indicate whether data from the encoding unit 914 has been added or removed from the transmitter buffer 915 and thus the counter 912 is incremented or decremented . The controller 911 generates by the encoding unit 914 based on the occupied region information 926 passed to the encoding unit by the controller to prevent overflow or underflow to occur in the transmitter buffer 915 And controls the generation of the packet.

정보량 카운터(912)는 제어기(911)에 의해 생성되고 출력된 프리셋 신호(928)에 응답하여 리셋된다. 정보량 카운터(912)가 리셋된 이후, 카운터는 인코딩 유닛(914)에 의해 출력된 데이터를 카운트하고, 생성된 정보량을 획득한다. 이후, 정보량 카운터(912)는 획득된 정보량을 나타내는 정보량 신호(929)를 제어기(911)에 공급한다. 제어기(911)는 송신기 버퍼(915)에서의 오버플로우가 존재하지 않도록 인코딩 유닛(914)을 제어한다.The information amount counter 912 is reset in response to the preset signal 928 generated and output by the controller 911. [ After the information amount counter 912 is reset, the counter counts the data output by the encoding unit 914 and obtains the amount of information generated. Then, the information amount counter 912 supplies the information amount signal 929 indicating the obtained information amount to the controller 911. [ The controller 911 controls the encoding unit 914 so that there is no overflow in the transmitter buffer 915.

디코딩 장치(940)는 압축된 비트스트림(905)과 같은 압축된 비트스트림을 수신하기 위한 인터페이스(970), 수신기 버퍼(950), 제어기(953), 프레임 메모리(952), 디코딩 유닛(951) 및 출력을 위한 인터페이스(975)를 포함한다. 도 2에 도시된 인식 디코딩 시스템(200)은 디코딩 유닛(951)에서 구현될 수 있다. 디코딩 장치(940)의 수신기 버퍼(950)는 압축된 비트스트림(905)을 통해 인코딩 장치(910)로부터 수신된 모션 벡터, 레지듀얼 픽쳐 및 포인터를 포함하는 인코딩된 정보를 일시적으로 저장할 수 있다. 디코딩 장치(940)는 수신된 데이터량을 카운트하고, 제어기(953)에 인가된 프레임 또는 픽쳐 수 신호(963)를 출력한다. 제어기(953)는 미리 결정된 간격으로, 예를 들어, 디코딩 유닛(951)이 디코딩 동작을 완료할 때마다 카운팅된 프레임 또는 픽쳐의 수를 감독한다.The decoding apparatus 940 includes an interface 970 for receiving a compressed bit stream such as a compressed bit stream 905, a receiver buffer 950, a controller 953, a frame memory 952, a decoding unit 951, And an interface 975 for output. The recognition decoding system 200 shown in Fig. 2 may be implemented in the decoding unit 951. Fig. The receiver buffer 950 of the decoding device 940 may temporarily store encoded information including the motion vector, the residual picture, and the pointer received from the encoding device 910 via the compressed bitstream 905. The decoding device 940 counts the amount of received data and outputs the frame or picture number signal 963 applied to the controller 953. [ The controller 953 supervises the number of frames or pictures counted at predetermined intervals, for example, every time the decoding unit 951 completes the decoding operation.

프레임 수 신호(963)가 수신기 버퍼(950)가 미리 결정된 용량 또는 양에 있음을 표시할 때, 제어기(953)는 디코딩 유닛(951)에 디코딩 시작 신호(964)를 출력할 수 있다. 프레임 수 신호(963)가 수신기 버퍼(950)가 미리 결정된 용량보다 더 작은 용량에 있음을 표시할 때, 제어기(953)는 카운팅된 프레임 또는 픽쳐의 수가 미리 결정된 양과 동일해지는 상황의 발생을 기다린다. 프레임 수 신호(963)가 수신기 버퍼(950)가 미리 결정된 용량에 있음을 표시할 때, 제어기(953)는 디코딩 시작 신호(964)를 출력한다. 인코딩된 프레임, 캡션 정보 및 프레임 차이 맵은, 프로그램 정보 패킷의 헤더 내의 표시 타임 스탬프(presentation time stamp; PTS)에 기초하여 단조(즉, 증가하는 또는 감소하는) 순서로 디코딩될 수 있다.The controller 953 may output a decoding start signal 964 to the decoding unit 951 when the frame number signal 963 indicates that the receiver buffer 950 is in a predetermined capacity or amount. When the frame number signal 963 indicates that the receiver buffer 950 is in a capacity less than the predetermined capacity, the controller 953 waits for the occurrence of a situation in which the number of counted frames or pictures becomes equal to a predetermined amount. When the frame number signal 963 indicates that the receiver buffer 950 is in a predetermined capacity, the controller 953 outputs a decoding start signal 964. The encoded frame, the caption information, and the frame difference map may be decoded in a forged (i.e., increasing or decreasing) order based on a presentation time stamp (PTS) in the header of the program information packet.

디코딩 시작 신호(964)에 응답하여, 디코딩 유닛(951)은 수신기 버퍼(950)로부터 수신된 하나의 프레임 또는 픽쳐를 양으로 하는 데이터(961)를 디코딩할 수 있다. 디코딩 유닛(951)은 디코딩된 비디오 신호(962)를 프레임 메모리(952) 내에 기록한다. 프레임 메모리(952)는 디코딩된 비디오 신호가 기록된 제1 영역, 및 디코딩된 비디오 데이터를 판독하고 이를 모니터 등에 출력하기 위해 사용되는 제2 영역을 가질 수 있다.In response to the decoding start signal 964, the decoding unit 951 may decode the data 961, which is positive for one frame or picture received from the receiver buffer 950. [ The decoding unit 951 records the decoded video signal 962 in the frame memory 952. [ The frame memory 952 may have a first area in which the decoded video signal is recorded, and a second area used to read the decoded video data and output it to a monitor or the like.

예시에 따르면, 인코딩 장치(910)는 헤드엔드와 함께 포함될 수 있거나 다른 방식으로 연관될 수 있고, 디코딩 장치(940)는 핸드셋 또는 셋톱 박스와 함께 포함되거나 다른 방식으로 연관될 수 있다. 이들은 비디오 시퀀스 내의 오리지널 픽쳐에 기초하여 인식 표현을 이용하는 것과 연관된 인코딩 및/또는 디코딩을 위한 방법에서 별도로 또는 함께 이용될 수 있다. 인코딩 장치(910) 및 디코딩 장치(940)가 구현될 수 있는 다양한 방식은 방법(1000 및 1100)의 흐름도를 도시하는 도 10 및 11에 관련하여 하기에 더욱 상세하게 기술된다.According to an example, the encoding device 910 may be included with or otherwise associated with the head end, and the decoding device 940 may be included with or otherwise associated with the handset or set-top box. These may be used separately or together in a method for encoding and / or decoding associated with using a recognition representation based on an original picture in a video sequence. The various ways in which encoding device 910 and decoding device 940 may be implemented are described in further detail below with reference to FIGS. 10 and 11, which show a flow diagram of methods 1000 and 1100.

인식 인코딩 시스템(100)은, 다른 실시예에서는, 도 9에 도시된 것과 같은 초기 인코딩을 수행하는 동일한 유닛 내에 포함되지 않을 수 있다. 예를 들어, 인식 인코딩 시스템(100)은 인코딩된 비디오 신호를 수신하고 디코더로의 다운스트림 전송을 위해 비디오 신호를 인식적으로 인코딩하는 별개의 디바이스에 제공될 수 있다. 또한, 인식 인코딩 시스템(100)은 트랜스코더와 같은 다운스트림 프로세싱 엘리먼트에 의해 사용될 수 있는 메타데이터를 생성할 수 있다. 메타데이터는 비트레이트를 제어하기 위해 트랜스코더에 의해 사용될 수 있는 인식 표현으로부터 추정된 모션 벡터를 기술하는 상세항목을 포함할 수 있다.Recognition encoding system 100, in other embodiments, may not be included in the same unit that performs the initial encoding as shown in FIG. For example, the recognition encoding system 100 may be provided in a separate device that receives the encoded video signal and that cognitively encodes the video signal for downstream transmission to the decoder. Recognition encoding system 100 may also generate metadata that may be used by downstream processing elements, such as transcoders. The metadata may include a detail entry describing the motion vector estimated from the recognition representation that may be used by the transcoder to control the bit rate.

방법Way

방법(1000)은 인식 표현을 활용하는 인코딩을 위한 방법이다. 방법(1100)은 인식 표현을 활용하는 디코딩을 위한 방법이다. 방법(1000 및 1100)이 일반화된 예시를 나타내며, 방법(1000 및 1100)의 범위로부터 벗어나지 않고 다른 단계가 추가될 수 있거나 기존의 단계가 제거되거나 수정되거나 재배열될 수 있다는 점이 당업자에게 명백하다. 방법(1000 및 1100)은 픽쳐가 수신될 때 비디오 신호 내의 픽쳐를 계속해서 인코딩하고 디코딩하도록 반복가능하다. 방법(1000 및 1100)의 기재는 특히 도 9에 도시된 인코딩 장치(910) 및 디코딩 장치(940)를 참조하여 이루어진다. 그러나, 방법(1000 및 1100)의 범위로부터 벗어나지 않고, 방법(1000 및 1100)이 인코딩 장치(910) 및 디코딩 장치(940)와는 상이한 시스템 및/또는 디바이스에서 구현될 수 있다는 점이 이해되어야 한다.The method 1000 is a method for encoding that utilizes a recognized representation. The method 1100 is a method for decoding utilizing a cognitive representation. It will be apparent to those skilled in the art that methods 1000 and 1100 illustrate generalized examples and that other steps may be added or existing steps may be eliminated or modified or rearranged without departing from the scope of methods 1000 and 1100. [ The methods 1000 and 1100 are repeatable to continuously encode and decode a picture in a video signal when a picture is received. The description of methods 1000 and 1100 is made specifically with reference to encoding device 910 and decoding device 940 shown in FIG. It should be understood, however, that methods 1000 and 1100 can be implemented in systems and / or devices that are different from encoding device 910 and decoding device 940, without departing from the scope of methods 1000 and 1100.

도 10의 방법(1000)을 참조하여, 단계(1001)에서, 인코딩 장치(910)는 인터페이스(930)에서 비디오 시퀀스(예를 들어, 도 1에 도시된 비디오 시퀀스(101)) 내에 오리지널 픽쳐를 포함하는 비디오 신호(920)를 수신한다. 예를 들어, 수신된 비디오 신호(920)는 비디오 비트스트림 내의 미압축된 오리지널 픽쳐일 수 있다.Referring to method 1000 of FIG. 10, at step 1001, encoding device 910 generates an original picture in a video sequence (e.g., video sequence 101 shown in FIG. 1) at interface 930 Lt; / RTI > For example, the received video signal 920 may be an uncompressed original picture in the video bitstream.

단계(1002)에서, 인코딩 장치(910)는 인코딩 유닛(914) 및 제어기(911)를 활용하여 수신된 오리지널 픽쳐에 기초하여 인식 표현을 생성한다. 이는 타겟 인식 표현 및 기준 인식 표현으로서 사용될 수 있는 인식 표현을 포함한다.In step 1002, the encoding device 910 utilizes the encoding unit 914 and the controller 911 to generate a recognized representation based on the received original picture. It includes a target recognition representation and a recognition representation that can be used as a reference recognition representation.

단계(1003)에서, 제어기(911)는 프레임 메모리(913)에 저장되거나 위치지정된 오리지널 픽쳐로부터의 복수의 기준 픽쳐로부터 하나 이상의 기준 픽쳐를 선택한다.In step 1003, the controller 911 selects one or more reference pictures from a plurality of reference pictures from an original picture stored or located in the frame memory 913. [

단계(1004)에서, 인코딩 유닛(914) 및 제어기(911)는 타겟 인식 표현 및 기준 픽쳐에 기초하여 모션 벡터 정보를 결정한다. 결정된 모션 벡터 정보는 기준 픽쳐에서의 낮은 콘트라스트 피쳐 및/또는 타겟 인식 표현에서의 마하 밴드 현상과 같은, 기준 픽쳐 및 타겟 인식 표현의 속성에 기초하여 결정될 수 있다. 결정된 모션 벡터 정보는 도 1에 도시된 모션 벡터(113)를 포함할 수 있다.In step 1004, the encoding unit 914 and the controller 911 determine the motion vector information based on the target recognition representation and the reference picture. The determined motion vector information may be determined based on attributes of the reference picture and the target recognition representation, such as a low contrast feature in the reference picture and / or a matched band in the target recognition representation. The determined motion vector information may include the motion vector 113 shown in FIG.

단계(1005)에서, 인코딩 유닛(914) 및 제어기(911)는 모션 벡터 정보 및 기준 픽쳐를 사용하여 오리지널 픽쳐를 인코딩한다. 도 1에 도시된 레지듀얼 픽쳐(112)는 인코딩된 오리지널 픽쳐의 예이다.In step 1005, the encoding unit 914 and the controller 911 encode the original picture using the motion vector information and the reference picture. The residual picture 112 shown in Fig. 1 is an example of an encoded original picture.

또한, 단계(1005)에서, 인코딩 유닛(914) 및 제어기(911)는 인코딩된 오리지널 픽쳐, 모션 벡터 정보 및 도 1에 도시된 포인터(114)와 같은 선택된 기준 픽쳐와 연관된 포인터를 출력한다.In addition, at step 1005, the encoding unit 914 and the controller 911 output the encoded original picture, motion vector information, and a pointer associated with the selected reference picture, such as the pointer 114 shown in FIG.

도 11의 방법(1100)을 참조하여, 단계(1101)에서, 디코딩 장치(940)는 인터페이스(970)를 활용하여 수신기 버퍼(950)에서의 압축된 비트스트림(905)으로부터 모션 벡터 정보를 수신한다. 수신된 모션 벡터 정보는 픽쳐를 포함하는 비디오 시퀀스로부터의 오리지널 픽쳐에 기초하는 타겟 인식 표현에 기초하고, 또한 타겟 인식 표현과 연관된 기준 픽쳐에 기초한다.Referring to method 1100 of Figure 11, at step 1101, decoding device 940 utilizes interface 970 to receive motion vector information from compressed bitstream 905 in receiver buffer 950 do. The received motion vector information is based on a target recognition representation based on the original picture from the video sequence containing the picture and is also based on a reference picture associated with the target recognition representation.

단계(1102)에서, 디코딩 장치(940)는 인터페이스(970)를 활용하여 수신기 버퍼(950)에서 압축된 비트스트림(905)으로부터 포인터를 수신한다. 포인터(114)와 같은 수신 포인터는 각자의 기준 픽쳐와 연관된다.At step 1102, the decoding device 940 utilizes the interface 970 to receive a pointer from the compressed bitstream 905 in the receiver buffer 950. A receive pointer, such as pointer 114, is associated with each reference picture.

단계(1103)에서, 디코딩 장치(940)는 인터페이스(970)를 활용하여 수신기 버퍼(950)에서 압축된 비트스트림(905)으로부터 수신된 모션 벡터 정보와 연관된 인코딩된 레지듀얼 픽쳐를 수신한다.At step 1103 decoding device 940 utilizes interface 970 to receive the encoded residual picture associated with the motion vector information received from compressed bitstream 905 in receiver buffer 950.

단계(1104)에서, 제어기(953)는 각자의 수신 포인터를 활용하여 수신기 버퍼(950)에 저장되거나 위치된 복수의 기준 픽쳐로부터 기준 픽쳐를 선택한다.In step 1104, the controller 953 utilizes the respective receive pointers to select a reference picture from a plurality of reference pictures stored or located in the receiver buffer 950. [

단계(1105)에서, 제어기(953) 및 디코딩 유닛(951)은 수신된 모션 벡터 정보 및 각자의 선택된 기준 픽쳐에 기초하여 예측된 픽쳐를 결정한다.In step 1105, the controller 953 and the decoding unit 951 determine the predicted picture based on the received motion vector information and the respective selected reference picture.

단계(1106)에서, 제어기(953) 및 디코딩 유닛(951)은 결정된 예측된 픽쳐 및 수신된 레지듀얼 픽쳐에 기초하여 재구성된 픽쳐를 생성한다.In step 1106, the controller 953 and the decoding unit 951 generate a reconstructed picture based on the determined predicted picture and the received residual picture.

전술된 방법 및 동작 중 일부 또는 전부는 하드웨어 저장 디바이스 또는 다른 타입의 저장 디바이스와 같이 비-일시적일 수 있는 컴퓨터 판독가능한 저장 매체 상에 저장된 유틸리티, 컴퓨터 프로그램 등과 같은 기계 판독가능한 명령으로서 제공될 수 있다. 예를 들어, 이들은 소스 코드, 객체 코드, 실행가능 코드 또는 다른 포맷의 프로그램 명령으로 구성된 프로그램(들)으로서 존재할 수 있다.Some or all of the above-described methods and operations may be provided as machine-readable instructions, such as a utility, a computer program, or the like, stored on a computer-readable storage medium that may be non-transient, such as a hardware storage device or other type of storage device . For example, they may exist as program (s) comprised of program instructions in source code, object code, executable code or other formats.

컴퓨터 판독가능한 저장 매체의 예는 종래의 컴퓨터 시스템 RAM, ROM, EPROM, EEPROM 및 자기 또는 광학 디스크 또는 테이프를 포함한다. 전술 항목의 구체적인 예는 CD ROM 상의 프로그램의 분배를 포함한다. 따라서, 전술된 기능을 실행할 수 있는 임의의 전자 디바이스가 위에 열거된 해당 기능을 수행할 수 있다는 점이 이해되어야 한다.Examples of computer-readable storage media include conventional computer system RAM, ROM, EPROM, EEPROM, and magnetic or optical disks or tapes. Specific examples of tactical items include distribution of programs on a CD ROM. It is therefore to be understood that any electronic device capable of performing the above-described functions may perform the corresponding functions listed above.

도 12를 참조하면, 인식 인코딩 시스템(100) 및/또는 인코딩 장치(910)와 같은 인식 표현들을 활용하는 인코딩 또는 디코딩을 위한 시스템 내에 컴퓨팅 디바이스로서 사용될 수 있는 플랫폼(1200)이 도시되어 있다. 플랫폼(1200)은 또한 셋톱 박스, 핸드셋, 모바일 폰 또는 다른 모바일 디바이스, 트랜스코더와 같은 업스트림 디코딩 장치, 및 인식 디코딩 시스템(200) 및/또는 디코딩 장치(940)와 같은 인식 표현을 활용하여 결정되는 인식 표현 및/또는 모션 벡터를 활용할 수 있는 다른 디바이스 및 장치에 대해 사용될 수 있다. 플랫폼(1200)의 예시가 일반화된 예시이며, 플랫폼(1200)이 추가적인 컴포넌트를 포함하고, 기재된 컴포넌트 중 일부가 플랫폼(1200)의 범위로부터 벗어나지 않고 제거되고 그리고/또는 수정될 수 있다는 점이 이해된다.12, a platform 1200 that may be used as a computing device in a system for encoding or decoding utilizing recognition representations such as recognition encoding system 100 and / or encoding device 910 is shown. The platform 1200 may also be implemented using an identification representation such as a set top box, handset, mobile phone or other mobile device, an upstream decoding device such as a transcoder, and a recognition decoding system 200 and / or decoding device 940 May be used for other devices and devices that may utilize a recognized representation and / or motion vector. It is understood that the example of the platform 1200 is a generalized example and that the platform 1200 includes additional components and that some of the described components may be removed and / or modified without departing from the scope of the platform 1200.

플랫폼(1200)은 모니터와 같은 디스플레이(1202)를 포함하고, 인식 인코딩 시스템(100) 및 인코딩 장치(910)에 대한 인터페이스(930 및 935)와 같은 인코딩 시스템 또는 장치의 인터페이스의 기능, 또는 인식 디코딩 시스템(200) 및 디코딩 장치(940)에 대한 인터페이스(970 및 975)와 같은 디코딩 시스템 또는 장치의 인터페이스의 기능을 수행할 수 있는, 로컬 영역 네트워크(LAN), 무선 802.11x LAN, 3G 또는 4G 모바일 WAN 또는 WiMax WAN에 대한 단순한 입력 인터페이스 및/또는 네트워크 인터페이스와 같은, 인터페이스(1203)를 더 포함한다. 플랫폼(1200)은 프로세서(1201), 예컨대, 하나 이상의 마이크로프로세서, 마이크로컨트롤러, 디지털 신호 프로세서(DSP)들, 이들의 결합 또는 당업자에게 공지된 이러한 다른 디바이스를 더 포함한다. 디스플레이(1202) 및 인터페이스(1203)에 의해 수행될 기능이 아닌, 인식 인코딩 시스템(100), 인코딩 장치(910), 인식 디코딩 시스템(200), 및 디코딩 장치(940)와 같은 인코딩 또는 디코딩을 위한 본원에 기술된 시스템에 의해 수행되는 것으로서 본원에 기재된 특정 동작/기능은 프로세서와 연관된 컴퓨터-판독가능한 매체(CRM)(1204)에 저장된 소프트웨어 명령 및 루틴의 실행에 의해 플랫폼의 프로세서(1201)에 의해 수행된다. 그러나, 당업자는, 프로세서(1201)의 동작/기능이 하드웨어, 예를 들어, 플랫폼에서 구현되는 집적 회로(IC)들, 주문형 집적 회로(ASIC)들, PLD, PLA, FPGA 또는 PAL와 같은 프로그램가능한 로직 디바이스 등에서 대안적으로 구현될 수 있음을 인식한다. 본 개시내용에 기초하여, 당업자는 실험을 실패하지 않고도 쉽게 이러한 소프트웨어 및/또는 하드웨어를 생산하고 구현할 수 있을 것이다. 이러한 컴포넌트 각각은 버스(1208)에 동작가능하게 연결될 수 있다. 예를 들어, 버스(1208)는, EISA, PCI, USB, FireWire, NuBus, 또는 PDS일 수 있다.The platform 1200 includes a display 1202 such as a monitor and is capable of functioning as an interface of the encoding system or device such as interfaces 930 and 935 to the recognition encoding system 100 and the encoding device 910, Such as a local area network (LAN), a wireless 802.11x LAN, a 3G or a 4G mobile, which can perform the functions of an interface of a decoding system or device, such as interfaces 970 and 975 to a system 200 and a decoding device 940. [ Such as a simple input interface and / or network interface to a WAN or WiMax WAN. The platform 1200 further includes a processor 1201, e.g., one or more microprocessors, microcontrollers, digital signal processors (DSPs), combinations thereof, or other such devices known to those skilled in the art. For encoding or decoding, such as recognition encoding system 100, encoding device 910, recognition decoding system 200, and decoding device 940, but not the functions to be performed by display 1202 and interface 1203, The specific operations / functions described herein as being performed by the system described herein may be performed by processor 1201 of the platform by execution of software instructions and routines stored in a computer-readable medium (CRM) 1204 associated with the processor . However, those skilled in the art will appreciate that the operations / functions of processor 1201 may be implemented in hardware, for example, in a programmable (e.g., programmable) memory such as integrated circuits (ICs), application specific integrated circuits (ASICs), PLDs, PLAs, Logic devices, and the like. Based on the present disclosure, those skilled in the art will readily be able to produce and implement such software and / or hardware without failing the experiment. Each of these components may be operatively coupled to bus 1208. [ For example, bus 1208 may be EISA, PCI, USB, FireWire, NuBus, or PDS.

CRM(1204)은 실행을 위해 프로세서(들)(1201)에 명령을 제공하는 것에 관여하는 임의의 적절한 매체일 수 있고, 인코딩 시스템(100) 또는 장치(910)에 대한 메모리(102 및 913) 및 버퍼(915)와, 디코딩 시스템(200) 또는 장치(940)에 대한 메모리(201 및 952) 및 버퍼(950)와 같은, 본원에 기술된 다양한 메모리 및 버퍼를 포함할 수 있다. 예를 들어, CRM(1204)은 광학 또는 자기 디스크와 같은 비휘발성 매체; 메모리와 같은 휘발성 매체; 및 동축 케이블, 구리 와이어, 및 광섬유와 같은 전송 매체일 수 있다. 전송 매체는 또한 음향, 광, 또는 무선 주파수 파의 형태를 취할 수 있다. CRM(1204)은 또한 워드 프로세서, 브라우저, 이메일, 인스턴트 메시징, 미디어 플레이어, 및 텔레포니 코드를 포함하는 다른 명령 또는 명령 세트를 저장할 수 있다.CRM 1204 may be any suitable medium that participates in providing instructions to processor (s) 1201 for execution and may include memory 102 and 913 for encoding system 100 or device 910, Buffer 915 and various memories and buffers described herein, such as memory 201 and 952 and buffer 950 for decoding system 200 or apparatus 940. [ For example, the CRM 1204 may be a non-volatile medium, such as an optical or magnetic disk; Volatile media such as memory; And transmission media such as coaxial cables, copper wires, and optical fibers. The transmission medium may also take the form of acoustic, optical, or radio frequency waves. CRM 1204 may also store other instructions or a set of instructions, including word processors, browsers, emails, instant messaging, media players, and telephony codes.

CRM(1204)은 또한 MAC OS, MS WINDOWS, UNIX, 또는 LINUX와 같은 운영 체제(1205); 네트워크 애플리케이션, 워드 프로세서, 스프레드시트 애플리케이션, 브라우저, 이메일, 인스턴트 메시징, 게임 또는 모바일 애플리케이션(예를 들어, "앱")과 같은 미디어 플레이어와 같은 애플리케이션(1206); 및 데이터 구조 관리 애플리케이션(1207)을 저장할 수 있다. 운영 체제(1205)는 멀티-유저, 멀티프로세싱, 멀티태스킹, 멀티스레딩, 실시간 등일 수 있다. 운영 체제(1205)는 또한 키보드 또는 키패드와 같은 입력 디바이스로부터를 포함하여, 인터페이스(1203)로부터 입력을 인식하는 것; 디스플레이(1202)에 출력을 송신하고, CRM(1204) 상에서 파일 또는 디렉토리를 계속 추적하는 것; 디스크 드라이브, 프린터, 이미지 캡쳐 디바이스와 같은 주변 디바이스를 제어하는 것; 및 버스(1208) 상의 트래픽을 관리하는 것과 같은, 기본 작업을 수행할 수 있다. 애플리케이션(1206)은 TCP/IP, HTTP, 이더넷, USB 및 FireWire를 포함한 통신 프로토콜을 구현하기 위한 코드 또는 명령과 같이, 네트워크 접속을 설정하고 유지하기 위한 다양한 컴포넌트를 포함할 수 있다.CRM 1204 may also include an operating system 1205, such as a MAC OS, MS WINDOWS, UNIX, or LINUX; An application 1206, such as a media player, such as a network application, a word processor, a spreadsheet application, a browser, an email, an instant messaging, a game or a mobile application (e.g. And a data structure management application 1207, for example. The operating system 1205 may be multi-user, multiprocessing, multitasking, multithreading, real-time, and the like. Operating system 1205 may also include an input device, such as a keyboard or a keypad, to recognize the input from interface 1203; Sending an output to display 1202 and continuing to track the file or directory on CRM 1204; Controlling peripheral devices such as disk drives, printers, and image capture devices; And to manage traffic on bus 1208. [0035] The application 1206 may include various components for establishing and maintaining network connections, such as code or instructions for implementing communication protocols including TCP / IP, HTTP, Ethernet, USB and FireWire.

전술된 바와 같이, 데이터 구조 관리 애플리케이션(1207)과 같은 데이터 구조 관리 애플리케이션은 비휘발성 메모리에 대해, 컴퓨터 판독가능한 시스템(CRS) 아키텍쳐를 구축/업데이트하기 위한 다양한 코드 컴포넌트를 제공한다. 특정 예에서, 데이터 구조 관리 애플리케이션(1207)에 의해 수행되는 프로세스의 일부 또는 전부는 운영 체제(1205)에 통합될 수 있다. 특정 예에서, 프로세스는 디지털 전자 회로에서, 컴퓨터 하드웨어, 펌웨어, 코드, 명령 세트, 또는 이들의 임의의 조합에서 적어도 부분적으로 구현될 수 있다.As described above, a data structure management application, such as data structure management application 1207, provides various code components for building / updating a computer readable system (CRS) architecture for non-volatile memory. In certain instances, some or all of the processes performed by the data structure management application 1207 may be integrated into the operating system 1205. [ In certain instances, the process may be implemented in digital electronic circuitry, at least in part, in computer hardware, firmware, code, instruction set, or any combination thereof.

현 개시내용 전체에 걸쳐 구체적으로 기술되었지만, 대표적인 예는 광범위한 애플리케이션에 걸친 활용성을 가지며, 위의 논의는 제한으로서 의도되지 않고 제한으로서 해석되지 않아야 한다. 본원에서 사용되는 용어, 기재 및 도면은 단지 예시로서 설명되며, 제한으로서 의도되지 않는다. 당업자는 본 발명의 사상 및 범위 내에서 많은 변형이 가능함을 인식한다. 본 발명은 예시를 참조하여 기술되었지만, 당업자는, 후속하는 청구항에 기술된 바와 같은 예들의 범위 및 그 등가물로부터 벗어나지 않고 기술된 예에 대한 다양한 수정을 행할 수 있다.Although specifically described throughout the present disclosure, representative examples have utility across a wide range of applications, and the above discussion should not be construed as a limitation and should not be construed as a limitation. The terms, descriptions and drawings used herein are set forth by way of example only, and are not intended as limitations. Those skilled in the art will recognize that many modifications are possible within the spirit and scope of the invention. Although the present invention has been described with reference to exemplary embodiments, those skilled in the art will be able to make various modifications to the described examples without departing from the scope of the examples and equivalents thereof as set forth in the claims that follow.

Claims

A system for encoding,
An interface configured to receive a video signal including original pictures within a video sequence comprising pictures; And
Processor
The processor comprising:
Generate target perceptual representations based on the received original pictures,
Selecting reference pictures from a plurality of reference pictures,
Determine motion vector information based on the target recognition expressions and the reference pictures, the determined motion vector information being determined based on the attributes of the target recognition expressions and the reference pictures,
Encode the determined motion vector information,
And to encode pointers associated with the reference pictures.

The method according to claim 1,
Wherein the processor is configured to generate a plurality of reference recognition expressions from the reference pictures and to determine the motion vector information using the plurality of reference recognition expressions.

The method according to claim 1,
The processor comprising:
Generates spatial detail maps based on the respective original pictures,
Determines sign information based on each of the generated spatial detail maps,
Determining maximum value information based on the generated spatial detail maps,
Processing the determined code information and the determined cut-off value information to form respective generated target recognition expressions
And generate the target aware representations.

The method of claim 3,
Wherein the generated spatial detail maps include values associated with pixels in the original pictures.

The method of claim 3,
Wherein the generated spatial detail maps include values determined using a model of human perceptibility of features in the respective original pictures.

The method of claim 3,
The processor comprising:
Determine peak values maps based on the generated spatial detail maps;
By generating the absolute space detail maps based on the companding factor, the determined maximum value maps of each of them, and the generated spatial detail maps,
And to determine the bonus information.

The method according to claim 1,
Wherein the original pictures in the video sequence are in a transition sequence of pictures in the video sequence.

8. The method of claim 7,
Wherein the transition sequence is characterized by at least one of a changing contrast attribute and a changing brightness attribute of the pictures in the plurality of pictures in the video sequence.

As a method for encoding,
The method comprising: receiving a video signal including original pictures in a video sequence including pictures;
Generating target recognition expressions based on the received original pictures;
Selecting reference pictures from a plurality of reference pictures;
Utilizing a processor to determine motion vector information based on the target recognition expressions and the reference pictures, the determined motion vector information being determined based on properties of the target recognition expressions and the reference pictures;
Encoding the determined motion vector information; And
Encoding pointers associated with the reference pictures
/ RTI >

A non-transitory computer readable medium (CRM) for storing computer-readable instructions for carrying out the method of claim 9.

A system for decoding,
Receiving motion vector information, the motion vector information being based on target pictures based on original pictures from a video sequence including pictures, and reference pictures associated with the target pictures;
Receiving pointers associated with the reference pictures,
An interface configured to receive residual pictures associated with the received motion vector information; And
Using the received pointers to select reference pictures from a plurality of reference pictures,
Determine predicted pictures based on the received motion vector information and the selected reference pictures,
A processor configured to generate reconstructed pictures based on the predicted pictures and the residual pictures;
/ RTI >

12. The method of claim 11,
Wherein the reference pictures are reference recognition representations.

12. The method of claim 11,
Wherein the target recognition expression is generated for each of the generated spatial detail maps based on the respective original pictures,
Determining at least one of the determined code information determined based on the generated spatial detail maps and the determined maximum value information determined based on the generated spatial detailed maps,
&Lt; / RTI >

14. The method of claim 13,
Wherein the generated spatial detail maps include values associated with pixels in the original picture.

14. The method of claim 13,
Wherein the generated spatial detail maps include values determined using a model of human perception of features in the original picture.

14. The method of claim 13,
The determined cut-
A provisional value map based on the respective generated spatial detail maps, and
A plurality of absolute value maps, and a compressed absolute value space detailed maps generated based on the generated spatial detailed maps,
/ RTI >

12. The method of claim 11,
Wherein the original pictures in the video sequence are within a transition sequence of pictures in the video sequence.

18. The method of claim 17,
Wherein the transition sequence is characterized by at least one of a contrast change attribute and a brightness change attribute of pictures in the transition sequence.

As a method for decoding,
Receiving motion vector information, the motion vector information being based on target-recognized representations based on original pictures from a video sequence comprising pictures, and reference pictures associated with the target-recognized expressions,
Receiving pointers associated with the respective reference pictures;
Receiving the residual pictures associated with the received motion vector information;
Selecting reference pictures from a plurality of reference pictures utilizing the respective received pointers;
Utilizing the processor to determine the predicted pictures based on the received motion vector information and the respective selected reference pictures; And
Generating reconstructed pictures based on the determined predicted pictures and the received residual pictures
/ RTI >

A non-transitory computer readable medium (CRM) for storing computer readable instructions for carrying out the method of claim 19.